Microsoft
Senior Software Engineer
Microsoft Jul 1, 2013 - Jul 2017
Software Engineer
University of Washington Sep 2007 - Jun 2013
Ph.d
Microsoft Jun 2012 - Sep 2012
Research Intern
Apple Jun 2011 - Sep 2011
Software Engineer Intern
Education:
University of Washington 2007 - 2013
Doctorates, Doctor of Philosophy
Skills:
Image Processing Matlab C++ C Signal Processing Dsp Machine Learning Python Deep Learning Acoustic Echo Cancellation
Kaibao Nie - Bothell WA, US Les Atlas - Seattle WA, US Jay Rubinstein - Seattle WA, US Xing Li - Bellevue WA, US Charles Pascal Clark - Seattle WA, US
Assignee:
University of Washington - Seattle WA
International Classification:
A61N 1/00
US Classification:
607 57
Abstract:
The restoration of melody perception is a key remaining challenge in cochlear implants. A novel sound coding strategy is proposed that converts an input audio signal into time-varying electrically stimulating pulse trains. A sound is first split into several frequency sub-bands with a fixed filter bank or a dynamic filter bank tracking harmonics in sounds. Each sub-band signal is coherently downward shifted to a low-frequency base band. These resulting coherent envelope signals have Hermitian symmetric frequency spectrums and are thus real-valued. A peak detector or high-rate sampler of half-wave rectified coherent envelope signals in each sub-band further converts the coherent envelopes into rate-varying, interleaved pulse trains. Acoustic simulations of cochlear implants using this new technique with normal hearing listeners, showed significant improvement in melody recognition over the most common conventional stimulation approach used in cochlear implants.
Method For Binary To Contone Conversion With Non-Solid Edge Detection
Yingjun Bai - San Jose CA, US Xing Li - Webster NY, US
Assignee:
Xerox Corporation - Norwalk CT
International Classification:
G06K 9/32
US Classification:
382293
Abstract:
A system and method convert a pixel of binary image data to a pixel of contone image data by determining if a predetermined pixel of binary image data is part of a solid edge or part of a fuzzy edge. A binary to contone conversion circuit converts the predetermined pixel of binary image data to a pixel of a first contone image data value, and a filter circuit converts the predetermined pixel of binary image data to a pixel of a second contone image data value. The filter circuit uses an adaptive filtering operation wherein the adaptive filtering operation utilizes one of a plurality of sets of weighting coefficients to change a characteristic of the filtering operation. The set of weighting coefficients used in the filtering operation are selected in response to a fuzzy edge detection. A selection between the first contone image data value and the second contone image data value is made based upon the determination as whether the predetermined pixel of binary image data is part of a solid edge.
A portable computer-vision-based non-contact vibration detection system and method. The system can process small vibrations and large vibrations separately in the captured images. The small vibrations can be enhanced, and the enhanced small vibrations are analyzed, and the analysis results of the small vibrations and large vibrations are fused, and the processed images are displayed through a GUI. The analysis results include displacements in Region of Interest, vibration frequencies or cycles, vibration amplitudes and phase angles, root mean square (RMS) values, etc., along with overall ‘virtual’ snapshots of vibrations with maximum amplitudes during the working period of the camera.
Automatically And Precisely Generating Highlight Videos With Artificial Intelligence
Presented herein are systems, methods, and datasets for automatically and precisely generating highlight or summary videos of content. For example, in one or more embodiments, videos of sporting events may be digested or condensed into highlights, which will dramatically benefit sports media, broadcasters, video creators or commentators, or other short video creators, in terms of cost reduction, fast, and mass production, and saving tedious engineering hours. Embodiment of the framework may also be used or adapted for use to better promote sports teams, players, and/or games, and produce stories to glorify the spirit of sports or its players. While presented in the context of sports, it shall be noted that the methodology may be used for videos comprising other content and events.
Generating Highlight Video From Video And Text Inputs
- Sunnyvale CA, US Le KANG - Dublin CA, US Zhiyu CHENG - Sunnyvale CA, US Hao TIAN - Cupertino CA, US Daming LU - Dublin CA, US Dapeng LI - Los Altos CA, US Jingya XUN - San Jose CA, US Jeff WANG - San Jose CA, US Xi CHEN - San Jose CA, US Xing LI - Santa Clara CA, US
Presented herein are systems, methods, and datasets for automatically and precisely generating highlight or summary videos of content. In one or more embodiments, the inputs comprise a text (e.g., an article) of the key event(s) (e.g., a goal, a player action, etc.) in an activity (e.g., a game, a concert, etc.) and a video or videos of the activity. In one or more embodiments, the output is a short video of an event or events in the text, in which the video may include commentary about the highlighted events and/or other audio (e.g., music), which may also be automatically synthesized.
- Sunnyvale CA, US Le KANG - Dublin CA, US Xin ZHOU - Mountain View CA, US Hao TIAN - Cupertino CA, US Xing LI - Santa Clara CA, US Bo HE - Sunnyvale CA, US Jingyu XIN - Tucson AZ, US
Assignee:
Baidu USA LLC - Sunnyvale CA
International Classification:
G06V 20/40 G06N 3/08 G06V 10/42
Abstract:
With rapidly evolving technologies and emerging tools, sports-related videos generated online are rapidly increasing. To automate the sports video editing/highlight generation process, a key task is to precisely recognize and locate events-of-interest in videos. Embodiments herein comprise a two-stage paradigm to detect categories of events and when these events happen in videos. In one or more embodiments, multiple action recognition models extract high-level semantic features, and a transformer-based temporal detection module locates target events. These novel approaches achieved state-of-the-art performance in both action spotting and replay grounding. While presented in the context of sports, it shall be noted that the systems and methods herein may be used for videos comprising other content and events.
Semiconductor Layer Structure With A Thin Blocking Layer
- San Jose CA, US Xing LI - San Jose CA, US Hery DJIE - San Jose CA, US
International Classification:
H01S 5/34
Abstract:
A semiconductor layer structure may include a substrate, a blocking layer disposed over the substrate, and one or more epitaxial layers disposed over the blocking layer. The blocking layer may have a thickness of between 50 nanometers (nm) and 4000 nm. The blocking layer may be configured to suppress defects from the substrate propagating to the one or more epitaxial layers. The one or more epitaxial layers may include a quantum-well layer that includes a quantum-well intermixing region formed using a high temperature treatment.
- Sunnyvale CA, US Zhenyu ZHONG - Sunnyvale CA, US Yueqiang CHENG - Sunnyvale CA, US Xing LI - Sunnyvale CA, US Tao WEI - Sunnyvale CA, US
International Classification:
G10L 13/047 G10L 19/018 G10L 25/30
Abstract:
According to various embodiments, an end-to-end TTS framework can integrate a watermarking process into the training of the TTS framework, which enables watermarks to be imperceptible within a synthesized/cloned audio segment generated by the TTS framework. The watermarks added in such a matter are statistically undetectable to prevent authorized removal. According to an exemplary method of training the TTS framework, a TTS neural network model and a watermarking neural network mode in the TTS framework are trained in an end to end manner, with the watermarking being part of the optimization process of the TTS framework. During the training, neuron values of the TTS neural network model are adjusted based on training data to prepare one or more spaces for adding a watermark in a synthesized audio segment to be generated by the TTS framework. In response to the neuron value adjustment in the TTS neural network model, neuron values of the watermarking neural network model are accordingly adjusted to add the watermark to the one or more prepared spaces.