- Redmond WA, US Victor Manuel FRAGOSO ROJAS - Bellevue WA, US Mei CHEN - Redmond WA, US Jedrzej Jakub KOZERAWSKI - Goleta CA, US
International Classification:
G06K 9/62 G06N 3/04 G06N 3/08
Abstract:
A method of balancing a dataset for a machine learning model includes identifying confusing classes of few-shot classes for a machine learning model during validation. One of the confusing classes and an image from one of the few-shot classes are selected. An image perturbation is computed such that the selected image is classified as the selected confusing class. The selected image is modified with the computed perturbation. The modified selected image is added to a batch for training the machine learning model.
- Redmond WA, US Xiyang DAI - Seattle WA, US Mengchen LIU - Redmond WA, US Dongdong CHEN - Bellevue WA, US Lu YUAN - Redmond WA, US Zicheng LIU - Bellevue WA, US Ye YU - Redmond WA, US Mei CHEN - Bellevue WA, US Yunsheng LI - San Diego CA, US
Assignee:
Microsoft Technology Licensing, LLC - Redmond WA
International Classification:
G06N 3/04 G06F 17/16
Abstract:
A computer device for automatic feature detection comprises a processor, a communication device, and a memory configured to hold instructions executable by the processor to instantiate a dynamic convolution neural network, receive input data via the communication network, and execute the dynamic convolution neural network to automatically detect features in the input data. The dynamic convolution neural network compresses the input data from an input space having a dimensionality equal to a predetermined number of channels into an intermediate space having a dimensionality less than the number of channels. The dynamic convolution neural network dynamically fuses the channels into an intermediate representation within the intermediate space and expands the intermediate representation from the intermediate space to an expanded representation in an output space having a higher dimensionality than the dimensionality of the intermediate space. The features in the input data are automatically detected based on the expanded representation.
- Redmond WA, US Dongdong CHEN - Bellevue WA, US Yinpeng CHEN - Sammamish WA, US Mengchen LIU - Redmond WA, US Ye YU - Redmond WA, US Zicheng LIU - Bellevue WA, US Mei CHEN - Bellevue WA, US Lu YUAN - Redmond WA, US Junru WU - College Station TX, US
International Classification:
G06N 3/04 G06N 3/08 G06K 9/62 G06F 11/34
Abstract:
A neural architecture search (NAS) with a weak predictor comprises: receiving network architecture scoring information; iteratively sampling a search space, wherein the sampling comprises: generating a set of candidate architectures within the search space; learning a first predictor; evaluating performance of the candidate architectures; and based on at least the performance of the set of candidate architectures and the network architecture scoring information, refining the search space to a smaller search space; based on at least the network architecture scoring information, thresholding the performance of candidate architectures to determine scored output candidate architectures; and reporting the scored output candidate architectures. In some examples, the candidate architectures each comprise a machine learning (ML) model, for example a neural network (NN). In some examples, searching continues to iterate until stopping criteria is met, such as a specified maximum number of iterations or a set of candidate architectures achieves a performance goal.
Leveraging Unsupervised Meta-Learning To Boost Few-Shot Action Recognition
The disclosure herein describes preparing and using a cross-attention model for action recognition using pre-trained encoders and novel class fine-tuning. Training video data is transformed into augmented training video segments, which are used to train an appearance encoder and an action encoder. The appearance encoder is trained to encode video segments based on spatial semantics and the action encoder is trained to encode video segments based on spatio-temporal semantics. A set of hard-mined training episodes are generated using the trained encoders. The cross-attention module is then trained for action-appearance aligned classification using the hard-mined training episodes. Then, support video segments are obtained, wherein each support video segment is associated with video classes. The cross-attention module is fine-tuned using the obtained support video segments and the associated video classes. A query video segment is obtained and classified as a video class using the fine-tuned cross-attention module.
Task-Aware Recommendation Of Hyperparameter Configurations
- Redmond WA, US Victor Manuel FRAGOSO ROJAS - Bellevue WA, US Mei CHEN - Bellevue WA, US Chang LIU - Medford MA, US
International Classification:
G06N 3/08 G06K 9/62
Abstract:
Providing a task-aware recommendation of hyperparameter configurations for a neural network architecture. First, a joint space of tasks and hyperparameter configurations are constructed using a plurality of tasks (each of which corresponds to a dataset) and a plurality of hyperparameter configurations. The joint space is used as training data to train and optimize a performance prediction network, such that for a given unseen task corresponding to one of the plurality of tasks and a given hyperparameter configuration corresponding to one of the plurality of hyperparameter configurations, the performance prediction network is configured to predict performance that is to be achieved for the unseen task using the hyperparameter configuration.
- Redmond WA, US Mei CHEN - Bellevue WA, US Gabriel TAKACS - Issaquah WA, US
Assignee:
Microsoft Technology Licensing, LLC - Redmond WA
International Classification:
G06T 7/73 G06T 7/60 G06T 7/80 H04N 5/247
Abstract:
A scale and pose estimation method for a camera system is disclosed. Camera data for a scene acquired by the camera system is received. A rotation prior parameter characterizing a gravity direction is received. A scale prior parameter characterizing scale of the camera system is received. A cost of a cost function is calculated for a similarity transformation that is configured to encode a scale and pose of the camera system. The cost of the cost function is influenced by the rotation prior parameter and the scale prior parameter. A solved similarity transformation is determined upon calculating a cost for the cost function that is less than a threshold cost. An estimated scale and pose of the camera system is output based on the solved similarity transformation.
Aug 2012 to Jan 2013 Part-Time Physical Therapy AideMulti-tasked
Aug 2012 to Jan 2013 Part-Time Physical Therapy AideNew York Cares
Dec 2011 to 2013 Volunteer position at New York Cares
Education:
Sunshine Developmental School Apr 2014 to Jul 2014 DPT in Clinical ExperienceMargaret Tietz Nursing & Rehabilitation Center Jun 2011 to Jul 2011 Clinical InternshipStony Brook University Bachelor of Health Sciences in BiologyStony Brook University DPT
May 2014 to 2000 Sales AssociateChinese Culture Language Club Norfolk, VA Nov 2009 to Dec 2013 President/SecretarySystem Technologies Advanced Research Virginia Beach, VA May 2013 to Jul 2013 Account Executive InternStudio Center Total Production Virginia Beach, VA Jan 2013 to Apr 2013 Marketing/Sales InternSmoothie King Alexandria, VA May 2010 to Aug 2012 Shift Leader/Team MemberSmoothie King Virginia Beach, VA Dec 2011 to Feb 2012 Sales AssociateHaagen Dazs Washington, DC Jun 2008 to Aug 2009 Team Member
Education:
Old Dominion University Norfolk, VA Dec 2013 B.A. in Communications
Skills:
SKILLS: Language - English, Mandarin, Fuzhounese Programming Language - Basic HTML Soft Skills - Adaptable, Creative, Dependable, Passionate, Team Player TOOLS: Content Management System - WordPress Graphic Editor - Adobe InDesign, Adobe Photoshop, Gimp Social Media - Facebook, Google Plus, LinkedIn, OrgSync, Pinterest, Twitter, YouTube Word Processing - Microsoft Office Suite, Open Office Suite