Microsoft
Research Hardware Design Engineer
Cornell University Jul 2009 - Jul 2015
Graduate Research Assistant
Intel Corporation May 2012 - Aug 2012
Graduate Technical Intern
Caltech Jun 2006 - Dec 2008
Undergraduate Research Assistant
Applied Minds Jun 2008 - Sep 2008
Intern
Education:
Cornell University 2009 - 2015
Doctorates, Doctor of Philosophy, Computer Engineering
Caltech 2005 - 2009
Bachelors, Bachelor of Science, Electrical Engineering
Skills:
Matlab Python Algorithms C++ Embedded Systems C Latex Computer Architecture Verilog Vhdl Assembly Language Git Subversion
Interests:
Mathematics Computer Programming Academia Nfl Processors Electronics Intel Computer Science Science Football (Us) Computer Architecture Graduate School Electrical Engineering Rubik's Cube Computer Hardware California Institute of Technology Cars and Automobiles Video Games
- Redmond WA, US Daniel LO - Bothell WA, US Haishan ZHU - Bellevue WA, US Eric Sen CHUNG - Woodinville WA, US
Assignee:
Microsoft Technology Licensing, LLC - Redmond WA
International Classification:
G06N 3/084
Abstract:
Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.
Neural Network Training With Decreased Memory Consumption And Processor Utilization
- Redmond WA, US Daniel LO - Bothell WA, US Haishan ZHU - Bellevue WA, US Eric Sen CHUNG - Woodinville WA, US
International Classification:
G06N 3/08
Abstract:
Bounding box quantization can reduce the quantity of bits utilized to express numerical values prior to the multiplication of matrices comprised of such numerical values, thereby reducing both memory consumption and processor utilization. Stochastic rounding can provide sufficient precision to enable the storage of weight values in reduced-precision formats without having to separately store weight values in a full-precision format. Alternatively, other rounding mechanisms, such as round to nearest, can be utilized to exchange weight values in reduced-precision formats, while also storing weight values in full-precision formats for subsequent updating. To facilitate conversion, reduced-precision formats such as brain floating-point format can be utilized.
Mixed Precision Training Of An Artificial Neural Network
- Redmond WA, US Taesik NA - Issaquah WA, US Daniel LO - Bothell WA, US Eric S. CHUNG - Redmond WA, US
International Classification:
G06N 3/08 G06N 3/04 G06F 7/483
Abstract:
The use of mixed precision values when training an artificial neural network (ANN) can increase performance while reducing cost. Certain portions and/or steps of an ANN may be selected to use higher or lower precision values when training. Additionally, or alternatively, early phases of training are accurate enough with lower levels of precision to quickly refine an ANN model, while higher levels of precision may be used to increase accuracy for later steps and epochs. Similarly, different gates of a long short-term memory (LSTM) may be supplied with values having different precisions.
Machine learning may include training and drawing inference from artificial neural networks, processes which may include performing convolution and matrix multiplication operations. Convolution and matrix multiplication operations are performed using vectors of block floating-point (BFP) values that may include outliers. BFP format stores floating-point values using a plurality of mantissas of a fixed bit width and a shared exponent. Elements are outliers when they are too large to be represented precisely with the fixed bit width mantissa and shared exponent. Outlier values are split into two mantissas. One mantissa is stored in the vector with non-outliers, while the other mantissa is stored outside the vector. Operations, such as a dot product, may be performed on the vectors in part by combining the in-vector mantissa and exponent of an outlier value with the out-of-vector mantissa and exponent.
Deriving A Concordant Software Neural Network Layer From A Quantized Firmware Neural Network Layer
- Redmond WA, US Daniel Lo - Bothell WA, US Deeksha Dangwal - Santa Barbra CA, US
International Classification:
G06N 3/063 G06F 17/16 G06F 9/30
Abstract:
Systems and methods for deriving a concordant software neural network layer are provided. A method includes receiving first instructions configured to, using a neural network processor (NNP), process a first set of data corresponding to a neural network layer, where the NNP is configured to quantize the first set of the data to generate a set of quantized data and then perform matrix-vector multiply operations on the set of quantized data using a matrix-vector-multiplier incorporated within hardware associated with the NNP to generate a first set of results. The method further includes processing the first instructions to automatically generate second instructions configured for use with at least one processor, different from the NNP, such that the second instructions, when executed by the at least one processor to perform matrix multiply operations, generate a second set of results that are concordant with the first set of results.
Neural Network Layer Processing With Scaled Quantization
Processors and methods for neural network processing are provided. A method includes receiving a subset of data corresponding to a layer of a neural network. The method further includes prior to performing any matrix operations using the subset of the data, scaling the subset of the data by a scaling factor to generate a scaled subset of data. The method further includes quantizing the scaled subset of the data to generate a scaled and quantized subset of data. The method further includes performing the matrix operations using the scaled and quantized subset of the data to generate a subset of results of the matrix operations. The method further includes descaling the subset of the results of the matrix operations, by multiplying the subset of the results of the matrix operations with an inverse of the scaling factor, to generate a descaled subset of results of the matrix operations.
Neural Network Layer Processing With Normalization And Transformation Of Data
Processors and methods for neural network processing are provided. A method includes receiving a subset of data corresponding to a layer of a neural network for processing using the processor. The method further includes during a forward propagation pass: (1) normalizing the subset of the data corresponding to the layer of the neural network based on an average associated with the subset of the data and a variance associated with the subset of the data, where the normalizing the subset of the data comprises dynamically updating the average and dynamically updating the variance, to generate normalized data and (2) applying a transformation to the normalized data using a fixed scale parameter corresponding to the subset of the data and a fixed shift parameter corresponding to the subset of the data such that during the forward propagation pass neither the fixed scale parameter nor the fixed shift parameter is updated.
Dithered Quantization Of Parameters During Training With A Machine Learning Tool
- Redmond WA, US Haishan ZHU - Redmond WA, US Daniel LO - Bothell WA, US Eric S. CHUNG - Woodinville WA, US
Assignee:
Microsoft Technology Licensing, LLC - Redmond WA
International Classification:
G06N 3/08 G06F 7/499 G06N 20/00
Abstract:
A machine learning tool uses dithered quantization of parameters during training of a machine learning model such as a neural network. The machine learning tool receives training data and initializes certain parameters of the machine learning model (e.g., weights for connections between nodes of a neural network, biases for nodes). The machine learning tool trains the parameters in one or more iterations based on the training data. In particular, in a given iteration, the machine learning tool applies the machine learning model to at least some of the training data and, based at least in part on the results, determines parameter updates to the parameters. The machine learning tool updates the parameters using the parameter updates and a dithered quantizer function, which can add random values before a rounding or truncation operation.
Youtube
PER SE CONMOCIONA | LA GENTE SE LEVANTA | EST...
Daniel Estulin intenta explicar lo que sucede en Per y qu repercusione...
Duration:
1h 4m 11s
LO MEJOR DE DANIEL CALVETI EN ADORACIN - GRAN...
LO MEJOR DE DANIEL CALVETI EN ADORACIN - GRANDES XITOS DE ALABANZA Y A...
Duration:
11h 54m 59s
Lo nuevo - Rvdo. Daniel Rossini
Si es la primera vez que te conectas con nosotros queremos conocerte y...
Duration:
2h 4s
LO LTIMO SOBRE CHIVAS! REFUERZOS, BAJAS, Y MS...
LO LTIMO SOBRE CHIVAS! REFUERZOS, BAJAS, RUMORES Y MAS! LO ULTIMO QUE ...
Duration:
8m 17s
LO MEJOR EST POR VENIR - Daniel Habif
LO MEJOR EST POR VENIR. La paz no llegar cuando lo tengas todo control...
Duration:
4m 3s
Christian Daniel x Claudy-O -Como Lo Hago (Vi...
Christian Daniel x Claudy-O -Como Lo Hago (Video Oficial) Suscrbete al...