- Redmond WA, US Benjamin Eliot LUNDELL - Seattle WA, US Larry Marvin WALL - Seattle WA, US Chad Balling McBRIDE - North Bend WA, US Amol Ashok AMBARDEKAR - Redmond WA, US George PETRE - Redmond WA, US Kent D. CEDOLA - Bellevue WA, US Boris BOBROV - Kirkland WA, US
A deep neural network (“DNN”) module can compress and decompress neuron-generated activation data to reduce the utilization of memory bus bandwidth. The compression unit can receive an uncompressed chunk of data generated by a neuron in the DNN module. The compression unit generates a mask portion and a data portion of a compressed output chunk. The mask portion encodes the presence and location of the zero and non-zero bytes in the uncompressed chunk of data. The data portion stores truncated non-zero bytes from the uncompressed chunk of data. A decompression unit can receive a compressed chunk of data from memory in the DNN processor or memory of an application host. The decompression unit decompresses the compressed chunk of data using the mask portion and the data portion. This can reduce memory bus utilization, allow a DNN module to complete processing operations more quickly, and reduce power consumption.
- Redmond WA, US Boris BOBROV - Kirkland WA, US Kent D. CEDOLA - Bellevue WA, US Chad Balling MCBRIDE - North Bend WA, US George PETRE - Redmond WA, US Larry Marvin WALL - Seattle WA, US
Neural processing elements are configured with a hardware AND gate configured to perform a logical AND operation between a sign extend signal and a most significant bit (“MSB”) of an operand. The state of the sign extend signal can be based upon a type of a layer of a deep neural network (“DNN”) that generate the operand. If the sign extend signal is logical FALSE, no sign extension is performed. If the sign extend signal is logical TRUE, a concatenator concatenates the output of the hardware AND gate and the operand, thereby extending the operand from an N-bit unsigned binary value to an N+1 bit signed binary value. The neural processing element can also include another hardware AND gate and another concatenator for processing another operand similarly. The outputs of the concatenators for both operands are provided to a hardware binary multiplier.
Reducing Power Consumption In A Neural Network Environment Using Data Management
- Redmond WA, US Chad Balling MCBRIDE - North Bend WA, US George PETRE - Redmond WA, US Kent D. CEDOLA - Bellevue WA, US Larry Marvin WALL - Seattle WA, US
International Classification:
G06F 1/32 G06N 3/063 G06N 3/04
Abstract:
Techniques to provide for improved (i.e., reduced) power consumption in an exemplary neural network (NN) and/or Deep Neural Network (DNN) environment using data management. Improved power consumption in the NN/DNN may be achieved by reducing a number of bit flips needed to process operands associated with one or more storages. Reducing the number bit flips associated with the NN/DNN may be achieved by multiplying an operand associated with a first storage with a plurality of individual operands associated with a plurality of kernels of the NN/DNN. The operand associated with the first storage may be neuron input data and the plurality of individual operands associated with the second storage may be weight values for multiplication with the neuron input data. The plurality of kernels may be arranged or sorted and subsequently processed in a manner that improves power consumption in the NN/DNN.
Dynamic Sequencing Of Data Partitions For Optimizing Memory Utilization And Performance Of Neural Networks
- Redmond WA, US Chad Balling McBRIDE - North Bend WA, US Amol Ashok AMBARDEKAR - Redmond WA, US George PETRE - Redmond WA, US Larry Marvin WALL - Seattle WA, US Boris BOBROV - Kirkland WA, US
International Classification:
G06N 3/04 G06F 3/06 G06N 3/063
Abstract:
Optimized memory usage and management is crucial to the overall performance of a neural network (NN) or deep neural network (DNN) computing environment. Using various characteristics of the input data dimension, an apportionment sequence is calculated for the input data to be processed by the NN or DNN that optimizes the efficient use of the local and external memory components. The apportionment sequence can describe how to parcel the input data (and its associated processing parameters—e.g., processing weights) into one or more portions as well as how such portions of input data (and its associated processing parameters) are passed between the local memory, external memory, and processing unit components of the NN or DNN. Additionally, the apportionment sequence can include instructions to store generated output data in the local and/or external memory components so as to optimize the efficient use of the local and/or external memory components.
Minimizing Memory Reads And Increasing Performance Of A Neural Network Environment Using A Directed Line Buffer
- Redmond WA, US Chad Balling McBRIDE - North Bend WA, US Amol Ashok AMBARDEKAR - Redmond WA, US Kent D. CEDOLA - Bellevue WA, US Larry Marvin WALL - Seattle WA, US Boris BOBROV - Kirkland WA, US
International Classification:
G06N 3/04 G06F 12/08 H04L 29/08
Abstract:
The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as management of data among the various memory components of the NN/DNN. Using a directed line buffer that operatively inserts one or more shifting bits in data blocks to be processed, data read/writes to the line buffer can be optimized for processing by the NN/DNN thereby enhancing the overall performance of a NN/DNN. Operatively, an operations controller and/or iterator can generate one or more instructions having a calculated shifting bit(s) for communication to the line buffer. Illustratively, the shifting bit(s) can be calculated using various characteristics of the input data as well as the NN/DNN inclusive of the data dimensions. The line buffer can read data for processing, insert the shifting bits and write the data in the line buffer for subsequent processing by cooperating processing unit(s).
Flexible Hardware For High Throughput Vector Dequantization With Dynamic Vector Length And Codebook Size
- Redmond WA, US Aleksandar TOMIC - Dublin, IE Chad Balling McBRIDE - North Bend WA, US George PETRE - Redmond WA, US Kent D. CEDOLA - Bellevue WA, US Larry Marvin Wall - Seattle WA, US Boris BOBROV - Kirkland, US
International Classification:
G06N 3/04
Abstract:
The performance of a neural network (NN) and/or deep neural network (DNN) can limited by the number of operations being performed as well as memory data management of a NN/DNN. Using vector quantization of neuron weight values, the processing of data by neurons can be optimize the number of operations as well as memory utilization to enhance the overall performance of a NN/DNN. Operatively, one or more contiguous segments of weight values can be converted into one or more vectors of arbitrary length and each of the one or more vectors can be assigned an index. The generated indexes can be stored in an exemplary vector quantization lookup table and retrieved by exemplary fast weight lookup hardware at run time on the flyas part of an exemplary data processing function of the NN as part of an inline de-quantization operation to obtain needed one or more neuron weight values.
Reducing Power Consumption In A Neural Network Processor By Skipping Processing Operations
- Redmond WA, US Chad Balling McBRIDE - North Bend WA, US George PETRE - Redmond WA, US Larry Marvin WALL - Seattle WA, US Kent D. CEDOLA - Bellevue WA, US Boris BOBROV - Kirkland WA, US
International Classification:
G06N 3/04 G06N 3/08 G06N 3/063
Abstract:
A deep neural network (“DNN”) module can determine whether processing of certain values in an input buffer or a weight buffer by neurons can be skipped. For example, the DNN module might determine whether neurons can skip the processing of values in entire columns of a neuron buffer. Processing of these values might be skipped if an entire column of an input buffer or a weight buffer are zeros, for example. The DNN module can also determine whether processing of single values in rows of the input buffer or the weight buffer can be skipped (e.g. if the values are zero). Neurons that complete their processing early as a result of skipping operations can assist other neurons with their processing. A combination operation can be performed following the completion of processing that transfers the results of the processing operations performed by a neuron to their correct owner.
Neural Network Processor Using Compression And Decompression Of Activation Data To Reduce Memory Bandwidth Utilization
- Redmond WA, US Benjamin Eliot LUNDELL - Seattle WA, US Larry Marvin WALL - Seattle WA, US Chad Balling McBRIDE - North Bend WA, US Amol Ashok AMBARDEKAR - Redmond WA, US George PETRE - Redmond WA, US Kent D. CEDOLA - Bellevue WA, US Boris BOBROV - Kirkland WA, US
International Classification:
G06N 3/04 G06N 3/063 H03M 7/30
Abstract:
A deep neural network (“DNN”) module can compress and decompress neuron-generated activation data to reduce the utilization of memory bus bandwidth. The compression unit can receive an uncompressed chunk of data generated by a neuron in the DNN module. The compression unit generates a mask portion and a data portion of a compressed output chunk. The mask portion encodes the presence and location of the zero and non-zero bytes in the uncompressed chunk of data. The data portion stores truncated non-zero bytes from the uncompressed chunk of data. A decompression unit can receive a compressed chunk of data from memory in the DNN processor or memory of an application host. The decompression unit decompresses the compressed chunk of data using the mask portion and the data portion. This can reduce memory bus utilization, allow a DNN module to complete processing operations more quickly, and reduce power consumption.
William Beaumont Army Medical Center Cardiovascular Disease 5005 N Piedras St RM 4278, El Paso, TX 79920 9157421840 (phone), 9157428306 (fax)
Education:
Medical School Kansas City University of Medicine and Biosciences College of Osteopathic Medicine Graduated: 2007
Languages:
English
Description:
Dr. McBride graduated from the Kansas City University of Medicine and Biosciences College of Osteopathic Medicine in 2007. He works in El Paso, TX and specializes in Cardiovascular Disease. Dr. McBride is affiliated with William Beaumont Army Medical Center.
Logic Design Lead - Bus Interface Unit - Wii gaming cpu at IBM Systems & Technology Group
Location:
Redmond, Washington
Industry:
Computer Hardware
Work:
IBM Systems & Technology Group - Rochester, Minnesota Area since Jan 2009
Logic Design Lead - Bus Interface Unit - Wii gaming cpu
Microsoft 2012 - 2013
Senior Hardware Designer
Microsoft 2012 - 2012
Senior Hardware Designer
IBM Systems & Technology Group - Rochester, Minnesota Area Oct 2007 - Jan 2012
Logic Design Lead - security engine and cpu bus interface - Xbox 360 cpu.
IBM Systems & Technology Group - San Jose Nov 2006 - Jan 2007
Logic Design - Cisco Switch Chip
Education:
University of Minnesota-Twin Cities 1996 - 2001
Masters, Electrical Engineering
Utah State University 1990 - 1996
BS, Electrical Engineering