- George Town, KY Lingjie XU - San Mateo CA, US Minghai QIN - San Mateo CA, US Ping CHEN - San Mateo CA, US Xinyang YU - San Mateo CA, US Qinggang ZHOU - San Mateo CA, US
AI-assisted programmable hardware video codec is disclosed. According to certain embodiments, a video processing apparatus includes a programmable hardware encoder configured to execute an encoding process on a plurality of input video frames. The video processing apparatus further includes a controller coupled with the programmable hardware encoder. The controller is configured to execute a set of instructions to cause the video processing apparatus to: determine first information of the plurality of input video frames, and adjust the encoding process based on the first information.
System For Deep Learning Training Using Edge Devices
- George Town, KY Lingjie XU - San Mateo CA, US Lingling JIN - San Mateo CA, US Wei ZHANG - San Mateo CA, US
International Classification:
G06N 3/08 G06N 20/00
Abstract:
The present disclosure provides systems and methods for deep learning training using edge devices. The methods can include identifying one or more edge devices, determining characteristics of the identified edge devices, evaluating a deep learning workload to determine an amount of resources for processing, assigning the deep learning workload to one or more identified edge devices based on the characteristics of the one or more identified edge devices, and facilitating communication between the one or more identified edge devices for completing the deep learning workload.
Methods And Devices For Power Management Based On Synthetic Machine Learning Benchmarks
- George Town, KY Lingjie XU - San Mateo CA, US Lingling JIN - San Mateo CA, US Wei ZHANG - San Mateo CA, US
International Classification:
G06F 1/329 G06N 20/00 G06N 5/04
Abstract:
A method for power management based on synthetic machine learning benchmarks, including generating a record of synthetic machine learning benchmarks for synthetic machine learning models that are obtained by changing machine learning network topology parameters, receiving hardware information from a client device executing a machine learning program or preparing to execute a machine learning program, selecting a synthetic machine learning benchmark based on the correlation of the hardware information with the synthetic machine learning models, and determining work schedules based on the selected synthetic machine learning benchmark.
System And Method For Synthetic-Model-Based Benchmarking Of Ai Hardware
- George Town, KY Lingjie Xu - Sunnyvale CA, US Lingling Jin - Sunnyvale CA, US
Assignee:
Alibaba Group Holding Limited - George Town
International Classification:
G06N 3/10 G06N 3/08 G06N 3/04
Abstract:
Embodiments described herein provide a system for facilitating efficient benchmarking of a piece of hardware configured to process artificial intelligence (AI) related operations. During operation, the system determines the workloads of a set of AI models based on layer information associated with a respective layer of a respective AI model. The set of AI models are representative of applications that run on the piece of hardware. The system forms a set of workload clusters from the workloads and determines a representative workload for a workload cluster. The system then determines, using a meta-heuristic, an input size that corresponds to the representative workload. The system determines, based on the set of workload clusters, a synthetic AI model configured to generate a workload that represents statistical properties of the workloads on the piece of hardware. The input size can generate the representative workload at a computational layer of the synthetic AI model.
Redundancy Elimination In Single Instruction Multiple Data/Thread (Simd/T) Execution Processing
- Suwon-si, KR John Brothers - Calistoga CA, US Santosh Abraham - Pleasanton CA, US Lingjie Xu - San Jose CA, US Maxim Lukyanov - Sunnyvale CA, US Alex Grosul - Santa Clara CA, US
International Classification:
G06F 9/30 G06F 9/38
Abstract:
A method for reducing execution of redundant threads in a processing environment. The method includes detecting threads that include redundant work among many different threads. Multiple threads from the detected threads are grouped into one or more thread clusters based on determining same thread computation results. Execution of all but a particular one thread in each of the one or more thread clusters is suppressed. The particular one thread in each of the one or more thread clusters is executed. Results determined from execution of the particular one thread in each of the one or more thread clusters are broadcasted to other threads in each of the one or more thread clusters.