Benjamin G. Alexander - Austin TX, US Gregory H. Bellows - Austin TX, US Joaquin Madruga - Austin TX, US Barry L. Minor - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 11/00 G06F 11/14
US Classification:
714 13, 714 10, 714 15, 714 381
Abstract:
Disclosed are a method, a system and a computer program product of operating a data processing system that can include or be coupled to multiple processor cores. In one or more embodiments, an error can be determined while two or more processor cores are processing a first group of two or more work items, and the error can be signaled to an application. The application can determine a state of progress of processing the two or more work items and at least one dependency from the state of progress. In one or more embodiments, a second group of two or more work items that are scheduled for processing can be unscheduled, in response to determining the error. In one or more embodiments, the application can process at least one work item that caused the error, and the second group of two or more work items can be rescheduled for processing.
Reducing Queue Synchronization Of Multiple Work Items In A System With High Memory Latency Between Processing Nodes
Benjamin G. Alexander - Austin TX, US Gregory H. Bellows - Austin TX, US Joaquin Madruga - Austin TX, US Barry L. Minor - Austin TX, US
Assignee:
International Business Machines Corporation - Armonk NY
International Classification:
G06F 9/46
US Classification:
718104
Abstract:
A system efficiently dispatches/completes a work element within a multi-node, data processing system that has a global command queue (GCQ) and at least one high latency node. The system comprises: at the high latency processor node, work scheduling logic establishing a local command/work queue (LCQ) in which multiple work items for execution by local processing units can be staged prior to execution; a first local processing unit retrieving via a work request a larger chunk size of work than can be completed in a normal work completion/execution cycle by the local processing unit; storing the larger chunk size of work retrieved in a local command/work queue (LCQ); enabling the first local processing unit to locally schedule and complete portions of the work stored within the LCQ; and transmitting a next work request to the GCQ only when all the work within the LCQ has been dispatched by the local processing units.
Benjamin G. Alexander - Austin TX, US Gregory H. Bellows - Austin TX, US Joaquin Madruga - Austin TX, US Brian D. Watt - Round Rock TX, US
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION - Armonk NY
International Classification:
G06F 9/46
US Classification:
718102
Abstract:
Execution units process commands from one or more command queues. Once a command is available on the queue, each unit participating in the execution of the command atomically decrements the command's work groups remaining counter by the work group reservation size and processes a corresponding number of work groups within a work group range. Once all work groups within a range are processed, an execution unit increments a work group processed counter. The unit that increments the work group processed counter to the value stored in a work groups to be executed counter signals completion of the command. Each execution unit that access a command also marks a work group seen counter. Once the work groups processed counter equals the work groups to be executed counter and the work group seen counter equals the number of execution units, the command may be removed or overwritten on the command queue.
Reducing Cross Queue Synchronization On Systems With Low Memory Latency Across Distributed Processing Nodes
Benjamin G. Alexander - Austin TX, US Gregory H. Bellows - Austin TX, US Joaquin Madruga - Austin TX, US Barry L. Minor - Austin TX, US
Assignee:
IBM CORPORATION - Armonk NY
International Classification:
G06F 9/46 G06F 15/76 G06F 9/308
US Classification:
718104, 712 28, 712E09019, 718105
Abstract:
A method for efficient dispatch/completion of a work element within a multi-node data processing system. The method comprises: selecting specific processing units from among the processing nodes to complete execution of a work element that has multiple individual work items that may be independently executed by different ones of the processing units; generating an allocated processor unit (APU) bit mask that identifies at least one of the processing units that has been selected; placing the work element in a first entry of a global command queue (GCQ); associating the APU mask with the work element in the GCQ; and responsive to receipt at the GCQ of work requests from each of the multiple processing nodes or the processing units, enabling only the selected specific ones of the processing nodes or the processing units to be able to retrieve work from the work element in the GCQ.
Reducing Cross Queue Synchronization On Systems With Low Memory Latency Across Distributed Processing Nodes
Benjamin G. Alexander - Austin TX, US Gregory H. Bellows - Austin TX, US Joaquin Madruga - Austin TX, US Barry L. Minor - Austin TX, US
Assignee:
IBM CORPORATION - Armonk NY
International Classification:
G06F 9/50
US Classification:
718104
Abstract:
A method for efficient dispatch/completion of a work element within a multi-node data processing system. The method comprises: selecting specific processing units from among the processing nodes to complete execution of a work element that has multiple individual work items that may be independently executed by different ones of the processing units; generating an allocated processor unit (APU) bit mask that identifies at least one of the processing units that has been selected; placing the work element in a first entry of a global command queue (GCQ); associating the APU mask with the work element in the GCQ; and responsive to receipt at the GCQ of work requests from each of the multiple processing nodes or the processing units, enabling only the selected specific ones of the processing nodes or the processing units to be able to retrieve work from the work element in the GCQ.
Method To Reduce Queue Synchronization Of Multiple Work Items In A System With High Memory Latency Between Processing Nodes
Benjamin G. Alexander - Austin TX, US Gregory H. Bellows - Austin TX, US Joaquin Madruga - Austin TX, US Barry L. Minor - Austin TX, US
Assignee:
IBM CORPORATION - Armonk NY
International Classification:
G06F 9/50
US Classification:
718104
Abstract:
A method efficiently dispatches/completes a work element within a multi-node, data processing system that has a global command queue (GCQ) and at least one high latency node. The method comprises: at the high latency processor node, work scheduling logic establishing a local command/work queue (LCQ) in which multiple work items for execution by local processing units can be staged prior to execution; a first local processing unit retrieving via a work request a larger chunk size of work than can be completed in a normal work completion/execution cycle by the local processing unit; storing the larger chunk size of work retrieved in a local command/work queue (LCQ); enabling the first local processing unit to locally schedule and complete portions of the work stored within the LCQ; and transmitting a next work request to the GCQ only when all the work within the LCQ has been dispatched by the local processing units.