GPUIterator: Bridging the Gap between Chapel and GPU Platforms (CHIUW 2019)

Sat 22 - Wed 26 June 2019 Phoenix, Arizona, United States

Who

Akihiro Hayashi, Sri Raj Paul, Vivek Sarkar

Track

CHIUW 2019

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sat 22 Jun 2019 10:00 - 10:25 at 212A - Chapel Implementation Improvements Chair(s): Michelle Strout

Abstract

PGAS (Partitioned Global Address Space) programming models were originally designed to facilitate productive parallel programming at both the intra-node and inter-node levels in homogeneous parallel machines. However, there is a growing need to support accelerators, especially GPU accelerators, in heterogeneous nodes in a cluster. Among high-level PGAS programming languages, Chapel is well suited for this task due to its use of locales and domains to help abstract away low-level details of data and compute mappings for different compute nodes, as well as for different processing units (CPU vs. GPU) within a node. In this paper, we address some of the key limitations of past approaches on mapping Chapel on to GPUs as follows. First, we introduce a Chapel module, GPUIterator, which is a portable programming interface that supports GPU execution of a Chapel forall loop. This module makes it possible for Chapel programmers to easily use hand-tuned native GPU programs/libraries, which is an important requirement in practice since there is still a big performance gap between compiler-generated GPU code and hand-turned GPU code; hand-optimization of CPU-GPU data transfers is also an important contributor to this performance gap. Second, though Chapel programs are regularly executed on multi-node clusters, past work on GPU enablement of Chapel programs mainly focused on single-node execution. In contrast, our work supports execution across multiple CPU+GPU nodes by accepting Chapel’s distributed domains. Third, our approach supports hybrid execution of a Chapel parallel (forall) loop across both a GPU and CPU cores, which is beneficial for specific platforms. Our preliminary performance evaluations show that the use of the GPUIterator is a promising approach for Chapel programmers to easily utilize a single or multiple CPU+GPU node(s) while maintaining portability.

Akihiro Hayashi

Rice University, USA

United States

Sri Raj Paul

Georgia Institute of Technology

United States

Vivek Sarkar

Rice University, USA

United States

Time Zone

The program is currently displayed in (GMT-07:00) Tijuana, Baja California.

Use conference time zone: (GMT-07:00) Tijuana, Baja CaliforniaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sat 22 Jun
Displayed time zone: Tijuana, Baja California change

10:00 - 10:50	Chapel Implementation ImprovementsCHIUW at 212A Chair(s): Michelle Strout University of Arizona

10:00 25m Research paper		GPUIterator: Bridging the Gap between Chapel and GPU Platforms CHIUW Akihiro Hayashi Rice University, USA, Sri Raj Paul Georgia Institute of Technology, Vivek Sarkar Rice University, USA
10:25 25m Talk		Calling Chapel Code: Interoperability Improvements CHIUW Lydia Duncan Cray Inc., David Iten Cray Inc.