Sat 22 Jun 2019 16:00 - 16:30 at 106C - Session 5 Chair(s): Lenore Mullin

We present ALPyNA, an automatic loop parallelization framework for Python, which analyzes data dependences within nested loops and dynamically generates CUDA kernels for GPU execution. The ALPyNA system applies classical dependence analysis techniques to discover and exploit potential parallelism. The skeletal structure of the dependence graph is determined statically; this is combined with type and bounds information discovered at runtime, to auto-generate high-performance kernels for offload to GPU. We demonstrate speedups of up to 1000x relative to the native CPython interpreter across four array-intensive numerical Python benchmarks. Performance improvement is related to iteration domain sizes and the complexity of the dependence graph. Nevertheless, this approach promises to bring the benefits of manycore parallelism to end-user developers.

Sat 22 Jun

Displayed time zone: Tijuana, Baja California change

16:00 - 17:30
Session 5ARRAY at 106C
Chair(s): Lenore Mullin SUNY Albany, USA
16:00
30m
Talk
ALPyNA: Acceleration of Loops in Python for Novel Architectures
ARRAY
A: Dejice Jacob , A: Jeremy Singer University of Glasgow
16:30
30m
Talk
Code Generation in Linnea (extended abstract)
ARRAY
A: Henrik Barthels RWTH Aachen, A: Paolo Bientinesi UmeƄ University
17:00
30m
Talk
High-Level Synthesis of Functional Patterns with Lift
ARRAY
A: Martin Kristien University of Edinburgh, UK, A: Bruno Bodin Yale-NUS College, A: Michel Steuwer University of Glasgow, A: Christophe Dubach University of Edinburgh