6th ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming
Array-oriented programming offers a unique blend of programmer productivity and high-performance parallel execution. As an abstraction, it directly mirrors high-level mathematical constructions commonly used in many fields from natural sciences through engineering to financial modelling. As a language feature, it exposes regular control flow, exhibits structured data dependencies, and lends itself to many types of program analysis. Furthermore, many modern computer architectures, particularly highly parallel architectures such as GPUs and FPGAs, lend themselves to efficiently executing array operations.
The ARRAY workshop is intended to bring together researchers from many different communities, including language designers, library developers, compiler researchers, and practitioners, who are using or working on numeric, array-centric aspects of programming languages, libraries and methodologies from all domains: imperative or declarative; object-oriented or functional; interpreted or compiled; strongly typed, weakly typed, or untyped.
Submissions are due on 8th April 2019, and the workshop takes place on Saturday 22nd June 2019, colocated with PLDI in Phoenix, Arizona. For more information about how to submit, see the call for papers.
The ARRAY series of workshops explores:
- formal semantics and design issues of array-oriented languages and libraries;
- productivity and performance in compute-intensive application areas of array programming;
- systematic notation for array programming, including axis- and index-based approaches;
- intermediate languages, virtual machines, and program-transformation techniques for array programs;
- representation of and automated reasoning about mathematical structure, such as static and dynamic sparsity, low-rank patterns, and hierarchies of these, with connections to applications such as graph processing, HPC, tensor computation and deep learning;
- interfaces between array- and non-array code, including approaches for embedding array programs in general-purpose programming languages; and
- efficient mapping of array programs, through compilers, libraries, and code generators, onto execution platforms, targeting multi-cores, SIMD devices, GPUs, distributed systems, and FPGA hardware, by fully automatic and user-assisted means.
Array programming is at home in many communities, including language design, library development, optimization, scientific computing, and across many existing language communities. ARRAY is intended as a forum where these communities can exchange ideas on the construction of computational tools for manipulating arrays.
The invited talk will be given by Peter J. Braam:
Array Processing on Steroids for the SKA Radio-Telescope
Abstract: The Square Kilometre Array (SKA) radio telescope will be a massive scientific instrument entering service in the late 2020’s. The conversion of its antenna signals to images and the detection of transient phenomena is a massive computational undertaking, requiring 200PB/sec of memory bandwidth, all dedicated to array processing. In this talk we will give an overview of the data processing in the telescope and the process that has been followed to design suitable algorithms and systems. We will highlight parts of the challenge that have interesting relationships to computer science, and then transition to review recent technological developments such as memory, machine learning accelerators, and new floating point formats that may prove helpful.
Bio: Peter Braam is a scientist and entrepreneur focused on problems in large scale computing. Originally trained as a mathematician, he has worked at several academic institutions including Oxford, CMU and Cambridge. One of his startup companies developed the widely used Lustre file system. During the last few years he has focused on computing for the SKA telescope and on research in data intensive computing.
- ARRAY 2018, 19th June 2018, Philadelphia
- ARRAY 2017, 18th June 2017, Barcelona
- ARRAY 2016, 14th June 2016, Santa Barbara
- ARRAY 2015, 13th June 2015, Portland
- ARRAY 2014, 12th-13th June 2014, Edinburgh
Call for Papers
Submissions are welcome in two categories: full papers and extended abstracts. All submissions should be formatted in conformance with the ACM SIGPLAN proceedings style. Accepted submissions in either category will be presented at the workshop.
Full papers may be up to 12pp, on any topic related to the focus of the workshop. They will be thoroughly reviewed according to the usual criteria of relevance, soundness, novelty, and significance; accepted submissions will be published in the ACM Digital Library.
Extended abstracts may be up to 2pp; they may describe work in progress, tool demonstrations, and summaries of work published in full elsewhere. The focus of the extended abstract should be to explain why the proposed presentation will be of interest to the ARRAY audience. Submissions will be lightly reviewed only for relevance to the workshop, and will not published in the DL.
Whether full papers or extended abstracts, submissions must be in PDF format, printable in black and white on US Letter sized paper. Papers must adhere to the standard SIGPLAN conference format: two columns, ten-point font. A suitable document template for LaTeX is available at http://www.sigplan.org/Resources/Author/.
Papers must be submitted using EasyChair.
Authors take note: The official publication date of full papers is the date the proceedings are made available in the ACM Digital Library. This date may be up to two weeks prior to the workshop. The official publication date affects the deadline for any patent filings related to published work.
TeIL: a type-safe imperative Tensor Intermediate Language
Abstract: Each of the popular tensor frameworks from the machine learning domain comes with its own language for expressing tensor kernels.
Since these tensor languages lack precise specifications, it is impossible to understand and reason about tensor kernels that exhibit unexpected behaviour.
In this paper, we give examples of such kernels.
The tensor languages are superficially similar to the well-known functional array languages, for which formal definitions often exist.
However, the tensor languages are inherently imperative.
In this paper we present TeIL, an imperative tensor intermediate language with precise formal semantics.
For the popular tensor languages, TeIL can serve as a common ground on the basis of which precise reasoning about kernels becomes possible.
Based on TeIL's formal semantics we develop a type-safety result in the Coq proof assistant.
ALPyNA: Acceleration of Loops in Python for Novel Architectures
Abstract: We present ALPyNA, an automatic loop parallelization framework for
Python, which analyzes data dependences within nested loops and
dynamically generates CUDA kernels for GPU execution.
The ALPyNA system applies classical dependence analysis
techniques to discover and exploit potential parallelism.
The skeletal structure of the dependence graph is
determined statically; this is combined with
type and bounds information discovered at runtime, to auto-generate
high-performance kernels for offload to GPU.
We demonstrate speedups of up to 1000x relative to the native
CPython interpreter across four
array-intensive numerical Python benchmarks.
Performance improvement is related to iteration domain sizes and the
complexity of the dependence graph.
Nevertheless, this approach promises to bring the benefits of
manycore parallelism to end-user developers.
Data-Parallel Flattening by Expansion
Abstract: We present a higher-order programmer-level technique for compiling particular kinds of irregular data-parallel problems to parallel hardware. The technique, which we have named ``flattening-by-expansion'' builds on a number of segmented data-parallel operations but is itself implemented as a higher-order generic function, which makes it useful for many irregular problems. Concretely, the implementation is given in Futhark and we demonstrate the usefulness of the functionality for a number of irregular problems and show that, in practice, the irregular problems are compiled to efficient parallel code that can be executed on GPUs. The technique is useful in any data-parallel language that provides a key set of primitives.
Records with Rank Polymorphism
Abstract: In a rank-polymorphic programming language, all functions automatically lift to operate on arbitrarily high-dimensional aggregate data. By adding records to such a language, we can support computation on data frames, a tabular data structure containing heterogeneous data but in which individual columns are homogeneous. In such a setting, a data frame is a vector of records, subject to both ordinary array operations (e.g., filtering, reducing, sorting) and lifted record operations—projecting a field lifts to projecting a column. Data frames have become a popular tool for exploratory data analysis, but fluidity of interacting with data frames via lifted record operations depends on how the language's records are designed.
We investigate three languages with different notions of record data: Racket, Standard ML, and Python. For each, we examine several common tasks for working with data frames and how the language's records make these tasks easy or hard. Based on their advantages and disadvantages, we synthesize their ideas to produce a design for record types which is flexible for both scalar and lifted computation.
Code Generation in Linnea (extended abstract)
Abstract: Linnea is a code generator for the translation of high-level linear algebra problems to efficient code. Unlike other languages and libraries for linear algebra, Linnea heavily relies on domain-specific knowledge to rewrite expressions and infer matrix properties.
Here we focus on two aspects related to code generation and matrix properties: 1) The automatic generation of code consisting of explicit calls to BLAS and LAPACK kernels, and the corresponding challenge with specialized storage formats. 2) A general notion of banded matrices can be used to simplify the inference of many matrix properties. While it is crucial to make use of matrix properties to achieve high performance, inferring those properties is challenging. We show how matrix bandwidth can be used as a unifying language to reason about many common matrix properties.
Finite Difference Methods Fengshui: Alignment through a Mathematics of Arrays
Abstract: Numerous scientific-computational domains make use of
array data. The core computing of the numerical methods
and the algorithms involved is related to multi-dimensional
arrays manipulation. Memory layout and the access patterns of that data are crucial to the optimal performance of
the array-based computations. As we move towards exascale computing, writing portables codes for efficient data
parallel computations is increasingly requiring an abstract
productive working environment. In this paper, we present
the design of a productive, reusable and tunable framework
for optimising scientific array-based computations, building a case study for a Partial differential equations solver.
By embedding the Mathematics of Arrays formalism to the
Magnolia programming language, we assemble a software
stack capable of abstracting the continuous high-level application layer from the discrete formulation of the collective
array-based numerical methods and algorithms and the final detailed low-level code. The case study shows how optimised memory layout and efficient computations can be
achieved while preserving the application abstraction layer
stable and independent of underlying algorithms and architectures changes.
Linear Algebraic Depth-First Search
Abstract: There is a recent push by a segment of the graph community to implement graph algorithms in the language of linear algebra.
However, graph algorithms that depend on depth-first search techniques are often highlighted as limitations of the linear algebraic approach as there are no existing formulation of depth-first search graph algorithms expressed in the language of linear algebra.
This paper provides a linear algebraic approach for developing depth-first search graph traversal algorithms and demonstrates its use for defining three classical DFS-based algorithms: Binary tree traversal, graph traversal, and biconnected components.
Toward Generalized Tensor Algebra for ab initio Quantum Chemistry Methods
Abstract: The widespread use of tensor operations in describing electronic structure
calculations has motivated the design of software frameworks for productive
development of scalable optimized tensor-based electronic structure methods.
Whereas prior work focused on Cartesian abstractions for dense tensors,
we present an algebra to specify and perform tensor operations on a larger class
of block-sparse tensors. We illustrate the use of this framework in expressing
real-world computational chemistry calculations beyond the reach of existing
Convolutional Neural Networks in APL
Abstract: This paper shows how Convolutional Neural Networks (CNN) can be implemented in APL. Its first-class array support ideally fits that domain, and its operators facilitate rapid and concise creation of generically reusable building blocks. For our example, there are ten such blocks, expressed as ten lines of native APL code, free of explicit array indexing. Compositions of such APL abstractions are very useful for prototyping, particularly by domain experts whose primary interests lie outside of programming. The functional nature of operators provides a highly portable specification that is suitable for high-performance optimizations and parallel execution. We explain each CNN building block, and briefly discuss the performance of the resulting specification.
High-Level Synthesis of Functional Patterns with Lift
Abstract: High-level languages are commonly seen as a good fit to tackle the problem of performance portability across parallel architectures. The Lift framework is a recent approach which has successfully demonstrated its ability to address the challenge of performance portability across multiple types of CPU and GPU devices. The functional and pattern-based nature of Lift simplifies the optimization of programs using rewrite-rules.
In this work we extend Lift to target FPGA-based platforms. We designed parallel patterns operating on data stream, and we implemented a Lift to VHDL backend. This contribution has been evaluated on a Xilinx XC7Z010 FPGA, using matrix multiplication for which we observe up to 10x speed-up over highly optimized CPU code and a commercial HLS tool. Furthermore, by considering the potential of design space exploration enabled by Lift, this work is a stepping stone towards automatically generated competitive code for FPGAs.