Sparse Computation Data Dependence Simplification for Efficient Compiler-Generated Inspectors
This paper presents a combined compile-time and runtime loop-carried
dependence analysis of sparse matrix codes and evaluates its
performance in the context of wavefront parallellism.
Sparse computations incorporate indirect memory accesses such as x[col[j]]
whose memory locations cannot be determined until runtime.
The key contributions of this paper are two compile-time techniques for
significantly reducing the overhead of runtime dependence testing:
(1) identifying new equality constraints that result in more efficient runtime inspectors, and
(2) identifying subset relations between dependence constraints such
that one dependence test subsumes another one that is therefore eliminated.
New equality constraints discovery is
enabled by taking advantage of domain-specific knowledge about index arrays, such as col[j].
These simplifications lead to automatically-generated
inspectors that make it practical to parallelize such computations.
We analyze our simplification methods for a collection of seven sparse computations.
The evaluation shows our methods reduce the complexity of the runtime
inspectors significantly.
Experimental results for a collection of five large matrices
show parallel speedups ranging from 2x to more than 8x running on a 8-core CPU.