Title/Author/Abstract
- Title:
Partial Array Expansion for Irregular Reductions
- Author:Eladio Gutierrez, Oscar Plata and Emilio L. Zapata
- Full Paper(.ps version)
- Abstract:
Array expansion is a very well-known technique for the
automatic parallelization of reductions on shared memory
multiprocessors. The technique is general and easy to use,
but is poorly scalable regarding to memory overhead
(the memory overhead is proportional to the number of
processors). This fact limits its applicability to large
problems and/or large machines. In addition, array expansion
does not take into account locality properties in the memory
access patterns, penalizing the performance in the present
day distributed shared memory machines.
For regular problems, many language and compile-time techniques
have been developed to both exploit data locality and reduce
memory overhead efficiently. In case of irregular codes, however,
the problem is much harder to solve and some runtime support
is required.
This paper presents an approach to a partial array expansion
for irregular reductions in which scalability (memory overhead)
is improved and data locality is exploited. To capture locality
in the irregular access pattern a prefetching phase is implemented.
The recognized locality allows to define sets of reduction iterations
with nonconflicting writes to the expanded array. This computation
reorganization permits to reduce the number of replicated arrays
(partial expansion) regarding array expansion with no significant
loose of parallelism and no computation replication. If the code
exhibits poor or null data locality (which is not frequent), the
proposed partial array expansion becomes array expansion.
For different classes of typical irregular programs we show that,
despite the prefetching phase memory cost,
there is enough locality to reduce significantly the total memory
overhead of the array expansion, improving its scalability.
In addition, as data locality is exploited, the efficiency of
execution of the parallel irregular codes is improved on
distributed shared memory machines.
Finally, an implementation of the partial array expansion is
presented, showing that the transformation that the compiler
must apply to the irregular reduction code is not complex.
Although data locality exploitation is accomplished at runtime,
no special runtime support is needed.
Abstract:
Please contact our
webadmin with any comments or changes.
Unless explicitly stated otherwise, all material is
copyright © The University of Edinburgh.