Multipartitionings, also known as Bruno-Capella partitionings, are a strategy for partitioning multidimensional arrays among a collection of processors so that line-sweep computations can be solved efficiently. The principal property of these partitionings is that for any directional sweep across a multipartitioned array, all processors have a tile to compute in each phase of the computation yielding a perfectly balanced parallelization with no serialization. A secondary benefit of these partitionings is that only coarse-grain communication is required. Any number of processors can be utilized with a two-dimensional multipartitioning, but a 3-dimensional multipartioning requires that the number of processors be a perfect square. In this work we explore the problem of generalizing partitionings in three or more dimensions to any number of processors that is a composite number. Two interesting problems must be solved for such partitionings. First, we must determine the number of cuts to be applied to each partitioned dimension to maintain full parallelism in each phase of a sweep across any data dimension. The second problem is determining the assignment of tiles to each processor. We are exploring how to best compute effective assignments for the more general case.
We are in the process of completing an implementation of multipartitioning in the Rice dHPF compiler for High Performance Fortran. Our prototype has been used to parallelize a sequential version of the NAS BT computational fluid dynamics benchmark from NASA Ames Laboratory, and, to date, this compiler-based parallelization achieves 80% of the parallel efficiency of the hand-coded parallelization on 25 processors. We intend to incorporate our strategy for achieving partitioning in three or more dimensions for a composite number of processors into a run-time library for the dHPF compiler.