DescriptionFMM and H-matrices share a rare combination of optimally low O(N) arithmetic complexity and high arithmetic intensity (flops/Byte), with important phases being nearly compute-bound. This stands in contrast to workhorse solvers that have either low arithmetic complexity with low arithmetic intensity (e.g., Fast Fourier Transform and multigrid), or high arithmetic intensity with high arithmetic complexity (e.g., dense linear algebra and direct N-body summation). In short, fast multipole and H-matrix methods are mathematically efficient algorithms that are and likely will remain computationally efficient, in the sense of being compute-bound, on future architectures. Furthermore, when it comes to distributed memory applications, these methods have a communication complexity of O(log P) for P processors, and permit high asynchronicity in their communication. They are therefore amenable to asynchronous programming models, which have been gaining popularity as purely bulk synchronous models appear stressed by energy-austere exascale hardware trends. Much research exists on the mathematics of hierarchically low-rank methods and many high-performance implementations are available for traditional full rank methods. HiCMA’s target is their intersection on accelerators, where the hierarchy needs flattening. Some modules of this open source project have already been adopted in the software libraries of major vendors. We describe currently available modules of this work in progress.