Bookshelf
Memory Management on IBM Power Systems with NUMA Characteristics based on the PGASUS Programming FrameworkMasterarbeit von Karsten Tausche Hasso-Plattner-Institut an der Universität Potsdam | 19. Oktober 2017 In today's server market, non-uniform memory access (NUMA) architectures become increasingly common. They allow high degrees of parallelization within single machines, while circumventing physical limitations of multi-core CPUs. Efficient exploitation of NUMA machines requires NUMA-awareness in software implementations. While NUMA-aware operating systems do efficiently handle basic cases, sophisticated optimizations require knowledge of application-level logic. Commonly used generalpurpose programming languages such as C++, however, are currently missing support for NUMA-aware programming. The PGASUS C++ programming framework employs a programming model that enables NUMA-aware programming in C++. Previously presented benchmarks demonstrate that using PGASUS may provide substantial performance gains in specific applications. However, a thorough analysis of advantages and disadvantages of PGASUS-based applications throughout different usage contexts was yet missing. This thesis presents an analysis of PGASUS -based, NUMA-aware memory management. Following insights are gained: For memory-bound applications, NUMA-aware programming using PGASUS provides substantial performance gains. In benchmarks, throughput increases of up to 1.5 orders of magnitude were achieved. In compute-bound benchmarks, however, employing PGASUS did not lead to performance gains. Instead, PGASUS's memory allocation functions introduce an overhead of around 10% compared to the standard glibc allocator. Based on analyses of benchmark results, following issues and required optimizations of PGASUS were identified. First, small allocations are slowed down by locking of every allocation. Second, large allocations suffer from frequent and expensive library calls for adjusting NUMA policies. To solve these issues, we recommend employing caching for all allocation sizes, and per-thread data structures that handle most allocations without locks. Both approaches are derived from concepts of high-concurrency memory allocators. Moreover, this thesis presents a preliminary evaluation of task management approaches on NUMA architectures. Promising benchmark results were achieved by combining PGASUS -based memory management with task management based on OpenMP. PGASUS's own task management module, however, did in its current state not produce competitive performance. Finally, in scope of this thesis, PGASUS was ported to the POWER platform. Porting steps include replacement of x86 assembly with Power ISA equivalents, and generalizations of x86-specific assumptions to support virtualized topologies of POWER LPARs. |
||||||||||||||||
|
||||||||||||||||
|