Non-Uniform Memory Access (NUMA) Seminar (2014)
Prof. Dr. Andreas Polze
Felix Eberhardt, M.Sc.
Frank Feinbube, M.Sc.
Frank Feinbube, M.Sc.
Organization
Extent: 2 semester hours (3 graded credit points)
Dates: Wednesday, 11.00 - 12.30, HS3
The seminar focusses on literature review. Participants are required to read fundamental scientific publications on non-uniform memory access (NUMA) systems, and present them to their fellow students.- Each participant is expected to give a 30-45 minute presentation on a topic.
- Presentation slides should be discussed with a supervisor one week prior to the presentation date.
- At the end of the seminar, we plan to assemble a technical report about your seminar topics.
Topics
1. NUMA system architecture
1.1 Multiprocessor architectures (historic, current): AMD, Intel, IBM, Sparc1.2 Interconnection technologies
1.3 Cache coherency
2. Operating systems
2.1 Scientific approaches: Thread and data placement2.2 Topology discovery
2.3 Kernel APIs for thread and data placement
3. Programming models
3.1 NUMA-aware algorithms3.2 OpenMP
3.3 OpenMPI
3.4 NUMA-aware hybrid computing with OpenCL/CUDA
3.5 NUMA support in high level programming languages (Java, Python, C#, ...)
3.6 PGAS (Unified Parallel C, Coarray Fortran, Fortress, Chapel, X10, and Global Arrays)
3.7 C++11 Memory consitency protocol
4. Profiling
4.1 Performance counter4.2 Instruction based sampling
4.3 Scientific approaches: Profilers/analyzing runtime behaviour
4.4 Profiling tools: Intel, AMD, IBM, Sun/Sparc
5. Case study: Linux NUMA balance evolution
We are open for any topic suggestions. Each of the following proposed topics may be worked on by one or two students.Presentation Dates
According to your prioritized lists of topics the seminar schedule is as follows:
Date | Topic | Presenter | Topic discussion | Presentation review |
15.10.2014 | Introduction | Felix Eberhardt | - | - |
22.10.2014 | Topic assignment | Felix Eberhardt | - | - |
29.10.2014 | FutureSOC Lab Day | - | - | - |
05.11.2014 | tba | Alexander Böhm (SAP) | - | - |
12.11.2014 | Introduction to FutureSOC Lab | Felix Eberhardt | - | - |
19.11.2014 | Multiprocessor architectures | Kirstin Heidler | 05.11.2014 | 12.11.2014 |
Cache coherency | Johannes Frohnhofen | 05.11.2014 | 12.11.2014 | |
26.11.2014 | Interconnection technologies | Elina Zarisheva | 12.11.2014 | 24.11.2014 |
Scientific approaches: Thread and data placement | Fabian Eckert | 12.11.2014 | 24.11.2014 | |
03.12.2014 | Case study: Linux NUMA evolution | Fredrik Teschke, Lukas Pirl | 06.11.2014 | 26.11.2014 |
Kernel APIs for thread and data placement | Dimitri Korsch | 19.11.2014 | 26.11.2014 | |
10.12.2014 | Scientific approaches: NUMA Profilers/analyzing runtime behaviour | Malte Swart | 24.11.2014 | 04.12.2014 |
Performance Counter | Karsten Tausche | 24.11.2014 | 04.12.2014 | |
17.12.2014 | Topology discovery | Sven Knebel | 01.12.2014 | 10.12.2014 |
NUMA in high level programming languages | Patrick Siegler | 01.12.2014 | 10.12.2014 | |
07.01.2015 | NUMA with OpenCL | Jan Philipp Sachse | 25.11.2014 | 16.12.2014 |
C++11: Memory consistency | Sebastian Gerstenberg | 17.12.2014 | 28.12.2014 | 14.01.2015 | NUMA with OpenMP | Matthias Springer | 17.12.2014 | 12.01.2015 |
NUMA with OpenMPI | Carolin Fiedler | 17.12.2014 | 12.01.2015 | |
21.01.2015 | NUMA-aware algorithms (matrix multiplication) | Max Reimann, Philipp Otto | 24.11.2014 | 19.01.2015 |
28.01.2015 | NUMA-aware algorithms (reader-writer locks) | Tom Herold, Marco Lamina | 03.12.2014 | 21.01.2015 |
04.02.2015 | NUMA-aware algorithms (SURF) | Christoph Sterz, Patrick Schmidt | 19.12.2014 | 02.02.2015 |
Start Literature
- Molka, Daniel, et al. "Memory performance and cache coherency effects on an Intel Nehalem multiprocessor system." Parallel Architectures and Compilation Techniques, 2009. PACT'09. 18th International Conference on. IEEE, 2009.
- Majo, Zoltan, and Thomas R. Gross. "Memory System Performance in a NUMA Multicore Multiprocessor." (2011).
- Antony, Joseph, Pete P. Janes, and Alistair P. Rendell. "Exploring thread and memory placement on NUMA architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport." High Performance Computing-HiPC 2006. Springer Berlin Heidelberg, 2006. 338-352.
- LaRowe Jr, Richard P., and Carla Schlatter Ellis. "Experimental comparison of memory management policies for NUMA multiprocessors." ACM Transactions on Computer Systems (TOCS) 9.4 (1991): 319-363.
- Su, ChunYi, et al. "Critical path-based thread placement for NUMA systems." ACM SIGMETRICS Performance Evaluation Review 40.2 (2012): 106-112.
- Fowler, Rob, Anirban Mandal, and Min Yeol Lim. "Performance Consistency on Multi-socket AMD Opteron Systems." (2008).
- Löf, Henrik, and Sverker Holmgren. "affinity-on-next-touch: increasing the performance of an industrial PDE solver on a cc-NUMA system." Proceedings of the 19th annual international conference on Supercomputing. ACM, 2005.
- Blagodurov, Sergey, et al. "A case for NUMA-aware contention management on multicore systems." Proceedings of the 19th international conference on Parallel architectures and compilation techniques. ACM, 2010.
- Majo, Zoltan, and Thomas R. Gross. "Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead." ACM SIGPLAN Notices. Vol. 46. No. 11. ACM, 2011.
- Dashti, Mohammad, et al. "Traffic management: a holistic approach to memory placement on NUMA systems." ACM SIGPLAN Notices 48.4 (2013): 381-394.
- Li, Yinan, et al. "NUMA-aware algorithms: the case of data shuffling." CIDR. 2013.
- Majo, Zoltan, and Thomas R. Gross. "A template library to integrate thread scheduling and locality management for NUMA multiprocessors." Proc. of the 4th USENIX conference on Hot Topics in Parallelism (HotPar). 2012.
- Broquedis, François, et al. "ForestGOMP: an efficient OpenMP environment for NUMA architectures." International Journal of Parallel Programming 38.5-6 (2010): 418-439.
- Broquedis, François, et al. "Dynamic task and data placement over NUMA architectures: an OpenMP runtime perspective." Evolving OpenMP in an Age of Extreme Parallelism. Springer Berlin Heidelberg, 2009. 79-92.
- Olivier, Stephen L., et al. "OpenMP task scheduling strategies for multicore NUMA systems." International Journal of High Performance Computing Applications (2012): 1094342011434065.
- Durand, Marie, et al. "An efficient openmp loop scheduler for irregular applications on large-scale numa machines." OpenMP in the Era of Low Power Devices and Accelerators. Springer Berlin Heidelberg, 2013. 141-155.
- Li, Shigang, Torsten Hoefler, and Marc Snir. "NUMA-aware shared-memory collective communication for MPI." Proceedings of the 22nd international symposium on High-performance parallel and distributed computing. ACM, 2013.
- Marathe, Jaydeep, and Frank Mueller. "Hardware profile-guided automatic page placement for ccNUMA systems." Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming. ACM, 2006.
- Zaparanuks, Dmitrijs, Milan Jovic, and Matthias Hauswirth. "Accuracy of performance counter measurements." Performance Analysis of Systems and Software, 2009. ISPASS 2009. IEEE International Symposium on. IEEE, 2009.
- McCurdy, Collin, and Jeffrey Vetter. "Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms." Performance Analysis of Systems & Software (ISPASS), 2010 IEEE International Symposium on. IEEE, 2010.
- Lachaize, Renaud, Baptiste Lepers, and Vivien Quéma. "MemProf: A Memory Profiler for NUMA Multicore Systems." USENIX Annual Technical Conference. 2012.
- http://thread.gmane.org/gmane.linux.kernel/1699668
- http://lwn.net/Articles/486858/