Parallel Programming and Heterogeneous Computing (2019)

Prof. Dr. Andreas Polze

Max Plauth
Felix Eberhardt
Sven Köhler
Lukas Wenzel
Robert Schmid

In this lecture you will learn the theoretical and practical solution available for parallel software development.

Since the very beginning of computers, processors were build with ever-increasing clock frequencies and instruction-level optimizations for faster serial code execution, such as ILP, caches, or speculative engines. Software developers and industry got used to the fact that applications get faster by just exchanging the underlying hardware. For several years now, these rules are proven to be no longer valid. Moore's law about the ever-increasing number of transistors per die is still valid, but decreased structural sizes and increased power consumption demand stalling, or even reduced, clock frequencies. Due to this development, serial execution performance no longer improves automatically with the next processor generation.

In the 'many-core era' that happens now, additional transistors are used not to speed up serial code paths, but to offer multiple execution engines ('cores') per processor. This changes every desktop-, server-, or even mobile system into a parallel computer. The exploitation of additional transistors is therefore now the responsibility of software, which makes parallel programming a mandatory approach for all software with scalability demands.

The following topics and technologies will be covered:

Theory: Flynn's Taxonomy, Strategies, Memory Models
Shared Memory: PThreads, C++11 Threads, Futures, OpenMP, TBB
Non-Uniform Memory Access: libnuma, PGASUS
On-Chip Accelerators: SIMD (SSE, AVX, AltiVec), Compression, LLVM-IR
External Accelerators: GPUs (OpenCL/CUDA), Xeon Phi, FGPAs (CAPI SNAP/MetalFS)
Shared Nothing: MPI, CloudCL, Actors (Erlang)
Future Developments: RISC-V, OpenCAPI, Gen-Z

Lecture

Extent: 4 SWS / 6 ECTS

Wed, 11:00 - 12:30, H.E-51
Thu, 13:30 - 15:00, H.E-51

All slides will be available in English. The lecture will be held in German, unless English is requested.

Exams

The oral exams will take place on September 9th, 13th and October 7th and 8th. Each exams lasts 30 minutes, and you are free to start with a short 3-5 minutes presentation (no slides) of a topic of choice from the lecture. Sign up for an exam slot using our online tool.

Assignments

Assignments have to be solved in teams of two persons. The oral exam admittance is achieved if 50% of each assignment is solved correctly. We will provide one non-mandatory assignment at the end of the semester, that can be used to cancel out one failed assignment.

All assigments are submitted to and valided by our submission system. Submissions failing the validator script will be treated like not handed in (thus failed).

Lectures

This preliminary planning may change in course of the semester.

Date	Content/Slides	Remarks
Wed, 2019-04-10	Introduction: Why Parallel Programming?
Thu, 2019-04-11	Symposium on Future Trends in Service-Oriented Computing	No lecture
Wed, 2019-04-17	Terminology
Thu, 2019-04-18	Lecture canceled due to sickness	No lecture
Wed, 2019-04-24	Metrics & Workloads
Thu, 2019-04-25	Shared-Memory: Concurrency, Assignment #1: PThreads and Synchronization
Wed, 2019-05-01	Maifeiertag
Thu, 2019-05-02	Shared-Memory: Programming Models
Wed, 2019-05-08	Shared-Memory: Programming Models (cont.: C++)	Demo: C++11, Atomic, OpenMP
Thu, 2019-05-09	Shared-Memory: Programming Models (cont.: OpenMP), Shared-Memory: Hardware
Wed, 2019-05-15	Feedback: Assignment 1, Assignment #2: Advanced Shared Memory and OpenMP, Hardware (cont.: ILP)	`[huge heat map test]`, Experiment ILP
Thu, 2019-05-16	Shared-Memory: Hardware (cont.: SMT)	Experiment SMT
Wed, 2019-05-22	Shared-Memory: Hardware (cont.: Memory Consistency)
Thu, 2019-05-23	Shared-Memory: Hardware (cont.: Cache Coherence), NUMA
Wed, 2019-05-29	NUMA (cont.), Profiling	Experiment FirstTouch, Matmul
Thu, 2019-05-30	Himmelfahrt	No lecture
Wed, 2019-06-05	SIMD: Integrated Accelerators	Demo AltiVec, Auto Vectorization
Thu, 2019-06-06	Heterogeneous Computing with GPUs and OpenCL, Assignment #3: Accelerators with OpenCL
Wed, 2019-06-12	Heterogeneous Computing with GPUs and OpenCL (cont.: Host Code Workflow)	Demo OpenCL
Thu, 2019-06-13	FPGA Accelerators, Feedback: Assignment 2
Wed, 2019-06-19	Metal FS and FPGA Programming Hands-On
Thu, 2019-06-20	Shared-Nothing: Theory, Shared-Nothing: Models, Assignment #4: Field-Programmable Gate Arrays	BSP Paper, LogP Paper
Wed, 2019-06-26	Hands-On: Software Development and Optimization Workflow with HLS
Thu, 2019-06-27	Shared-Nothing: MPI
Wed, 2019-07-03	Shared-Nothing: Cluster and Interconnects, Hands-On: Compiling, Executing and Placing of MPI Ranks/OMP Threads	MPI Placement Demos
Thu, 2019-07-04	Shared-Nothing: Actors
Wed, 2019-07-10	Feedback: Assignment 3, Assignment #5: Message Passing Interface and Actors	`wator2ppm.sh`
Thu, 2019-07-11	Feedback: Assignment 4
Wed, 2019-07-17	Summary
Thu, 2019-07-18	The End: Berkley Dwarfs, Hardware Trends, Your next Thesis
Thu, 2019-07-25	Feedback: Assignment 5	Erlang nqueens