Parallel Programming Concepts

Winter term 2011/12

Dr. Peter Tröger, Prof. Dr. Andreas Polze
Frank Feinbube

Since the very beginning of computers, processors were build with ever-increasing clock frequencies and instruction-level optimizations for faster serial code execution, such as ILP, caches, or speculative engines. Software developers and industry got used to the fact that applications get faster by just exchanging the underlying hardware. For several years now, these rules are proven to be no longer valid. Moore's law about the ever-increasing number of transistors per die is still valid, but decreased structural sizes and increased power consumption demand stalling, or even reduced, clock frequencies. Due to this development, serial execution performance no longer improves automatically with the next processor generation.

In the 'many-core era' that happens now, additional transistors are used not to speed up serial code paths, but to offer multiple execution engines ('cores') per processor. This changes every desktop-, server-, or even mobile system into a parallel computer. The exploitation of additional transistors is therefore now the responsibility of software, which makes parallel programming a mandatory approach for all software with scalability demands.

In this course, we want to discuss the relevant theoretical and practical solutions available for parallel software development. The following topics are covered:

Block 1: Introduction
- Introduction
Block 2: Formal aspects of concurrency
- Theory of Concurrency - Shared Memory
- Theory of Concurrency - Multicomputer
Block 3: Parallel hardware
- Parallel Hardware and Interconnects
- GPU Compute Devices
Block 4: Parallel programming
Block 5: Examples

Assignments

Please make sure that your group is registered in the assignment submission system. You can use your HPI login credentials (information from the lecture was incorrect).

Assignment 1: Shared memory programming with basic synchronization primitives
Assignment 2: Data-Parallel Programming (OpenCL example code)
Assignment 3: Wator with Scala

Rules

There are three mandatory assignments, to be solved alone or as team of two persons. The oral exam admittance is achieved if two out of three assignments are solved correctly.

Recommended Readings

Book

C. Breshears, The Art of Concurrency: A Thread Monkey's Guide to Writing Parallel Applications. O'Reilly Media, Inc., 2009.
K. Mani Chandy and J. Misra, Parallel Programming Design. .
I. Foster, Designing and Building Parallel Programs. Addison-Wesley, 1995.
P. B. Hansen, Ed., The origin of concurrent programming: From semaphores to remote procedure calls. New York, NY, USA: Springer-Verlag New York, Inc., 2002.
M. Herlihy and N. Shavit, The Art of Multiprocessor Programming. Morgan Kaufmann, 2008.
J. Jaja, An introduction to parallel algorithms. Redwood City, CA, USA: Addison Wesley Longman Publishing Co., Inc., 1992.
D. B. Kirk and W. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, 1st ed. Morgan Kaufmann, 2010.
N. Lynch, Distributed Algorithms. Morgan Kaufmann Series in Data Management Systems, 1997.
T. G. Mattson, B. A. S, ers, and B. L. Massingill, Patterns for Parallel Programming (Software Patterns Series), 1st ed. Addison-Wesley Professional, 2004.
R. H. Perrott, Parallel Programming. Addison-Wesley Publishing Company, 1987.
S. Schneider, Concurrent and Real Time Systems: The CSP Approach. New York, NY, USA: John Wiley Sons, Inc., 1999.

Book Section

G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, pp. 79-81.
P. J. Courtois, F. Heymans, and D. L. Parnas, "Concurrent control with 'readers' and 'writers'," D. M. Hoffman and D. M. Weiss, Eds. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2001, pp. 389-392.
E. W. Dijkstra, "Hierarchical ordering of sequential processes," in The origin of concurrent programming: From semaphores to remote procedure calls, New York, NY, USA: Springer-Verlag New York, Inc., 2002, pp. 198-227.
E. W. Dijkstra, "Cooperating sequential processes," in The origin of concurrent programming: From semaphores to remote procedure calls, New York, NY, USA: Springer-Verlag New York, Inc., 2002, pp. 65-138.
C. A. R. Hoare, "Towards a theory of parallel programming," in The origin of concurrent programming: From semaphores to remote procedure calls, New York, NY, USA: Springer-Verlag New York, Inc., 2002, pp. 231-244.

Journal Article

M. E. Conway, "Design of a separable transition-diagram compiler," Commun. ACM, vol. 6, no. 7, pp. 396-408, Jul. 1963.
F. Feinbube, P. Tröger, and A. Polze, "Joint Forces: From Multithreaded Programming to GPU Computing," IEEE Software (Software), vol. 28, no. 1, pp. 51-57, Oct. 2010.
D. Gelernter, "Generative communication in Linda," ACM Trans. Program. Lang. Syst., vol. 7, no. 1, pp. 80-112, Jan. 1985.
H. González-Vélez and M. Leyton, "A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers," Software: Practice and Experience, vol. 40, no. 12, pp. 1135-1160, Nov. 2010.
J. L. Gustafson, "Reevaluating Amdahl's law," Commun. ACM, vol. 31, no. 5, pp. 532-533, 1988.
P. Brinch Hansen, "Structured multiprogramming," Commun. ACM, vol. 15, no. 7, pp. 574-578, 1972.
P. Brinch Hansen, "Distributed processes: a concurrent programming concept," Commun. ACM, vol. 21, no. 11, pp. 934-941, 1978.
M. D. Hill and M. R. Marty, "Amdahl's Law in the Multicore Era," IEEE Computer, vol. 41, no. 7, pp. 33-38, Jul. 2008.
C. Antony Richard Hoare, "Communicating Sequential Processes," Commun. ACM, vol. 21, no. 8, pp. 666-677, 1978.
C. A. R. Hoare, "Monitors: an operating system structuring concept," Commun. ACM, vol. 17, no. 10, pp. 549-557, 1974.
H. Sutter, "The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software," Dr. Dobb's Journal, vol. 30, no. 3, Mar. 2005.
L. G. Valiant, "A bridging model for parallel computation," Commun. ACM, vol. 33, no. 8, pp. 103-111, 1990.

Conference Paper

R. D. Blumofe and C. E. Leiserson, "Scheduling Multithreaded Computations by Work Stealing," in In Proceedings of the 35th Annual Symposium on Foundations of Computer Science (FOCS, 1994, pp. 356-368.
G. Bronevetsky, I. Laguna, S. Bagchi, B. R. de Supinski, D. H. Ahn, and M. Schulz, "AutomaDeD: Automata-based debugging for dissimilar parallel tasks," in International Conference on Dependable Systems and Networks (DSN), 2010, pp. 231-240.
A. P. Chandrakasan, M. Potkonjak, R. Mehra, J. Rabaey, and R. W. Brodersen, "Optimizing Power Using Transformations."
D. E. Culler, R. M. Karp, D. A. Patterson, A. Sahay, K. E. Schauser, E. Santos, R. Subramonian, and T. von Eicken, "LogP: Towards a Realistic Model of Parallel Computation," 1993, pp. 1-12.
E. W. Dijkstra, "The structure of the 'THE'-multiprogramming system," in SOSP '67: Proceedings of the first ACM symposium on Operating System Principles, 1967.
W. Gellerich and M. . Gutzmann, "Massively Parallel Programming Languages - A Classification," in ISCA International Conference on Parallel and Distributed Computing Systems, vol. 1, pp. 110-118.
K. Kennedy, C. Koelbel, and H. Zima, "The rise and fall of High Performance Fortran: an historical object lesson," in third ACM SIGPLAN conference on History of programming languages, 2007, pp. 7-1.
D. M. Tullsen, S. J. Eggers, and H. M. Levy, "Simultaneous Multithreading: Maximizing On-Chip Parallelism," in 22nd Annual International Symposium on Computer Architecture.

Web Page

B. Barney, "Introduction to Parallel Computing." .
C. Breshears, "Intro to Parallel Programming - Video Course." [Online].
G. V. Wilson, "The History of the Development of Parallel Computing." .
"Classification of the principal programming paradigms." [Online]. Available: http://www.info.ucl.ac.be/~pvr/paradigms.html. [Accessed: 29-Mar-2011].

Slide

A. Bechtolsheim, The Road To Exascale. 2009.