Fault Tolerant Systems Lecture and Project (2016)
Prof. Dr. Andreas Polze
Lena Feinbube, M.Sc.
Daniel Richter, M.Sc.
Andreas Grapentin, M.Sc.
Daniel Richter, M.Sc.
Andreas Grapentin, M.Sc.
Projekt-Abschlusspräsentationen: 7. Februar 2016, 15:15 Uhr, Raum A 1.2
Termine für mündliche Prüfungen: 15./16. Februar 2016.
A software system is dependable if it delivers its service in a way that can be trusted. Software dependability is gaining importance, as software is becoming ubiquituous in our lives and at the same time increasingly complex. Modern software systems are not only growing in size, but also gaining complexity due to additional layers of abstraction, interaction with different components, concurrency and other sources of nondeterminism. This lecture introduces the state of the art of software dependability means, with a strong practical focus on case studies and real-world large scale software systems. Topics covered include:
- Software Fault, Timing, and Consistency Models
- Fault Prevention
- Processes for dependable software design
- Development practices
- Fault Tolerance
- Patterns for fault tolerance and detection
- Distributed systems: theory and applications
- Fault tolerance in operating systems
- Fault Removal and Forecasting
- Formal methods
- Testing and Debugging
- Fault Forecasting
- Fault injection
- Dependability modelling and analysis
- Discussion of Case Studies and Postmortems from Practice
Large parts of the time are spent on practical project work in small teams. Within a coding project, sudents will make an application fault-tolerant and subsequently evaluate it by implementing fault injection. The hands-on project work will be designed as a competition and span the entire semester (details tba).
Organization
Extent: 4 semester hours (6 graded credit points)
Lecture: Thursday, 13.30 - 15.00 in HS3
Project slots: Tuesday, 15:15 - 16:45 in HS3
Identifiers (SO2010): OSIS, SAMT, IST
Grading: Oral Examination
Material
- Introduction
- Dependability Fundamentals 1
- Dependability Fundamentals 2
- Fault Prevention
- Fault Tolerance in Distributed Systems: 1, 2, 3, Applications
- Fault Tolerance Patterns (Recap)
- Fault Removal: Testing & Debugging
- Fault Removal: Formal Methods
- Software Dependability Approaches
- Hardware Fault Tolerance
- Fault Injection
- Fault Forecasting: Dependability Modelling
- Site Reliability Engineering
- Human Aspects
Exercise
Additional agreed conditions:- The implementations will be compiled in DEBUG mode.