Much of the previous work on preparing undergraduates for industry focusses on software engineering and the skills needed to design and to implement new software systems. There has been relatively little attention given to the skills needed to maintain, to modify, and to repair systems already in use. These skills are captured in the emerging discipline of site reliability engineering, a relative of software reliability engineering. Site reliability engineers use a distinct set of skills, tools, and techniques for managing complex production systems. More importantly, they have a mindset that prioritizes high performance and reliability while attempting to minimize repetitive tasks done by human operators. In this paper we describe an upper-division elective that was designed to introduce students to site reliability engineering through hands-on assignments requiring teams to deploy, maintain, and scale a working software system, done alongside readings and discussion of high-stakes episodes from the broader history of complex systems. We discuss the design of the class and reflect on what worked well and not so well in the initial offerings.
Teaching Site Reliability Engineering as a Computer Science Elective
Published 2023 in Technical Symposium on Computer Science Education
ABSTRACT
PUBLICATION RECORD
- Publication year
2023
- Venue
Technical Symposium on Computer Science Education
- Publication date
2023-03-02
- Fields of study
Computer Science, Engineering
- Identifiers
- External record
- Source metadata
Semantic Scholar
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-32 of 32 references · Page 1 of 1
CITED BY
Showing 1-1 of 1 citing papers · Page 1 of 1