443 ViewsSoftware development is getting faster and more complex – frustrating IT operations teams more than ever. So, DevOps gained popularity in order to combat siloed workflows, decreased collaboration and a lack of visibility. While establishing a culture of DevOps has helped teams collaborate better and deliver reliable software faster, DevOps teams don’t necessarily have someone specifically dedicated to developing systems that increase site reliability and performance. That’s where a site reliability engineer (SRE) comes into the picture.
The concept of SRE was initially brought to life by Google engineer, Ben Treynor. Then, shortly after implementing SRE, they published their popular SRE eBook – helping the movement gain traction in the industry. Site reliability engineers sit at the crossroads of traditional IT and software development. Basically, SRE teams are made up of software engineers who build and implement software to improve the reliability of their systems.
Read more: CCNP Security Certification
So, let’s first define the basic roles and responsibilities of a site reliability engineer and show how SRE can drastically improve the resilience of your people, processes and technology.
What is site reliability engineering (SRE)?
In the words of Ben Treynor, SRE is “what happens when you ask a software engineer to design an operations function.” In a traditional setup of siloed IT operations and software development teams, developers would throw their code over to IT professionals. Then, IT would be in charge of deployment, maintenance and any on-call responsibilities associated with the system in production. Luckily, DevOps came along and forced developers to share accountability for systems in production, own their code and take on-call responsibilities.