Mid & Senior Site Reliability Engineers Technology and Security - GDS - G7
Government Digital & Data -
The Government Digital Service (GDS) is the digital centre of government — we are responsible for setting, leading and delivering the vision for a modern digital government.
Our priorities are to drive a modern digital government, by:
- joining up public sector services
- harnessing the power of AI for the public good
- strengthening and extending our digital and data public infrastructure
- elevating leadership and investing in talent
- funding for outcomes and procuring for growth and innovation
- committing to transparency and driving accountability
We are home to the Incubator for Artificial Intelligence (I.AI), the world-leading GOV.UK and at the forefront of coordinating the UK’s geospatial strategy and activity. We lead the Government Digital and Data function and champion the work of digital teams across government.
We’re part of the Department for Science, Innovation and Technology (DSIT) and employ more than 1,000 people all over the UK, with hubs in Manchester, London and Bristol.
The Government Digital Service is where talent translates into impact. From your first day, you’ll be working with some of the world’s most highly-skilled digital professionals, all contributing their knowledge to make change on a national scale.
Join us for rewarding work that makes a difference across the UK. You'll solve some of the nation’s highest-priority digital challenges, helping millions of people access services they need.
Reporting to GDS Product Group CTO, the Technology programme is boosting efficiency, strengthening security, and improving developer experience by providing infrastructure, tools and standards. Building upon our current initiatives within GDS's AWS cloud environment, we are establishing four new multi-disciplinary teams to broaden our scope of work:
Cloud Platform team - owns and operates a thin central platform for our AWS estate.
Developer Experience and Finops team - manages core engineering tooling, proactively works to enhance developer practice & experience and ensures value from our SaaS services.
Engineering Access Operations team - owns and operates identity and access management for our systems and acts as an intelligent customer for IT services, improving overall effectiveness.
Business Enablement team - manages core business tooling and services, supporting business impact and agility
Read about a day in the life of a GDS Site Reliability Engineer on the GDS blog, and watch our video about Becoming a site reliability engineer at GDS.
Job description
We're seeking to hire SREs into the following teams:
- Cloud platform team - 1 Senior, 1 Mid-level.
- Developer Experience and Finops team - 1 Senior.
- Engineering Access Operations team - 2 Senior, 1 Mid-level.
As a Site Reliability Engineer you will:
- Be part of a multidisciplinary team developing and supporting our central cloud, developer and identity and access platforms
- Write infrastructure as code using terraform to ensure our infrastructure is consistent, reusable and reliable
- Deploy and configure observability tools to enable our teams to identify and respond to operational issues quickly and effectively
- Build CI/CD pipelines to enable the team to get code into production quickly and reliably
- Provide day-to-day support for our platforms and tools to ensure they remain available, secure and robust
- Participate in on-call rotations when necessary
- Solve complex and interesting problems
- Share your knowledge and expertise with your peers and the wider team to drive consistency and develop a culture of openness and learning
In addition to the above, as a Senior Site Reliability Engineer, you will:
- line manage 1-2 technologists, supporting their growth and development
- provide technical leadership within a team, working with other team members to identify the best approaches and solutions
Person specification
We’re interested in people who have:
- A deep understanding of Linux operating system internals and are comfortable working with Linux virtual machines or containers
- Strong experience of working with infrastructure technologies such as databases, web servers, DNS, CDNs, reverse proxies, message queues and load balancers
- Experience of building and maintaining services in the cloud (preferably AWS)
- Extensive experience of creating infrastructure as code using Terraform or CloudFormation (preferably Terraform)
- Experience of using container orchestration systems such as Kubernetes, ECS or serverless application design with AWS Lambda
- Experience supporting large production services
- Proficiency in at least one programming language (we use Ruby and Python)
- Strong Git skills
- Experience of creating pipelines in a CI/CD tool like Github Actions or AWS Codepipeline
- A strong understanding of security principles and how to keep large operational services secure
In addition to the above, as a Senior SRE you will have:
- Experience of technical leadership, for example acting as a tech lead for a team or leading a technical initiative
- Experience of line management, mentoring or coaching
If you meet a few of those criteria but think that you might not meet every last one then don’t let that stop you from submitting an application.