Site Reliability Engineer - CO - G7
Government Digital & Data -
Location
Bristol, London, Manchester
About the job
Job summary
GovWifi is a government-critical service that enables secure, consistent WiFi access across the UK public sector, supporting staff and visitors in thousands of locations. We’re looking for a skilled DevOps Engineer to help keep this high-profile platform reliable, secure, and future-ready.
You’ll work with a multi-disciplinary team to maintain service availability, automate infrastructure, and deliver improvements. From deploying secure solutions in AWS to strengthening monitoring and incident response, you’ll play a vital role in keeping GovWifi resilient at scale.
If you enjoy solving complex problems, collaborating with diverse teams, and want your engineering skills to directly benefit the public sector, this is your opportunity to make real impact on a service used nationwide.
Job description
As a DevOps Engineer on the GovWifi service, you will be part of a cross-disciplinary team responsible for ensuring the secure, reliable, and efficient operation of a government-critical platform. Your work will directly support thousands of users across the UK public sector, helping create a seamless and secure WiFi experience in government buildings nationwide.
What you’ll be doing:
- Maintaining service reliability: Monitor, manage and improve the availability of GovWifi, ensuring the platform consistently meets service level objectives. Respond to and resolve incidents quickly, serving as a point of escalation when needed.
- Automating infrastructure: Use Terraform (or other IaC tools) to automate deployments and infrastructure changes, reducing manual intervention and improving consistency.
- Deploying securely: Carry out safe, reliable deployments of code and configuration into AWS environments (ECS, EC2, CloudWatch, ELB, CodeBuild, CodePipeline).
- Improving system resilience: Design, build and implement monitoring, alerting, and recovery mechanisms to keep systems highly available and secure.
- Mitigating risks: Identify, assess, and reduce security vulnerabilities across the platform, applying web security best practices and implementing protective measures.
- Supporting migrations and transitions: Assist with tool changes, platform improvements, or policy-driven migrations that affect GovWifi operations.
- Building for users: Develop new features or improvements through prototyping, proof-of-concepts, and continuous iteration in collaboration with product managers and developers.
- Knowledge sharing: Document technical decisions clearly, add to the team’s knowledge base, and explain complex issues to non-technical colleagues in a clear, supportive way.
- Customer support: Engage with end-user requests and issues through support tools such as Zendesk, helping resolve technical challenges directly impacting users.
- Driving continuous improvement: Pair with teammates, contribute to engineering improvement initiatives, and promote best practices across the service.
Ways of working:
You’ll spend your time collaborating closely with site reliability engineers, developers, product managers, and central teams. You’ll work independently when needed, but also in pairs and group settings to solve problems. You’ll play an active role in incident reviews, retrospectives, and roadmap planning. The role requires curiosity, adaptability, and a commitment to secure, user-centred service delivery.
Person specification
Essential Criteria
- Strong technical expertise in AWS cloud services (ECS, EC2, CloudWatch, ELB, CodeBuild, CodePipeline).
- Strong expertise in terraform or CloudFormation, with a strong willingness to learn new technologies.
- Proficient in at least one scripting or programming language (Python, JavaScript, Ruby, Bash).
- Solid understanding of network protocols (TCP/UDP), AWS VPC networking, ports, and security groups.
- Familiarity with containerisation technologies, particularly Docker.
- Experience building, deploying, and maintaining resilient, highly available, monitored systems.
- Good knowledge of cybersecurity principles and secure system design.
- Comfortable working with CI/CD pipelines using tools like Jenkins, GitHub Actions, Concourse, or CodePipeline.
- Experience with Linux operating systems and web application technologies.
- Ability and willingness to document work clearly and share knowledge across technical and non-technical audiences.
- Strong problem-solving skills with a proactive approach to identifying and resolving complex issues.
- Comfortable working independently and collaboratively (pair programming, agile teamwork).
- Excellent communication skills including explaining technical issues to non-technical stakeholders.
- Willingness to engage in customer-facing support through ticketing tools such as Zendesk.
- Experience working in agile environments and ability to prototype and iterate on new solutions.
Desirable (not essential but advantageous)
- Experience with RADIUS or network engineering.
- Leading or contributing to engineering improvement projects.
- Passion for improving public sector IT services and working within a collaborative, inclusive team.
- Empathetic, supportive, and adaptable mindset.
Additional information:
A minimum 60% of your working time should be spent at your principal workplace. Although requirements to attend other locations for official business will also count towards this level of attendance.
Qualifications
Terraform
Python or Bash
Experience with AWS Cloud Platform
Behaviours
We'll assess you against these behaviours during the selection process:
- Working Together
- Leadership
- Delivering at Pace
- Making Effective Decisions
- Communicating and Influencing
- Changing and Improving
We only ask for evidence of these behaviours on your application form:
- Working Together
- Leadership
Technical skills
We'll assess you against these technical skills during the selection process:
- AWS Cloud Platform Expertise
- Infrastructure as Code (IaC)
- Scripting and Automation
- Containerisation and Orchestration
- Networking and Security Fundamentals
- CI/CD Pipeline Development and Maintenance
- Incident Management and Troubleshooting:
- Monitoring and Observability
We only ask for evidence of these technical skills on your application form:
- AWS Cloud Platform Expertise
- Infrastructure as Code (IaC)
- Scripting and Automation
- CI/CD Pipeline Development and Maintenance