skip navigation
skip mega-menu

Senior Specialist Engineer (SRE) - UKHSA - SEO

Government Digital & Data -

Full-time (Permanent)
National £41,983 to £48,128 Inner £46,310 to £52,113
Published on
11 December 2025
Deadline
5 January 2026

Location

This role is being offered as hybrid working based at any of our Core HQ’s.

We offer great flexible working opportunities at UKHSA and operate using a hybrid working model where business needs allow. This provides us with greater flexibility about how and where we work, to get the best from our workforce. As a hybrid worker, you will be expected to spend a minimum of 60% of your contractual working hours (approximately 3 days a week pro rata, (averaged over a month) working at one of UKHSA's core HQ’s (Birmingham, Leeds, Liverpool, and London).

Our core HQ offices are modern and newly refurbished with excellent city centre transport link and benefit from benefit from co-location with other government departments such as the Department for Health and Social Care (DHSC).

About the job

Job summary

The Digital and Data Directorate has primary responsibility for scientific and research computing services and support. The key functions of the Digital Development and Operations unit are to provide and support such platforms required by the staff of The UK Health Security Agency, and to provide the technical capabilities to enable public health services, both within the Organisation and between the Organisation and its customers and stakeholders.

As a Specialist Site Reliability Engineer (SRE) you will:

  • Remediate infrastructure and operational problems
  • Leverage automation and Continuous Integration/Continuous Delivery (CI/CD); ensuring our services run reliably, are scalable, and perform optimally
  • Monitor and manage these aspects while taking responsibility for multiple cloud infrastructure services
  • Observing systems will be key to prioritising the operational service improvements and performance improvements to meet/exceed SLOs (Service Level Objectives)

The role will be responsible to the Principal Specialist Engineer SRE and is part of the High Performance Computing, Site Reliability Engineering , Artificial intelligence (HPC/SRE/AI) & research computing unit whose remit is to:

  • Architect, develop & manage multi-cloud HPC platforms and on-premise infrastructure
  • Ensure services are highly available, scalable and resilient 
  • Managing performance, capability and capacity planning
  • Support UKHSA's AI requirements

This role attracts a Market Pay Supplement of up to £5,000.


Working for your organisation

We pride ourselves as being an employer of choice, where Everyone Matters promoting equality of opportunity to actively encourage applications from everyone, including groups currently underrepresented in our workforce.   

UKHSA ethos is to be an inclusive organisation for all our staff and stakeholders. To create, nurture and sustain an inclusive culture, where differences drive innovative solutions to meet the needs of our workforce and wider communities. We do this through celebrating and protecting differences by removing barriers and promoting equity and equality of opportunity for all.  

Please visit our careers site for more information https://gov.uk/ukhsa/careers

Job description

We are seeking a highly motivated and experienced SRE to join our HPC & SRE engineering team. As an SRE, you will play a critical role in ensuring the stability, scalability, and performance of our services. You will combine software engineering and systems engineering to build, improve and run reliable, scalable production systems.

Key Responsibilities

Service Reliability & Performance

  • Ensure services are stable, scalable, and performant through engineering best practices and system design.
  • Proactively identify and address system bottlenecks using advanced problem-solving and performance tuning techniques.
  • Conduct capacity planning and implement solutions to ensure systems can support current and future workloads.

Incident Response & Troubleshooting

  • Respond swiftly to production incidents, ensuring minimal downtime and quick restoration of services.
  • Perform root cause analysis and postmortems, implementing lessons learned to prevent recurrence.

Monitoring, Alerting & Observability

  • Contribute to the design and implementation of effective monitoring and alerting systems using tools and dashboards.
  • Improve observability of services, ensuring issues are identified and addressed before impacting users.
  • Continuously refine monitoring practices to reduce alert fatigue and improve response times.

Automation & Tooling

  • Develop automation to eliminate manual, repetitive tasks and improve operational efficiency.
  • Write clear, maintainable, and well-tested code to support automation efforts and system tooling.
  • Drive initiatives to reduce operational toil and improve reliability through Infrastructure as Code (IaC).

Service Level Objectives & Operational Improvements

  • Contribute to the definition, tracking, and continuous improvement of SLOs, Service Level Indicator’s (SLIs), and error budgets.
  • Identify and prioritize operational improvements that align with business goals and user experience.

SRE Best Practices & Advocacy

  • Helping to evangelize SRE principles across the organization.
  • Collaborate with stakeholders to integrate reliability practices into the development lifecycle.

Collaboration & Knowledge Sharing

  • Work closely with software engineering, DevOps, and infrastructure teams to streamline deployment and operational workflows.
  • Improve cross-functional collaboration and promote a culture of shared responsibility for service reliability.

Documentation & Training

  • Maintain accurate technical documentation, runbooks, and post-incident reports.
  • Provide training and mentorship to engineering teams on best practices and tools.

Main duties of the job

  • Ensure services are stable, scalable, performant and automated.
  • Respond to incidents, troubleshooting issues, and restoring services as quickly as possible.
  • Prioritise operational service improvements to meet or increase SLO, minimising downtime.
  • Ensure that effective monitoring/alerting is in place to proactively identify issues using tools and dashboards. Reducing times to respond to issues.
  • Leverage automation to streamline tasks, reduce overhead on repeatable operations, reduce manual intervention and improve efficiency. Write code that is maintainable, clear, and concise.
  • Optimise system performance using strong problem-solving skills to identify bottlenecks with an engineering mindset.
  • Ensure systems can handle current and future workloads through automation and capacity planning.
  • Continuously improve services through observability, and identify ways to improve observability practices.
  • Follow SRE principles. Guide and educate stakeholders to adopt implemented principles.
  • Provide technical documentation for engineers. Providing training, where appropriate.
  • Working closely with engineering and technology teams to improve operational processes, reduce manual tasks, ensure seamless collaboration/knowledge sharing, reduce risks and adapt to new ways of working.

This list is not exhaustive.

Person specification

Essential criteria:

  • Experience as a Site Reliability Engineer, DevOps Engineer, Operations Engineer or similar role
  • Coding skills in programming/scripting languages such as Python, PowerShell or Bash
  • Understanding of Linux/Unix & Windows systems, networking, and distributed systems
  • Experience with observability tools (e.g., Prometheus, Grafana, Datadog) and alerting systems
  • Understanding of infrastructure automation (e.g., Terraform, Ansible, PowerShell, Helm)
  • Excellent communication and collaboration skills
  • Experience with security best practices
  • Possesses problem solving skills and the ability to respond to sudden unexpected demands

Desirable criteria:

  • Experience with CI/CD pipelines, cloud platforms (e.g., Amazon Web Services, Google Cloud Platform (AWS, GCP), Azure) and container orchestration (e.g., Kubernetes)
  • Experience with post-incident reviews
  • Previous involvement in driving adoption of SRE practices across an organization
  • Experience delivering training or mentoring junior engineers


More jobs at Government Digital & Data

Interaction Designer - GDS
Full-time (Permanent)
Test Engineer - GDS - SEO
£46,725 - £50,220 (London) / £42,893 - £45,653 (National) plus additional allowance
Full-time (Permanent)
Test Engineer - Welsh Revenue Authority - HEO
£37,111 - £45,378 plus additional DDaT allowance
Full-time (Permanent)
Senior Test Engineer - Infected Blood Compensation Authority - SEO
£47,258 plus additional £3,544 after completing probation
Full-time (Permanent)
Lead Interaction Designer - Crown Prosecution Service - G7
£58,330 - £67,450 (National) / £62,820 - £73,520 + £3,150 RRA (London)
Full-time (Permanent)
Software Developer - HMRC - HEO
National £37,682 - £40,705. London £42,631 - £46,077
Full-time (Permanent)
Senior Infrastructure Manager - HMRC - SEO
£45,544 - £49,523
Full-time (Permanent)
Deputy Director DDaT in HO Digital Enterprise Services Technology - Home Office - SCS1
£81,000 - £91,000
Full-time (Permanent)
£55,575
£55,575 plus allowances. London offers an additional £4,218
Full-time (Permanent)
Test Assurance Analyst - National Crime Agency - HEO
£45,326 plus additional allowance. London additional £4,218
Full-time (Permanent)
Supporting Services Senior Officer - National Crime Agency - HEO
£45,326 plus an additional £4,218 for London
Full-time (Permanent)
Senior Dynamics Developer - Intellectual Property Office - SEO
£47,766 up to £58,575 with additional pay allowance
Full-time (Permanent)
Senior Enterprise Architect (Data Analytics) - HMRC - G7
£58,541 - £64,624
Full-time (Permanent)
Test Engineer - Welsh Revenue Authority - HEO
£37,111 - £45,378
Full-time (Permanent)
Senior Test Engineer - Infected Blood Compensation Authority - SEO
£47,258 plus additional £3,544 after probationary period
Full-time (Permanent)
Lead Services Manager - Office for Standards in Education, Children's Services and Skills - G7
£68,635 per annum. Rising to £69,322 per annum on successful completion of probation.
Full-time (Permanent)
Software Developer - Ministry of Housing, Communities and Local Government - SEO
£49,548 (London), £45,928 (National) may also qualify for additional allowance
Full-time (Permanent)
Lead Developer - Department for Transport - G7
Base pay £57,515 plus an additional allowance up to £22,885
Full-time (Permanent)
Lead Technical Architect - Home Office - G7
National £62,109 London £66,229 plus up to £18,291 additional allowance
Full-time (Permanent)
Senior Technical Architect - Crown Commercial Service - G7
£59,877 - £66,869 plus up to £9,000 technical allowance
Full-time (Permanent)
Principal Technical Architect, Networks & Infrastructure - Home Office - G6
National £76,117 London £80,237 plus up to £19,483 additional allowance
Full-time (Permanent)
SOC Technical Team Lead - Registers of Scotland - SEO
£48,544 - £57,155 plus Digital, Data and Technology Annual Pay supplement of 20%
Full-time (Permanent)
Senior DevOps Engineer - UK Health Security Agency - SEO
£41,983 - £52,113 This role attracts a Market Pay Supplement of up to £5,000.
Full-time (Permanent)
IT Ops Student Placement - HM Land Registry - EO
£32,114
Full-time (Permanent)
Agile Delivery Manager - Intellectual Property Office - SEO
£47,766 earn up to £58,575 with additional allowances
Full-time (Permanent)
Data Analyst - Government Digital Service - SEO
£46,725 - £50,220 (London) & £42,893 - £45,653 (National) including additional allowance
Full-time (Permanent)
Head of Engineering and Operations - Cabinet Office - SCS1
£81,000 - £117,800
Full-time (Permanent)
Deputy Director, Digital Project and Change Delivery - HM Courts and Tribunals Service - SCS1
£81,000 - £117,800
Full-time (Permanent)
Chief Technology Officer - Department for Culture, Media and Sport - SCS1
£81,000
Full-time (Permanent)
Director General for Technology, Digital and Data - Department of Health and Social Care - SCS3
Up to £285,000 per annum dependent upon experience
Full-time (Permanent)
Software Developer - Ofgem - HEO
National £34,123-£45,831 / London £36,824-£48,561
Full-time (Permanent)
Senior Developer - Department for Transport - SEO
Base pay £44,241 plus an additional allowance up to £13,159
Full-time (Permanent)
Delivery Manager - Ofgem - HEO
London £36,824-£48,561 National £34,123-£45,831
Full-time (Permanent)
Agile Delivery Manager - Intellectual Property Office - SEO
£47,766 up to £58,575 with additional allowances
Full-time (Permanent)
Associate IT Delivery Manager - HMRC - HEO
£37,682 - £40,705
Full-time (Permanent)
Principal Delivery Manager - HM Courts and Tribunals Service - G7
National £58,511 - £65,329 London £63,343 - £70,725
Full-time (Permanent)
Head of Transformation for Emergencies - Ministry of Housing, Communities and Local Government - G6
£73,423 (London) or £66,620 (National)
Full-time (Permanent)
AI Delivery and Oversight Lead - Department for Transport - G7
National Minimum Salary: £57,515; London Minimum Salary: £62,034
Full-time (Permanent)
Senior Product Manager (Private Rented Sector Database) - Ministry of Housing, Communities and Local Government - G7
£56,167
Full-time (Permanent)
Technical Product Manager - Companies House - HEO
£42,923 - £47,044
Full-time (Permanent)
Cyber Security Manager - National Savings and Investments - G7
£57,500-£63,000 London; £54,000-60,000 Durham, Lytham, Glasgow
Full-time (Permanent)
Senior Service Designer - Government Digital Service - G7
£55,403 up to £65,163 with additional pay allowance
Full-time (Permanent)
Recruitment Support Officer - Department for Science, Innovation & Technology - HEO
National: £36,728 - £40,670 London: £39,684 - £43,834
Full-time (Permanent)
Network Infrastructure Engineer - Met Office - HEO
£35,145 - £37,701
Full-time (Permanent)
Cloud Infrastructure Engineer - The National Archives - HEO
£42,000 plus £2,998 Market Supplement
Full-time (Permanent)
Lead Infrastructure Engineer - Home Office - G7
National: £62,109 London: £66,229 plus up to £18,291 additional allowance
Full-time (Permanent)
Senior Infrastructure Engineer (Mobile Device Services) - Department for Work and Pensions - G7
£57,946 - £73,205
Full-time (Permanent)

Subscribe to our newsletter

Sign up here