skip navigation
skip mega-menu

Site Reliability Engineer

Government Digital Services -

Full-time (Permanent)
£53,400 - £60,990
Published on
8 August 2023
Deadline
4 September 2023

Job summary

GDS exists to help government make brilliant public services that empower people in the UK. We work at the very centre of government to drive digital transformation, focused on users. We build and maintain common platforms, products and tools for others to use and create great public services that are accessible, inclusive and easy to use. We also work with departments to identify patterns, share learning and create change to make government more efficient.

Our teams are organised around delivering on our priorities. These are:

  • making it easier for people to find what they are looking for on GOV.UK
  • building common service platforms to make it simpler and cheaper to build quality digital services
  • promoting agile, user-centred design practices both in the UK and across the world

GOV.UK Pay lets service teams across the public sector take online and over the phone card payments from their users quickly and easily. It also helps them manage their income, issue refunds and run financial reports. It provides a simple, accessible and secure payment experience to millions of people.

Since its launch over 6 years ago, use of GOV.UK Pay has grown rapidly. Today over 300 organisations use it to take payments for over 850 public services. We’ve processed over 57 million transactions with a value of £3.6 billion. GOV.UK Pay is expanding to support more ways to take payments, and allow deeper integrations with finance systems. The next few years are an opportunity to radically improve how the public sector handles payments.

If this sounds like the next role for you on your career journey then we’d love to hear from you. 

Find out more at the GDS Blog. Here are some recent blog posts from the GOV.UK Pay team:

Job description

As a Site Reliability Engineer at GDS you will:

  • be part of a multidisciplinary service team working with and supporting front-end and back-end developers, delivery and product managers, tech writers and architects
  • build and maintain resilient, highly available and secure systems to meet the needs of our users
  • take responsibility for solving complex and interesting problems
  • create infrastructure as code to ensure our infrastructure and deployment pipelines are reusable, repeatable and reliable
  • ensure our systems are appropriately monitored and instrumented to enable our teams to identify and respond to operational issues quickly and effectively 
  • build CI/CD pipelines to enable our developers to get their code into production as quickly and safely as possible
  • support the live operation of the services we run, and participate in out-of-hours support rotas where necessary - you'll be paid an allowance, and a further hourly payment, for any duties you perform when on call
  • share knowledge of tools and practices with your wider team and peers to drive consistency and maintain our high engineering standards, with the option of presenting at conferences and meetups
  • use your learning and development budget to develop your career
  • help recruit other site reliability engineers and, where appropriate, get involved with sifting and interviewing

Person specification

We’re interested in people who:

  • are experienced with Linux operating system internals and are comfortable working with Linux virtual machines or containers
  • have experience of working with technologies that underpin digital services such as databases, web servers, DNS, CDNs, reverse proxies, message queues and load balancers
  • have experience of cloud infrastructure providers such as AWS
  • are familiar with container orchestration technologies such as Kubernetes, ECS or Cloud Foundry; or serverless application design such as AWS Lambda
  • have an understanding of SRE principles such as capacity planning, SLOs and SLIs and how to design and support resilient, large scale, high performance services in a production environment
  • can deploy monitoring tools to ensure systems are appropriately monitored and instrumented to enable teams to identify and respond to operational issues quickly and effectively
  • are familiar with at least one programming language (at GDS we use Node.js, Java, Python, Ruby and Go)
  • are very proficient using Git for version control
  • understand the benefits of continuous integration and continuous deployment and have experience with CI/CD tools such as Concourse, Jenkins, GitHub Actions and CodePipeline
  • have a strong preference for automation and experience of using Infrastructure as Code tools such as Terraform or CloudFormation
  • are able to use automated testing and test-driven development (TDD) to validate solutions and maintain code quality
  • have a good understanding of security principles and how to keep large operational services secure

Benefits

The benefits of working at GDS

There are many benefits of working at GDS, including:

  • flexible hybrid working with flexi-time and the option to work part-time or condensed hours
  • a Civil Service Pension with an average employer contribution of 27%
  • 25 days of annual leave, increasing by a day each year up to a maximum of 30 days
  • an extra day off for The King’s birthday
  • an in-year bonus scheme to recognise high performance
  • career progression and coaching, including a training budget for personal development
  • paid volunteering leave
  • a focus on wellbeing with access to an employee assistance programme
  • job satisfaction from making government services easier to use and more inclusive for people across the UK
  • advances on pay, including for travel season tickets
  • death in service benefits
  • cycle to work scheme and facilities
  • access to children's holiday play schemes across different locations in central London
  • access to an employee discounts scheme
  • 10 learning days per year
  • volunteering opportunities (5 special leave days per year)
  • access to a suite of learning activities through Civil Service learning

GDS offers hybrid working for all employees. This means that everyone does some working from home and also spends some time in their local office. You’ll agree to your hybrid working arrangement with your line manager in line with your preferences and business needs.

Any move to Government Digital Service from another employer will mean you can no longer access childcare vouchers. This includes moves between government departments. You may however be eligible for other government schemes, including Tax Free Childcare. Determine your eligibility at https://www.childcarechoices.gov.uk

Things you need to know

Selection process details

The standard selection process for roles at GDS consists of:

  • a simple application screening process - We only ask for a CV and cover letter of up to 750 words. Important tip - please ensure that your cover letter includes how you meet the skills and experience listed in the “person specification” section above
  • a 30 minute phone screen (which may not be required depending on the volume of applications)
  • A 90 minute technical interview (conducted over video conferencing)
  • a 60 minute civil service behavioural interview (conducted over video conferencing)

Depending on how many applications we get, there might also be an extra stage before the video interview, for example a phone interview or a technical exercise.

In the Civil Service, we use Success Profiles to evaluate your skills and ability. This gives us the best possible chance of finding the right person for the job, increases performance and improves diversity and inclusivity. We’ll be assessing your technical abilities, skills, experience and behaviours that are relevant to this role.

For this role we’ll be assessing you against the following Civil Service Behaviours:

  • working together
  • changing and improving
  • making effective decisions

We’ll also be assessing your experience and specialist technical skills against the following skills defined in the Digital, Data and Technology Profession Capability Framework for the DDaT DevOps role:

  • availability and capacity management
  • development process optimisation
  • information security
  • modern standards approach
  • programming and build (software engineering)
  • service support
  • systems design
  • systems integration

Candidates that do not pass the interview but have demonstrated an acceptable standard may be considered for similar roles at a lower grade.

A reserve list will be held for a period of 12 months, from which further appointments can be made.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status or disability status.


Feedback will only be provided if you attend an interview or assessment.

Subscribe to our newsletter

Sign up here