Mid & Senior Site Reliability Engineers - GDS - G7
Government Digital & Data -
Location
Bristol, London, Manchester
About the job
Job summary
The Government Digital Service (GDS) is the digital centre of government — we are responsible for setting, leading and delivering the vision for a modern digital government.
Our priorities are to drive a modern digital government, by:
- joining up public sector services
- harnessing the power of AI for the public good
- strengthening and extending our digital and data public infrastructure
- elevating leadership and investing in talent
- funding for outcomes and procuring for growth and innovation
- committing to transparency and driving accountability
We are home to the Incubator for Artificial Intelligence (I.AI), the world-leading GOV.UK and at the forefront of coordinating the UK’s geospatial strategy and activity. We lead the Government Digital and Data function and champion the work of digital teams across government.
We’re part of the Department for Science, Innovation and Technology (DSIT) and employ more than 1,000 people all over the UK, with hubs in Manchester, London and Bristol.
The Government Digital Service is where talent translates into impact. From your first day, you’ll be working with some of the world’s most highly-skilled digital professionals, all contributing their knowledge to make change on a national scale.
Join us for rewarding work that makes a difference across the UK. You'll solve some of the nation’s highest-priority digital challenges, helping millions of people access services they need.
Job description
About Engineering Enablement
Reporting to GDS Product Group CTO, the Technology programme is boosting efficiency, strengthening security, and improving developer experience by providing infrastructure, tools and standards. Building upon our current initiatives within GDS's AWS cloud environment, we are establishing four new multi-disciplinary teams to broaden our scope of work:
- Cloud Platform team - owns and operates a thin central platform for our AWS estate. We are looking for 1 Senior and 1 mid-level SRE to join this team
- Developer Experience and Finops team - manages core engineering tooling, proactively works to enhance developer practice & experience and ensures value from our SaaS services. We are looking for 1 Senior SRE to join this team
- Engineering Access Operations team - owns and operates identity and access management for our systems and acts as an intelligent customer for IT services, improving overall effectiveness. We are looking for 2 Senior and 1 mid-level SRE to join this team
- Business Enablement team - manages core business tooling and services, supporting business impact and agility
About GOV.UK One Login
GOV.UK One Login Programme represents a once in a generation and career opportunity to simplify and widen access to all digital government services. Sitting at the heart of the government, we are building one simple, safe and secure way for users to log in and prove who they are that will work across all government services.
Effective identity assurance is central to digital transformation and GOV.UK One Login enables people to prove who they are online, with the necessary level of confidence to access and use particular services. Our technology runs on AWS, using serverless compute and storage products. Backend services are written in TypeScript/Node.js and JVM technologies. Web applications also use TypeScript.
GOV.UK One Login programme is full of talented and passionate people who are focussed on delivering high quality products for services. We’re half way through our build phase and features are being shipped weekly as we work to mature our product so we can expand the range of services and departments benefitting from our work.
GOV.UK One Login is being designed and built for the many. It will unite services across government, revolutionising the way government departments interact digitally with users. One Login will deliver an accessible and essential function that will change lives and help millions.
We are recruiting for 2 mid level and 3 senior SREs for One Login.
About GOV.UK Pay
GOV.UK Pay lets service teams across the public sector take online and over the phone card payments from their users quickly and easily. It also helps them manage their income, issue refunds and run financial reports. It provides a simple, accessible and secure payment experience to millions of people.
Since its launch over 9 years ago, GOV.UK Pay has grown rapidly. Today over 500 organisations use it to take payments for over 1400 public services. We’ve processed over 114 million transactions with a value of £7.5 billion. GOV.UK Pay is expanding to support more ways to take payments, and allow deeper integrations with finance systems. The next few years are an opportunity to radically improve how the public sector handles payments.
Come and join a well motivated and multi-disciplined delivery team working to deliver on our commitments and roadmap. We are an ambitious, fast paced and visionary team, have a background in software delivery and are used to working in a scaled agile environment then this could be the place for you!
We are recruiting for one senior SRE for GOV.UK Pay
Read about a day in the life of a GDS Site Reliability Engineer on the GDS blog, and watch our video about Becoming a site reliability engineer at GDS.
As a Site Reliability Engineer you will:
- be part of a multidisciplinary team developing and supporting one of our product areas
- write infrastructure as code using terraform or CloudFormation to ensure our infrastructure is consistent, reusable and reliable
- deploy and configure observability tools to enable our teams to identify and respond to operational issues quickly and effectively
- build CI/CD pipelines to enable the team to get code into production quickly and reliably
- provide day-to-day support for our platforms and tools to ensure they remain available, secure and robust
- participate in on-call rotations when necessary
- solve complex and interesting problems
- share your knowledge and expertise with your peers and the wider team to drive consistency and develop a culture of openness and learning
In addition to the above, as a Senior Site Reliability Engineer, you will:
- line manage 1-2 technologists, supporting their growth and development
- provide technical leadership within a team, working with other team members to identify the best approaches and solutions
Person specification
We’re interested in people who have:
- a deep understanding of Linux operating system internals and are comfortable working with Linux virtual machines or containers
- proficiency in at least one programming language (we use Ruby, Java, Typescript, and Python)
- strong experience of working with infrastructure technologies such as databases, web servers, DNS, CDNs, reverse proxies, message queues and load balancers
- experience of building and maintaining services in the cloud (preferably AWS)
- extensive experience of creating infrastructure as code using Terraform or CloudFormation
- experience of using container orchestration systems such as Kubernetes, ECS or serverless application design with AWS Lambda
- experience supporting large production services
- strong Git skills
- experience of creating pipelines in a CI/CD tool like Github Actions or AWS Codepipeline
- a strong understanding of security principles and how to keep large operational services secure
In addition to the above, as a Senior SRE you will have:
- experience of technical leadership, for example acting as a tech lead for a team or leading a technical initiative
- experience of line management, mentoring or coaching
If you meet a few of those criteria but think that you might not meet every last one then don’t let that stop you from submitting an application.