
Site Reliability Engineering Lead
Greater London, South East, England
Apply by 2 Apr 2026
£90000 - £110000 per annum
Job Ref.: BH-56921
Job Description
The successful candidate will lead and mentor the SRE team, set the technical direction for reliability engineering, and take end-to-end ownership of production systems. They will be accountable for availability, performance, and incident response, while working closely with Product and Engineering to define SLIs and establish meaningful SLOs that balance stability with delivery pace. They will champion a blameless culture, embedding robust incident management processes and driving continuous, systemic improvement.
Key skills and experience:
- Proven experience as a Lead or Senior SRE with a strong software engineering background
- Strong programming ability in PHP and Java or .NET
- Experience defining SLIs, setting SLOs, and using error budgets to guide decision-making
- Demonstrated ownership of production systems with full accountability for uptime and resilience
- Hands-on experience building and running incident management processes, including blameless postmortems
- Strong knowledge of observability and monitoring tools (e.g. Prometheus, Grafana, Datadog)
- Solid Linux systems expertise and experience with MySQL and PostgreSQL
- Experience with cloud platforms (Azure preferred), Kubernetes, and Infrastructure as Code
- Proven leadership capability, including mentoring engineers and influencing cross-functional teams