
SRE | Permanent | London, Hybrid, AWS
London, United Kingdom (Britain / UK)
Apply by 1 Jun 2026
UK90000.0
Job Ref.: 56925
Job Type: Permanent
Job Description
About the job
- Role: Site Reliability Engineer
- Type: Full-time permanent role
- Location: Hybrid, London City - 3 days per week on-site
- Salary: £90,000 per annum
- Industry: Technology - Gaming Platforms
Our Client is a premier provider of high-volume software solutions for the global iGaming and predictive analytics sector. With a footprint spanning the USA, UK, and Europe, they partner with industry leaders to engineer sophisticated platforms for sports wagering, prize-based systems, and complex market simulation environments. Their vision is to lead the evolution of interactive technology through intelligent, data-driven architecture that ensures seamless user experiences. The firm is driven by a culture of teamwork, transparency, and technical excellenc
The role
You will help shape and drive how the firm builds and operates reliable, observable, secure, and cost-efficient systems on AWS. Working closely with development, platform, and incident management teams, you will define reliability in measurable terms and build the tooling and processes to achieve it, improving platform speed, stability, and scalability.
Key responsibilities
- Partner with engineering teams to define, measure, and manage SLOs/SLIs, using error budgets to guide delivery decisions.
- Enhance observability across services (metrics, logs, traces) to detect and resolve issues proactively.
- Lead cost optimisation: monitor spend, right-size workloads, tune autoscaling, and improve infrastructure efficiency.
- Improve production readiness via pre-deployment checks, post-release validation, and robust platform guardrails.
- Introduce and run chaos engineering experiments to strengthen resilience and recovery.
- Automate operational processes to reduce manual intervention and toil across the stack.
- Support major incident response, root-cause analysis, and continual improvement actions.
- Collaborate cross-functionally to raise standards for stability, security, performance, and compliance.
Required skills & experience
- 3+ years’ experience in SRE, Platform, or DevOps roles within production environments.
- Strong Kubernetes operational experience (on-prem and AWS EKS).
- Hands-on experience defining and operating SLOs/SLIs, alerting, and incident workflows.
- Deep understanding of observability and telemetry (monitoring, logging, tracing).
- Infrastructure as Code with Terraform; experience with GitOps workflows and CI/CD.
- Scripting proficiency in Python, Bash, or Go.
- Proven ability to balance cost efficiency with reliability and performance.
- Excellent communication skills and the ability to work effectively across multiple teams.
Strong Desirables for this role
- Experience running chaos engineering experiments.
- Exposure to high-throughput, low-latency systems.
- FinOps knowledge or cost management practices.
- AWS certifications (e.g., Solutions Architect, DevOps Engineer)