SubBanner banner image

SRE | Permanent | London, Hybrid, AWS

London, Greater London, South East, England

Apply by 2 Apr 2026

£90000 per annum

Job Ref.: BH-56925

Job Description

About the job
  • Role: Site Reliability Engineer
  • Type: Full-time permanent role
  • Location: Hybrid, London City - 3 days per week on-site
  • Salary: £90,000 per annum
  • Industry: Technology - Gaming Platforms 
Our Client is a  premier provider of high-volume software solutions for the global iGaming and predictive analytics sector. With a footprint spanning the USA, UK, and Europe, they partner with industry leaders to engineer sophisticated platforms for sports wagering, prize-based systems, and complex market simulation environments. Their vision is to lead the evolution of interactive technology through intelligent, data-driven architecture that ensures seamless user experiences. The firm is driven by a culture of teamwork, transparency, and technical excellenc

The role You will help shape and drive how the firm builds and operates reliable, observable, secure, and cost-efficient systems on AWS. Working closely with development, platform, and incident management teams, you will define reliability in measurable terms and build the tooling and processes to achieve it, improving platform speed, stability, and scalability.

Key responsibilities
  • Partner with engineering teams to define, measure, and manage SLOs/SLIs, using error budgets to guide delivery decisions.
  • Enhance observability across services (metrics, logs, traces) to detect and resolve issues proactively.
  • Lead cost optimisation: monitor spend, right-size workloads, tune autoscaling, and improve infrastructure efficiency.
  • Improve production readiness via pre-deployment checks, post-release validation, and robust platform guardrails.
  • Introduce and run chaos engineering experiments to strengthen resilience and recovery.
  • Automate operational processes to reduce manual intervention and toil across the stack.
  • Support major incident response, root-cause analysis, and continual improvement actions.
  • Collaborate cross-functionally to raise standards for stability, security, performance, and compliance.
Required skills & experience
  • 3 years’ experience in SRE, Platform, or DevOps roles within production environments.
  • Strong Kubernetes operational experience (on-prem and AWS EKS).
  • Hands-on experience defining and operating SLOs/SLIs, alerting, and incident workflows.
  • Deep understanding of observability and telemetry (monitoring, logging, tracing).
  • Infrastructure as Code with Terraform; experience with GitOps workflows and CI/CD.
  • Scripting proficiency in Python, Bash, or Go.
  • Proven ability to balance cost efficiency with reliability and performance.
  • Excellent communication skills and the ability to work effectively across multiple teams.
Strong Desirables for this role 
  • Experience running chaos engineering experiments.
  • Exposure to high-throughput, low-latency systems.
  • FinOps knowledge or cost management practices.
  • AWS certifications (e.g., Solutions Architect, DevOps Engineer)
APPLY NOW

Recent Jobs.

ITSM BA
Prague, Czech Republic

For our customer, we are looking for an ITSM Business Analyst for a 6 month initial contract this role is remote from the EU - we can work with people across Europe for this what we are looking for: -

ITSM BA
Zurich, Switzerland

Determine and define business needs and global requirements to develop and continue the growth of an Atlassian ITSM environment with primary focus on the Configuration Management process and CMDB • As

Senior Marketing Scientist
Amsterdam, Provincie Noord-Holland, Netherlands

Are you an expert in marketing effectiveness, econometrics, or Marketing Mix Modelling (MMM)? Do you enjoy translating complex data into clear, strategic recommendations that directly influence market