Description:
We are looking for an enthusiastic and proactive Site Reliability Engineer to join our SRE team and help us ensure we provide world-class resilience and performance across the platform. The remit and focus of the role is to advise on all aspects of site reliability including availability, scalability, observability and capacity planning. It’s a broad and exciting role, so we’re looking for someone up for a challenge - if you’re an energetic and a collaborative Site Reliability Engineer, this is the role for you.
Core responsibilities
Proactively monitor and analyse platform performance.
Collaborate with engineering teams to address performance bottlenecks and ensure scalability.
Assist engineering teams with implementing and reviewing SLOs
Continually improve observability through monitoring and alerting, and dashboards, using tools such as DataDog or Prometheus for example.
Work with other teams to ensure it is effective and provides full coverage.
Ensure the service is highly available and resilient
Champion best practices in design for high availability
Devise runbooks and run game sessions to test our DR plan, H/A and backups
Conduct assessments of capacity and plan for scaling to meet current and future business needs.
Work closely with the Head of Platform Engineering and Head of SRE to strategize and implement scalable solutions.
Work closely with the Platform team, feature teams and, 2nd line support and other stakeholders to ensure a good level of service is provided for our customers and embed SRE practices.
Key player in the response and troubleshooting of incidents, ensuring rapid resolution and minimising downtime.
Participate in blameless postmortems to identify root cause and corrective actions
Develop and maintain playbooks and documentation
Organization | Arbor Education |
Industry | Engineering Jobs |
Occupational Category | Site Reliability Engineer |
Job Location | London,UK |
Shift Type | Morning |
Job Type | Full Time |
Gender | No Preference |
Career Level | Intermediate |
Experience | 2 Years |
Posted at | 2024-05-01 6:21 am |
Expires on | 2025-01-24 |