Lead Site Reliability Engineer
Description
About The Role AIQ is looking for a Lead Site Reliability Engineer to drive reliability, performance, and scalability across our infrastructure. This role will lead SRE initiatives, mentor team members, and collaborate with engineering and product teams to build robust systems that can scale globally.Responsibilities Architect and lead reliability strategies across services and environments.Define and enforce SLOs, SLIs, and error budgets with engineering leadership.Lead incident response and root cause analysis.Implement automation to reduce toil and improve system resilience.Manage capacity planning, traffic forecasting, and cost optimization.Mentor junior and senior SREs in technical and process excellence.Collaborate with MLOPS, Dev Sec Ops, and Cloud Ops teams to enforce best practices.Champion observability, metrics-driven decisions, and platform maturity.Qualifications12+ years of experience in relevant roles.At least 1 year experience in leading a team.Expertise in Kubernetes, CI/CD (e.g., Git Lab, Argo), and infrastructure-as-code (Terraform/Helm).Strong experience in cloud platforms (Azure, AWS, or GCP).Proven background in SRE principles, SLIs/SLOs, and reliability-focused engineering.Programming proficiency in Python or Shell (nice to have).Deep understanding of distributed systems, networking, and incident management.Disclaimer: Naukrigulf.com is only a platform to connect jobseekers & employers. Applicants are advised to verify the legitimacy of the employer independently. We do NOT endorse any requests for money payments and strictly advise against sharing personal or bank-related information. For more security tips, visit Security Advice. If you suspect fraud or malpractice, email us at abuse@naukrigulf.com #J-18808-Ljbffr
Posted: 19th June 2025 7.42 am
Application Deadline: N/A
Similar Jobs
Explore more opportunities like this