Home India Principal Site Reliability Engineering Manager

Home India Principal Site Reliability Engineering Manager

Principal Site Reliability Engineering Manager

Full time at a Laimoon Verified Company in India
Posted on April 27, 2024

Job details

Principal Site Reliability EngineerWe are seeking a highly skilled and experienced Principal Site Reliability Engineer (SRE) to join Lab45 team in Wipro. As a Principal SRE, you will play a critical role in ensuring the reliability, availability, and performance of our systems and applications. Your expertise and leadership will be essential in driving the adoption of best practices, designing scalable architectures, and improving the overall reliability of our infrastructure.Responsibilities:Design, implement, and maintain highly available and scalable systems, services, and architectures to support our organization's applications and infrastructure.Lead efforts to improve system reliability, monitoring, and performance, utilizing automation and best practices for continuous integration and deployment.Collaborate with cross-functional teams to identify and resolve performance bottlenecks, scalability issues, and architectural challenges.Develop and implement incident response procedures, conduct post-incident analysis, and drive root cause analysis to prevent future incidents.Define and enforce service-level objectives (SLOs) and service-level agreements (SLAs) to ensure the reliability and availability of our systems and applications.Automate deployment and configuration processes, utilizing infrastructure-as-code and configuration management tools.Stay up-to-date with the latest industry trends and technologies related to site reliability engineering, and proactively recommend and implement improvements.Mentor and provide technical leadership to SRE and engineering teams, promoting a culture of reliability, performance, and scalability.Requirements:Bachelor's or Master's degree in Computer Science, Engineering, or a related field.Extensive experience (10+ years) in site reliability engineering or a similar role, with a strong focus on designing and maintaining scalable, reliable, and high-performance systems. Deep understanding of cloud technologies and platforms (e.g., AWS, Azure, Google Cloud) and experience with cloud-based infrastructure management.Proficiency in scripting and programming languages (e.g., Python, Go, Java) for automation and infrastructure-as-code.Strong knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes) and experience with microservices architectures.Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack) to ensure system health and performance.Familiarity with incident management and response processes, including on-call rotations and post-incident analysis.Excellent troubleshooting and problem-solving skills, with the ability to quickly diagnose and resolve complex system issues.Strong communication and collaboration skills, with the ability to effectively interact with technical and non-technical stakeholders.Relevant certifications (e.g., AWS Certified DevOps Engineer, Google Cloud Professional DevOps Engineer) are highly desirable.Join our team as a Principal Site Reliability Engineer and contribute to the design, implementation, and maintenance of our organization's reliable and scalable systems. Apply your expertise to drive site reliability initiatives, mentor engineering teams, and ensure the highest level of system performance, availability, and resilience. PRB

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All Principal Jobs
Feedback Feedback