Distributed Cloud | Azure Site Reliability Engineer, Hybrid
Job details
Job DescriptionOur Devoteam Distributed Cloud Unit is looking for Azure Site Reliability Engineers to join our Infrastructure Microsoft team and work inside several projects within the banking sector.We are seeking an experienced and highly skilled Senior Site Reliability Engineer (SRE) with Azure and Kubernetes certifications and extensive experience working with microservices. The ideal candidate will be responsible for ensuring the reliability, scalability, and performance of our cloud-based microservices architecture. You will collaborate closely with development and operations teams to design, implement, and maintain systems that are robust, resilient, and highly available.Infrastructure Management: Design, implement, and manage scalable, secure, and reliable cloud infrastructure on Microsoft Azure and Google Cloud Platform.Microservices Management: Design, implement, and manage a scalable and resilient microservices architecture on Microsoft Azure.Kubernetes Administration: Deploy, manage, and optimize Kubernetes clusters, ensuring smooth operation and integration with existing systems.Automation and Monitoring: Develop and maintain automation scripts and tools to enhance system efficiency and reduce manual intervention. Implement comprehensive monitoring and alerting systems to proactively identify and resolve issues.Performance Optimization: Analyze system performance metrics and make recommendations for improvements. Implement performance tuning and optimization strategies.Incident Response: Lead incident response efforts, including troubleshooting, root cause analysis, and post-incident reviews. Develop and implement strategies to minimize downtime and prevent future occurrences.Collaboration: Work closely with development, operations, and security teams to ensure seamless integration of new features and technologies. Advocate for best practices in reliability, security, and performance.Documentation: Create and maintain detailed documentation of system architecture, processes, and procedures. Ensure knowledge sharing across the team.Mentorship: Provide guidance and mentorship to junior SREs and other team members. Foster a culture of continuous learning and improvement.QualificationsBachelor's degree in Computer Science, Information Technology, or a related field. Master's degree is a plus.Minimum of 5 years of experience in a Site Reliability Engineer or similar role, with a strong background in cloud infrastructure and container orchestration.Extensive experience with Kubernetes, including deployment, scaling, and maintenance of clusters.Strong understanding of microservices architecture and best practices.Strong programming and scripting skills (e.g., Python, PowerShell, Go, Bash, Azure CLI).Strong experience with IAC and configuration management tools (e.g., Ansible, Terraform).Familiarity with CI/CD pipelines and tools (e.g., Jenkins, GitLab CI).Knowledge of networking, security, and best practices in cloud environments.Excellent problem-solving and analytical skills.Strong communication and collaboration abilities.Ability to work effectively in a fast-paced, dynamic environment.Leadership skills with the ability to mentor and guide team members.Good level of knowledge in English (mandatory).
#J-18808-Ljbffr Engenharia e tecnologia
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.