Site Reliability Engineer
Job details
About Us:At InstaService, we are committed to delivering reliable, high-performance home services to our customers. As a fast-growing on-demand services platform, we are looking for a talented DevOps / Site Reliability Engineer (SRE) to join our dynamic team. This role is crucial to scaling and maintaining our infrastructure, ensuring our platform remains fast, secure, and reliable.Role Overview:As a DevOps / Site Reliability Engineer (SRE), you will be responsible for ensuring the stability, performance, and scalability of our systems. You will work closely with engineering teams to improve deployment pipelines, monitor system health, and optimize performance. Experience with Kubernetes for deploying applications and GCP (Google Cloud Platform) is a must. We are looking for someone with 5+ years of experience in SRE and DevOps.Key Responsibilities:System Reliability: Ensure the reliability, uptime, and scalability of production systems, improving service availability.DevOps Practices: Implement and maintain CI/CD pipelines to streamline and automate deployments.Kubernetes Deployment: Use Kubernetes to manage the deployment, scaling, and operations of our applications.Cloud Management: Leverage GCP to build and manage cloud infrastructure, optimizing resource usage.Monitoring & Alerts: Set up and manage monitoring and alerting systems to proactively identify and resolve infrastructure issues.Incident Response: Lead incident response efforts, troubleshoot production issues, and perform post-incident root cause analysis.Performance Tuning: Analyze system performance and identify bottlenecks for improvement.Automation: Build and maintain infrastructure as code (IaC) solutions to automate processes and reduce manual efforts.Collaboration: Work with software engineering, QA, and product teams to ensure new features meet reliability and scalability standards.Documentation: Maintain comprehensive documentation of infrastructure, processes, and playbooks for operational tasks.Qualifications:5+ years of experience as a DevOps Engineer or Site Reliability Engineer (SRE).Strong proficiency with Kubernetes for deploying, scaling, and managing applications.Experience with GCP (Google Cloud Platform) for building and managing cloud-based infrastructure.Proficiency with CI/CD pipelines and tools such as Jenkins, GitLab CI, CircleCI, etc.Hands-on experience with Infrastructure as Code (IaC) tools like Terraform, Ansible, or CloudFormation.Expertise in monitoring tools such as Prometheus, Grafana, Datadog, or similar.Strong scripting and programming skills in Python, Go, Bash, or similar.Experience with version control systems (e.g., Git) and branching strategies.Familiarity with logging and alerting tools such as the ELK Stack, Splunk, or similar.Excellent troubleshooting and problem-solving skills.Strong communication skills and ability to work collaboratively with cross-functional teams.Preferred Qualifications:Experience with microservices architecture and distributed systems.Knowledge of security best practices in cloud environments.Experience with load testing and performance optimization.Why Join InstaService:Opportunity to work on cutting-edge technologies in a fast-paced, growing company.Collaborative and innovative work environment.Competitive salary and benefits package.Opportunities for professional development and growth. PRB
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.