Home Online Staff Site Reliability Engineer

Home Online Staff Site Reliability Engineer

Staff Site Reliability Engineer

Full time at nference in Online
Posted on September 30, 2024

Job details

Staff Site Reliability Engineer: Job Location: Bangalore Work Mode: Hybrid (3 days in the office, 2 days remote) As a Staff Site Reliability Engineer (SRE) at Nference, you will ensure the reliability, scalability, and performance of our nSights platform. Collaborate closely with engineering teams to design, build, and maintain systems supporting our global partners and customers. Key Responsibilities:

  • System Design and Architectu re: Design highly available, fault-tolerant systems that scale to meet customer and partner demands.
  • Automation and Tooli ng: Develop and maintain monitoring, deployment, and operational tools for efficiency and reduced manual intervention.
  • Incident Response and Root Cause Analys is: Lead post-incident reviews to identify causes and implement preventive measures.
  • Cross-functional collaborati on: Work with software engineering teams to advocate reliability best practices and influence architectural decisions.
  • Mentorship and Leadersh ip: Mentor junior engineers, conduct technical interviews, and contribute to the SRE community.
Requirements:
  • 7+ years of Site Reliability Engineering, or a blend of software engineering and DevOps.
  • Strong Linux fundamentals, system administration scripting, performance tuning, and troubleshooting.
  • Proficiency in one programming language - Python, Java, or Golang.
  • Deep understanding of AWS, GCP, Azure cloud platforms, and Kubernetes orchestration.
  • Experience building and managing Kubernetes clusters using Terraform. CRD and operator implementation experience preferred.
  • Familiarity with ArgoCD, Nexus repository is advantageous.
  • Skilled in creating and utilizing Terraform modules and CI/CD pipelines.
  • Implementation experience with open-source observability and alerting tools like Prometheus, Grafana, Cortex, Thanos, Alertmanager, etc.
  • Networking knowledge (VPC, VNet, DNS) and TCP/IP stack understanding, internet routing, and load balancing
  • Excellent interpersonal, communication, and teamwork skills across diverse environments including SREs, Engineers, and Product Managers.
  • Project or team leadership experience, with a commitment to mentoring and developing junior engineers.
  • Join us in advancing the reliability and scalability of our platform. Apply now and contribute to our dynamic team environment focused on innovation and excellence.

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All Staff Jobs
Feedback Feedback