الصفحة الرئيسية الهند Site Reliability Engineer [T500-9984]

الصفحة الرئيسية الهند Site Reliability Engineer [T500-9984]

Site Reliability Engineer [T500-9984]

دوام كامل في a Laimoon Verified Company في India
نُشرت يوم June 17, 2024

تفاصيل الوظيفة

Short Description:A site reliability engineer (SRE) is a role that combines software engineering and systems engineering to ensure that a software system is available, scalable, and maintainable 24*7*365 in "Always ON" aspect for the Ford's e-Commerce PlatformDescription for Internal CandidatesStrong background in software development and systems administration, as well as excellent problem-solving and communication skills.Improve reliability, quality, and time-to-market of our suite of software solutionsMeasure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improveIdentify and reduce or eliminate toil via automation to maximize the time spent on engineering and innovationPerforming root cause analysis of production incidents and implementing preventive measuresResponsibilities for Internal CandidatesStrong background in software development and systems administration, as well as excellent problem-solving and communication skills.Run the production environment by monitoring availability and taking a holistic view of system health.Developing, improving, and operating the deployment and orchestration of a complex distributed systemImprove reliability, quality, and time-to-market of our suite of software solutionsMeasure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improveProvide primary operational and engineering Support for multiple large, distributed software applicationsIdentify and reduce or eliminate toil via automation to maximize the time spent on engineering and innovationCollaborating with development teams to design, build, and operate scalable and resilient software systemsAutomating deployment, monitoring, and incident response processesPerforming root cause analysis of production incidents and implementing preventive measuresConducting performance analysis and optimization of the systemEnsuring compliance with security and regulatory standardsImplementing and maintaining disaster recovery processesProviding technical guidance and mentorship to other team membersParticipating in an on-call rotation for incident response and support.Qualifications:4 Year College Degree in Computer Science or Equivalent.2-5 years' experience with JAVA, J2EE, NoSQL/SQL Datastore, Spring Boot, GCP/AWS/Azure & Docker/K8 in Maintenance and Development of multi-tier applications.Understanding of RESTful APIs and microservices platform2-5 Years of experience with any of APM and other monitoring tools such as Dynatrace, New Relic, ELK, Splunk, Prometheus, Sensu, Nagios, Kafka, DataDog, PagerDuty.Strong experience with product & development teams to establish error budgets by identifying the right SLOs (Service level objective), SLIs (Service level indicators), KPIs (Key performance indicators) and effectively drive the use of the budget to ensure maximum domain availability/uptime.Regularly review key site technical metrics such as transactions errors, logging, response times, caching strategies, conversion/bounce rates, capacity & resource utilization.Proactively identify stability risks & work with engineering leadership to establish appropriate mitigation plansExperience in solving complex architecture/design & business problems, work to simplify, optimize, remove bottlenecks, etc.Architect, design & develop automation to reduce toil, improve recoverability, availability, latency & scalability of supported applications with understanding of MTTD (Mean Time to Detection) & MTTR (Mean Time to Resolution)Maintain knowledge repository that includes Standard operating procedure, Release checklists, Runbooks for incident recovery Same Posting Description for Internal and External Candidates PRB

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All Site Jobs
تعليقات وملاحظات تعليقات وملاحظات