Observability / Site Reliability Engineer (DSC/JH)
تفاصيل الوظيفة
We are seeking a skilled Observability Engineer with at least 2-3 years of experience in observability to join our dynamic team. In this role, you will be responsible for implementing, managing, and optimizing observability tools. You will work closely with cross-functional teams to ensure that our systems are monitored effectively, and issues are identified and resolved proactively. Key Responsibilities: Design, implement, and maintain observability frameworks using tools such as Prometheus, Grafana, ELK Stack, Tableau or similar. Design, implement, and maintain monitoring tools such as BMC, CA, SolarWinds, SCOM, Dynatrace, Datadog or similar. Create and manage dashboards, visualizations, and reports to communicate system health and performance metrics. Collaborate with the sales team to understand client requirements and demonstrate how our observability solutions can address their specific needs. Prepare and deliver presentations, demos, and workshops to potential clients showcasing the capabilities and benefits of our observability tools. Troubleshoot and resolve tools-related issues in a timely manner. Assist in the training and mentoring of team members on observability and monitoring tools and practices. Job Requirements: 2-3 years of experience in software development, implementation, operations, or a related field with a focus on observability tools. Proficiency in implementing and managing observability tools. Solid understanding of cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes). Experience with scripting languages (Python, Bash, etc.) for automation tasks. Knowledge of best practices in monitoring, logging, and incident management. Strong analytical skills with the ability to diagnose issues and propose effective solutions. Excellent communication and collaboration skills, with a proactive approach to problem-solving. Technical experience in enterprise monitoring tools such as Dynatrace, Grafana, BMC. Knowledge of automation tools, cloud technologies, and DevOps concepts, open systems, and networking technologies. Good knowledge in various monitoring tools e.g. BMC, SolarWinds, CloudWatch, and Azure. Experience with configuration management tools (Ansible, Terraform, etc.). Familiarity with APM (Application Performance Management) tools such as New Relic, Dynatrace, or similar. Understanding of network protocols and architectures. Experience with orchestration tools (e.g., BMC, Kubernetes, Apache Airflow, Jenkins) to create and manage automated workflows for deploying, monitoring, and scaling observability solutions. Preferred Qualifications: Proficiency in observability tools (e.g., Grafana, ELK Stack, Datadog, Prometheus etc.). Proficiency in ITOM tools (e.g., BMC, Dynatrace, CA, SCOM, IBM, SolarWinds etc.). Strong understanding of monitoring and logging frameworks. Experience with distributed systems and microservices architecture. Ability to write scripts for automation and data analysis. Experienced in cloud platforms (AWS, Azure, GCP) and their monitoring services. Experience with CI/CD pipelines and infrastructure as code (IaC) tools like Terraform or Ansible. Relevant certifications in cloud computing, DevOps, or observability tools can be a plus. Work location: Jurong East #J-18808-Ljbffr
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.