Job details
Must have : Incident Manager having experience in handling Production Major Incident calls, RCA problem management. Skills:
- Incident Management : Experience in managing incidents, including response strategies and postmortem analysis, is critical for maintaining system reliability.
- Troubleshooting : The ability to diagnose and resolve issues quickly is a key trait for any Incident Manager.
- Networking Knowledge : A solid grasp of networking concepts helps in diagnosing issues and understanding how systems communicate.
- Programming Skills : Proficiency in programming languages (such as .NET or Java) is important for reviewing ideas/solutions, automation, and developing tools.
- Monitoring and Observability : Skills in using monitoring tools (like AppD, Azure App Insights, EAGLE, Grafana) to track system performance and detect anomalies are essential.
- Security Awareness : Understanding security best practices helps ensure that reliability solutions do not compromise system security.
- Collaboration and Communication : Strong interpersonal skills are necessary for working effectively with development, network, firewall, and release operations teams.
- Windows/Linux/Unix Proficiency : Understanding of Windows, Linux, or Unix systems is fundamental, as many applications run on these platforms.
- Cloud Computing : Familiarity with cloud services (like AWS, Azure, or Google Cloud) is crucial, given the prevalence of cloud-based architectures.
- CI/CD Practices : Understanding Continuous Integration and Continuous Deployment (CI/CD) pipelines is vital for managing software releases and ensuring reliability.
- Capacity Planning : Skills in forecasting system needs and scaling resources accordingly are important for maintaining performance.
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.