Site Reliability Engineer
Job details
Site Reliability Engineer Apply Location: Singapore, Singapore Time Type: Full Time Posted On: Posted 30+ Days Ago Job Requisition ID: R-2024-11-101247 About the Team: Our team treats infrastructure and operations as software engineering problems. We are responsible for building and progressing software platforms that enable the provisioning and management of all Digibank services in safe, reliable, and scalable ways. We consistently challenge the status quo and use new technologies to build platforms and tooling for engineering teams. Join us and make significant decisions with a huge impact on building modern banking technology. About the Role: We treat Infrastructure and operations as Software Engineering problems. Our mission is to build and progress software platforms which enable the provisioning and managing of all Digibank services in safe, reliable and scalable ways. You will be part of a team, responsible for designing & architecting new solutions, finding creative ways to optimize existing solutions which will improve agility for managing hundreds of microservices infrastructures in a stable & reliable way. If you are: A strong believer in automating DevOps & SRE aspects like infrastructure provisioning, deployment, observability, incident lifecycle, uptime SLA etc. Bold to challenge, open to get challenged, curious to learn & grow. This role would require skill in Linux networking (TCP/IP, firewall, switch, CloudFront, AWS security groups). This is the right place for you! Roles and Responsibilities: Configuring and installing various network devices and services (e.g., routers, switches, firewalls, load balancers, VPN). Support IT network infrastructure-related work, such as installing Internet connections, WiFi APs, network upgrades, office builds, expansions, and relocations. Actively participate in engaging with Business Stakeholders, internal IT Teams, and Vendors to manage the outcome of the projects. Using Infrastructure as Code tooling like Terraform and Ansible to manage AWS, Azure & Kubernetes resources. Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents. Perform analytics on previous incidents and usage patterns to better predict issues and take proactive actions. Build and drive adoption for greater self-healing and resiliency patterns. Performance and cost optimization for infrastructure. Be part of an on-call rotation for the team’s tooling and 24x7 support coverage as needed. Succeed, fail, and learn together with other talented people. We believe in an environment that provides an opportunity for growth. Qualifications: Bachelor's degree in information systems, information technology, computer science, or similar. 3-5 years of professional experience. Extensive routing, switching, security, and wireless LAN design, implementation, and troubleshooting experience. Cloud (AWS/Azure) network configuration and integration with on-premises network equipment. Network Automation experience using any scripting language (Python, Go, Perl, Bash). Experience with managing Infrastructure as Code using Terraform. Direct production operations experience in a cloud environment. #J-18808-Ljbffr
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.