Início EAU [Dubai] Site Reliability Engineer

Início EAU [Dubai] Site Reliability Engineer

[Dubai] Site Reliability Engineer

Full time na a Laimoon Verified Company no UAE
Publicado em June 8, 2024

Detalhes do emprego

At Syndica, big things happen. Every day, we're translating vision into reality by tackling new and exciting challenges head-on. This is a breakthrough stage in our company, and you'll experience firsthand the infectious enthusiasm of our employees and leadership team. You'll have the opportunity to learn new skills, grow your career, and work with the smartest, most passionate people in crypto.This role will have primary accountability for maintaining and operating Syndica's blockchain infrastructure platform. Golang knowledge is a necessity! The team operates with a "run what you write" philosophy and each engineer is responsible for deploying and operating the code they write.A successful candidate must have demonstrable experience in at least one programming language (preferably Go, Rust or C++), and previous work in SaaS application development and operations. You will be working closely with the Support and Development team on the architecture and configuration of our AWS and GCP hosted infrastructure as well as management of our bare metal RPC nodes. You will be responsible to ensure the environment is configured, managed, and monitored correctly to support the business. You will drive decisions on the right-sizing of servers and storage, troubleshooting performance issues, ensuring the highest level of reliability for the platform, and tuning the environment for maximum scalability, cost efficiency, and security. The ideal candidate will also have prior experience developing applications on either of the three major cloud platforms - AWS, Azure, or GCP via Kubernetes.ResponsibilitiesDesign, creation, and provisioning of infrastructureAdminister overall site availability, security, latency and system healthResponsible for effective provisioning, installation/configuration, operation, and maintenance of services and system software and related infrastructureAdminister the state of all components in our cloud and bare metal environmentsDeploy, manage, and operate the cloud environmentsDesign, build, manage and operate the infrastructure and configuration of SaaS applications with a focus on automation and infrastructure as codeDesign, manage and operate the infrastructure as a service layer (hosted and cloud-based platforms) that supports the different platform servicesDevelop comprehensive monitoring solutions to provide full visibility to the different platform components using tools and services like Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic, and other similar toolsCreate the environments and tooling that enables the development team to release code quickly and reliablyIdentify and troubleshoot any availability and performance issues at multiple layers of deployment, from hardware, to operating environment, network, and applicationEvaluate performance trends and expected changes in demand and capacity, and establish the appropriate scalability plansTroubleshoot and solve customer RPC issuesEnsure that SLAs are met in executing operational tasksWork with development teams to ensure best practices for scalability, reliability, and security are designed and implemented from the startConduct periodic on-call dutiesQualificationsGreat collaborator with 5+ years of experience in a DevOps or SRE roleDeep understanding of infrastructure-as-code (Terraform, etc.) and deploying large-scale systems reliablyStrong experience with Infrastructure as Code and Configuration Management toolsExperience with Prometheus/Grafana for metrics aggregation/visualizationConfiguration of CI/CD pipelinesExperience using KubernetesExperience with automation tools/platformsExperience with alerting and monitoring toolsStrong knowledge of monitoring and performance analytics tools (DataDog, New Relic, etc.)Commitment to implementing reliability and security best practicesCapacity planning experience, including resource optimization and load testingExperience working in a highly distributed company is a plusAlign a portion of your day with the business hours of Central Time Zone - UTC -6Working knowledge of information security issuesExperience in Building and managing Virtualized systems (KVM, OVM, Containers/Docker) and ability to read and understand source codeSystematic problem-solving approach, combined with a strong sense of ownership and driveFirm grasp of at least one modern programming language, beyond advanced scripting (Shell or Python)Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)Experience writing automation tools & eagerness to "automate all the things"What does success in this role look like?In three months, you have become our infrastructure administrator with respect to overall site availability, security, latency, system health, customer accounts, and billing. You'll have taken on independent code review responsibilities and are collaborating on the design of new featuresIn six months, you have earned the trust of the team and are delivering tasks through the entire SDLC, from design through development with minimal guidance, and are helping to effectively mentor new engineers joining the teamIn twelve months, you have established a cadence of predictable, on-time delivery without cutting corners

#J-18808-Ljbffr Engineering

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All [Dubai] Jobs
Feedback Feedback