Site Reliability Engineer (LATAM-based)
تفاصيل الوظيفة
About Us: At Dendi, we provide critical software infrastructure necessary for clinical labs to operate and connect patient diagnostic data. We don’t cut corners, and we work closely with our partners to continuously deliver a solution that helps them achieve their goals. We’re a motivated and entrepreneurial group of team players. We have high expectations for each other and believe in fostering a sense of camaraderie and support within our tight-knit group. We’re proud of what we do, and we’re looking for others who are proud of their work, too. Dendi was founded in 2019 and is based in Research Triangle Park, NC. We recently raised a $3.6 million seed round from Teamworthy Ventures, QuidelOrtho, and CEOs of companies like OpenGov, G2, & ChannelAdvisor. Join us, and let’s work on improving human health together. About the Role: We are seeking a Site Reliability Engineer (SRE) based in LATAM to be a key player in building and maintaining our robust infrastructure, ensuring our clinical lab software operates reliably and at scale. In this role, you will take ownership of the reliability and performance of our systems, working alongside a talented team that values collaboration, innovation, and proactive problem-solving. You will spend the majority of your time focusing on infrastructure management —designing, building, and maintaining highly scalable, cloud-based systems that power our lab solutions. You'll be leveraging AWS, automating with Infrastructure as Code tools like Terraform, and deploying efficient CI/CD pipelines. The ideal candidate is fully fluent in both verbal and written English communication , thrives in a dynamic environment, and possesses a deep understanding of cloud services with a strong grasp of scripting and automation. You'll not only work to improve our infrastructure's reliability but also foster a culture of knowledge-sharing and operational excellence across the team. By documenting processes, communicating effectively, and adhering to best practices, you'll help ensure our technology grows as securely and efficiently as possible. If you are passionate about reliability, automation, and continuous improvement, and want to help improve human health, we'd love to have you on our team. Key Responsibilities: Infrastructure Management (80%):
- Ensure the reliability, availability, and performance of applications and systems.
- Design, build, and maintain scalable and efficient infrastructure using AWS services.
- Deploy Docker containers in production environments.
- Utilize Infrastructure as Code (IaC) tools such as Terraform, Ansible, and Packer for automation.
- Develop and manage complex CI/CD pipelines using tools like GitLab CI, GitHub Actions, etc.
- Code in Python or Ruby and script in Shell (Bash) for automation and integration tasks.
- Architect, support, and deploy large-scale systems from scratch.
- Implement best practices for massive-scale data ingestion and messaging systems.
- Work closely with developers and operations teams, fostering a true DevOps culture.
- Document infrastructure and processes extensively, ensuring information is centralized in the company’s knowledge base.
- Communicate clearly in written form, using Slack, tickets, and documentation to share knowledge.
- Stay updated on security best practices and ensure the infrastructure is secure.
- Encourage collaboration and empower colleagues by sharing knowledge and offering feedback.
- Fully fluent English written and verbal communication.
- 7 years of experience as an SRE, with diverse infrastructure experience.
- In-depth knowledge of AWS services and 5+ years of hands-on experience deploying them in production environments.
- Experience with Docker containers, Infrastructure as Code tools (Terraform, Ansible, Packer), and CI/CD pipelines (GitLab CI, GitHub Actions).
- Expertise and deep knowledge of SRE to leverage while making executive decisions.
- Strong coding skills in Python or Ruby and scripting in Shell (Bash).
- Thorough understanding of the Software Development Lifecycle (SDLC).
- Proven experience in architecting, supporting, and deploying large-scale systems from scratch.
- Familiarity with massive-scale data ingestion and messaging systems.
- Strong written communication skills and a commitment to documentation.
- Curiosity and a proactive approach to problem-solving and learning new technologies.
- Ability to work collaboratively in a remote-first environment.
- Security-conscious mindset and up-to-date knowledge of security standards.
- Unlimited PTO.
- Observance of local holidays (no work on most U.S. holidays).
- A collaborative work environment where you can grow your skills and career.
- Opportunities to work on diverse projects and technologies.
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.