Home India Senior Machine Learning Operations Engineer

Home India Senior Machine Learning Operations Engineer

Senior Machine Learning Operations Engineer

Full time at a Laimoon Verified Company in India
Posted on August 21, 2024

Job details

We are looking for a Senior Machine Learning Operations Engineer to join our team, to design, build, and integrate ML Ops for large-scale, distributed machine learning systems, focusing on cutting-edge tools, distributed GPU training, and enhancing research experimentation.About SatSure:SatSure is a deep tech, decision Intelligence company that works primarily at the nexus of agriculture, infrastructure, and climate action creating an impact for the other millions, focusing on the developing world. We want to make insights from earth observation data accessible to all.Join us to be at the forefront of building a deep tech company from India that solves problems for the globe. Roles & Responsibilities:Architect, build and integrate end-to-end life cycles of large-scale, distributed machine learning systems i.e. ML Ops using cutting-edge tools/frameworks.Develop tools and services for explainability of ML solutions.Implement distributed cloud GPU training approaches for deep learning models.Build software/tools that improve the rate of experimentation for the research team and extract insights from it.Identify and evaluate new patterns and technologies to improve the performance, maintainability, and elegance of our machine learning systems.Lead and execute technical projects to completion. Communicate with peers to build requirements and track progress.Mentor fellow engineers in your areas of expertise - Contribute to a team culture that values effective collaboration, technical excellence, and innovation.Collaborate with engineers across various functions to solve complex data problems at scale.Qualification:5 - 8 years of professional experience in implementing MLOps framework to scale up ML in production.Master's degree or PhD in Computer Science, Machine Learning / Deep Learning domainsMust-have:Hands-on experience with Kubernetes, Kubeflow, MLflow, Sagemaker, and other ML model experiment management tools including training, inference, and evaluation.Experience in ML model serving (TorchServe, TensorFlow Serving, NVIDIA Triton inference server, etc.)Proficiency with ML model training frameworks (PyTorch, Pytorch Lightning, Tensorflow, etc.).Experience with GPU computing to do data and model training parallelism.Solid software engineering skills in developing systems for production.Strong expertise in Python.Building end-to-end data systems as an ML Engineer, Platform Engineer, or equivalent.Experience working with cloud data processing technologies (S3, ECR, Lambda, AWS, Spark, Dask, ElasticSearch, Presto, SQL, etc.).Having Geospatial / Remote sensing experience is a plus.Competencies: Excellent debugging and critical thinking skills.Excellent analytical and problem-solving skills.Ability to work in a fast-paced, team-based environment.Benefits:Medical Health Cover for you and your family including unlimited online doctor consultationsAccess to mental health experts for you and your familyDedicated allowances for learning and skill developmentComprehensive leave policy with casual leaves, paid leaves, marriage leaves, bereavement leavesTwice a year appraisalInterview Process:Intro callAssessmentPresentationInterview rounds (ideally up to 3-4 rounds)Culture Round / HR round PRB

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All Senior Jobs
Feedback Feedback