Machine Learning Engineer
Description
MLOps Engineer Onsite in Abu Dhabi, full relocation provided Key Responsibilities Model Deployment: Oversee the deployment and scaling of large language models (LLMs) and other deep learning systems using modern inference engines such as v LLM, Triton, or TGI, with a focus on reliability and performance.Pipeline Engineering: Build and manage automated pipelines for model fine-tuning, evaluation, versioning, and continuous delivery using platforms like MLflow, Kubeflow, or comparable tooling.Infrastructure Management: Design and maintain cloud-native infrastructure for ML workloads, leveraging services from major cloud providers (e.g., EC2, Kubernetes, serverless functions, managed ML services).Performance Optimization: Implement robust monitoring and logging strategies, ensuring low-latency, high-availability systems that meet production-grade performance metrics.Cross-Functional Collaboration: Partner with data scientists, ML researchers, and software engineers to support experimentation workflows and ensure research-to-production continuity.Dev Ops & Automation: Create infrastructure-as-code (Ia C) solutions and CI/CD pipelines for repeatable, secure deployments of ML systems.Model Optimization: Apply techniques such as quantization, pruning, and distributed inference to maximize performance while minimizing computational costs.Qualifications Experience: 5+ years of hands-on experience in MLOps, ML infrastructure, or related engineering roles, with a strong track record in managing the full ML lifecycle.Deployment Expertise: Demonstrated experience deploying large-scale ML models with advanced inference and optimization practices.Cloud Infrastructure: Deep understanding of cloud platforms (preferably AWS or equivalents), including scalable architecture design and cost-efficient compute management.Programming: Proficient in Python, with experience in C/C++ for performance-critical applications.Tooling: Well-versed in MLOps tools such as MLflow, Kubeflow, or Sage Maker Pipelines; strong working knowledge of Docker, Kubernetes, and distributed systems.Optimization: Familiarity with tools and frameworks for distributed training and inference such as Deep Speed, FSDP, or Accelerate.Education: Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Engineering, or a related discipline. #J-18808-Ljbffr
Posted: 13th June 2025 8.54 am
Application Deadline: N/A
Similar Jobs
Explore more opportunities like this