Machine Learning Engineer
Description
MLOps Engineer Onsite in Abu Dhabi, full relocation provided Key Responsibilities Model Deployment: Oversee the deployment and scaling of large language models (LLMs) and other deep learning systems using modern inference engines such as v LLM, Triton, or TGI, with a focus on reliability and performance. Pipeline Engineering: Build and manage automated pipelines for model fine-tuning, evaluation, versioning, and continuous delivery using platforms like MLflow, Kubeflow, or comparable tooling. Infrastructure Management: Design and maintain cloud-native infrastructure for ML workloads, leveraging services from major cloud providers (e.g., EC2, Kubernetes, serverless functions, managed ML services). Performance Optimization: Implement robust monitoring and logging strategies, ensuring low-latency, high-availability systems that meet production-grade performance metrics. Cross-Functional Collaboration: Partner with data scientists, ML researchers, and software engineers to support experimentation workflows and ensure research-to-production continuity. Dev Ops & Automation: Create infrastructure-as-code (Ia C) solutions and CI/CD pipelines for repeatable, secure deployments of ML systems. Model Optimization: Apply techniques such as quantization, pruning, and distributed inference to maximize performance while minimizing computational costs. Qualifications Experience: 5+ years of hands-on experience in MLOps, ML infrastructure, or related engineering roles, with a strong track record in managing the full ML lifecycle. Deployment Expertise: Demonstrated experience deploying large-scale ML models with advanced inference and optimization practices. Cloud Infrastructure: Deep understanding of cloud platforms (preferably AWS or equivalents), including scalable architecture design and cost-efficient compute management. Programming: Proficient in Python, with experience in C/C++ for performance-critical applications. Tooling: Well-versed in MLOps tools such as MLflow, Kubeflow, or Sage Maker Pipelines; strong working knowledge of Docker, Kubernetes, and distributed systems. Optimization: Familiarity with tools and frameworks for distributed training and inference such as Deep Speed, FSDP, or Accelerate. Education: Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Engineering, or a related discipline. #J-18808-Ljbffr
Posted: 4th July 2025 6.52 am
Application Deadline: N/A
Similar Jobs
Explore more opportunities like this