Home India HPC Administrator L2 & L3

Home India HPC Administrator L2 & L3

HPC Administrator L2 & L3

Full time at a Laimoon Verified Company in India
Posted on June 13, 2024

Job details

Greetings from Yotta Data Services Pvt LtdPosition: HPC Administrator L2 & L3Location: Airoli/PanvelYears of experience: 4-15 yearsJob responsibilities:Provision, configure, and maintain GPU Supercomputing clusters and associated networking configuration.Collaborate with NVIDIA Solution Architect & Engineering teams on large-scale GPU-as-a-service projects, both on-premises and in cloud deployments.Implement and optimize software stacks including MaaS (metal-as-a-service), Job Scheduler (SLURM/PBS), Cloud Orchestration (Kubernetes), and Network Management (NetQ for Ethernet fabric and UFM for InfiniBand).Conduct performance engineering activities such as debugging, profiling, benchmarking, and tuning of GPU applications on large-scale supercomputing clusters.Run benchmarking applications from widely used platforms such as MLPerf Training & Inference, AI Training (PyTorch, TensorFlow, NeMo, Megatron-LM), and AI Inference (TensorRT-LLM, Triton Inference Server, vLLM).Must have skills:Hands-on experience with NVIDIA GPU , particularly NVIDIA Data Centre GPUs (A100/H100)Proficiency in provisioning and managing software stacks like MaaS, Job Scheduler (SLURM/PBS), Cloud Orchestration (Kubernetes), and Network Management (NetQ for Ethernet fabric and UFM for InfiniBand).Prior experience collaborating with NVIDIA Solution Architect & Engineering teams on large-scale GPU-as-a-service projects.Familiarity with benchmarking applications from widely used platforms and frameworks, including MLPerf, PyTorch, TensorFlow, NeMo, Megatron-LM, TensorRT-LLM, Triton Inference Server, and vLLM.Experience in performance engineering, including debugging, profiling, benchmarking, and tuning various GPU applications on large-scale supercomputing clusters.Good to Have Skill:Knowledge of other HPC technologies and architectures beyond NVIDIA, broadening expertise in the field.Good knowledge on Infiniband and other switches.Experience with other cloud platforms and orchestration tools, expanding versatility in deployment environments.Strong problem-solving and troubleshooting abilities, enabling quick resolution of complex technical issues.Excellent communication and collaboration skills to work effectively within cross-functional teams and with external partners.If interested please share your resume on ogholap@yotta.comThanks & regards,Poonam Shevale - HR PRB

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All HPC Jobs
Feedback Feedback