Cloud Engineer (HPC)
Full time
at KLA
in
Online
Posted on January 9, 2025
Job details
Responsibilities:
- Optimize performance and scalability of HPC applications running in containerized environments.
- Stay up to date with the latest advancements in HPC, cloud technologies.
- Collaborate with other DevOps engineers and developers to ensure seamless integration of HPC solutions.
- Configure Linux OS for HPC needs.
- Implement and maintain Kubernetes clusters for HPC workloads.
- Explore, Qualify & tune open source cloud-based technology stacks for High Performance Compute demands.
- Design robust high performant cloud-based software architecture systems involving CPU/GPU workloads, scalable/robust storages, high-bandwidth inter-connects
- Strong knowledge of HPC systems and cloud computing technologies (gRPC, Kafka, Kubernetes, ZeroMQ, Redis, Ceph, etc.).
- Strong Linux Performance tunning
- Proven experience with Kubernetes and container orchestration
- Validated in-depth and flavor agnostic knowledge of Linux systems (SuSE, RedHat, Rocky, Ubuntu)
- Experience in different remote boot technologies like System-D, Net boot/PXE, Linux HA.
- Strong understanding of TCP/IP fundamentals and knowledge of protocols, DNS, DHCP.
- Strong fundamentals with respect to linux networking, storages.
- Proficiency in scripting languages such as Ansible, Python and Bash.
- Decent proficiency in low-level language as in c.
- Experience with CI/CD tools like Jenkins, GitLab or similar.
- Familiarity with HPC workload managers and schedulers (e.g., Slurm, PBS).
- Excellent problem-solving skills and attention to detail.
- Strong communication and teamwork abilities.
- Experience with one or more of the listed Configuration Mgmt utilities. (Salt, Chef, Puppet etc) .
- CPU and GPU Performance tuning
- BS or MS degree + 3 to 8 years validated experience
- Computer Engineering or Electrical Engineer related fields
- Team Orientation & Interpersonal – Highly motivated teammate with ability to develop and maintain collaborative relationships with all levels within and external to the organization.
- Organization & Time Management – Able to plan, schedule, organize, and follow up on tasks related to the job to achieve goals within or ahead of established time frames.
- Multi-task - Ability to expeditiously organize, coordinate, manage, prioritize, and perform multiple tasks simultaneously to swiftly assess a situation, determine a logical course of action, and apply the appropriate response.
- Adaptability to Change – Able to be flexible and supportive, and able to assimilate change positively and proactively in rapid growth environment.
- Outstanding teammate with excellent written and verbal communications skills.
Apply safely
To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.