Home Ireland Principal/Senior Software Engineer - Performance and Scale Engineering (Waterford Office or Ireland)

Home Ireland Principal/Senior Software Engineer - Performance and Scale Engineering (Waterford Office or Ireland)

Principal/Senior Software Engineer - Performance and Scale Engineering (Waterford Office or Ireland)

Full time at Red Hat in Ireland
Posted on May 8, 2024

Job details

About the job

The Red Hat Performance and Scale Engineering team is looking for an AI Performance Engineer to join us in the PSAP - Performance and Scale for AI Platforms team. As recent advances in AI technologies have taken the world by storm, Red Hat has engineered an enterprise grade platform - OpenShift AI, based on open source AI technologies, to help enterprises leverage the full potential of these transformative AI technologies. As part of this team, you will be responsible for the performance and scalability assessments of OpenShift AI platform - that includes but not limited to notebooks as a service, data science pipelines, model serving stack, feature store, edge AI, and a distributed model training stack. Our goal is to make OpenShift AI the platform of choice for Red Hat’s enterprise customers for leveraging AI technologies. You will help us achieve those goals through targeted improvements in the performance and scalability of the OpenShift AI platform. You will be required to formulate and execute performance test plans. You will investigate cloud infrastructure, on-prem hardware, RHEL, OpenShift, and OpenShift AI performance tuning knobs. In addition, you will triage and potentially fix performance issues, create new benchmarking tests and automation tools as needed, and socialize performance results on a regular basis. This role needs an engineer that thinks creatively, adapts to rapid change, and has the willingness to learn and apply new technologies. You will be joining a vibrant open source culture, and helping promote performance and innovation in this Red Hat engineering team. The border mission of the Performance and Scale team is to establish performance and scale leadership of the Red Hat product and cloud services portfolio. The scope includes component level, system and solution analysis and targeted enhancements. The team collaborates with engineering, product management, product marketing and customer support as well as hardware and software partners.

What you will do

  • Execute performance and scalability benchmarks against various components of the OpenShift AI platform to drive improvements and detect regressions
  • Develop tools and automation to aid the performance benchmarking work
  • Collaborate with other teams to resolve performance issues
  • Triage, debug, and solve customer cases related to AI performance
  • Submit performance benchmarking results to industry consortia
  • Publish results, conclusions, recommendations and best practices via internal test reports, presentations, and external blogs to support our partners and customers. 
  • Participate in internal and external conferences about your work and results
  • Provide technical leadership and guidance to the wider team

What you will bring

  • Experience in running performance tests, data capture, data analysis, and visualization
  • Experience with systems performance engineering and metrics collection tools such as iostat, vmstat, sar, perf, and prometheus. 
  • ​Experience with container technologies (podman, Kubernetes, docker)
  • Programming experience in Python
  • Experience working with the Linux operating system (RHEL, Fedora or CentOS preferred)
  • Experience with AI technologies and frameworks (pytorch, transformers, etc)
  • Excellent written and verbal language skills in English
Following is considered a plus
  • Knowledge of AI benchmarking suites such as MLperf
  • Experience with software defined storage, networking as it pertains to Kubernetes
  • Experience working with hardware accelerators such as Nvidia GPUs
  • Experience working on a MLOps platform
#LI-JK2 #LI-remote #J-18808-Ljbffr

Apply safely

To stay safe in your job search, information on common scams and to get free expert advice, we recommend that you visit SAFERjobs, a non-profit, joint industry and law enforcement organization working to combat job scams.

Share this job
See All Principal Jobs
Feedback Feedback