Let’s get started
By clicking ‘Next’, I agree to the Terms of Service
and Privacy Policy, and consent to receive emails from Rise
Jobs / Job page
Senior ML Infrastructure Engineer  image - Rise Careers
Job details

Senior ML Infrastructure Engineer

About Us

Hippocratic AI is developing the first safety-focused Large Language Model (LLM) for healthcare. Our mission is to dramatically improve healthcare accessibility and outcomes by bringing deep healthcare expertise to every person. No other technology has the potential for this level of global impact on health.

Why Join Our Team

  • Innovative mission: We are creating a safe, healthcare-focused LLM that can transform health outcomes on a global scale.

  • Visionary leadership: Hippocratic AI was co-founded by CEO Munjal Shah alongside physicians, hospital administrators, healthcare professionals, and AI researchers from top institutions including El Camino Health, Johns Hopkins, Washington University in St. Louis, Stanford, Google, Meta, Microsoft and NVIDIA.

  • Strategic investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.

  • Team and expertise: We are working with top experts in healthcare and artificial intelligence to ensure the safety and efficacy of our technology.

For more information, visit www.HippocraticAI.com.

We value in-person teamwork and believe the best ideas happen together. Our team is expected to be in the office five days a week in Palo Alto, CA unless explicitly noted otherwise in the job description.

The Role:

We are seeking a Machine Learning Infrastructure Engineer to design, build, and manage the next-generation training and inference platform for LLMs. You will be at the heart of building scalable, efficient infrastructure that supports our researchers and engineers in training, serving, and experimenting with large models at scale. Your work will directly impact our ability to innovate with new architectures and training techniques in production environments.

Key Responsibilities:

  • LLM Training Infrastructure: Design and operate large-scale training clusters using Kubernetes and/or Slurm for LLM experimentation, fine-tuning, and RLHF workflows.

  • Cluster & GPU Management: Own scheduling, autoscaling, resource allocation, and monitoring across high-performance GPU clusters (NVIDIA, AMD).

  • Distributed Systems: Build and optimize distributed data pipelines using frameworks like Ray, enabling parallel training and inference jobs.

  • Inference Optimization: Benchmark and optimize model serving performance with technologies like vLLM, and support autoscaling of inference workloads in production environments.

  • Platform Reliability: Collaborate with infra and platform engineers to ensure system robustness, observability, and maintainability of ML workloads.

  • Research Enablement: Partner closely with ML researchers to enable rapid experimentation through flexible and efficient infrastructure tooling.

Preferred Qualifications:

  • 5+ years of experience in infrastructure, MLOps, or systems engineering, ideally with time spent in architect or staff-level roles.

  • Proven experience managing large-scale Kubernetes or Slurm clusters for training or serving ML workloads.

  • Strong proficiency in Python; familiarity with Go or Rust is a plus.

  • Hands-on experience with Ray, vLLM, Hugging Face Transformers, and/or custom LLM training stacks.

  • Deep understanding of GPU scheduling, container orchestration, and workload optimization across heterogeneous hardware.

  • Experience with inference workloads, benchmarking, latency optimization, and cost-performance tradeoffs.

  • Familiarity with Reinforcement Learning, particularly RLHF frameworks, is a strong plus.

  • Contributions to internal platforms that enabled others to train or fine-tune LLMs efficiently.

Bonus Skills:

  • Exposure to multiple hardware platforms (e.g., H100s, A100s, MI300X).

  • Experience with managing storage, IOPS performance, and object store integration for ML data.

  • Familiarity with building observability into ML pipelines (e.g., Prometheus, Grafana, Datadog).

  • Ability to present infra systems/platforms to technical stakeholders.

Hippocratic AI Glassdoor Company Review
4.8 Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon Glassdoor star icon
Hippocratic AI DE&I Review
No rating Glassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star iconGlassdoor star icon
CEO of Hippocratic AI
Hippocratic AI CEO photo
Munjal Shah
Approve of CEO

Average salary estimate

$175000 / YEARLY (est.)
min
max
$140000K
$210000K

If an employer mentions a salary or salary range on their job, we display it as an "Employer Estimate". If a job has no salary data, Rise displays an estimate if available.

Similar Jobs
Photo of the Rise User
Posted 4 days ago

Contribute as a Materials Characterization Engineer II at Relativity Space, driving materials testing and qualification for next-generation aerospace rockets.

Photo of the Rise User
AECOM Hybrid Los Angeles, CA
Posted 4 days ago

Lead geotechnical design projects at AECOM’s Los Angeles office, driving innovative infrastructure solutions in a collaborative environment.

Photo of the Rise User

Contribute to cybersecurity excellence as an Inside Systems Engineer at Palo Alto Networks by advising clients technically and supporting the sales cycle.

SEC Hybrid 3900 N Capital of Texas Hwy, Austin, TX, USA
Posted 7 days ago

A Senior Staff Physical Design Engineer role at Samsung focusing on advanced semiconductor IP development for cutting-edge GPU technology with an emphasis on physical design excellence.

Mizuho Hybrid New York, NY (1271 AOA/6th Ave)
Posted 13 days ago

Lead Mizuho's payment modernization and financial crime strategies as the Payment Product Engineering Manager, overseeing multi-location teams and large-scale engagements.

Photo of the Rise User
Anduril Industries Hybrid Atlanta, Georgia, United States
Posted 6 days ago

Anduril Industries seeks a Flight Test Engineer to lead software and hardware test planning and execution for their autonomous Altius UAS in a hybrid work environment.

Photo of the Rise User
Posted 10 days ago

Allied Universal® Technology Services seeks an Applications Engineer skilled in software integration and automation to enhance security system implementations.

Photo of the Rise User
Toole Design Hybrid Atlanta, Georgia, United States
Posted 6 days ago

An experienced Senior Transportation Engineer is needed to lead multimodal transportation projects and mentor teams at Toole Design's Atlanta office, a leader in sustainable civil engineering.

Photo of the Rise User

Bridge Engineering Designer needed at Consor to support bridge design, inspection, and rehabilitation projects with a hybrid work model.

Barry Isett & Associates Inc Hybrid Phoenixville, Pennsylvania, United States
Posted 5 days ago

Experienced Professional Geotechnical Engineer needed at Barry Isett & Associates to lead projects and support team and client development in a multi-discipline engineering firm.

K2 Space Hybrid Los Angeles, California, United States
Posted 6 days ago

K2 Space is seeking a Senior Thermal Design Engineer to lead the detailed thermal design and testing of cutting-edge spacecraft subsystems.

Photo of the Rise User
Medtronic Hybrid Boulder, Colorado, United States of America
Posted 4 days ago

Medtronic seeks a Manufacturing Engineer II to lead validation and optimization projects onsite in Boulder, advancing medical device manufacturing excellence.

Photo of the Rise User
Posted 12 days ago

A Developer Relations Engineer role at E2B to build inspiring AI projects and engage developers through demos and community leadership in San Francisco.

Hippocratic AI is building a safety-focused large language model (LLM) for the healthcare industry. We believe that generative AI has the potential to massively increase healthcare access the world over but has to be built and tested responsibly. ...

2 jobs
MATCH
Calculating your matching score...
FUNDING
DEPARTMENTS
SENIORITY LEVEL REQUIREMENT
TEAM SIZE
EMPLOYMENT TYPE
Full-time, onsite
DATE POSTED
July 20, 2025
Risa star 🔮 Hi, I'm Risa! Your AI
Career Copilot
Want to see a list of jobs tailored to
you, just ask me below!