ML Platform Engineer
Foxglove · San Francisco, CA (OnSite)
On-site$183–310k
mid
ml platformplatform engineer
Apply on Foxglove →
Build the data infrastructure that powers physical AI.
Physical AI is moving from research labs into production fleets across industries. As robots scale across the real world, from factories to vehicles, to defense - every workflow from product development to deployment becomes a data problem: what happened, when, on which robot, and why?
At Foxglove, we built the unified data platform for physical AI that developer and engineering teams use to answer those questions. We help teams make vast quantities of robotics data actionable, creating the data flywheel they need to develop, test, train, deploy, and operate robots with confidence.
About the Role
We're looking for a ML Platform Engineer with deep infrastructure instincts to help design, deploy, and scale the systems that power Foxglove's data platform. This is a platform-first role: you'll own the infrastructure layer that makes ML possible in production, not just the models that run on top of it.
You'll be responsible for the reliability, scalability, and performance of the ML platform itself, from inference serving and pipeline orchestration to training infrastructure and evaluation frameworks. The problems are real and urgent: petabyte-scale multimodal robotics data, high-throughput retrieval and embedding pipelines, and the internal ML flywheel that lets our team ship fast. This is a hands-on infrastructure role, not research.
Key Responsibilities
Design, deploy, and operate production inference infrastructure — including model serving, autoscaling, load balancing, and cost optimization across cloud environments
Own the platform architecture for embedding and retrieval pipelines that power semantic search over multimodal robotics data (image, video, point cloud, and timeseries)
Build and maintain the training and evaluation infrastructure that enables rapid iteration on model performance — including job orchestration, experiment tracking, and dataset versioning
Drive cloud infrastructure decisions (AWS/GCP) that directly impact latency, throughput, reliability, and cost at scale
Define platform abstractions and internal tooling that let product engineers ship ML-powered features without needing to manage infrastructure themselves
Evaluate, integrate, and operationalize third-party ML infrastructure components; establish clear build vs. buy frameworks for the team
What We're Looking For
Deep, hands-on experience owning production ML infrastructure: inference serving, model optimization (e.g., vLLM, Triton, TorchServe), orchestration, and cloud cost management
Strong foundation in distributed systems and cloud infrastructure (AWS/GCP) — you think in terms of system reliability, failure modes, and operational burden, not just model accuracy
Experience architecting and operating retrieval systems at scale, including vector databases (e.g., Pinecone, Lance, turbopuffer, pgvector) and embedding pipelines over large, heterogeneous datasets
A platform engineer's mindset: you build systems that other engineers depend on, and you take that responsibility seriously
Proven ability to operate with high ownership — you can make hard infrastructure tradeoffs independently and move fast without breaking things
Strong communication skills; you can explain infrastructure tradeoffs clearly to both ML and non-ML engineers
Bonus Points
Familiarity with fine-tuning and domain adaptation techniques for LLMs or embedding models (i.e. SFT, PEFT)
Familiarity with data mining or hybrid search workflows, especially as applied in robotics autonomous vehicles, or physical AI workflows
Prior experience building ML platforms, evaluation frameworks, or data management tooling from the ground up
What We Offer
$300 monthly budget towards commuter benefits or building your personal workspace (remote only)
Competitive equity grant in a Series B company
Medical, Dental, Vision, and Term Life insurance coverage at 100% for employees and 75% for dependents
401(k) matching up to 4%
4 weeks vacation, plus holidays and winter break
All expenses paid company off-sites 2× per year
Why Join Us
Work on real robotics problems. Robot data is large, messy, multimodal, time-sensitive, and tied to physical-world behavior. The problems we work on span ingestion, indexing, search, visualization, replay, connectivity, collaboration, evaluation, and operations.
Build tools engineers rely on. Foxglove is used by robotics teams investigating failures, validating changes, reviewing field behavior, curating datasets, and operating production fleets. The work you do helps teams understand what their robots saw, what they did, and why they behaved the way they did.
High-leverage product surface area. A better query path, visualization workflow, Fleet connection, UI primitive, API, onboarding flow, or customer deployment can change how an entire robotics team works.
Ownership and autonomy. We’re a small team, and people at Foxglove own meaningful work end-to-end. You’ll have real influence over product direction, technical architecture, customer outcomes, and how we operate as a company.
Strong peers and high standards. You’ll work with people who care about correctness, performance, craft, product judgment, and building software that technical users trust under pressure.
A mission grounded in production software. We accelerate robotics and physical AI by building the infrastructure teams use every day to connect to robots, inspect live telemetry, manage multimodal data, replay runs, investigate failures, and improve real systems.
What we offer
Competitive equity grant in a Series B company.
Medical, dental, vision, and term life insurance coverage at 100% for employees and 75% for dependents, for U.S. full-time employees.
401(k) matching up to 4%, for U.S. full-time employees.
4 weeks of vacation, plus holidays and winter break.
All-expenses-paid company offsites 1–2× per year.
$300 monthly budget toward commuter benefits or building your personal workspace, depending on role/location.
Equal opportunity
Foxglove is an equal opportunity employer. We welcome candidates from different backgrounds, experiences, and communities, and we’re committed to building an inclusive environment for everyone.
We encourage you to apply even if you don’t meet every nice-to-have listed above. The strongest candidates often bring a mix of relevant experience, curiosity, judgment, and the ability to learn quickly.
About Foxglove
Foxglove is the data platform for Physical AI. Built for robotics teams developing real-world systems, Foxglove provides a purpose-built, modular platform to collect, organize, and learn from vast quantities of multimodal data, creating the data flywheel to safely scale from development to distributed fleets. Founded in 2021, Foxglove supports hundreds of customers across automotive, aerospace, defense, logistics, agriculture, construction, and consumer robotics to deploy the next generation of intelligent machines. Learn more at foxglove.dev .
Posted 2026-04-02