Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that address the most complex challenges in robotics, unlocking the full potential of embodied intelligence. We go beyond typical data-driven approaches or pure transformer-based architectures, and are charting a new course, with already-globally-deployed solutions delivering real-world results and rapidly improving models through real-field applications.
Field AI is transforming how robots interact with the real world. We are building risk-aware, reliable, and field-ready AI systems that solve the hardest challenges in autonomy — deploying globally today to unlock the full potential of embodied intelligence. Our solutions go beyond conventional data-driven ML or purely transformer-based models. We’re building real-world AI that learns from experience and delivers tangible, continuous improvements in the field.
Are you excited by the challenge of supporting ML teams with robust, scalable infrastructure? Do you want to help accelerate real-time robotics through better developer workflows and reliable systems?
Field AI is hiring an ML Infrastructure Engineer to own the software platform and tooling that enables fast, reliable AI development and deployment across our ML and robotics stacks.
What You Will Get To Do
• Build ML Infrastructure & Developer Tooling
• Design and implement internal tools, libraries, and CLI utilities that streamline experimentation, model training, and evaluation.
• Improve local and cloud development environments using Docker, internal package registries, and monorepos.
• Build reusable templates and interfaces for training, evaluation, and inference pipelines.
• Support the ML Lifecycle (Data → Models → Deployment)
• Develop pipelines for dataset ingestion, transformation, versioning, and validation.
• Automate model training, evaluation, packaging, and deployment to cloud and edge environments.
• Ensure integrity and traceability across data, code, and model artifacts.
• Improve Build Systems and Developer Experience
• Maintain and evolve a shared monorepo across ML, robotics, and software teams.
• Leverage Bazel or similar systems to enable fast, reproducible builds and tests.
• Enhance developer workflows to support consistent environments and reduce friction.
• Own CI/CD and Automation for ML Systems
• Build and maintain CI/CD pipelines (e.g., GitHub Actions, AWS Step Functions) for ML experimentation and deployment.
• Automate regression testing and benchmarking models.
• Develop observability tools: dashboards, telemetry systems, and model health monitoring.
• Collaborate Across Engineering & Research Teams
• Work closely with ML scientists, software engineers, and roboticists to translate high-level platform needs into robust engineering solutions.
• Participate in code and design reviews, documentation, and cross-team planning
What Will Set You Apart
• Experience with distributed training frameworks (e.g., PyTorch DDP, FSDP, DeepSpeed, Megatron).
• Familiarity with orchestrating large-scale training jobs using Kubernetes-based platforms (e.g., Ray, SageMaker, EKS, Karpenter).
• Background in hybrid edge-cloud ML deployments or infrastructure supporting robotic systems.
• Prior work in environments requiring real-time ML performance, safety validation, or regulatory traceability.