Staff MLOps Engineer

Job Post Information* : Posted Date 2 days ago(1/8/2026 10:44 AM)
ID
2026-2121
# of Openings
1
Category
Engineering

Overview

We are looking for a seasoned Staff MLOps Engineer to lead the design, implementation, and scaling of enterprise-grade machine learning platforms on AWS. This role will focus on building reliable, secure, and cost-efficient MLOps systems that enable data scientists and engineers to deploy, monitor, and manage ML models in production. As a Staff Engineer, you will provide technical leadership, define best practices, and drive cross-team alignment on ML platform architecture.

Duties & Responsibilities

Key Responsibilities 

 

MLOps Platform & Architecture 

 

  • Architect and own scalable MLOps platforms on AWS supporting model training, deployment, monitoring, and governance. 
  • Design and maintain end-to-end ML CI/CD pipelines, including data validation, model training, testing, approval, and deployment. 
  • Establish standards for model lifecycle management, experiment tracking, versioning, reproducibility, and rollback. 

 

Model Deployment & Monitoring 

 

  • Enable real-time, batch, and asynchronous model inference using AWS-native and container-based solutions. 
  • Implement monitoring for model performance, data drift, concept drift, and operational metrics. 
  • Ensure high availability, fault tolerance, and observability for production ML systems. 

 

AWS Cloud & Infrastructure 

 

  • Lead design and implementation using AWS services, including but not limited to: 
  • Amazon SageMaker (training, hosting, pipelines, feature store) 
  • EKS, ECS, EC2, Lambda for model serving and orchestration 
  • S3, Glue, Athena, Redshift for data storage and analytics 
  • CloudWatch, X-Ray for logging and monitoring 
  • Implement Infrastructure as Code (IaC) using Terraform or AWS CloudFormation. 
  • Optimize ML workloads for cost, performance, and scalability, including GPU/spot instance strategies. 

 

 

 

DevOps, Security & Compliance 

  • Build and maintain CI/CD pipelines using tools such as GitHub Actions, GitLab CI, Jenkins, or AWS CodePipeline. 
  • Enforce security best practices (IAM, VPC, encryption, secrets management). 
  • Support compliance, auditability, and governance requirements for ML systems. 

 

Technical Leadership & Collaboration 

 

  • Serve as a Staff-level technical leader, influencing MLOps architecture across multiple teams. 
  • Mentor engineers and data scientists on production ML best practices. 
  • Partner with Data Science, Data Engineering, Platform, and Product teams to align ML solutions with business goals. 
  • Contribute to the long-term ML platform roadmap and strategy. 

 

Skills Required

 

 

  • 11–13 years of overall experience, with 5+ years in MLOps, ML Platform, or ML Infrastructure roles. 
  • Strong experience deploying and operating machine learning models in production on AWS. 
  • Proficiency in Python and experience with ML frameworks such as TensorFlow, PyTorch, Scikit-learn. 
  • Deep hands-on experience with Docker and Kubernetes (EKS). 
  • Strong understanding of Amazon SageMaker and its ecosystem. 
  • Experience with CI/CD systems and Git-based workflows. 
  • Solid background in distributed systems, system design, and cloud architecture. 

 

Preferred / Nice-to-Have Skills 

  • Experience with SageMaker Feature Store, Pipelines, Model Registry, or MLflow. 
  • Exposure to LLMOps / GenAI on AWS (Bedrock, custom LLM deployment, vector databases like OpenSearch, Pinecone). 
  • Experience with streaming and real-time pipelines (Kafka, Kinesis, Spark). 
  • Experience in regulated or high-scale environments (finance, healthcare, retail, etc.). 
  • AWS certifications (Solutions Architect, Machine Learning Specialty) are a plus. 

 

Soft Skills 

  • Strong ownership and decision-making ability at a Staff level. 
  • Excellent communication skills across engineering, data science, and leadership teams. 
  • Ability to balance short-term delivery with long-term platform vision. 
  • Passion for building reliable, scalable, and maintainable ML systems. 

 

 

 

 

 

Qualifications Required: 

 

  • Bachelor’s degree (B.A.) from four-year college or university, or equivalent combination of education and experience.  
  • 11–13 years of overall experience, with 5+ years in MLOps, ML Platform, or ML Infrastructure roles. 

 

 

About symplr: 

 

As a leader in healthcare operations solutions, we empower healthcare organizations to navigate the complexities of integrating critical business operations. Our customers are at the heart of everything we do, and they rely on our mission-critical systems to drive better operations and better outcomes. 

 

We are a remote-first company with employees working across the United States, India, and the Netherlands. Guided by values, we focus on teamwork, championing our customers, being rooted in action and outcomes, overcoming challenges, and leading through equality and integrity. Read more about symplr's culture and values at symplr.com/careers. 

 

Options

Sorry the Share function is not working properly at this moment. Please refresh the page and try again later.
Share on your newsfeed