ML Ops Support Engineer

Posted Yesterday
Be an Early Applicant
Reading, PA
Senior level
Information Technology • Consulting
The Role
Provide 24/7 support for ML pipelines and data processing jobs, troubleshoot issues, manage Dataiku workflows, and guarantee uptime for ML environments.
Summary Generated by Built In

Our Company

We’re Hitachi Digital Services, a global digital solutions and transformation business with a bold vision of our world’s potential. We’re people-centric and here to power good. Every day, we future-proof urban spaces, conserve natural resources, protect rainforests, and save lives. This is a world where innovation, technology, and deep expertise come together to take our company and customers from what’s now to what’s next. We make it happen through the power of acceleration.

Imagine the sheer breadth of talent it takes to bring a better tomorrow closer to today. We don’t expect you to ‘fit’ every requirement – your life experience, character, perspective, and passion for achieving great things in the world are equally as important to us.

The team

MLOps L2 Support Engineer to provide 24/7 production support for machine learning (ML) and data pipelines. The role requires on-call support, including weekends, to ensure high availability and reliability of ML workflows. The candidate will work with Dataiku, AWS, CI/CD pipelines, and containerized deployments to maintain and troubleshoot ML models in production.

The role

Key Responsibilities:

Incident Management & Support:

  • Provide L2 support for MLOps production environments, ensuring uptime and reliability.
  • Troubleshoot ML pipelines, data processing jobs, and API issues.
  • Monitor logs, alerts, and performance metrics using Dataiku, Prometheus, Grafana, or AWS tools such CloudWatch.
  • Perform root cause analysis (RCA) and resolve incidents within SLAs.
  • Escalate unresolved issues to L3 engineering teams when needed.

 Dataiku Platform Management:

  • Manage Dataiku DSS workflows, troubleshoot job failures, and optimize performance.
  • Monitor and support Dataiku plugins, APIs, and automation scenarios.
  • Collaborate with Data Scientists and Data Engineers to debug ML model deployments.
  • Perform version control and CI/CD integration for Dataiku projects.

 Deployment & Automation:

  • Support CI/CD pipelines for ML model deployment (Bamboo, Bitbucket etc).
  • Deploy ML models and data pipelines using Docker, Kubernetes, or Dataiku Flow.
  • Automate monitoring and alerting for ML model drift, data quality, and performance.

 Cloud & Infrastructure Support:

  • Monitor AWS-based ML workloads (SageMaker, Lambda, ECS, S3, RDS).
  • Manage storage and compute resources for ML workflows.
  • Support database connections, data ingestion, and ETL pipelines (SQL, Spark, Kafka).

 Security & Compliance:

  • Ensure secure access control for ML models and data pipelines.
  • Support audit, compliance, and governance for Dataiku and MLOps workflows.
  • Respond to security incidents related to ML models and data access.

 What you’ll bring

Experience: 5+ years in MLOps, Data Engineering, or Production Support.
Dataiku DSS: Strong experience in Dataiku workflows, scenarios, plugins, and APIs.
Cloud Platforms: Hands-on experience with AWS ML services (SageMaker, Lambda, S3, RDS, ECS, IAM).
CI/CD & Automation: Familiarity with GitHub Actions, Jenkins, or Terraform.
Scripting & Debugging: Proficiency in Python, Bash, SQL for automation & debugging.
Monitoring & Logging: Experience with Prometheus, Grafana, CloudWatch, or ELK Stack.
Incident Response: Ability to handle on-call support, weekend shifts, and SLA-based issue resolution.

Preferred Qualifications:

Containerization: Experience with Docker, Kubernetes, or OpenShift.
ML Model Deployment: Familiarity with TensorFlow Serving, MLflow, or Dataiku Model API.
Data Engineering: Experience with Spark, Databricks, Kafka, or Snowflake.
ITIL/DevOps Certifications: ITIL Foundation, AWS ML certifications; Dataiku certification

 Work Schedule & On-Call Requirements:

  • Rotational on-call support (including weekends and nights).
  • Shift-based monitoring for ML workflows and Dataiku jobs.
  • Flexible work schedule to handle production incidents and critical ML model failures.

About us

We’re a global, team of innovators. Together, we harness engineering excellence and passion to co-create meaningful solutions to complex challenges. We turn organizations into data-driven leaders that can make a positive impact on their industries and society. If you believe that innovation can bring a better tomorrow closer to today, this is the place for you.


Championing diversity, equity, and inclusion

Diversity, equity, and inclusion (DEI) are integral to our culture and identity. Diverse thinking, a commitment to allyship, and a culture of empowerment help us achieve powerful results. We want you to be you, with all the ideas, lived experience, and fresh perspective that brings. We support your uniqueness and encourage people from all backgrounds to apply and realize their full potential as part of our team.

How we look after you

We help take care of your today and tomorrow with industry-leading benefits, support, and services that look after your holistic health and wellbeing. We’re also champions of life balance and offer flexible arrangements that work for you (role and location dependent). We’re always looking for new ways of working that bring out our best, which leads to unexpected ideas. So here, you’ll experience a sense of belonging, and discover autonomy, freedom, and ownership as you work alongside talented people you enjoy sharing knowledge with.

We’re proud to say we’re an equal opportunity employer and welcome all applicants for employment without attention to race, colour, religion, sex, sexual orientation, gender identity, national origin, veteran, age, disability status or any other protected characteristic. Should you need reasonable accommodations during the recruitment process, please let us know so that we can do our best to set you up for success.


Top Skills

AWS
Bash
Ci/Cd
Databricks
Dataiku
Docker
Ecs
Elk Stack
Github Actions
Grafana
Jenkins
Kafka
Kubernetes
Lambda
Prometheus
Python
Rds
S3
Sagemaker
Snowflake
Spark
SQL
Terraform
Am I A Good Fit?
beta
Get Personalized Job Insights.
Our AI-powered fit analysis compares your resume with a job listing so you know if your skills & experience align.

The Company
Hyderabad
1,644 Employees
On-site Workplace

What We Do

Hitachi Digital Services is an independent services business that focuses on delivering a unified operating model for cloud, data, IoT and managed services.

Playing a pivotal role in Hitachi's digital transformation strategy, Hitachi Digital Services places a strong emphasis on Generative AI to deliver an integrated end-to-end digital transformation for enterprises. The company is strategically positioned within the Hitachi Digital portfolio of companies to leverage the synergies between operational technology (OT), information technology (IT), and product and service offerings.

Such positioning allows Hitachi Digital Services to work closely with Hitachi Digital, the new Hitachi Vantara and Hitachi group businesses, including GlobalLogic, to create an integrated end-to-end digital transformation solution for enterprises

Similar Jobs

The PNC Financial Services Group Logo The PNC Financial Services Group

Data Engineer (Hadoop, Oracle, Python, Spark, SQL)

Machine Learning • Payments • Security • Software • Financial Services
Hybrid
Pittsburgh, PA, USA
55000 Employees

CrowdStrike Logo CrowdStrike

Sr. Principal Cloud Engineer, NG-SIEM Serverless Platform (Remote, East Coast)

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Remote
Hybrid
15 Locations
10000 Employees
230K-350K Annually

Capital One Logo Capital One

Senior Lead Software Engineer, Back End

Fintech • Machine Learning • Payments • Software • Financial Services
Hybrid
2 Locations
55000 Employees
205K-257K Annually

The PNC Financial Services Group Logo The PNC Financial Services Group

Software Engineer

Machine Learning • Payments • Security • Software • Financial Services
Hybrid
Pittsburgh, PA, USA
55000 Employees

Similar Companies Hiring

InCommodities Thumbnail
Renewable Energy • Machine Learning • Information Technology • Energy • Automation • Analytics
Austin, TX
234 Employees
Quantum Rise Thumbnail
Software • Professional Services • Natural Language Processing • Machine Learning • Consulting • Automation • Artificial Intelligence
Chicago, Illinois
17 Employees
HERE Technologies Thumbnail
Software • Logistics • Internet of Things • Information Technology • Computer Vision • Automotive • Artificial Intelligence
Amsterdam, NL
6000 Employees

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account