Senior MLOps Engineer - #1120251
CDG Zig

We are seeking a skilled Senior MLOps Engineer with a strong DevOps foundation to build and manage the infrastructure that powers our entire machine learning ecosystem. You will be responsible for automating the ML lifecycle, ensuring our models are deployed, monitored, and scaled with maximum reliability and efficiency. Your work will be critical in enabling our data scientists and ML engineers to innovate faster by providing a robust, scalable, and automated platform.
Job Responsibilities
Design, build, and maintain robust, scalable CI/CD pipelines specifically for machine learning, automating data validation, model training, deployment, and testing.
Take full ownership of the ML infrastructure, managing and provisioning cloud resources (AWS, GCP, or Azure) using Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
Develop and manage our containerization and orchestration strategy for ML services using Docker and Kubernetes (or platforms like Kubeflow).
Implement comprehensive monitoring solutions to track model performance, data/concept drift, and system health, with automated alerting and response mechanisms.
Establish and manage the central model registry and feature store, enforcing best practices for model versioning, lineage, and governance.
Automate and optimize ML workflows by integrating disparate systems, APIs, and tooling to ensure seamless operations from development to production.
Collaborate with Data Science, ML Engineering, and SRE teams to define and evangelize MLOps best practices across the organization.
Job Requirements
Bachelor’s degree in Computer Science, Computer Engineering, Software Engineering, or a related technical field.
Min 3 to 5 years of experience in MLOps, DevOps or a related field.
Strong command of DevOps practices and extensive experience building and managing CI/CD pipelines (e.g., GitHub Actions, Jenkins, GitLab CI).
Expertise in a major cloud platform (AWS, GCP, or Azure), including its native data, AI/ML, compute, and storage solutions.
Proven, hands-on experience with Infrastructure as Code (IaC) tools, particularly Terraform or CloudFormation.
Practical experience with containerization (Docker) and orchestration technologies (Kubernetes) for deploying and scaling applications.
Strong scripting proficiency (e.g., Python, Bash) for automation and building tooling.
Direct experience with MLOps-specific tools such as MLflow, Kubeflow, DVC, Seldon Core, or cloud-native equivalents (e.g., Amazon SageMaker, Vertex AI).
Familiarity with the machine learning lifecycle and its unique challenges (e.g., experiment tracking, data versioning, model monitoring).
Experience supporting ML systems in the ride-hailing industry, particularly around dynamic pricing, would be a strong plus.
Proficiency with workflow orchestration tools like Apache Airflow is highly desirable.
Experience with monitoring and logging tools like Guance, Prometheus, Grafana, or the ELK stack.
A proactive and analytical approach to problem-solving, with a systems-thinking mindset.
Strong ownership mentality with the ability to manage critical infrastructure and platforms independently.
Excellent communication skills to collaborate effectively with cross-functional technical teams.
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Digital Marketing Executive (Automotive) | $3600/ Northeast/ Bonus

Aerospace Traineeship Programme

Outdoor Sales Executive
