AIOps Engineer / DevOps - #1114382

Sri Trang Agro-Industry


Date: 20 hours ago
District: Singapore
Contract type: Full time
Work schedule: Full day
Sri Trang Agro-Industry

Position: AIOps Engineer
Location: Central Singapore

Department: IT Operations / Infrastructure & Cloud
Reports to: Head of IT Operations / IT Infrastructure Manager

Job Overview:

We are seeking a hands-on, visionary, and technically deep AIOps and Cloud-Native first mindset Engineer, who will play a leading role in developing our Solutions Platform — a cutting-edge Dev/Data/ML/AI/LLM-Ops platform for scalable AI/ML/Agentic innovation across Sri Trang. This role is designed for someone who thrives in complex environments, enjoys problem-solving at scale, and can architect resilient, high-performance, and automated infrastructure on multi-cloud platforms.

Join us to build scalable, intelligent, and automated infrastructures that power AI, ML, and Agentic applications at Sri Trang Group. You’ll be driving CI/CD pipelines, cloud-native deployments, and AI-enhanced solutions, ensuring our systems are not only reliable but also smart enough to heal themselves.

This is not just a role — it's a mission to redefine how AI is built, tested, deployed, and monitored at scale in our organization. If you are up to the challenge, we will be happy to get in touch with you!

Key Responsibilities:

  • DevOps – The Foundation of Your Role:

  • Develop and implement a comprehensive DevOps strategy that aligns with Sri Trang Group’s business objectives and AI transformation goals.

  • Architect and optimize CI/CD pipelines to support high-frequency deployments.

  • Build and maintain cloud-native infrastructures (preferably Azure) using Infrastructure as Code (ARM, Terraform).

  • Automate as much as possible! From deployments to monitoring, ensuring zero-touch operations whenever possible.

  • Drive observability and monitoring using cutting-edge tools like Azure monitor, Grafana, Prometheus, and Datadog.

  • Manage CPU/GPU computing resources and workloads for seamless scalability.

  • Data Operations – Because without data we can’t develop AI:

  • Collaborate with Data Engineering and Infrastructure teams to ensure the availability, quality, and timeliness of data for model training, finetuning, and serving.

  • Automate workflows supporting large-scale data preparation for AI/ML/Agentic applications.

  • Integrate version control systems and CI/CD tools (Azure DevOps preferably) to streamline the deployment of scalable data pipelines.

  • Work extensively with cloud vendors (AWS, Azure, Google Cloud Platform, etc.) to scale data infrastructure leveraging cloud-native architectures like serverless computing and distributed data systems.

  • Collaborate with data engineers, data scientists, and analysts to continuously refine deployment processes.

  • Machine Learning (ML), DevOps, and Data Engineering – Where Dev Meets AI:

  • Collaborate with Data Scientists to deploy, monitor, and scale AI/ML models in production using MLflow, TensorFlow serving, TorchServe, Nvidia Triton, etc.

  • Collaborate with Data Scientists to automate model versioning, drift detection, and retraining for optimal performance.

  • Collaborate with Data Scientists to design ML pipelines with AzureML, Airflow, or Kubeflow for efficient data and model workflows.

  • Ensure cost-efficient inference through model optimization and resource scaling on CPU/GPU instances.

  • Large Language Model Operations – Keeping up with What’s Coming:

  • Collaborate with Data Scientists to optimize deployment and fine-tuning of LLMs like DeepSeek, BERT, and Llama.

  • Collaborate with Data Scientists to work with vector databases to enhance real-time inference and implement Agentic AI.

  • Help Data Scientists to enable scalable AI applications through prompt engineering and model optimization.

  • Artificial Intelligence for IT Operations – Make the Infrastructure Smarter:

  • With the collaboration of Data Scientists, Data Engineers, and Infrastructure teams, implement AI-powered monitoring and anomaly detection to predict failures before they happen.

  • Use AI-driven automation for root cause analysis and self-healing infrastructure.

  • Enhance operational efficiency with intelligent incident response mechanisms.

  • Subject of Expertise: be the go-to expert on Dev/Data/ML/AI/LLM-Ops engineering best practices, spearheading state-of-the-art implementation in our team.

  • Documentation: Develop comprehensive documentation for Dev/Data/ML/AI/LLM-Ops processes and systems. Provide training and support to team members and stakeholders on tools and best practices.

Required Qualifications:

  • Education: Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field. A Master’s degree is preferred but not required.

  • Experience:

  • 3-5 (1-3 with PhD) years of experience in either DevOpsDevelopment and Operations, DataOpsData Operations, MLOpsMachine Learning Operations, AIOpsArtificial Intelligence for IT Operations, LLMOpsLarge Language Model Operations coupled with expertise in SRE and Cloud Engineering.

  • Strong coding skills in Python, Bash, and PowerShell, for automation and scripting.

  • Technical Skills:

  • Deep expertise in CI/CD, and multi-cloud platforms (AWS, Azure preferred, GCP).

  • Hands-on experience deploying and managing ML models in production environments.

  • Detail-Oriented:

  • Passionate about automation, AI-driven infrastructure, and making systems smarter at the highest standard possible.

How to stand out from the rest:

  • Certification in Azure (e.g., Azure AI Engineer Associate or Azure DevOps Engineer Expert).

  • Familiarity with feature stores and model registries.

  • Experience with data versioning tools like DVC.

  • MLOps Pipelines Development:

  • On-premise and edge deployment are a big plus.

  • Familiarity with AIOps and LLMOps concepts, tools, and strategies.

  • Technical Skills: knowledge of tools and technologies such as Docker, Kubernetes, SQL, Spark, Hadoop, Kafka, ONNX, and ETL processes is a big plus.

  • Continuous Integration and Deployment: experience with A/B testing and model validation in production environments is highly desirable.


How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

Machine Learning Software Engineer

Innowave Tech, Singapore
$5,000 - $7,500 / month
18 hours ago
About Innowave Tech Singapore   Innowave Tech is an Artificial Intelligence (AI) company offering solutions for the Semiconductor and Advanced Manufacturing industry. Utilizing deep industrial domain knowledge, proven experience, and innovation, we provide expert AI solutions and systems to address various...

Assistant Manager, Human Capital Development

Resorts World Sentosa, Singapore
20 hours ago
Job Summary We are seeking a proactive and enthusiastic Assistant Manager to support the workforce upskilling of RWS. This role is ideal for someone passionate about learning and development, with strong organizational skills and a keen interest in learning methodologies....
Resorts World Sentosa

Clinical Research Coordinator (5.5 days office hours/ West) #HGN

Recruit Express, Singapore
$3,000 - $3,400 / month
20 hours ago
Job Scope Subject recruitment & follow-up (screening, consent taking, data entry) Work closely with doctors, nurses & research team Maintain research records, support IRB applications & audits Ensure compliance with research protocols (HBRA, GCP) Requirements Min Degree in Science /...
Recruit Express