AI Platform Operations Engineer (Ref 26288) - #1129196
Jobline Resources Pte Ltd
Responsibilities
• Perform availability monitoring, outage detection, and performance optimization of Azure AI cloud platform
• Support incident response, root cause analysis, and implement disaster recovery strategies to ensure business continuity
• Support security audits, compliance reporting, and ensure alignment with Singtel policies, regulatory frameworks and industry best practices
• Collaborate with other developer teams to integrate monitoring, automation, and security best practices into AI/ML workflows
• Drive continuous improvement in platform operations through automation, observability, and operational excellence initiatives
Requirements
• Bachelor’s degree in Computer Science, Engineering, or a related field
• 1-2 years of experience in cloud administration and/or operations.
• Expertise in Azure operations and monitoring services including Azure Monitor, Log Analytics, Application Insights
• Proficiency in infrastructure-as-code (Terraform, Bicep, ARM) and automation scripting (PowerShell, Python)
• Familiarity with AI/ML infrastructure (AKS, GPU VMs, data pipelines, model hosting) and their operational demands
• Excellent problem-solving, communication, and leadership skills, especially in high-pressure incident scenarios
• Forward thinking ability to identify possible failure scenarios and formulate effective response plans
Licence no: 12C6060
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Sales Admin Executive
Loan Operation Executive 【Immediate Hiring】
Accounts Executive (High Volume/ Cap $5,000 + AWS + VB!!)