Job Specification: AI Operations Lead ( Role : AIOps Lead )
We are seeking an experienced and highly skilled AI Operations (AIOps) Lead to drive Driving Agentic Automation and AIOps implementation , the operationalization, governance, monitoring, and continuous improvement of enterprise AI solutions. This role requires a proven specialist capable of establishing scalable AI operating models while providing hands-on leadership to ensure AI systems deliver reliable, secure, and measurable business outcomes.
Key Responsibilities & Requirements:
- Provide expert leadership for the operational management and continuous improvement of AI, Machine Learning, and Generative AI solutions across the organization.
- Driving Agentic Automation and AIOps implementation by providing oversight, resolving blockers and ensuring smooth execution
- Design the solution and implement, code review and lead the team - Google ADK (Agentic Framework), LLM - Google 2.0 Flash or 1.5, Lang graph
- Drive team to formalize the engineering and integration approaches (enterprise changes, impacts, and documentation standards)
- Establish feasibility and checklist-based transition / adoption approach with automated verifications where possible
- Formalize and package adoption standards for federated adoption of AI / Agentic interventions
- Run Training and Support Incidents/Escalations associated with Adoptions / Integrations
- For first time cases, establish / package all materials associated with ad
- Develop and implement enterprise-wide AIOps frameworks, operating models, standards, and best practices to ensure scalable and sustainable AI adoption.
- Act as a hands-on contributor, working directly with program leadership, AI architects, data scientists, and engineering teams to support AI initiatives throughout their lifecycle.
- Establish monitoring, observability, and performance management capabilities for AI models, services, and AI-powered applications.
- Define and manage processes for model deployment, versioning, validation, retraining, and lifecycle management.
- Ensure AI solutions meet operational requirements related to reliability, scalability, security, compliance, and business continuity.
- Develop and track key performance indicators (KPIs) and service metrics related to AI adoption, model performance, operational efficiency, and business value realization.
- Lead incident management, root-cause analysis, and remediation efforts for AI-related production issues.
- Collaborate with data engineering, platform, security, and infrastructure teams to optimize AI platform operations and service delivery.
- Drive the implementation of MLOps and LLMOps practices to support efficient deployment, monitoring, and governance of AI solutions.
- Establish governance processes to ensure compliance with Responsible AI principles, organizational policies, and regulatory requirements.
- Identify and mitigate operational, technical, security, and governance risks associated with AI deployments.
- Support the development and execution of change management and adoption strategies to maximize the value of AI investments.
- Translate operational insights and performance data into actionable recommendations for improving AI effectiveness and business outcomes.
- Demonstrate strong stakeholder management and communication skills, particularly when engaging with senior leadership and cross-functional teams.
- Operate effectively within a fast-paced, dynamic environment, delivering measurable outcomes and driving continuous operational excellence.
Preferred Qualifications:
- Extensive experience in AI/ML operations, platform engineering, MLOps, DevOps, or enterprise technology operations.
- Strong understanding of Machine Learning, Generative AI, Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and AI platform ecosystems.
- Experience implementing and managing MLOps, LLMOps, model governance, and AI monitoring frameworks in enterprise environments.
- Proven expertise with cloud-based AI and data platforms, automation tools, monitoring solutions, and CI/CD pipelines.
- Strong analytical and problem-solving skills with the ability to translate operational data into strategic improvements.
- Demonstrated experience leading large-scale AI transformation or operational excellence initiatives.
- Excellent communication, stakeholder engagement, and leadership capabilities.
This role is ideal for a specialist who can bridge AI strategy and day-to-day operations, ensuring that enterprise AI solutions remain reliable, governed, scalable, and aligned with business objectives.
