What Should I Focus on as I Take Over MLOps for Azure AI/ML?

0
3
Asked By CuriousCoder42 On

I'm stepping up to manage the Terraform repository for our Azure AI/ML projects after a teammate left, and the team member who trained under him didn't pick up much. The development side will start training their own models next month, and my manager advised me to prepare for that by studying on my own. Currently, the Terraform repo is being used mostly to deploy models and set up endpoints. I'll be responsible for building the infrastructure needed for developers to train their own models and ensuring we have high availability. I'm confused about the role—should I consider it MLOps, platform operations, or something else? Would getting the Azure AI Engineer certification be beneficial for me? I'm eager but also a bit overwhelmed, so I'm looking for recommendations on resources or insights into what this job entails, like infrastructure, CI/CD pipelines, etc. I'm planning to ask my company for access to Pluralsight and I already have KodeKloud, but I haven't explored the material there yet. Any advice would be greatly appreciated!

4 Answers

Answered By DevOpsDude99 On

It sounds like you’re stepping into more of an **MLOps or ML platform engineer** role. You're not directly training models, but managing the infrastructure that enables developers to do so. Using Terraform with Azure ML is a classic MLOps setup. While the **Azure AI Engineer** certification is good, it focuses more on application-level AI. For your role, consider getting certified in **DP-100** (data scientist/Azure ML) or **AZ-400** (DevOps). Pay attention to learning about Azure ML workspace, compute, storage, networking, and Terraform modules for these resources. Dive into CI/CD pipelines too! Check out Microsoft's "MLOps on Azure" documentation and the **mlops-v2** GitHub repo for practical examples. Good luck!

Answered By ML_Explorer_77 On

You're basically becoming the infrastructure expert for ML workloads. I've had similar experiences where I had to quickly become knowledgeable about things I wasn't familiar with. The good news is that managing ML infrastructure is somewhat similar to regular infrastructure—you'll set up compute clusters and storage for training jobs. Definitely consider this role as part of MLOps. While the AI Engineer cert can give you insights into services, focus on Azure Machine Learning workspaces and compute clusters. The Terraform docs for Azure should help too. Remember, managing GPU availability and thinking about checkpointing for long-running training jobs are key!

Answered By CloudNinja88 On

Just a heads up on Terraform and Azure Foundry—there's a learning curve with those. You might need to explore beyond just Terraform for your needs, or utilize the Azure API provider. I had planned to create deployment modules but opted to wait until the tools stabilize. Getting a grasp on classic versus new deployment methods will be crucial, given that Foundry lets developers take on more responsibility. Understanding their expectations and project patterns will prevent conflicts over roles as you move forward.

Answered By TerraformGuru11 On

I’d suggest focusing on the tools and tasks rather than stressing over exact titles. It does sound like MLOps fits your activities, but every title might mean different tasks in various companies. Get familiar with the Azure products relevant to your work and dive into Terraform as well. Start with understanding what Azure offers and how it integrates with ML workflows.

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.