I'm setting up an enterprise-grade Azure DevOps environment with self-hosted agents using VM Scale Sets (VMSS), and I've hit a snag with the scale-out latency. It currently takes around 10 to 20 minutes to provision a new VM and bootstrap the agent, which means a lot of wait time for queued pipeline jobs when no agent is free. Our aim is to minimize this wait time so that jobs can start processing as soon as a new agent is available.
For some context, we're using self-hosted agents registered through Azure DevOps agent pools backed by VMSS for elasticity. While reliability and cost are important, our priority is responsiveness. I'm looking for best practices or architectural recommendations to reduce this scale-out delay. Some ideas I'm considering include:
- Maintaining a minimum number of warm or idle agents
- Using pre-baked VM images with agents ready to go
- Exploring alternative scaling strategies like queue-based or hybrid pools
- Assessing if VMSS is the best fit for our needs
I'd love to hear how others are dealing with fast job pickup with Azure DevOps self-hosted agents, especially at scale. Any real-world insights or lessons learned would be greatly appreciated! Thanks!
5 Answers
Definitely check out Managed DevOps Pools! In my experience, the average startup time is around 5 minutes, and you can easily set up standby agents during business hours only. Also, you only pay for the time the VM is running, and the agent installs automatically, saving setup time. You can either use Microsoft hosted agents or create custom images.
One effective strategy is to create a golden image that has all the necessary tools and dependencies pre-installed. This way, you can reduce startup time significantly. Keeping some warm or standby instances can also help with immediate job processing, but you might still have to wait a few minutes for additional instances to spin up. We usually aim for around 10 minutes of start time, mainly because cost efficiency is our top priority.
Using VMSS does have its advantages, but I recommend considering managed DevOps pools. Once you enable warm or standby instances, you'll see reduced wait times. We found that incorporating Linux images further minimizes the startup time. For instance, we achieved around 2 minutes for a new agent to be ready with a Windows image and even faster with Linux. It could be a good option to explore!
It might be worth looking into your current setup because while we also use VMSS, our wait time is only about 3 to 5 minutes with a custom Ubuntu image. For our Windows agents, it's around 5 to 7 minutes. We build our golden images using a custom Packer script on a monthly basis. If your bootstrap process is taking 10 to 20 minutes, it sounds like there may be additional steps or scripts causing the delay after the VM starts.
We’ve opted for Azure Container Instances for our pipeline. The pipeline creates an instance and hands off jobs to a self-hosted agent. The time from creation to registration is about 1 minute, and once the jobs are finished, the instance gets destroyed. This could be a fast alternative for your setup!

Related Questions
Can't Load PhpMyadmin On After Server Update
Redirect www to non-www in Apache Conf
How To Check If Your SSL Cert Is SHA 1
Windows TrackPad Gestures