I am looking at increasing the amount of CPU I have available to my CI jobs. We have some pretty heavy lifting in our CI, but it only utilises the machine 10% of the day.
It would be absolutely ideal if there was a way to automatically set up something like an Amazon Spot Instance to fire up and connect as a drone-agent, then when there are no active jobs, send off a message to tear down the spot instance.
Is there any support for this in drone 0.5? It seems such a useful thing to be able to do.
I do see this as outside the scope of the core Drone codebase at this time. I think this would be a great stand-alone utility that runs as a cron job, queries the build queue for pending jobs using the API, and determines if instances should be added or removed.
I remember hearing a few different teams created utilities for automatically scaling. I can’t remember which platforms they supported and if they open sourced their work. I recommend sending a message in the gitter room to poll the community.
We are indirectly hoping to enable this sort of behavior by integrating ECR and Kubernetes and Swarm which all have some sort of auto-scaling capabilities. I’m also interested in Hyper.sh which provisions bare-metal docker environments on-demand with per-second billing.
Spot instances is of course very interesting as well. If we ever went that route and sponsored some sort of official project for auto-scaling, it would likely be a separate daemon that monitored the build queue (briefly mentioned in my prior post). So I would definitely say you (or anyone) can feel comfortable with that design approach, and if it really takes off, the option is always on the table to transfer the project to the drone organization.
This is very interesting, I’m very tempted to write such a simple daemon.
I’m going to manually create and spawn EC2 instances tonight to see if they are worth supporting for our build (i.e. stable, performant)
Can I temporarily disable agents that are registered? We let volunteers submit CPU time and I don’t own those boxes.
Does drone have a push api? (Callxack or ideally websockets) in there a way to get information about average build times for the queued builds so that i could anticipate requirements?
I tried Google Cloud, it was useless - couldn’t even allocate me a cluster. It’s obviously over-subscribed or Google don’t care enough to make it work.
Then I tried Amazon ECS, but I am totally lost. I seem to have started a Cluster, which has an Instance, which has a Task, which has my Container in it… and the Container is the drone agent, with all the envvars it needs. But there is no output, and it doesn’t seem to do anything. I have no idea what to do next, I think I need somebody with Amazon ECS experience to hold my hand.
Why don’t you use a Kubernetes Cluster with automatic scaling?
I have used it in the Google Cloud with Concourse CI before, but now switched to Drone. The setup is not finished yet, but I think it should be very easy.