Hey Brad , thanks for the feedback!
I pre-warmed a sandbox k8s cluster with enough capacity to support my test case, which is a .drone.yml with 50 parallel pipelines that just run a basic image build pipeline.
When running with a single kube-runner replica, start times balloon pretty quickly to 10x+ their normal start times and build containers are scheduled in what looks to be something of a sequential process.
Great, I did not know this was supported! I just tested this with a few iterations using the test method described above. When running with 3 kube-runner replicas, starts times seemed to decrease by ~1/3 and running with 20 kube-runner replicas start times seem to return to their normal expected values.
If scaling the kube-runner requires no special consideration, then we will probably just look at throwing more replicas at it for now. I suppose we could also look into setting up an HPA with custom metrics to scale using the drone_running_jobs
metric. Do you think that would be the appropriate metric to determine if the kube-runner needs more capacity?