[Solved] [Autoscaler] Pending builds not being run despite available runners

LudvigHz · January 11, 2022, 9:37pm

I just enabled the autoscaler using hetzner cloud. Everything seems to work great with creating and destroying servers, but here is the situation:

We have two on-prem servers dedicated to drone. One running the server, and a runner, and another just running a docker runner. The servers are quite old and can’t handle too many pipelines, so we want to offload heavy load to hetzner cloud VMs when there is need.

Here is the relevant part of the autoscaler if needed:

  autoscaler:
    container_name: autoscaler
    image: drone/autoscaler
    restart: always
    ports:
      - 8080:8080
    environment:
      - DRONE_POOL_MIN=0
      - DRONE_POOL_MAX=4
      - DRONE_SERVER_PROTO=https
      - DRONE_SERVER_HOST=ci.webkom.dev
      - DRONE_SERVER_TOKEN=$DRONE_USER_TOKEN
      - DRONE_AGENT_TOKEN=$DRONE_RPC_SECRET
      - DRONE_HETZNERCLOUD_TOKEN=$HETZNER_TOKEN
      - DRONE_HETZNERCLOUD_TYPE=cpx21
      - DRONE_HETZNERCLOUD_SSHKEY=0
      - DRONE_SLACK_WEBHOOK=$SLACK_AUTOSCALER_WEBHOOK

The problem is this: A bunch of builds pile up at the same time, so say there are 4 running builds and 8 pending. The autoscaler kicks in and provisions 4 servers. But the pending builds are still pending? And the newly provisioned servers do nothing. The logs on the runners show they successfully ping the remote server with correct capacity. After some builds complete, the provisioned runners start on the pending builds (but not all). So the capacity is not at all used and the autoscaler provided servers mostly sit unused.

~~As a side note: The slack webhook does not seem to be doing anything either. And no logs from the autoscaler (not checked debug logs).~~~
EDIT: Configuration error. Always check your shell quotation

LudvigHz · January 12, 2022, 1:44pm

SOLVED

Turns out we had concurrency limits enabled in one (and the one that runs the most) of our pipelines.

As a side note here as well. It might be an idea in the future to incorporate this limit together with the autoscaler logic, as currently, a situation like this, where a single project has several pending builds, but there is a limit on concurrency. The autoscaler will still provision servers, which consequentially, will do nothing, due to said limit.

wojtekxtx · December 15, 2022, 9:50pm

Classic illogical behavior from drone team

Topic		Replies	Views
Autoscaler: Scaling to zero? Drone Support	2	328	March 10, 2020
Autoscaler not destroying servers Drone Bugs	2	532	May 11, 2021
Autoscale and privileged containers Drone Support	1	237	June 11, 2021
How to enable Autoscaler Logging? Drone Support	3	680	August 18, 2019
Troubleshooting the Autoscaler Drone FAQ	0	584	January 12, 2022

[Solved] [Autoscaler] Pending builds not being run despite available runners

Related topics