[Solved] [Autoscaler] Pending builds not being run despite available runners

I just enabled the autoscaler using hetzner cloud. Everything seems to work great with creating and destroying servers, but here is the situation:

We have two on-prem servers dedicated to drone. One running the server, and a runner, and another just running a docker runner. The servers are quite old and can’t handle too many pipelines, so we want to offload heavy load to hetzner cloud VMs when there is need.

Here is the relevant part of the autoscaler if needed:

  autoscaler:
    container_name: autoscaler
    image: drone/autoscaler
    restart: always
    ports:
      - 8080:8080
    environment:
      - DRONE_POOL_MIN=0
      - DRONE_POOL_MAX=4
      - DRONE_SERVER_PROTO=https
      - DRONE_SERVER_HOST=ci.webkom.dev
      - DRONE_SERVER_TOKEN=$DRONE_USER_TOKEN
      - DRONE_AGENT_TOKEN=$DRONE_RPC_SECRET
      - DRONE_HETZNERCLOUD_TOKEN=$HETZNER_TOKEN
      - DRONE_HETZNERCLOUD_TYPE=cpx21
      - DRONE_HETZNERCLOUD_SSHKEY=0
      - DRONE_SLACK_WEBHOOK=$SLACK_AUTOSCALER_WEBHOOK

The problem is this: A bunch of builds pile up at the same time, so say there are 4 running builds and 8 pending. The autoscaler kicks in and provisions 4 servers. But the pending builds are still pending? And the newly provisioned servers do nothing. The logs on the runners show they successfully ping the remote server with correct capacity. After some builds complete, the provisioned runners start on the pending builds (but not all). So the capacity is not at all used and the autoscaler provided servers mostly sit unused.

As a side note: The slack webhook does not seem to be doing anything either. And no logs from the autoscaler (not checked debug logs).~
EDIT: Configuration error. Always check your shell quotation :smile:

SOLVED

Turns out we had concurrency limits enabled in one (and the one that runs the most) of our pipelines.

As a side note here as well. It might be an idea in the future to incorporate this limit together with the autoscaler logic, as currently, a situation like this, where a single project has several pending builds, but there is a limit on concurrency. The autoscaler will still provision servers, which consequentially, will do nothing, due to said limit.

Classic illogical behavior from drone team