I just enabled the autoscaler using hetzner cloud. Everything seems to work great with creating and destroying servers, but here is the situation:
We have two on-prem servers dedicated to drone. One running the server, and a runner, and another just running a docker runner. The servers are quite old and can’t handle too many pipelines, so we want to offload heavy load to hetzner cloud VMs when there is need.
Here is the relevant part of the autoscaler if needed:
autoscaler:
container_name: autoscaler
image: drone/autoscaler
restart: always
ports:
- 8080:8080
environment:
- DRONE_POOL_MIN=0
- DRONE_POOL_MAX=4
- DRONE_SERVER_PROTO=https
- DRONE_SERVER_HOST=ci.webkom.dev
- DRONE_SERVER_TOKEN=$DRONE_USER_TOKEN
- DRONE_AGENT_TOKEN=$DRONE_RPC_SECRET
- DRONE_HETZNERCLOUD_TOKEN=$HETZNER_TOKEN
- DRONE_HETZNERCLOUD_TYPE=cpx21
- DRONE_HETZNERCLOUD_SSHKEY=0
- DRONE_SLACK_WEBHOOK=$SLACK_AUTOSCALER_WEBHOOK
The problem is this: A bunch of builds pile up at the same time, so say there are 4 running builds and 8 pending. The autoscaler kicks in and provisions 4 servers. But the pending builds are still pending? And the newly provisioned servers do nothing. The logs on the runners show they successfully ping the remote server with correct capacity. After some builds complete, the provisioned runners start on the pending builds (but not all). So the capacity is not at all used and the autoscaler provided servers mostly sit unused.
As a side note: The slack webhook does not seem to be doing anything either. And no logs from the autoscaler (not checked debug logs).~
EDIT: Configuration error. Always check your shell quotation