from time to time, I’m getting this:
This happens when drone-autoscaler has just scaled up because of additional build jobs.
I can get around this by implementing a custom userData block:
# use firewall to disable access to docker until it has restarted and has been able to pull an image
runcmd:
- ufw default allow outgoing
- ufw default allow incoming
- ufw deny 2376
- echo activating firewall
- ufw enable
- apt-get install -o Dpkg::Options::="--force-confold" --force-yes -y docker-ce #custom docker config is already in place. These options makes sure installing docker doesnt overwrite them.
- docker pull drone/drone-runner-docker
- echo sleeping for 30 secs
- sleep 30
- echo opening firewall
- ufw allow 2376
We inject this using the DRONE_AMAZON_USERDATA_FILE
environment variable.
This allows docker to get installed without overwriting the config (daemon.json), start and perform a docker pull before it becomes available to drone.
It would be good if more robustness was built-in to the drone-autoscaler instead, so that we didn’t have to do this. For example, perform a docker pull (with appropirate retry logic) and only when that succeeds mark the runner as “ready for service”.
I think we discussed this in another thread and the maintainer said they’d never experienced issues with drone runners coming online, but we’re seeing it quite frequently - it seems to me as if drone-ausoscaler simply marks the node as healthy too soon. We’d very much like to not have to maintain a custom userdata config.