Drone-agent in error status

Environment

drone-server: 1.4.0
autoscaler: commit: “a8ce5f982e0b516cb88c3335cbfad8344fbf4dbb”

Problem

I intermittently get the agents in error status when I query drone server info.

Name: agent-WDXdfUhX
Address: 10.0.12.8
Region:  us-west-2a
Size:    c5.9xlarge
State:   error
Error:   error during connect: Post https://10.0.12.8:2376/v1.33/images/create?fromImage=drone%2Fagent&tag=1: EOF

I use autoscaler on AWS EC2, and the instance was instantiated.

What is the possible cause of the error? Is there a way to clean error instance in autoscaler? Otherwise, I’m going to run a cron script to drone server destroy, but I’d like to know if there is a way to prevent this from happening in the first place.

Here is my autoscaler configuration. I specified AMI of Ubuntu 18.04 LTS to use the latest docker version (19.03.2).

sudo docker run -d \
  -e DRONE_POOL_MIN=0 \
  -e DRONE_POOL_MAX=20 \
  -e DRONE_CAPACITY_BUFFER=0 \
  -e DRONE_POOL_MIN_AGE=30m \
  -e DRONE_INTERVAL=10s \
  -e DRONE_SERVER_PROTO=https \
  -e DRONE_INSTALL_CHECK_INTERVAL=60s \
  -e DRONE_AMAZON_IMAGE=ami-08edb3f0e9f3f2557 \
  -e DRONE_SERVER_HOST=drone-ci.example.com \
  -e DRONE_SERVER_TOKEN=****** \
  -e DRONE_AGENT_CONCURRENCY=12 \
  -e DRONE_AGENT_TOKEN=****** \
  -e DRONE_AMAZON_REGION=us-west-2 \
  -e DRONE_AMAZON_SUBNET_ID=subnet-****** \
  -e DRONE_AMAZON_SECURITY_GROUP=sg-***** \
  -e DRONE_AMAZON_INSTANCE=c5.9xlarge \
  -e DRONE_AMAZON_SSHKEY=drone \
  -e AWS_ACCESS_KEY_ID=****** \
  -e AWS_SECRET_ACCESS_KEY=******* \
  -e DRONE_AMAZON_PRIVATE_IP=true \
  -p 8080:8080 \
  --restart=always \
  --name=autoscaler \
  drone/autoscaler

What is the possible cause of the error?

This is the full error message returned from Docker, which unfortunately is not very descriptive. If you have questions about specific Docker errors we recommend engaging Docker technical support, since this is outside our area of expertise.

Is there a way to clean error instance in autoscaler?

You can set DRONE_ENABLE_REAPER=true which runs a task every hour to remove errored instances.

1 Like