Droner autoscaler - Issue with AWS EC2

Hello,

I open a ticket today because I wasted time to make the Droner autoscaler working :confused:

I have followed this link to install my Drone server (with some changes on runner part for AWS EC2) :

I didn’t find any link with a tuto for AWS EC2 and the documentation on Install for Amazon EC2 | Drone looks not complete.

So, I encounter many Issue, the first one is with SSL, when I enable it, Drone doesn’t create agent instance on AWS, but if I disable SSL, it works, but has you probably know, HTTP isn’t secure. So if you’ve the solution to make it works with HTTPS, you’re welcome :slight_smile:

My main issue now :

With HTTP, Drone looks works even if this is not the best solution, but when the agent is created, and I execute a JOB from DRONE UI server I can see in my drone autoscaler logs these lines :

{"level":"debug","module":"api","msg":"FIXME: Got an status-code for which error does not match any expected type!!!: -1","status_code":"-1","time":"2022-02-08T16:36:54Z"}
{"error":"Cannot connect to the Docker daemon at https://13.38.xxx.xxx:2376. Is the docker daemon running?","ip":"13.38.xxx.xxx","level":"debug","msg":"cannot connect, retry in 1m0s","name":"agent-Kuei60K4","time":"2022-02-08T16:36:54Z"}
{"level":"debug","module":"api","msg":"FIXME: Got an status-code for which error does not match any expected type!!!: -1","status_code":"-1","time":"2022-02-08T16:36:58Z"}
{"error":"Cannot connect to the Docker daemon at https://15.236.xxx.xxx:2376. Is the docker daemon running?","ip":"15.236.xxx.xxx","level":"debug","msg":"cannot connect, retry in 1m0s","name":"agent-NqzU628Z","time":"2022-02-08T16:36:58Z"}

I’ve checked my aws security rules, and it’s look like correctly configured :

TCP	2376	sg-02c63c7b8xxx

For your information, on the same server I’ve my Gitlab runner installed which use the same feature on AWS (with docker+machine) and it works without any issue.

When i login to the server, I can see that docker is running :slight_smile:

root@ip-xx-xxx-xxx-xxx:/home/ubuntu# docker ps
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

But when I try to do a telnet from my drone server to agent server, the communication with docker looks not working but the port looks listening :

telnet 13.38.xxx.xxx 2376
Trying 13.38.xxx.xxx...
^C

root@ip-xx-xxx-xxx-xxx:/home/ubuntu# netstat -tulpn | grep 2376
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp6       0      0 :::2376                 :::*                    LISTEN      3022/dockerd  

To finish, here is my configuration for drone server and autoscaler (I use docker swarm not docker-compose) :

version: "3.9"
services:
  server:
    image: drone/drone
    ports:
      - "80:80"
#      - "443:443"
    volumes:
      - /var/lib/drone:/data
    environment:
      - DRONE_LOGS_DEBUG=true
      - DRONE_SERVER_HOST=test.example.com
      - DRONE_SERVER_PROTO=http
#      - DRONE_TLS_AUTOCERT=true
#      - DRONE_HTTP_SSL_REDIRECT=true
#      - DRONE_HTTP_SSL_TEMPORARY_REDIRECT=true
#      - DRONE_HTTP_SSL_HOST=test.example.com
#      - DRONE_HTTP_STS_SECONDS=315360000
      - DRONE_USER_FILTER=admin
      - DRONE_REPOSITORY_FILTER=test
      - DRONE_GITHUB_CLIENT_ID=4xxxxxxxxx
      - DRONE_GITHUB_CLIENT_SECRET=axxxxxxxx
      - DRONE_RPC_SECRET=1xxxx
      - DRONE_DATADOG_ENABLED=false
      - DRONE_USER_CREATE=username:admin,admin:true
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 300M
      update_config:
        order: start-first
        failure_action: rollback
        delay: 10s
      rollback_config:
        parallelism: 1
        order: stop-first
      restart_policy:
        condition: any
        delay: 10s
        max_attempts: 5
        window: 120s
  autoscaler:
    image: drone/autoscaler
    ports:
      - "8080:8080"
    volumes:
      - /var/lib/autoscaler:/data
    environment:
      - DRONE_LOGS_DEBUG=true
      - DRONE_POOL_MIN=0
      - DRONE_POOL_MAX=4
      - DRONE_POOL_MIN_AGE=1m
      - DRONE_INTERVAL=30s
      - DRONE_SERVER_PROTO=http
      - DRONE_SERVER_HOST=test.example.com
      - DRONE_SERVER_TOKEN=fxxxxxxx
      - DRONE_AGENT_IMAGE=drone/drone-runner-docker
      - DRONE_AGENT_TOKEN=1xxxxx
      - DRONE_AMAZON_IMAGE=ami-0c0f763628afa7f8b # Ubuntu 20.04 LTS
      - DRONE_AMAZON_INSTANCE=t3a.small
      - DRONE_AMAZON_VOLUME_TYPE=gp3
      - DRONE_AMAZON_TAGS=Project:drone-cicd-job,Managed_by:ec2-instance-drone-autoscaler,Environment:production
      - DRONE_AMAZON_MARKET_TYPE=spot
      - DRONE_AMAZON_REGION=eu-west-3
      - DRONE_AMAZON_SUBNET_ID=subnet-0xxxxx
      - DRONE_AMAZON_SECURITY_GROUP=sg-0xxxxx
      - DRONE_AMAZON_SSHKEY=main
      - AWS_ACCESS_KEY_ID=xxx
      - AWS_SECRET_ACCESS_KEY=xxx
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 300M
      update_config:
        order: start-first
        failure_action: rollback
        delay: 10s
      rollback_config:
        parallelism: 1
        order: stop-first
      restart_policy:
        condition: any
        delay: 10s
        max_attempts: 5
        window: 120s

I think I said everything, thanks for your help !

My post has only been approved today, since 8 February I found solutions on my way.

I just want to say that Drone server wants to use public IP to communicate with agent. I don’t understand why we can’t use private IP when it’s available. This functioning force me to add an extra automation to rerun my terraform project to update agent security group with the new drone server IP, else with private IP I could just add the security group’s name to allow docker communication.

If the Dev team could work on it, it could be appreciated :slight_smile:

About, the TLS, my issue was from GitHub when on my first integration I configured HTTP, I knew that I had to change HTTP to HTTPS from user settings, but I discovered later that I should do the same thing on linked projects.

Regards,