New Docker Runner breaks the registry plugin

Hi!

We use Drone in combination with the autoscaler and the ECR registry plugin to use custom build images on ECR.

Since we updated the Drone autoscaler from 1.4.4 to 1.5.0 we cannot pull images from ECR anymore.
This update included the new Docker runner, so we suspect this new runner isn’t compatible with custom registry plugin(s).

We tried both the new and old style environment variables (DRONE_REGISTRY_PLUGIN_ENDPOINT vs DRONE_REGISTRY_ENDPOINT)
The logs doesn’t contain any information about this issue.

Did something change in this new runner how registry plugins work? :slightly_smiling_face:

the only changes I am aware of are changes to the environment variables (below) however we also have code in place to read the old variables [1] to ensure backward compatibility. So this should really be a non-issue.

DRONE_REGISTRY_PLUGIN_ENDPOINT
DRONE_REGISTRY_PLUGIN_SKIP_VERIFY
DRONE_REGISTRY_PLUGIN_TOKEN

Furthermore the old and the new runner use the same drone-go library for integrating with registry plugins, and there have been no breaking changes to this library.

Therefore the only suggestion I have is to enable trace logging for your runner and inspect the runner logs. The runner outputs trace logs when it makes requests to the plugin and includes detailed information that can help with debugging [2].

[1] https://github.com/drone-runners/drone-runner-docker/blob/master/command/daemon/config.go#L113:L115
[2] https://github.com/drone/runner-go/blob/master/registry/external.go#L49:L64

I can confirm that this issue exists. I tried to get to the bottom of it, but couldn’t. Only rolling back autoscaler helped.

My findings:
both legacy and new syntax for environment variables in autoscaler don’t work.
The debug\trace log on the agent shows this:

time="2020-01-30T23:57:50Z" level=debug msg="stage received" stage.id=22086 stage.name="PR: Generate diff of actions" stage.number=1 thread=2
time="2020-01-30T23:57:50Z" level=debug msg="stage accepted" stage.id=22086 stage.name="PR: Generate diff of actions" stage.number=1 thread=2
time="2020-01-30T23:57:51Z" level=debug msg="stage details fetched" build.id=13816 build.number=41 repo.id=232 repo.name=kops repo.namespace=movio stage.id=22086 stage.name="PR: Generate diff of actions" stage.number=1 thread=2
time="2020-01-30T23:57:51Z" level=trace msg="registry: external: credential list returned" thread=2
time="2020-01-30T23:57:51Z" level=debug msg="updated stage to running" build.id=13816 build.number=41 repo.id=232 repo.name=kops repo.namespace=movio stage.id=22086 stage.name="PR: Generate diff of actions" stage.number=1 thread=2
time="2020-01-30T23:57:59Z" level=debug msg="updated stage to complete" build.id=13816 build.number=41 repo.id=232 repo.name=kops repo.namespace=movio stage.id=22086 stage.name="PR: Generate diff of actions" stage.number=1 thread=2
time="2020-01-30T23:57:59Z" level=debug msg="poller: request stage from remote server" thread=2
time="2020-01-30T23:57:59Z" level=trace msg="http: context canceled"
time="2020-01-30T23:57:59Z" level=debug msg="done listening for cancellations" build.id=13816 build.number=41 repo.id=232 repo.name=kops repo.namespace=movio stage.id=22086 stage.name="PR: Generate diff of actions" stage.number=1 thread=2

Notice the registry: external: credential list returned - looks like the agent receives the credentials, but then fails to forward them to docker daemon on pull command. Here’s docker’s debug output:

Jan 31 02:56:11 ip-10-110-20-109 dockerd[4320]: time="2020-01-31T02:56:11.373395117Z" level=debug msg="hostDir: /etc/docker/certs.d/*REDACTED*.dkr.ecr.ap-southeast-2.amazonaws.com"
Jan 31 02:56:11 ip-10-110-20-109 dockerd[4320]: time="2020-01-31T02:56:11.373432084Z" level=debug msg="Trying to pull *REDACTED*.dkr.ecr.ap-southeast-2.amazonaws.com/devops-drone-images from https://*REDACTED*.dkr.ecr.ap-southeast-2.amazonaws.com v2"
Jan 31 02:56:11 ip-10-110-20-109 dockerd[4320]: time="2020-01-31T02:56:11.379371365Z" level=info msg="Attempting next endpoint for pull after error: Get https://*REDACTED*.dkr.ecr.ap-southeast-2.amazonaws.com/v2/devops-drone-images/manifests/terraform-assume-latest: no basic auth credentials"
Jan 31 02:56:11 ip-10-110-20-109 dockerd[4320]: time="2020-01-31T02:56:11.379441515Z" level=error msg="Handler for POST /v1.33/images/create returned error: Get https://*REDACTED*.dkr.ecr.ap-southeast-2.amazonaws.com/v2/devops-drone-images/manifests/terraform-assume-latest: no basic auth credentials"

No basic auth credentials.

If I try to run docker login with the credentials from the plugin (I can get them by running drone plugins registry list), then docker logins successfully. Only login from the drone’s docker client fails.

@ashwilliams1

I setup a quick test environment and I was able to use a registry credential plugin with the new docker runner without any issues. I tested with the drone/drone-registry-plugin plugin.

Test Environment

Here is my configuration in docker-compose format:

version: "3.7"
services:
  server:
    image: drone/drone:linux-amd64
    ...

  runner:
    image: drone/drone-runner-docker:linux-amd64
    environment:
    - DRONE_RPC_HOST=server
    - DRONE_RPC_SECRET=XXX
    - DRONE_REGISTRY_PLUGIN_ENDPOINT=http://registry:3000
    - DRONE_REGISTRY_PLUGIN_SECRET=XXX
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock

  registry:
    image: drone/registry-plugin
    environment:
    - DRONE_SECRET=xxx
    - DRONE_CONFIG_FILE=/opt/auths.yml
    volumes:
    - ./auths.yml:/opt/auths.yml

Test Yaml

Here is my pipeline configuration. The last step in the yaml uses a private image:

kind: pipeline
type: docker
name: test

clone:
  disable: true
  disabled: true

steps:
- name: a
  image: alpine:3.8
  commands:
  - sleep 5
  - echo bar

- name: b
  image: alpine:3.8
  commands:
  - sleep 5
  - echo bar

- name: c
  image: bradrydzewski/private-test:latest
  commands:
  - echo baz

Test Registry File

Here is the test registry file that was passed to the registry plugin:

- address: docker.io
  username: bradrydzewski
  password: XXX

Preparation

Before running any tests I cleared my local docker cache to ensure the private image was removed.

$ docker rmi bradrydzewski/private-test:latest
Untagged: bradrydzewski/private-test:latest
Untagged: bradrydzewski/private-test@sha256:0873c923e00e0fd2ba78041bfb64a105e1ecb7678916d1f7776311e45bf5634b

Testing Without the Plugin

First, I decided to test with the plugin disabled, so I commented out the registry configuration. The expected results of this test are that the final step will fail with an error due to problems pulling the image.

  runner:
    image: drone/drone-runner-docker:linux-amd64
    environment:
    - DRONE_RPC_HOST=server
    - DRONE_RPC_SECRET=XXX
-   - DRONE_REGISTRY_PLUGIN_ENDPOINT=http://registry:3000
-   - DRONE_REGISTRY_PLUGIN_SECRET=XXX
    volumes:
    - /var/run/docker.sock:/var/run/docker.sock

I ran the build and the step failed with an error, as expected:

Testing With the Plugin

Next, I re-enabled the registry plugin in the configuration. This time, the expected result is that the system will fetch the credentials from the plugin and use those credentials to pull the image.

I ran the build and the step was successful.

I saw the following log output indicating the credentials were sourced from the plugin:

runner_1    | time="2020-01-31T05:24:39Z" level=trace msg="registry: external: credential list returned" thread=2

Conclusion

I was able to successfully use registry plugins with the new docker runner. I do not have enough data available (at this time) to conclude there are compatibility issues or regressions, however, we can definitely continue debugging and analyze any additional information that you can provide.

I’m just not sure what additional info can I provide…

@mtb-xt here are some items that would be helpful:

  1. your autoscaler configuration since both yourself and the op mention the autoscaler, and since you mentioned you did not think the environment variables were working
  2. your registry plugin configuration (including name and version of plugin being used)
  3. the registries returned from “drone plugins registry list”
  4. a copy of your yaml so we can compare the image name with the registry address

the reason 3 and 4 are important is because Drone matches the image name with the hostname of the registry. Assuming the registry credentials are returned from the plugin, the only explanation (that I can think of) for them not being used to pull the image would be a matching issue.

edit I pushed an updated runner image (:latest) that includes some new trace logging entries that will print a list of registries returned from the plugin (more details here). It would be helpful to see these trace logs. We want to rule out the possibility that the plugin returned a list that did not include the expected results.

@bradrydzewski, thanks for helping us out.

Here are the details:

  • Autoscaler configuration:
ExecStart=/bin/bash -c '/usr/local/bin/summon -p summon-aws-secrets -f /etc/drone/secrets.yml /usr/bin/docker run --name drone-autoscaler \
                               --rm \
                               -p 8080:8080 \
                               --env-file @SUMMONENVFILE \
                               --env-file /etc/drone/drone-agent-ami.env \
                               --volume=/var/lib/autoscaler:/data \
                               --env=DRONE_SERVER_HOST=drone.tools.prod.movio.co \
                               --env=DRONE_SERVER_PROTO=https \
                               --env=DRONE_POOL_MIN=3  \
                               --env=DRONE_POOL_MAX=12 \
                               --env=DRONE_AGENT_CONCURRENCY=2 \
                               --env=DRONE_INTERVAL=30s \
                               --env=DRONE_LOGS_DEBUG=true \
                               --env=DRONE_AMAZON_INSTANCE=m5d.xlarge \
                               --env=DRONE_AMAZON_TAGS="Name:drone-agent,Squad:devops,dibs/team:devops" \
                               --env=DRONE_AMAZON_REGION=ap-southeast-2  \
                               --env=DRONE_AMAZON_RETRIES=10  \
                               --env=DRONE_AMAZON_SUBNET_ID=subnet-0231e30d447dece79 \
                               --env=DRONE_AMAZON_SECURITY_GROUP=sg-00206da902bd4bd8c \
                               --env=DRONE_AMAZON_SSHKEY=movio-tools-ap-southeast-2.pem \
                               --env=DRONE_ENABLE_REAPER=true  \
                               --env=DRONE_REAPER_ENABLED=true \
                               --env=DRONE_REAPER_INTERVAL=10m \
                               --env=DRONE_ENABLE_PINGER=true  \
                               --env=DRONE_AMAZON_PRIVATE_IP=true \
                               --env=DRONE_AMAZON_IAM_PROFILE_ARN=arn:aws:iam::864091978270:instance-profile/drone-agent.tools.prod.movio.co \
                               --env=AWS_IAM=true \
                               --env=DRONE_SLACK_WEBHOOK=https://hooks.slack.com/services/*REDACTED*/*REDACTED*/*REDACTED* \
                               --env=DRONE_HTTP_HOST=drone.tools.prod.movio.co  \
                               --env=DRONE_HTTP_PROTO=https \
                               --env=DRONE_AMAZON_VOLUME_SIZE=55 \
                               --env=DRONE_AGENT_ENVIRON=DRONE_LOGS_DEBUG=true,DRONE_LOGS_PRETTY=true,DRONE_LOGS_TRACE=true,DRONE_SECRET_PLUGIN_ENDPOINT=https://drone.tools.prod.movio.co,DRONE_SECRET_PLUGIN_TOKEN=*VERYSECRET*,DRONE_REGISTRY_ENDPOINT="https://drone.tools.prod.movio.co:3200",DRONE_REGISTRY_SECRET="*VERYSECRET*",DRONE_REGISTRY_VERIFY="false" \
                               --env=DRONE_DATABASE_DRIVER=mysql \
                               drone/autoscaler:1.4'

The same config was used with 1.6.

  • Registry plugin:
    We’re using https://github.com/davidbyttow/drone-ecr-registry-plugin, unfortunately, it’s not published in dockerhub, so we’re running it from our own ECR. Authentication on drone master is done via aws ecr authentication helper.

The code is EXACTLY the same as the code from the repository.

The config for the plugin:

ExecStart=/bin/bash -c '/usr/local/bin/summon -p summon-aws-secrets -f /etc/drone/secrets.yml /usr/bin/docker run --name=drone-plugin-ecr-registry \
                              --rm \
                              --volume=/var/run/docker.sock:/var/run/docker.sock \
                              --env-file @SUMMONENVFILE \
                              --publish=3200:3000 \
                              --env=PLUGIN_DEBUG=true \
                              --env=ECR_REGISTRY_LIST=191213556404.dkr.ecr.us-east-1.amazonaws.com,864091978270.dkr.ecr.ap-southeast-2.amazonaws.com,016843415717.dkr.ecr.us-east-1.amazonaws.com,649264355054.dkr.ecr.eu-central-1.amazonaws.com,650454640207.dkr.ecr.ap-southeast-2.amazonaws.com \
                              fa97908f1674'
#                             864091978270.dkr.ecr.ap-southeast-2.amazonaws.com/drone-ecr-registry-plugin:latest'

  • Registries from plugin list:
 $  drone plugins registry list
https://191213556404.dkr.ecr.us-east-1.amazonaws.com 
Username:  AWS
Password: ಠ_ಠ

https://864091978270.dkr.ecr.ap-southeast-2.amazonaws.com 
Username:  AWS
Password: ಠ_ಠ

https://016843415717.dkr.ecr.us-east-1.amazonaws.com 
Username:  AWS
Password: ಠ_ಠ

https://649264355054.dkr.ecr.eu-central-1.amazonaws.com 
Username:  AWS
Password: ಠ_ಠ

https://650454640207.dkr.ecr.ap-southeast-2.amazonaws.com 
Username:  AWS
Password: ಠ_ಠ
  • .drone.yml
{
   "kind": "pipeline",
   "name": "PR: Generate diff of actions",
   "steps": [
      {
         "commands": [
            "kubectl config set-cluster k8s.stage.movio.co --server=https://api.k8s.stage.movio.co",
            "kubectl config set-credentials k8s.stage.movio.co-aws --exec-command=/bin/aws-iam-authenticator --exec-api-version=client.authentication.k8s.io/v1alpha1 --exec-arg=token --exec-arg=-i --exec-arg=k8s.stage.movio.co --exec-arg=-r --exec-arg=arn:aws:iam::225711623517:role/DevopsDroneRole",
            "kubectl config set-context k8s.stage.movio.co --cluster=k8s.stage.movio.co --user=k8s.stage.movio.co-aws",
            "kubectl config use-context k8s.stage.movio.co",
            "kubectl config view",
            "kubectl cluster-info",
            "cd components && helmfile diff"
         ],
         "depends_on": [
            "clone"
         ],
         "image": "864091978270.dkr.ecr.ap-southeast-2.amazonaws.com/devops-drone-images:helmfiles-plugin-latest",
         "name": "PR: Generate diff of actions for stage cluster.",
         "pull": "always"
      }   ],
   "trigger": {
      "event": [
         "pull_request"
      ]
   }
}
---
{
   "kind": "pipeline",
   "name": "Deploy changes",
   "steps": [
      {
         "commands": [
            "kubectl config set-cluster k8s.stage.movio.co --server=https://api.k8s.stage.movio.co",
            "kubectl config set-credentials k8s.stage.movio.co-aws --exec-command=/bin/aws-iam-authenticator --exec-api-version=client.authentication.k8s.io/v1alpha1 --exec-arg=token --exec-arg=-i --exec-arg=k8s.stage.movio.co --exec-arg=-r --exec-arg=arn:aws:iam::225711623517:role/DevopsDroneRole",
            "kubectl config set-context k8s.stage.movio.co --cluster=k8s.stage.movio.co --user=k8s.stage.movio.co-aws",
            "kubectl config use-context k8s.stage.movio.co",
            "kubectl config view",
            "kubectl cluster-info",
            "cd components && helmfile sync"
         ],
         "depends_on": [
            "clone"
         ],
         "image": "864091978270.dkr.ecr.ap-southeast-2.amazonaws.com/devops-drone-images:helmfiles-plugin-latest",
         "name": "Deploy helmfile changes to stage cluster.",
         "pull": "always"
      }
   ],
   "trigger": {
      "event": [
         "promote"
      ],
      "target": [
         "production"
      ]
   }
}

.

I’ve updated our autoscaler to use :latest runner, and it started failing again. Here’s the debug log

time="2020-01-31T08:44:57Z" level=trace msg="http: no content returned: re-connect and re-try"
time="2020-01-31T08:44:57Z" level=trace msg="http: no content returned: re-connect and re-try"
time="2020-01-31T08:45:21Z" level=debug msg="stage received" stage.id=22158 stage.name="Deploy changes" stage.number=1 thread=2
time="2020-01-31T08:45:21Z" level=debug msg="stage accepted" stage.id=22158 stage.name="Deploy changes" stage.number=1 thread=2
time="2020-01-31T08:45:23Z" level=debug msg="stage details fetched" build.id=13888 build.number=50 repo.id=232 repo.name=kops repo.namespace=movio stage.id=22158 stage.name="Deploy changes" stage.number=1 thread=2
time="2020-01-31T08:45:23Z" level=trace msg="registry: external: received credentials" address="https://191213556404.dkr.ecr.us-east-1.amazonaws.com" thread=2 username=AWS
time="2020-01-31T08:45:23Z" level=trace msg="registry: external: received credentials" address="https://864091978270.dkr.ecr.ap-southeast-2.amazonaws.com" thread=2 username=AWS
time="2020-01-31T08:45:23Z" level=trace msg="registry: external: received credentials" address="https://016843415717.dkr.ecr.us-east-1.amazonaws.com" thread=2 username=AWS
time="2020-01-31T08:45:23Z" level=trace msg="registry: external: received credentials" address="https://649264355054.dkr.ecr.eu-central-1.amazonaws.com" thread=2 username=AWS
time="2020-01-31T08:45:23Z" level=trace msg="registry: external: received credentials" address="https://650454640207.dkr.ecr.ap-southeast-2.amazonaws.com" thread=2 username=AWS
time="2020-01-31T08:45:23Z" level=debug msg="updated stage to running" build.id=13888 build.number=50 repo.id=232 repo.name=kops repo.namespace=movio stage.id=22158 stage.name="Deploy changes" stage.number=1 thread=2




time="2020-01-31T08:45:37Z" level=trace msg="http: no content returned: re-connect and re-try"
time="2020-01-31T08:45:39Z" level=debug msg="updated stage to complete" build.id=13888 build.number=50 repo.id=232 repo.name=kops repo.namespace=movio stage.id=22158 stage.name="Deploy changes" stage.number=1 thread=2
time="2020-01-31T08:45:39Z" level=debug msg="poller: request stage from remote server" thread=2
time="2020-01-31T08:45:39Z" level=trace msg="http: context canceled"
time="2020-01-31T08:45:39Z" level=debug msg="done listening for cancellations" build.id=13888 build.number=50 repo.id=232 repo.name=kops repo.namespace=movio stage.id=22158 stage.name="Deploy changes" stage.number=1 thread=2

Thanks for the additional information. I will run some tests and see what I can figure out.

We’re using https://github.com/davidbyttow/drone-ecr-registry-plugin , unfortunately, it’s not published in dockerhub, so we’re running it from our own ECR.

In the meantime, have you considered using GitHub - drone/drone-registry-plugin: Registry extension for backward compatibility with 0.8 global registry credentials. ? This plugin supports ECR and we have a number of customers using this in production. We also publish a Docker image so you don’t have to build your own.

I think we found the problem, so thanks for providing the additional details.

The runner expects the registry plugin to return a list of hostnames, as opposed to a list of fully qualified urls. This would explain why drone/drone-registry-plugin is working but davidbyttow/drone-ecr-registry-plugin is not. The https:// prefix in the hostname was causing the matching function to fail. We patched the runner to strip the protocol before matching the image, and published a new minor release.

1 Like

I’ve updated the autoscaler to use the latest docker runner and it seems to be working.

We haven’t tried https://github.com/drone/drone-registry-plugin because it seems to require us to use an API key and secret key, thus we will be storing credentials.
With that other ECR plugin, we can authenticate to ECR with an IAM instance profile.

Glad to hear the patch worked. In terms of the drone-registry-plugin, it should fallback to IAM if the API key and secret are empty. When a nil credential object is passed the AWS client it attempts to authenticate using multiple methods.

1 Like