Runner-kube no logs

Hi! I’m using Drone on K8S and I realized that while jibs are running under runner-kube they don’t submit steps logs. All I see is “1” in the log section.
What should I do in order to see logs?


Here is my runner deployment file:

runner

apiVersion: apps/v1
kind: Deployment
metadata:
name: drone
namespace: ci
spec:
selector:
matchLabels:
app: drone
replicas: 1
template:
metadata:
labels:
app: drone
spec:
containers:
- image: drone/drone-runner-kube:latest
name: runner
resources:
requests:
cpu: 100m
memory: 64Mi
ports:
- containerPort: 3000
env:
- name: DRONE_RPC_HOST
value: drone-service
# issue: client is newer than server (client API version: 1.26, server API version: 1.24) · Issue #2048 · harness/gitness · GitHub
- name: DOCKER_API_VERSION
value: “1.24”
- name: DRONE_RPC_SECRET
valueFrom:
secretKeyRef:
name: drone-secrets
key: server.secret
- name: DRONE_MAX_PROCS
valueFrom:
configMapKeyRef:
name: drone-config
key: agent.max.procs
- name: DRONE_NAMESPACE_DEFAULT
valueFrom:
configMapKeyRef:
name: drone-config
key: agent.namespace
- name: DRONE_DEBUG
valueFrom:
configMapKeyRef:
name: drone-config
key: agent.debug
- name: DRONE_TRACE
valueFrom:
configMapKeyRef:
name: drone-config
key: agent.trace
- name: DRONE_RPC_DUMP_HTTP
valueFrom:
configMapKeyRef:
name: drone-config
key: agent.trace
- name: DRONE_RPC_DUMP_HTTP_BODY
valueFrom:
configMapKeyRef:
name: drone-config
key: agent.trace
- name: DRONE_UI_USERNAME
valueFrom:
configMapKeyRef:
name: drone-config
key: agent.username
- name: DRONE_UI_PASSWORD
valueFrom:
secretKeyRef:
name: drone-secrets
key: agent.password

We have not been able to reproduce issues with logging, and this does not appear to be impacting the majority of our users, although one other person did indicate they had logging issues. When we cannot reproduce we ask individuals to dive into the code to help identify the issue, and if possible, submit a patch to resolve. You can find some more details about how to test the runner in this thread:

http://discuss.harness.io/t/contributing-to-drone-for-kubernetes/3159

I experience the same issue. It seems the runner is not collecting the logs, or not posting the logs to the controller. After collecting the log files, i reverted to the previous configuration and the collection of the logging works.

I expect the API call /rpc/v2/step/938/logs/upload containing the log lines, however the collection is not working. The only information passed on to the controller is “[{“pos”:0,“out”:”“,“time”:6}]” this is the same for all steps.

I’ve studied the debug article and inspected the runner source code, but i couldn’t find the part of the code that retrieves the log from the pod. I hope someone with knowledge of the runner can give us some guidelines on how to proceed, as i would really like to have the runner up and running :slight_smile:

Below i’ve posted:

  1. the drone pipeline
  2. The output of the controller
  3. Some output of the runner

I’ve created a .drone.yaml file with a few steps to test:

kind: pipeline
type: kubernetes
name: default

platform:
  os: linux
  arch: arm64

clone:
  disable: true

steps:
- name: greeting
  image: alpine
  commands:
  - echo hello
  - echo world

- name: foo
  image: alpine
  commands:
  - touch hello.txt

- name: bar
  image: alpine
  commands:
  - ls hello.txt

- name: clone-google-kaniko
  image: drone/git
  commands: 
  - git clone https://github.com/GoogleContainerTools/kaniko /go/src/github.com/GoogleContainerTools/kaniko

Output of the controller

time="2020-01-12T21:52:41Z" level=debug fields.time="2020-01-12T21:52:41Z" latency=8.61918ms method=GET remote="10.42.2.1:56434" request=/ request-id=1WJRxzSikVGxANSqoayxJpWGRPb
time="2020-01-12T21:52:41Z" level=debug msg="manager: request queue item" arch= kernel= kind=pipeline os= type=kubernetes variant=
time="2020-01-12T21:52:42Z" level=debug msg="webhook parsed" commit=994c6f55d5c0c9cf716f2c9049a8bcd83a79e50f event=push name=drone-test namespace=Rick-Jongbloed
time="2020-01-12T21:52:42Z" level=debug msg="trigger: received" commit=994c6f55d5c0c9cf716f2c9049a8bcd83a79e50f event=push ref=refs/heads/master repo=Rick-Jongbloed/drone-test
time="2020-01-12T21:52:42Z" level=debug fields.time="2020-01-12T21:52:42Z" latency=637.907431ms method=POST remote="10.42.0.35:48966" request=/hook request-id=1WJRy5BjtYXLDgZwl6Axho9zJxw
time="2020-01-12T21:52:42Z" level=debug msg="manager: request queue item" arch= kernel= kind=pipeline os= type=kubernetes variant=
time="2020-01-12T21:52:42Z" level=debug msg="manager: accept stage" machine=drone-runner-arm stage-id=298
time="2020-01-12T21:52:42Z" level=debug msg="manager: accept stage" machine=drone-runner-arm stage-id=298
time="2020-01-12T21:52:42Z" level=debug msg="manager: request queue item" arch= kernel= kind=pipeline os= type=kubernetes variant=
time="2020-01-12T21:52:42Z" level=debug msg="manager: stage accepted" machine=drone-runner-arm stage-id=298
time="2020-01-12T21:52:42Z" level=debug msg="manager: fetching stage details" step-id=298
time="2020-01-12T21:52:42Z" level=debug msg="manager: stage processed by another agent" error="Optimistic Lock Error" machine=drone-runner-arm stage-id=298
time="2020-01-12T21:52:43Z" level=debug msg="manager: updating step status" step.id=938 step.name=greeting step.status=running
time="2020-01-12T21:52:51Z" level=debug fields.time="2020-01-12T21:52:51Z" latency=10.049557ms method=GET remote="10.42.2.1:56438" request=/ request-id=1WJRzKam6xAZSiNvYPJgzWhkVEd
time="2020-01-12T21:53:01Z" level=debug fields.time="2020-01-12T21:53:01Z" latency=9.669838ms method=GET remote="10.42.2.1:56440" request=/ request-id=1WJS0XmTcEcbdqMXvpjWRovYphK
time="2020-01-12T21:53:04Z" level=debug msg="manager: updating step status" step.id=938 step.name=greeting step.status=success
time="2020-01-12T21:53:04Z" level=debug msg="manager: updating step status" step.id=939 step.name=foo step.status=running
time="2020-01-12T21:53:11Z" level=debug fields.time="2020-01-12T21:53:11Z" latency=12.448756ms method=GET remote="10.42.2.1:56442" request=/ request-id=1WJS1lsbQJN8gF1qjmL6yX3vrmR
time="2020-01-12T21:53:12Z" level=debug msg="manager: context canceled" arch= kernel= kind=pipeline os= type=kubernetes variant=
time="2020-01-12T21:53:14Z" level=debug msg="manager: updating step status" step.id=939 step.name=foo step.status=success
time="2020-01-12T21:53:14Z" level=debug msg="manager: updating step status" step.id=940 step.name=bar step.status=running
time="2020-01-12T21:53:20Z" level=debug msg="manager: updating step status" step.id=940 step.name=bar step.status=success
time="2020-01-12T21:53:20Z" level=debug msg="manager: updating step status" step.id=941 step.name=clone-google-kaniko step.status=running
time="2020-01-12T21:53:21Z" level=debug fields.time="2020-01-12T21:53:21Z" latency=9.803758ms method=GET remote="10.42.2.1:56446" request=/ request-id=1WJS2zScvQhrLVVTOnlNJeOlFZg
time="2020-01-12T21:53:22Z" level=debug msg="manager: request queue item" arch= kernel= kind=pipeline os= type=kubernetes variant=
time="2020-01-12T21:53:31Z" level=debug fields.time="2020-01-12T21:53:31Z" latency=8.488093ms method=GET remote="10.42.2.1:56448" request=/ request-id=1WJS4Hh5xMphfp8ZtruxycIv4An
time="2020-01-12T21:53:37Z" level=debug msg="manager: updating step status" step.id=941 step.name=clone-google-kaniko step.status=success
time="2020-01-12T21:53:38Z" level=debug msg="manager: stage is complete. teardown" stage.id=298
time="2020-01-12T21:53:38Z" level=debug msg="manager: build is finished, teardown" build.id=307 build.number=12 repo.id=30 stage.id=298
time="2020-01-12T21:53:41Z" level=debug fields.time="2020-01-12T21:53:41Z" latency=9.73309ms method=GET remote="10.42.2.1:56450" request=/ request-id=1WJS5WLoXgWsLgjf73yU2juipca

A selection of the output of the runner:(from running to posting the logs of stage 1)

{"id":298,"repo_id":0,"build_id":307,"number":1,"name":"default","kind":"pipeline","type":"kubernetes","status":"running","errignore":false,"exit_code":0,"machine":"drone-runner-arm","os":"linux","arch":"arm64","started":1578865963,"stopped":0,"created":1578865962,"updated":1578865963,"version":3,"on_success":true,"on_failure":false,"steps":[{"id":938,"step_id":298,"number":1,"name":"greeting","status":"pending","exit_code":0,"version":1},{"id":939,"step_id":298,"number":2,"name":"foo","status":"pending","exit_code":0,"version":1},{"id":940,"step_id":298,"number":3,"name":"bar","status":"pending","exit_code":0,"version":1},{"id":941,"step_id":298,"number":4,"name":"clone-google-kaniko","status":"pending","exit_code":0,"version":1}]}
time="2020-01-12T21:52:43Z" level=debug msg="updated stage to running" build.id=307 build.number=12 repo.id=30 repo.name=drone-test repo.namespace=Rick-Jongbloed stage.id=298 stage.name=default stage.number=1 thread=4
PUT /rpc/v2/step/938 HTTP/1.1
Host: drone-controller
User-Agent: Go-http-client/1.1
Content-Length: 120
X-Drone-Token: 33eb94a450d8075023820bab4177d951
Accept-Encoding: gzip

{"id":938,"step_id":298,"number":1,"name":"greeting","status":"running","exit_code":0,"started":1578865963,"version":1}
HTTP/1.1 200 OK
Content-Length: 120
Cache-Control: no-cache, no-store, must-revalidate, private, max-age=0
Content-Type: text/plain; charset=utf-8
Date: Sun, 12 Jan 2020 21:52:43 GMT
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Pragma: no-cache
X-Accel-Expires: 0

{"id":938,"step_id":298,"number":1,"name":"greeting","status":"running","exit_code":0,"started":1578865963,"version":2}
POST /rpc/v2/step/938/logs/batch HTTP/1.1
Host: drone-controller
User-Agent: Go-http-client/1.1
Content-Length: 30
X-Drone-Token: 33eb94a450d8075023820bab4177d951
Accept-Encoding: gzip

[{"pos":0,"out":"","time":6}]
HTTP/1.1 200 OK
Content-Length: 0
Cache-Control: no-cache, no-store, must-revalidate, private, max-age=0
Date: Sun, 12 Jan 2020 21:52:51 GMT
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Pragma: no-cache
X-Accel-Expires: 0

POST /rpc/v2/step/938/logs/upload HTTP/1.1
Host: drone-controller
User-Agent: Go-http-client/1.1
Content-Length: 30
X-Drone-Token: 33eb94a450d8075023820bab4177d951
Accept-Encoding: gzip

[{"pos":0,"out":"","time":6}]
HTTP/1.1 200 OK
Content-Length: 0
Cache-Control: no-cache, no-store, must-revalidate, private, max-age=0
Date: Sun, 12 Jan 2020 21:53:04 GMT
Expires: Thu, 01 Jan 1970 00:00:00 UTC
Pragma: no-cache
X-Accel-Expires: 0

PUT /rpc/v2/step/938 HTTP/1.1
Host: drone-controller
User-Agent: Go-http-client/1.1
Content-Length: 141
X-Drone-Token: 33eb94a450d8075023820bab4177d951
Accept-Encoding: gzip

@Rick-Jongbloed the controller is used by the legacy kubernetes integration that was deprecated some time ago. People sometimes end up installing the deprecated kubernetes integration by using an unofficial installation method, like helm, which is not kept up to date and defaults to deprecated features.

I recommend installing the kubernetes runner using the official installation instructions, or using a forked helm chart that is being more actively maintained by some of our community members. https://github.com/HighwayofLife/helm-charts-drone

Thanks for your reply. I do not use Helm and used the official documentation to setup the runner. I created the deployment based on the official documentation.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: drone-runner
  labels:
    app.kubernetes.io/name: drone
    component: runner
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/name: drone
      component: runner
  template:
    metadata:
      labels:
        app.kubernetes.io/name: drone
        component: runner
    spec:
      nodeSelector:
        kubernetes.io/arch: arm
        device: RaspberryPi4
      serviceAccountName: drone
      containers:
      - name: runner
        image: drone/drone-runner-kube:latest
        imagePullPolicy: Always
        envFrom:
        - configMapRef:
            name: drone-runner-arm-config
        ports:
        - name: management
          containerPort: 3000
          protocol: TCP

And the configmap:

apiVersion: v1
kind: ConfigMap
metadata:
  name: drone-runner-arm-config
data:
  DRONE_RUNNER_NAME: "drone-runner-arm"
  DRONE_RPC_HOST: drone-controller
  DRONE_RPC_PROTO: http
  DRONE_RPC_SECRET: ------------------
  DRONE_TRACE: "true"
  DRONE_DEBUG: "true"
  DRONE_RPC_DUMP_HTTP: "true"
  DRONE_RPC_DUMP_HTTP_BODY: "true"
  DRONE_UI_USERNAME: "----"
  DRONE_UI_PASSWORD: "----"
  DRONE_HTTP_BIND: ":3000"

sorry, I was thrown off by the term controller which has a special meaning, as it refers to the legacy implementation and the legacy drone/controller image. In the context of this thread, however, it looks like controller is referring to the server.

For debugging purposes, you can find the logging code here: https://github.com/drone-runners/drone-runner-kube/blob/master/engine/engine_impl.go#L224:L244

Thanks, i’ll dive in deeper and try to extend the logging on this function.

This is also happening to me - I’ll see if I can figure something out over the next few days, unless you crack it for us @Rick-Jongbloed !

I just want to confirm that I cloned the latest commit:

commit 335e9ef11947520674aba977b5f4a5879f12555f (HEAD -> master, origin/master)
Merge: 7932592 926c682
Author: Brad Rydzewski <brad.rydzewski@gmail.com>
Date:   Thu Jan 2 13:33:09 2020 -0500

    Merge pull request #19 from sethpollack/VersionedParams
    
    Fix logs

I compiled the binary:

go build

I created the following yaml for testing purposes:

kind: pipeline
type: kubernetes
name: default

clone:
  disable: true

steps:
- name: test
  image: alpine:3.10
  commands:
  - echo hello
  - sleep 5
  - echo world
  - sleep 5
  - echo goodbye
  - sleep 5
  - echo world

I ran the following command to execute the pipeline:

./drone-runner-kube exec --kubeconfig=${HOME}/.kube/config

and received the following output:

[test:1] + echo hello
[test:2] hello
[test:3] + sleep 5
[test:4] + echo world
[test:5] world
[test:6] + sleep 5
[test:7] + echo goodbye
[test:8] goodbye
[test:9] + sleep 5
[test:10] + echo world
[test:11] world
[test:12] 

based on my testing I cannot reproduce any issues with logging. I wish I could provide more assistance, but I cannot fix what I cannot reproduce, so I will need to rely on those that can reproduce to dig into the code and publish their findings.

I have the problem as all the others. No errors are logged by drone server or kube runner.

Currently I suspect that the ARM64 image is somehow broken since @lnattrass showed metadata that points to ARM64 and I am running ARM64 as well. @bradrydzewski you tried to verify using amd64 I guess.

Logging is neither visible in the kube runner UI nor in the drone server pipeline view. The actual build pods are logging to stdout.

Should the output be visible in stdout of the kube-runner pod as well?

@bradrydzewski I’ll test with the command you’ve posted and report back. I wonder if it’s kubernetes version related (I’m on 1.17)

@evrflx For what it’s worth, I’m on the amd64 platform

Further investigating, I’ve managed to reproduce it by executing the kubectl logs command against the pod at the exact time it becomes “Ready”.

You can see that I get no response, not even an error. This looks like a race condition in Kubernetes so I’ll see if I can find something over there…

See below, the first command is run when the pod has become “Ready”, the following run immediately after, which returns the expected logs response:

$ kubectl logs -f -n default -c drone-j19473utzj5fwoxm0v91 drone-24wz7rcis6gtmnl8ldv3
$ 
$ kubectl logs -f -n default -c drone-j19473utzj5fwoxm0v91 drone-24wz7rcis6gtmnl8ldv3
+ echo hello
hello
+ sleep 5
^C
$
1 Like

Ok, I’ve made a PR that may address the issue we’re seeing:

2 Likes

I am running kubernetes version 14.6 and am not seeing any issues with logging. Is it possible there is a regression in Kubernetes? or perhaps an issue with the specific distribution you are running?

@lnattrass thanks for taking the time to research and send a patch. I applied the patch and can confirm that everything is working as expected (everything was previously working for me, but this confirms there was no regression :slight_smile:). A new drone-runner-kube image is available so hopefully folks can pull the latest image and give it a try.

2 Likes

Thanks all, i can confirm it’s working now!

I can verify the solution. Just tested that with tag drone-runner-kube:linux-amd64