Using exit code 78 with Kubernetes runner doesn't stop execution

webvictim · August 26, 2020, 1:54pm

When using the Kubernetes runner, it seems that if you use exit code 78 to exit a pipeline early (as described here), the pipeline doesn’t actually stop running and exit early at all. Here’s some screenshots with timings:

~11 minutes is the normal time for this pipeline to run when steps aren’t being skipped. It seems that the total time for the pipeline to complete is exactly the same, regardless of whether the pipeline exits early with exit code 78 or whether steps run to completion.

My expectation would be that using exit code 78 to exit early would exit the pipeline with success immediately and prevent any future steps from being executed.

My reason for using it here is to make the CI process quicker by not running expensive tests if no changes have been made to the code which require them to run - unfortunately this doesn’t seem to currently be achievable with Drone. This looks like a bug.

bradrydzewski · August 26, 2020, 2:18pm

I ran a quick test and the early exit functionality worked as expected:

Are you using the latest release of the Kubernetes runner? If not, you should make sure you are using the latest release. https://docs.drone.io/runner/kubernetes/installation/

You can find the relevant code responsible for skipping the remaining steps here:

https://github.com/drone/runner-go/blob/98e945f20c7ba4fe059020559679c61465cfb883/pipeline/runtime/execer.go#L207:L219

Note that if you are running steps in parallel, the system will skip subsequent steps, but in-progress steps will be allowed to complete.

webvictim · August 26, 2020, 4:42pm

Thanks for the reply.

Looks like I was on drone/drone-runner-kube:1.0.0-beta.1, I’ve updated to drone/drone-runner-kube:1.0.0-beta.4 (which is the latest from the Helm chart) but the issue persists.

I think this is related to the use of a services: entry. I can reproduce the issue with a simple variation on the .drone.yml you used:

---
kind: pipeline
type: kubernetes
name: test

steps:
  - name: test
    image: alpine:3.8
    commands:
      - echo hello
      - exit 78

  - name: test2
    image: alpine:3.8
    commands:
      - echo hello
      - sleep 60
      - echo world

services:
  - name: Start Docker
    image: docker:dind
    privileged: true
    volumes:
      - name: dockersock
        path: /var/run

volumes:
  - name: dockersock
    temp: {}

You can see from this screenshot that instead of exiting the whole pipeline early after test exits with code 78, the test2 step still runs and sleeps for the full 60 seconds before exiting. You can even see the output of the job in the log window.

My expected behaviour here would be that as soon as test2 is skipped, the service should shut down and the pipeline should end immediately without any subsequent steps running.

webvictim · October 15, 2020, 2:56pm

FYI I tried this test case again just now using drone/drone-runner-kube:latest and the same issue I reported previously still exists when using a services: declaration. It seems this particular bug wasn’t fixed by any changes to the execer.

As before, the steps are marked as skipped in the UI, but they still continue to run in the background and the entire pipeline does not return a success/failure status until all the subsequent steps have executed in full. It seems like the skip status is not being propagated correctly when a services: declaration is used.

Exit code 78 from test after 8 seconds, but the full pipeline still takes 1m12 to execute:
2020-10-15-114048__287x232__maim

Logs from docker:dind showing it doesn’t exit until 71 seconds (1m11) in:

bradrydzewski · October 15, 2020, 3:12pm

The execer code is shared by all runners. I think we can rule out issues with the execer code, since this code is shared by all runners, and we cannot reproduce with the docker runner [1][2]. This seems to imply the issue is isolated to the kubernetes runner. As the kubernetes runner is currently in beta, we are asking that individuals who can consistently reproduce issues consider sending pull requests for improvements. We provide a contributing guide here.

[1] https://cloud.drone.io/drone/hello-world/185
[2] https://github.com/drone/hello-world/blob/dbb02e1f07669b50e1739751b2348c7a4495d957/.drone.yml

webvictim · October 15, 2020, 4:37pm

I just tested with the Docker runner and it does exactly the same thing as the Kubernetes runner does - i.e. it fails to exit early on code 78 when you use a services: declaration.

I copied your test pipeline from hello-world/.drone.yml at dbb02e1f07669b50e1739751b2348c7a4495d957 · drone/hello-world · GitHub and just added a sleep 60 before echo goodbye:

kind: pipeline
name: build

steps:
- name: hello
  image: alpine:3.8
  commands:
  - echo hello

- name: world
  image: alpine:3.8
  commands:
  - echo world
  - exit 78

- name: goodbye
  image: alpine:3.8
  commands:
  - sleep 60
  - echo goodbye

services:
- name: redis
  image: redis

You can see in the UI that although the step says it was skipped, the pipeline is still running:

Then when the pipeline does finish, you can refresh and see the container’s output - proving it still ran despite exit 78:

Docker runner start command:

docker run -d \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -e DRONE_RPC_PROTO=https \
  -e DRONE_RPC_HOST=<host> \
  -e DRONE_RPC_SECRET=<secret> \
  -e DRONE_RUNNER_CAPACITY=1 \
  -e DRONE_RUNNER_NAME=${HOSTNAME} \
  -p 3000:3000 \
  --restart always \
  --name drone-runner \
  drone/drone-runner-docker:1

I think that your test pipeline steps ran too quickly to see the issue. If you add a sleep 60 into the second step, you should be able to reproduce it too. There is definitely an issue with exit 78 and services: when using both the Docker and Kubernetes runners.

bradrydzewski · October 15, 2020, 6:15pm

Thanks, I was able to hunt down the root cause, which was due to a flaw in how we calculated whether or not a step was cancelled. This issue only manifested when a pipeline services were running, although the root cause was not related to services themselves. The way we calculate and subsequently skip steps was improved, and the runners were updated accordingly. Download the :latest versions of the runner images to get this fix.

webvictim · October 15, 2020, 6:16pm

Thanks! I’ll try it out.

bradrydzewski · October 15, 2020, 6:17pm

the images are still building in CI, but should be published to DockerHub in the next 5-10 minutes.

masterkain · October 15, 2020, 6:28pm

I can test only with uniquely tagged releases, beta5? thanks!

webvictim · October 15, 2020, 7:33pm

This feature works exactly as it should for me now - thanks for taking a look and fixing it. Much appreciated!

Topic		Replies	Views
Exit 78 does not work for all steps Drone Support	14	721	October 23, 2020
Using 78 exit code with when conditions execute the skipped steps Drone Support	0	305	February 15, 2023
Exiting pipeline with code 78 fails the pipeline Drone Support	1	536	March 18, 2019
How to exit a Pipeline early without Failing Drone FAQ	1	2623	March 9, 2019
Drone doesn't exit from previous success step to the next step occasionally with latest kube runner Drone Bugs	2	540	September 8, 2020

Using exit code 78 with Kubernetes runner doesn't stop execution

Related topics