When using the Kubernetes runner, it seems that if you use exit code 78 to exit a pipeline early (as described here), the pipeline doesn’t actually stop running and exit early at all. Here’s some screenshots with timings:
~11 minutes is the normal time for this pipeline to run when steps aren’t being skipped. It seems that the total time for the pipeline to complete is exactly the same, regardless of whether the pipeline exits early with exit code 78 or whether steps run to completion.
My expectation would be that using exit code 78 to exit early would exit the pipeline with success immediately and prevent any future steps from being executed.
My reason for using it here is to make the CI process quicker by not running expensive tests if no changes have been made to the code which require them to run - unfortunately this doesn’t seem to currently be achievable with Drone. This looks like a bug.
Looks like I was on drone/drone-runner-kube:1.0.0-beta.1, I’ve updated to drone/drone-runner-kube:1.0.0-beta.4 (which is the latest from the Helm chart) but the issue persists.
I think this is related to the use of a services: entry. I can reproduce the issue with a simple variation on the .drone.yml you used:
You can see from this screenshot that instead of exiting the whole pipeline early after test exits with code 78, the test2 step still runs and sleeps for the full 60 seconds before exiting. You can even see the output of the job in the log window.
My expected behaviour here would be that as soon as test2 is skipped, the service should shut down and the pipeline should end immediately without any subsequent steps running.
FYI I tried this test case again just now using drone/drone-runner-kube:latest and the same issue I reported previously still exists when using a services: declaration. It seems this particular bug wasn’t fixed by any changes to the execer.
As before, the steps are marked as skipped in the UI, but they still continue to run in the background and the entire pipeline does not return a success/failure status until all the subsequent steps have executed in full. It seems like the skip status is not being propagated correctly when a services: declaration is used.
Exit code 78 from test after 8 seconds, but the full pipeline still takes 1m12 to execute:
Logs from docker:dind showing it doesn’t exit until 71 seconds (1m11) in:
The execer code is shared by all runners. I think we can rule out issues with the execer code, since this code is shared by all runners, and we cannot reproduce with the docker runner [1][2]. This seems to imply the issue is isolated to the kubernetes runner. As the kubernetes runner is currently in beta, we are asking that individuals who can consistently reproduce issues consider sending pull requests for improvements. We provide a contributing guide here.
I just tested with the Docker runner and it does exactly the same thing as the Kubernetes runner does - i.e. it fails to exit early on code 78 when you use a services: declaration.
I think that your test pipeline steps ran too quickly to see the issue. If you add a sleep 60 into the second step, you should be able to reproduce it too. There is definitely an issue with exit 78 and services: when using both the Docker and Kubernetes runners.
Thanks, I was able to hunt down the root cause, which was due to a flaw in how we calculated whether or not a step was cancelled. This issue only manifested when a pipeline services were running, although the root cause was not related to services themselves. The way we calculate and subsequently skip steps was improved, and the runners were updated accordingly. Download the :latest versions of the runner images to get this fix.