After upgrading Drone-runner-kube to 1.0.0-beta.12 the pipeline gets stuck at the step, that fails but no other steps are dependent on this step. It gets an Error 1 but the step keeps running.
Expected behavior with Drone-runner-kube 1.0.0-beta.9 was:
As a bonus, after manually stopping the build, the pod stays created, doesn’t get terminated.
Please enable trace level logging and then post the runner logs for the drone developers to analyze (you can post to gist / pastebin / etc and provide a link in this thread) as well as a yaml file that can be used to reproduce.
Here’s what happened after I manually canceled the build. This time pod was terminated but before it wasn’t.
Yet the build still got stuck at the same step.
In the runner’s log, I found that the step “check-dependencies” (container=drone-3e4r5l615rtj4qstaa6x) was never finished. It was launched at “2021-09-15T21:03:05Z”, successfully started two seconds later, but a Kubernetes event for the step termination never arrived. From the screenshot I can see the step was still running at “2021-09-15T21:14:25Z” when the pod was deleted (after you canceled the build).
What was the step doing? Is it possible that the step itself got stuck, and not the runner, because the log looks normal. Remember, there’s no timeout for execution of steps.
Also, it’s weird that the first time the runner failed to delete a pod. What version of Kubernetes are you using?
Sorry, now I see I blacked out all lines, it was finished with Error 1, this step is meant to fail, it’s just to display indirect dependecies for our devs. The expected behaviour is to fail and continue with next step.
With the beta.12 it stays stuck indefinitely in this state.
Okay, it look’s like the steps that finish with error fail successfuly.
Although we discovered a different bug now with step that fails but in the end, in Drone UI it displays this error message:
During the run, the logs are normally displayed as usual but as the run is finished it’s displaying this messsage.
I was monitoring logs in a container on k8s during the run of the build and the logs finish with this lines:
We’d expect this lines will be displayed in Drone server UI but unfortunately we just find error message shown on a picture above.
Note: Pod ID’s don’t match as the screenshot with logs was made on pod, that was triggered with kube-runner beta.12 as I was testing previous behavior. Anyway the logs were the same on the pod, that produced the error on first picture. This behavior is happening with drone-runner-kube:latest.
Failing to display the logs could be an issue with the user interface as opposed to an issue with the runner. Can you see the logs if you manually try to access the log endpoint in your browser? The log endpoint follows this pattern: