Latest drone-runner-kube fails successful steps and does not display output

Using the new runner and Drone 2.0.0 (the proper tagged version), I’m having an issue where any pipeline that has failed steps is not showing the log output in the UI. I can inspect and see the XHR request is successfully returning JSON with the log lines, but the UI only displays the red error banner on all steps. EDIT: also looking a bit closer, these steps actually look to be successful from the output.

I’d also like to provide all the requested information for debugging but I’d have some issues providing full logs publicly, is there an alternative way I can provide that information?

@mattlqx thanks for moving this to a separate thread. Please send me a private message with the build and step data in JSON format and I will pass it along to the team. You can retrieve the build and step data from /api/repos/{organization}/{repo}/builds/{build}. The team may follow up with you tomorrow if they require additional details.

EDIT also thanks for testing the change so quickly and giving feedback, we really appreciate it

Marco, we are seeing an unknown container error and I believe this error may be coming from the Kubernetes API. Matt provided an example where we see the step fail just a few seconds after it is started (maybe a problem making an API call to update the image or stream the logs)

                {
                    "id": 1234,
                    "step_id": 5678,
                    "number": 6,
                    "name": "[redacted]",
                    "status": "error",
                    "error": "unknown container",
                    "exit_code": 255,
                    "started": 1623195715,
                    "stopped": 1623195724
                },

Matt also provided an example, pictured in his screenshot above, of a step that runs successfully for 2 minutes an exits with a non-zero exit code, but then still appears to return an unknown container error:

                {
                    "id": 1234,
                    "step_id": 5678,
                    "number": 7,
                    "name": "test-unit",
                    "status": "error",
                    "error": "unknown container",
                    "exit_code": 255,
                    "started": 1623194436,
                    "stopped": 1623194556
                },

while testing the latest version of the runner i’m facing this situation too

"steps":[
            {
               "id":124861,
               "step_id":22423,
               "number":1,
               "name":"clone",
               "status":"error",
               "error":"unknown container",
               "exit_code":255,
               "started":1623225456,
               "stopped":1623225462,
               "version":4
            },

@marko-gacesa submitted a pull request that was merged to master. An updated image is available for anyone that wants to test the fix:

We are going to be heavily iterating in master so please treat master / latest as an unstable build and don’t hesitate to report issues to Discourse. We will target the week of June 21 to tag our first release candidate so that teams can deploy the runner without having to use the latest image.

1 Like

Ran a build with the new version (3f8d5c34567e), same issue but slightly different build/step data.

Here’s an excerpt. “pod is terminated”

        {
          "id": 415974,
          "step_id": 50444,
          "number": 3,
          "name": "rebase",
          "status": "success",
          "exit_code": 0,
          "started": 1623268649,
          "stopped": 1623268653,
          "version": 4
        },
        {
          "id": 415975,
          "step_id": 50444,
          "number": 4,
          "name": "plan",
          "status": "error",
          "error": "pod is terminated",
          "exit_code": 255,
          "started": 1623268653,
          "stopped": 1623269027,
          "version": 4
        },
        {
          "id": 415976,
          "step_id": 50444,
          "number": 5,
          "name": "infracost",
          "status": "skipped",
          "errignore": true,
          "exit_code": 0,
          "started": 1623269027,
          "stopped": 1623269027,
          "version": 3
        }

thanks @mattlqx

@marko-gacesa is continuing to iterate and has pushed a fix https://github.com/drone-runners/drone-runner-kube/pull/57.

we are going to setup a test cluster in Google Cloud (GCP) with a bunch of cron jobs to launch pipelines, so we can hit the kubernetes runner with a steady flow of volume and see if we can expose any edge cases or race conditions. But in the meantime, please keep the feedback coming :slight_smile:

So far so good on this one. I’ve kicked off around 10 builds and no unexpected failures so far. Though an exact number is hard to get with the Drone 2.0 interface. :wink: (since it buries activity quite a bit compared to 1.x)

Thanks for the support @bradrydzewski and @marko-gacesa .

(I made this post earlier but realized I hadn’t actually been running the new runner, but after putting a few builds through the new one, it is true. :slight_smile: )

Thanks @mattlqx ! This is really helpful.