The problem is that this is reported as a status 200 back to GitHub and the run is skipped silently. On our side we see it as the run simply not being triggered (the status check is not created), but the commit has a green check mark from our other actions.
Since the runs performs some critical business functions for us (synchronising Kubernetes manifests to be deployed by Flux), we really can’t have silent failures to execute. We need to have a way to have these conversion plugin failures surfaced so we can take action. I did have two ideas
Report a non-200 status. (But I’m not sure what GitHub would do with that.)
Report a created build but have it fail with the same error message. In our case we have alerting configured for this already so on-call engineers would be notified of the failure.
I am reading through this thread, and it sounds like Drone is creating an entry for the Build in the Drone database, with an error status, which is visible in the user interface (per the screenshot you provided). A silent failure would generally imply that no build entry is created, and you have no way to know there was an error (other than looking at the logs). Just to ensure I’m not misunderstanding, can you confirm you see a build in the Drone user interface with the relevant error?
If yes, is it fair to define the problem statement as the following: When the extension returns an error, Drone does not create a GitHub status?
That’s right. If you were browsing the web UI at the right time, you would be able to see an failure for the conversion extension error so it is actually possible to detect if you are there when it happens or if you page back and find all the failures for pushes to the main branch. (I didn’t try the CLI.)
The problem is really two things
No status check is created on GH
It’s not reported as a failure as far as our webhook listener (I didn’t mention this specific in the OP, apologies)
Ok, sorry for using the word silent then. Here’s where I was coming from: normal failures are noisy for us, deliberately because the steps being run are quite crucial. In this case there is no notification so that’s why I used that word. It’s making a build but the normal notification channels (as far as I can see) are not being pinged. So it’s not making any noise if you see what I mean. But you can go and look, as you say.
Thanks so much for clarifying, this makes perfect sense. A quick look at the code and I think we just need to send the status and webhook in the createBuildError function (at the link below). We will have our team dig deeper and will report back here if we have any questions.