Drone "running" builds forever

Are you sure? We have an experimental reaper built into the agent to handle this exact situation. It needs to be enabled with a feature flag. We also have an open issue around non-graceful shutdowns (here). I am happy to talk through these issues, but would ask that we avoid speculating what is or is not the problem without more data.

Is this part of the design open to change? My thought is it would be nice if drone-server would periodically check to see if any builds are ‘running’ and also past the configured ‘timeout’ and then react by cancelling them. Come to think of it, this is something I could probably implement outside of drone core as a cron job that consumes the drone api.

I prefer not to answer this question because we are talking about solutions before we have identified a root cause. I think we need to take a step back and gather more data before we start making assumptions about root causes and required design changes.