[answered] Pipeline keeps running even though Runner is offline

HackHerz · December 5, 2020, 11:09am

I do have a Docker Runner which is not online 24/7. The Server does not notice when the Runner shuts down and doesn’t stop the pipeline, even though the 4h timeout is exceeded (by a lot).

Steps to reproduce:

Start Build
Shut down Runner

Expected behaviour:
Drone stops the pipeline after the timeout is exceeded.

Actual Behaviour:
The Pipeline is forever displayed as running.

Version:

Drone Server: 1.9
Drone Runner: 1.6

bradrydzewski · December 5, 2020, 6:27pm

Runners must be gracefully terminated; they must not be force-terminated while pipelines are running, otherwise they are stuck in a running state.

The server does not keep track of runner connectivity for a number of reasons (for example, connections are not persistent and the runners use long polling and frequently connect and disconnect to avoid tcp timeouts, which are common in many corporate networks). If you stop or restart the server while builds are running, or the runner loses connectivity with the server, it is able to keep running pipelines and upload the results using a backoff once it is able to re-establish a connection. This decentralized design makes the system more resilient to outages and flaky networks, but the tradeoff is that you must not shut down a runner while it is running a pipeline.

The servers does scan for stuck jobs every 24 hours and terminates them. If you want to reduce the interval and scan more frequently, you can adjust the cleanup intervals and deadlines by passing the following environment variables to your Drone server:

DRONE_CLEANUP_INTERVA=1h
DRONE_CLEANUP_DEADLINE_RUNNING=1hr

slinstaedt · October 12, 2021, 12:15pm

Just for clarification: If a (docker) runner is getting a signal to gracefully stop, it is 1. not accepting (reads: pulling for) any new jobs to execute and 2. blocks until it’s currently executing jobs have finished before exiting itself?

Settings the runner’s container grace period longer than the maximum build timeout (default 60min) should do the job, if the OS itself is not reaping processes at some point.

Would it be hard to implement a different, configurable strategy for runners, that is automatically cancelling executing jobs (and notifying the server thereof) before exiting?

Topic		Replies	Views
Drone Docker Runner looses track of containers across runner restarts Drone Support	2	311	May 24, 2022
Docker runner is instantiating and terminating Drone Support	4	334	October 11, 2021
Drone doesn't exit from previous success step to the next step occasionally with latest kube runner Drone Bugs	2	540	September 8, 2020
Cancel build from UI but it still running on Background! Drone Support	16	651	March 29, 2021
Drone "running" builds forever Drone Support	9	735	June 7, 2019

[answered] Pipeline keeps running even though Runner is offline

Related topics