Kubernetes-runner sometimes gets stuck on finished step

Below step finished, but drone didn’t notice, and as result other steps are waiting for it:

It fails in like 1/50 ratio, most of the time it is fine like below:

Going back to the stuck one, logs from it:

kubectl --namespace drone-ci-exec logs drone-bgq6ry8sqxq524zjv3am drone-1qljfp58m4cuf0qxndjo
+ waitpostgres $DATA_CONNECTION
psql: error: could not connect to server: Connection refused
	Is the server running on host "postgres" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
Waiting for postgres...
psql: error: could not connect to server: Connection refused
	Is the server running on host "postgres" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
Waiting for postgres...
psql: error: FATAL:  database "test" does not exist
Waiting for postgres...
current_date|2021-03-23
+ waitredis $REDIS_HOST
PONG
crictl ps -a|grep drone-1qljfp58m4cuf0qxndjo
7f8d467da4d72       ecr.ad.dice.fm/base@sha256:5973219db525061334ccb4ef74f1e3220ad13f01ad0419599cd5be933bc73d2b               38 minutes ago      Exited              drone-1qljfp58m4cuf0qxndjo   1                   b3785904fbfd7

kubectl describe say that it ended with error:

  drone-1qljfp58m4cuf0qxndjo:
    Container ID:  cri-o://7f8d467da4d7288d9083887f5539fdac4ba5d6b3967638c8b88b31d2ab1e65f2
    Image:         ecr.ad.dice.fm/base:waitdbs-latest
    Image ID:      ecr.ad.dice.fm/base@sha256:5973219db525061334ccb4ef74f1e3220ad13f01ad0419599cd5be933bc73d2b
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/sh
      -c
    Args:
      echo "$DRONE_SCRIPT" | /bin/sh
    State:          Running
      Started:      Tue, 23 Mar 2021 17:15:21 +0000
    Last State:     Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Tue, 23 Mar 2021 17:14:54 +0000
      Finished:     Tue, 23 Mar 2021 17:15:20 +0000
    Ready:          True
    Restart Count:  1
crictl logs 7f8d467da4d72
+ waitpostgres $DATA_CONNECTION
psql: error: could not connect to server: Connection refused
	Is the server running on host "postgres" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
Waiting for postgres...
psql: error: could not connect to server: Connection refused
	Is the server running on host "postgres" (127.0.0.1) and accepting
	TCP/IP connections on port 5432?
Waiting for postgres...
psql: error: FATAL:  database "test" does not exist
Waiting for postgres...
current_date|2021-03-23
+ waitredis $REDIS_HOST
PONG
{
  "status": {
    "id": "7f8d467da4d7288d9083887f5539fdac4ba5d6b3967638c8b88b31d2ab1e65f2",
    "metadata": {
      "attempt": 1,
      "name": "drone-1qljfp58m4cuf0qxndjo"
    },
    "state": "CONTAINER_EXITED",
    "createdAt": "2021-03-23T17:15:21.459089592Z",
    "startedAt": "2021-03-23T17:15:21.479616027Z",
    "finishedAt": "2021-03-23T17:15:24.646643722Z",
    "exitCode": 0,
    "image": {
      "image": "ecr.ad.dice.fm/base:waitdbs-latest"
    },
    "imageRef": "ecr.ad.dice.fm/base@sha256:5973219db525061334ccb4ef74f1e3220ad13f01ad0419599cd5be933bc73d2b",
    "reason": "Completed",

Drone runner log:

kubectl --namespace drone-ci logs drone-runner-kube-69d76cddbd-2dvns|grep 1032
time="2021-03-23T17:15:59Z" level=debug msg="stage details fetched" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:15:59Z" level=debug msg="updated stage to running" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:17:00Z" level=debug msg="received exit code 0" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=clone thread=41
time="2021-03-23T17:17:07Z" level=debug msg="received exit code 0" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=db-connections thread=41
time="2021-03-23T17:21:02Z" level=debug msg="received exit code 0" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=dam-build-and-test thread=41
time="2021-03-23T17:22:39Z" level=debug msg="received exit code 0" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=prefect-build-and-test thread=41
time="2021-03-23T17:24:33Z" level=debug msg="received exit code 2" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=tournesol-build-and-test thread=41
time="2021-03-23T17:24:34Z" level=debug msg="destroying the pipeline environment" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:24:39Z" level=debug msg="successfully destroyed the pipeline environment" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:24:39Z" level=debug msg="updated stage to complete" build.id=1278 build.number=1032 duration=514 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:24:39Z" level=debug msg="done listening for cancellations" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41

The kubernetes runner is currently in Beta and is a community development effort. If you are able to consistently reproduce this issue, we would kindly ask that you review our development guide and help us identify the offending code that requires improvement (and perhaps submit a pull request). See Contributing to Drone for Kubernetes

It seems Fix build steps being stuck 'running' by smira · Pull Request #37 · drone-runners/drone-runner-kube · GitHub fixed it
I run over 100 builds yesterday with this and not single one of them got stuck. Before it was like every fifth, or tenth

moving discussion to this thread