Below step finished, but drone didn’t notice, and as result other steps are waiting for it:
It fails in like 1/50 ratio, most of the time it is fine like below:
Going back to the stuck one, logs from it:
kubectl --namespace drone-ci-exec logs drone-bgq6ry8sqxq524zjv3am drone-1qljfp58m4cuf0qxndjo
+ waitpostgres $DATA_CONNECTION
psql: error: could not connect to server: Connection refused
Is the server running on host "postgres" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
Waiting for postgres...
psql: error: could not connect to server: Connection refused
Is the server running on host "postgres" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
Waiting for postgres...
psql: error: FATAL: database "test" does not exist
Waiting for postgres...
current_date|2021-03-23
+ waitredis $REDIS_HOST
PONG
crictl ps -a|grep drone-1qljfp58m4cuf0qxndjo
7f8d467da4d72 ecr.ad.dice.fm/base@sha256:5973219db525061334ccb4ef74f1e3220ad13f01ad0419599cd5be933bc73d2b 38 minutes ago Exited drone-1qljfp58m4cuf0qxndjo 1 b3785904fbfd7
kubectl describe say that it ended with error:
drone-1qljfp58m4cuf0qxndjo:
Container ID: cri-o://7f8d467da4d7288d9083887f5539fdac4ba5d6b3967638c8b88b31d2ab1e65f2
Image: ecr.ad.dice.fm/base:waitdbs-latest
Image ID: ecr.ad.dice.fm/base@sha256:5973219db525061334ccb4ef74f1e3220ad13f01ad0419599cd5be933bc73d2b
Port: <none>
Host Port: <none>
Command:
/bin/sh
-c
Args:
echo "$DRONE_SCRIPT" | /bin/sh
State: Running
Started: Tue, 23 Mar 2021 17:15:21 +0000
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Tue, 23 Mar 2021 17:14:54 +0000
Finished: Tue, 23 Mar 2021 17:15:20 +0000
Ready: True
Restart Count: 1
crictl logs 7f8d467da4d72
+ waitpostgres $DATA_CONNECTION
psql: error: could not connect to server: Connection refused
Is the server running on host "postgres" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
Waiting for postgres...
psql: error: could not connect to server: Connection refused
Is the server running on host "postgres" (127.0.0.1) and accepting
TCP/IP connections on port 5432?
Waiting for postgres...
psql: error: FATAL: database "test" does not exist
Waiting for postgres...
current_date|2021-03-23
+ waitredis $REDIS_HOST
PONG
{
"status": {
"id": "7f8d467da4d7288d9083887f5539fdac4ba5d6b3967638c8b88b31d2ab1e65f2",
"metadata": {
"attempt": 1,
"name": "drone-1qljfp58m4cuf0qxndjo"
},
"state": "CONTAINER_EXITED",
"createdAt": "2021-03-23T17:15:21.459089592Z",
"startedAt": "2021-03-23T17:15:21.479616027Z",
"finishedAt": "2021-03-23T17:15:24.646643722Z",
"exitCode": 0,
"image": {
"image": "ecr.ad.dice.fm/base:waitdbs-latest"
},
"imageRef": "ecr.ad.dice.fm/base@sha256:5973219db525061334ccb4ef74f1e3220ad13f01ad0419599cd5be933bc73d2b",
"reason": "Completed",
Drone runner log:
kubectl --namespace drone-ci logs drone-runner-kube-69d76cddbd-2dvns|grep 1032
time="2021-03-23T17:15:59Z" level=debug msg="stage details fetched" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:15:59Z" level=debug msg="updated stage to running" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:17:00Z" level=debug msg="received exit code 0" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=clone thread=41
time="2021-03-23T17:17:07Z" level=debug msg="received exit code 0" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=db-connections thread=41
time="2021-03-23T17:21:02Z" level=debug msg="received exit code 0" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=dam-build-and-test thread=41
time="2021-03-23T17:22:39Z" level=debug msg="received exit code 0" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=prefect-build-and-test thread=41
time="2021-03-23T17:24:33Z" level=debug msg="received exit code 2" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 step.name=tournesol-build-and-test thread=41
time="2021-03-23T17:24:34Z" level=debug msg="destroying the pipeline environment" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:24:39Z" level=debug msg="successfully destroyed the pipeline environment" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:24:39Z" level=debug msg="updated stage to complete" build.id=1278 build.number=1032 duration=514 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41
time="2021-03-23T17:24:39Z" level=debug msg="done listening for cancellations" build.id=1278 build.number=1032 repo.id=106 repo.name=Tournesol repo.namespace=dicefm stage.id=1612 stage.name=default stage.number=1 thread=41