Drone agents errors: context deadline exceeded

minhdanh · July 30, 2018, 4:33am

I have several drone agents run on Kubernetes. But some agents will keep spawning these errors:

2018/07/30 03:54:41 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:42 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:43 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:44 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:45 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:46 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:47 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:48 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:49 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2018/07/30 03:54:50 grpc error: wait(): code: DeadlineExceeded: rpc error: code = DeadlineExceeded desc = context deadline exceeded

In that case the only thing that will stop these errors is to restart the agents.
The agents with these errors seems doesn’t run any pipeline step when this happens.
How to avoid these errors?

bradrydzewski · July 30, 2018, 8:57am

I remember two other individuals had a similar issue. The docker logs command was freezing (problem with docker daemon, not drone) and was resulting in builds hanging and exceeding deadlines. In one case, the individual restarted the machine and/or docker daemon and the issue was solved. In the other case, the individual upgraded docker and it was solved.

minhdanh · August 6, 2018, 11:09am

So is it normal that when this error happens, it never stops and new build won’t be assigned to these drone agents?

bradrydzewski · August 6, 2018, 3:46pm

It is not normal, but in this case would be a docker bug, so nothing we can do about it in Drone. There were (are?) multiple open issues in the moby issue tracker related to logs freezing. As I mentioned one other team had a similar issue and they resolved it by either upgrading or downgrading docker (not sure)

minhdanh · August 13, 2018, 4:17am

I upgraded Docker to the current latest stable version (18.06.0-ce) on Ubuntu but the problem still exist.
Btw I don’t know how this is related to logs freezing as I don’t use that command and thus it doesn’t result to my problem.

ptagr · August 13, 2018, 5:23am

@minhdanh can you share your setup? If you are using docker swarm, make sure to use endpoint mode as dnsrr.

minhdanh · August 13, 2018, 9:31am

Sure. I have a drone server run as a docker container using docker-compose on a Ubuntu server.


services:
  drone-server:
    container_name: drone-server
    image: drone/drone:0.8.5
    restart: unless-stopped
    ports:
      - "9000:9000"

On another Kubernetes cluster I have several drone agents (version 0.8.5, using this helm chart: https://github.com/helm/charts/tree/master/stable/drone) configured to connect to that drone server using a url like https://drone.example.com:9000

Docker version on the K8s cluster is 18.06.0-ce.

When the agents started on K8s they will work fine, for a while. Then some of them will misbehave and keeps spawning the error that I described. When this happens I need to terminate those agents so that they’re created again. The CI job that started by those agents seem stuck in running state and won’t finish.

godleon · September 28, 2018, 5:44am

Hi all,

is there any update or fix about this issue? I am the same issue too.

I tried docker version from 17.03 ~ 18.06, all those versions have the same problem. And I also tried to move OS from Ubuntu 18.04 to Ubuntu 16.04 and CentOS 7, but the problem still the same.

I used docker-compose to start drone server & agent in a single VM.

Thanks!

godleon · September 29, 2018, 3:02pm

Hi, all

I’ve found the problem was caused by the pipeline execution time exceed the default timeout setting(60mins). And the rpc error message starts to spawn once per second, never stop until restart the agent container.

I solved this problem by extending the timeout setting with an admin user. @minhdanh if you still have the same issue, you can try if this help.

Topic		Replies	Views
RPC errors on the server Drone Support	0	840	April 27, 2018
POST results in "context deadline exceeded" [Please help] Drone Support	5	2943	October 18, 2019
Seeing a lot of errors related to code = DeadlineExceeded Drone Support	0	1164	April 3, 2018
Agent failling to pick up bulds 1.0.0-rc5 Drone Support	2	1550	February 26, 2019
0.8 Build permanently pending Drone Support	15	4953	April 23, 2018

Drone agents errors: context deadline exceeded

Related topics