Long running Kubernetes pipeline fails with unable to sync caches

mattlqx · February 27, 2020, 4:12pm

I’m running Drone 1.6.4 using drone-runner-kube. There’s a long running build involves kicking off a job on an external service and waiting to get its results back. The job ran for 1h11m and it looks to be successful from the output of it. Drone marked it as failed and the error it shows in the red box is UntilWithSync: unable to sync caches: context deadline exceeded. Is this related to a timeout somewhere? Any insight is appreciated.

bradrydzewski · February 27, 2020, 4:22pm

Do you have a reverse proxy or load balancer with a timeout? If so you may want to increase the request timeout to prevent it from killing long running http requests. Alternatively you can pull the :latest server image which patches the sync behavior and changes the HTTP request from a blocking request to an async non-blocking request which should also solve this problem.

mattlqx · February 27, 2020, 4:55pm

If I understand what you linked correctly, that’s for syncing repositories from source control right? The problem I’m seeing is an error on a pipeline run, see screenshot. Maybe you’re saying it’s still related? I did notice that the committer didn’t have his email configured in Github, so his commits weren’t recognized by Drone resulting in the broken avatar.

As for the load balancer, I assume you mean the one servicing the Drone UI. It’s a simple AWS ALB connecting to an NGINX Kubernetes ingress. Would that affect the result of the job itself? All Drone services are run inside the same Kubernetes cluster and access each other directly.

bradrydzewski · February 27, 2020, 5:07pm

If I understand what you linked correctly, that’s for syncing repositories from source control right? The problem I’m seeing is an error on a pipeline run, see screenshot. Maybe you’re saying it’s still related?

Sorry I read the issue quickly and misunderstood. They are not related.

Drone marked it as failed and the error it shows in the red box is UntilWithSync: unable to sync caches: context deadline exceeded

I can see this error coming from kubernetes [1] but I will need to do some research to understand why. I am guessing we will need some sort of backoff to gracefully handle this error and retry, but I would like to see if I can better understand the root cause before we try to write any code.

[1] until.go - kubernetes/client-go - Sourcegraph

mattlqx · February 27, 2020, 5:08pm

No problem. Thanks Brad!

ihakimi · July 14, 2020, 8:26am

Hi @bradrydzewski

I got this error too after step build runs 1:20:00 I got an error
general-2: UntilWithSync: unable to sync caches: context deadline exceeded
I am running on Kubernetes any update regarding this?

ashwilliams1 · July 14, 2020, 12:33pm

@ihakimi do you need to increase your repository timeout in the user interface?

ihakimi · July 14, 2020, 2:36pm

Hi @ashwilliams1,

Do you ask me to do it?
I also saw that after timeout exceeded the build don’t stop on kube, i see that kube is ingnore it

bradrydzewski · July 14, 2020, 2:53pm

When the timeout is reached, it may product a context error (similar to the one you pasted). Have you tried to increase the timeout for this build?

ihakimi · July 14, 2020, 5:44pm

Yes, I increase the timeout and it seems to be working, the timeout was 60 min but the build run 85 min so I see timeout is not accurate for pipeline with kube-runner

Topic		Replies	Views
Increase pipeline timeout above 72h Drone Support	0	362	May 3, 2022
Timeout option is not worknig Drone Support	2	205	December 29, 2022
[solved] Timed out waiting for the condition Drone Support	2	557	October 2, 2020
Drone-runner-kube slow to process spike in pipeline requests Drone Support	8	512	January 7, 2022
Kubernetes runner: operation cannot be fulfilled; the object has been modified Drone Support	3	1860	April 21, 2020

Long running Kubernetes pipeline fails with unable to sync caches

Related topics