Long running Kubernetes pipeline fails with unable to sync caches

I’m running Drone 1.6.4 using drone-runner-kube. There’s a long running build involves kicking off a job on an external service and waiting to get its results back. The job ran for 1h11m and it looks to be successful from the output of it. Drone marked it as failed and the error it shows in the red box is UntilWithSync: unable to sync caches: context deadline exceeded. Is this related to a timeout somewhere? Any insight is appreciated.

1 Like

Do you have a reverse proxy or load balancer with a timeout? If so you may want to increase the request timeout to prevent it from killing long running http requests. Alternatively you can pull the :latest server image which patches the sync behavior and changes the HTTP request from a blocking request to an async non-blocking request which should also solve this problem.

If I understand what you linked correctly, that’s for syncing repositories from source control right? The problem I’m seeing is an error on a pipeline run, see screenshot. Maybe you’re saying it’s still related? I did notice that the committer didn’t have his email configured in Github, so his commits weren’t recognized by Drone resulting in the broken avatar.

As for the load balancer, I assume you mean the one servicing the Drone UI. It’s a simple AWS ALB connecting to an NGINX Kubernetes ingress. Would that affect the result of the job itself? All Drone services are run inside the same Kubernetes cluster and access each other directly.

If I understand what you linked correctly, that’s for syncing repositories from source control right? The problem I’m seeing is an error on a pipeline run, see screenshot. Maybe you’re saying it’s still related?

Sorry I read the issue quickly and misunderstood. They are not related.

Drone marked it as failed and the error it shows in the red box is UntilWithSync: unable to sync caches: context deadline exceeded

I can see this error coming from kubernetes [1] but I will need to do some research to understand why. I am guessing we will need some sort of backoff to gracefully handle this error and retry, but I would like to see if I can better understand the root cause before we try to write any code.

[1] until.go - kubernetes/client-go - Sourcegraph

No problem. Thanks Brad!

Hi @bradrydzewski

I got this error too after step build runs 1:20:00 I got an error
general-2: UntilWithSync: unable to sync caches: context deadline exceeded
I am running on Kubernetes any update regarding this?

@ihakimi do you need to increase your repository timeout in the user interface?

Hi @ashwilliams1,

Do you ask me to do it?
I also saw that after timeout exceeded the build don’t stop on kube, i see that kube is ingnore it

When the timeout is reached, it may product a context error (similar to the one you pasted). Have you tried to increase the timeout for this build?

Yes, I increase the timeout and it seems to be working, the timeout was 60 min but the build run 85 min so I see timeout is not accurate for pipeline with kube-runner