Kubernetes "No space left on device"

andrewmclagan · January 24, 2019, 2:04am

Since 1-rc4 we have the following occurring at clone stage

error: copy-fd: write returned: No space left on device
fatal: cannot copy '/usr/share/git-core/templates/hooks/pre-push.sample' to '/drone/src/.git/hooks/pre-push.sample': No space left on device
fatal: Not a git repository (or any parent up to mount point /drone)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
+ git fetch origin +refs/heads/master:
fatal: Not a git repository (or any parent up to mount point /drone)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

bradrydzewski · January 24, 2019, 2:15am

error: copy-fd: write returned: No space left on device

I think my first question is if you have checked your node to see if it has run out of disk space? If yes, then I would follow-up and ask if you can run some sort of disk usage analysis to see if Drone is responsible for filling up disk space (and if yes, where those files are stored) and then report back.

If the node has not run out of disk space, then I would be interested in the following:

does this happen on every node, or just some
does this happen for every repository, or just some. If just some, can you provide a sample yaml
if this happens for every build, how were you able to successfully test this fix, and what might have changed since your last successful test?

andrewmclagan · January 24, 2019, 3:31am

Disk pressure

There is no disk pressure, there are 4 nodes in the cluster and each has 100Gb of allocatable storage. Currently only 4Gb, 7Gb, 6GB, 2Gb in use. Also KubeletHasNoDiskPressure is reported by each node.

Does this occur on each build?

No not on all builds. As you have noticed we were able to test a fix for the volume mount issues in this issue.

Does this occur on each node?

All builds were running on the same node, to test if it was a node dependant issue we ran kubectl cordon node-id-xxxx then ran the build again so it was on a different node, this solved the issue and everything was fine. No storage issues.

Although this node 100% has physical space. We can use this space through other pods without issue on the same node.

bradrydzewski · January 24, 2019, 6:37am

Drone does not yet support persistent volume claims, and as a temporary workaround creates a host machine mount to /tmp (tracking at https://github.com/drone/drone-runtime/issues/19). Perhaps /tmp is mounted to a local block device storage with limited disk space, and the standard kubernetes storage is mounted to another block device with more disk space?

arianitu · August 29, 2019, 1:45pm

This seems like a serious issue. @bradrydzewski it doesn’t appear the /tmp directory gets cleaned after builds, is it suppose to?

/tmp/ just increases forever until the node breaks. In our case, drone just filled the entire disk entirely and we didn’t even have enough room to transfer an SSH key to fix it.

I’m going through our other nodes and making sure it doesn’t break those too. What’s the correct way to handle this, is it to regularily go onto the machine and rm -rf /tmp/.drone?

arianitu · August 29, 2019, 2:11pm

When I SSH’d into the GKE node, the /tmp folder is specifically only 2GB in size. The rest of the 30GB is not assigned to /tmp/, but elsewhere.

What exactly is stored in /tmp/? Is it just the git repo, not what runs in containers?

bradrydzewski · August 29, 2019, 3:08pm

The native Kubernetes implementation was experimental and has since been formally deprecated. We therefore recommend running agents on Kubernetes.

lehno · December 6, 2019, 5:15pm

I see some people saying there is a new version of kubernetes runner, is this true or its really deprecated?

bradrydzewski · December 6, 2019, 5:28pm

The experimental kubernetes runtime (discussed in this thread) was deprecated in April. We launched a second iteration of a kubernetes runner a few weeks ago which you can find at https://docs.drone.io/runner/kubernetes/overview/

Topic		Replies	Views
Error on clone step: No space left on device Drone Support	3	636	February 2, 2021
Docker runner ephemeral disk full Drone Support	2	703	December 5, 2019
Cleaning up /tmp/drone/* in Kubernetes nodes? Drone Support	2	379	June 22, 2019
Drone 1.0 on Helm - No space left on device Drone Support	7	900	July 3, 2019
Docker plugin randomly fails with disk full Drone Support	1	453	September 26, 2017

Kubernetes "No space left on device"

Disk pressure

Does this occur on each build?

Does this occur on each node?

Related topics