Hi everyone again.
We are running drone 1.0.1 to run tests in parallel.
They are running in ~20 steps, each step running a rspec command in different folder, running in parallel in the same machine (a r5.xlarge) instance, they are grouped in drone.yml.
The problem is that some steps are failing randomly due to “Failure” Error. The error only says that.
I also checked the meaning of 255 and it indicates that docker run, docker start or docker logs failed to execute. You might also be able to run docker events on the agent hostmachine to surface any detailed errors being returned by the Docker deamon.
Hello @bradrydzewski.
I could get the real error using docker inspect
"State": {
"Status": "created",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 128,
"Error": "error setting label on mount source '/var/lib/docker/volumes/9bvrs0ahyxj4o7ttigp9tpt7lp6ofpkq/_data': no such file or directory",
"StartedAt": "0001-01-01T00:00:00Z",
"FinishedAt": "0001-01-01T00:00:00Z"
},
We are trying to understand it, because even with one or two step failing with this error, the others (in paralel) runs normally.
Do you have any idea?
Hello @bradrydzewski.
It was a good point!
We tested the build running the agents on another Box. It was running in a CoreOS before. Now it’s running on a Amazon Linux Optimized (We run docker on ECS) and it’s running without errors on build.
So, it seems something related to the Core OS & Docker.