Drone Pipeline Step Skipping

Hi,

If we enable verbose logging within a pipeline using the command: mvn clean install -B -V , the job after is skipped, which is to build a docker image: docker build ./$${IMAGE_NAME_API_JSON} -t. The first jobs logs are also not visible in the browser.

However if mvn clean install -B -V is changed to mvn clean install -B -V 1>/dev/null , the build succeeds.

Can you please provide any information as to why the build does not work with the mvn clean install -B -V command?

Thanks,
Jasdeep Dhadial

1 Like

Hi,

I am a colleague of Jasdeep and I thought I’d add some detail to this issue.

We have found that a step which generates a large amount of logs can cause the subsequent step to be skipped, even though the first step is marked as successful.

We discovered this by directing stdout to /dev/null - as Jasdeep stated.

Example below - the second step runs in the second pipeline but not in the first

- name: build-java
  pull: if-not-exists
  image: quay.io/ukhomeofficedigital/java8-mvn:v3.6.0
  commands:
    - mvn clean install -B -V
  volumes:
    - name: dockersock
      path: /var/run
  when:
    branch:
      - master
      - feature/*
    event:
      - push

- name: build-fb-images
  pull: if-not-exists
  image: docker:dind
  commands:
    - docker build ./$${IMAGE_NAME_API_JSON} -t $${DOCKER_REPO}/$${PROJECT}/$${IMAGE_NAME_API_JSON}:$${DRONE_BUILD_NUMBER}
  volumes:
    - name: dockersock
      path: /var/run
  when:
    branch:
      - master
      - feature/*
    event:
      - push
      - pull_request

- name: build-java
  pull: if-not-exists
  image: quay.io/ukhomeofficedigital/java8-mvn:v3.6.0
  commands:
    - mvn clean install -B -V 1>/dev/null
  volumes:
    - name: dockersock
      path: /var/run
  when:
    branch:
      - master
      - feature/*
    event:
      - push

- name: build-fb-images
  pull: if-not-exists
  image: docker:dind
  commands:
    - docker build ./$${IMAGE_NAME_API_JSON} -t $${DOCKER_REPO}/$${PROJECT}/$${IMAGE_NAME_API_JSON}:$${DRONE_BUILD_NUMBER}
  volumes:
    - name: dockersock
      path: /var/run
  when:
    branch:
      - master
      - feature/*
    event:
      - push
      - pull_request

There is nothing in the code that would limit step execution based on log size of prior steps. However, the pipeline does block when uploading the final logs after the step completes, and will retry on failure using a backoff. We have seen people experience similar problems with reverse proxies or load balancers failing large file uploads (due to http request limits) causing the the upload to retry using a backoff until the pipeline eventually times out (skipping remaining steps). This would be consistent with the behavior you described, and can be easily fixed by updating your reverse proxy or load balancer configuration.

Have you enabled trace logging on your runner to see if it provides you with more details of what is happening? Also, have you ensured you are using the latest stable release of the runner to ensure you have all the latest patches and fixes?

Hi Brad,

Thank you for the response.

We are using v1.0.0-beta.4

I will turn on trace logging first to see if that sheds any light on things

Hi Willem, please note that v1.0.0-beta.4 is a very old beta release of the software and is missing nearly two years of bug fixes and improvements. Please upgrade to the latest stable release of Drone (1.9) and the latest stable release of the docker runner (drone/drone-runner-docker:1.5). Once you have the latest stable releases, if the problem persists, we can continue triaging.

Sorry, I should have been more clear: we’re running v1.0.0-beta.4 of drone-runner-kube

For the server we’re running 1.9.0

I’m going to set DRONE_TRACE=true on the runner to see if that reveals anything

thanks for clarifying. In that case, you’ll want to upgrade the kubernetes runner to v1.0.0-beta.5 which includes a number of improvements and bug fixes.

Hi Brad, we have found that this issue was caused by an http payload restriction in the cluster internal ingress controller, so when the runner tried to post the logs to the server via an ingress the following error was returned:

error="1 error occurred:\n\t* <html>\r\n<head><title>413 Request Entity Too Large</title></head>\r\n<body>\r\n<center><h1>413 Request Entity Too Large</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n\n\n"

413 Request Entity Too Large confirms what I hypothesized in my previous comment

We have seen people experience similar problems with reverse proxies or load balancers failing large file uploads (due to http request limits) causing the the upload to retry using a backoff until the pipeline eventually times out (skipping remaining steps). This would be consistent with the behavior you described, and can be easily fixed by updating your reverse proxy or load balancer configuration.

You will need to update your ingress configuration to increase the size limits for http requests.

Also note that v1.0.0-beta.5 surfaces this error in the user interface instead of skipping steps. So you should also make sure you are running the latest version of the runner. It will not solve your ingress problem, but will ensure the displays an error.

Indeed your hypothesis was proven correct. Thanks, was wondering abour surfacing the issue rather than silently skipping. We’ll deploy the new version when we can.