Hi Matt,
We added some feature flags to drone/drone-runner-kube:latest that I hope will help:
DRONE_FEATURE_FLAG_RETRY_LOGS=true
configures the system to retry streaming the logs on error. I recommend enabling this flag first and monitoring its behavior. If this flag solves the problem, we can implement a more permanent fix where we automatically retry.
DRONE_FEATURE_FLAG_DELAYED_DELETE=true
delays removing the pod by 30 seconds. If the underlying root cause is the Pod being deleted before logs are streamed, setting this flag to true should workaround this issue by giving the system enough time to stream logs.
DRONE_FEATURE_FLAG_DISABLE_DELETE=true
disables removing the Pod. If enabled this requires manually removing the resources associated with the pipeline (Pod, Config Maps, Secrets, etc). If streaming the logs fails consistently, we should be able to disable Pod deletion and try streaming logs using kubectl to see if the error is reproducible.