Please post the autoscaler logs with TRACE logging enabled. The logs will help us determine where in the process the autoscaler is waiting so that we can suggest possible root causes. If the logs are insufficient, we can add more.
This is not quite how it works. First, the autoscaler provisions the instance and then makes an API call to describe the instance and to check the instance network. Once the instance is successfully provisioned, the status is changed from creating
to staging
Next, the autoscaler tries to docker ping
the docker instance to verify it is initialized (using a backoff). And finally once it is able to ping the instance, it installs and starts the autoscaler (using docker create
and docker start
). Once the agent is successfully installed, the status is changed from creating
to running
.
Since you see the runner is stuck in a creating
status, we can narrow this down to some problem with instance creation. It sounds like it might be stuck in the waitZoneOperation
backoff. The backoff is subject to a 1 hour timeout, which ultimately propagates to the waitZoneOperation
call using this context
. The autoscaler performs this backoff until GCP indicates the instance is given a status of DONE
or until the API returns an error (which includes a timeout error).
The agent is ready once it has successfully connected to the Docker daemon on the machine and executed a docker ping
and installed the agent using docker create
and docker start
. But as mentioned above, it sounds like you are not getting past the instance creation and verification step.