Hello,
We’re having issues with the autoscaler not scaling down instances as aggressively as I would expect.
The documentation suggests that the default minimum age for an agent is 1h, and the minimum pool size is 2. I currently have 7 running instances, but can see several hour long gaps in the cpu utilisation timeline where I would expect some of the agents to be killed, as there are no running builds.
It’s pretty obvious that the agents are spending a lot of time doing nothing, and this has a significant cost, so I’m keen to sort this rather than giving Bezos more money.
From my understanding, I would expect all bar two agents to be terminated between 14:30 and 16:30 in this graph. Further, a maximum of 4 agents have anything actually being run on them in this 12 hour stretch.
The agents are being spun up and down (I think). Two of the agents have only been running for 22h hours. Two of them have been running for 16 days, which matches the minimum pool.
I have confirmed the pool min age is not overridden in the autoscaler.
The autoscaler logs have a repeated message about the autoscaler considering terminating, then aborting the termination.
{“id”:“z35a5k3geK9z74fR”,“level”:“debug”,“msg”:“calculate unfinished jobs”,“time”:“2021-07-09T00:53:35Z”}
{“id”:“z35a5k3geK9z74fR”,“level”:“debug”,“msg”:“calculate server capacity”,“time”:“2021-07-09T00:53:35Z”}
{“id”:“z35a5k3geK9z74fR”,“level”:“debug”,“max-pool”:30,“min-pool”:2,“msg”:“check capacity”,“pending-builds”:0,“running-builds”:0,“server-buffer”:0,“server-capacity”:7,“server-count”:7,“time”:“2021-07-09T00:53:35Z”}
{“id”:“z35a5k3geK9z74fR”,“level”:“debug”,“msg”:“terminate 5 servers”,“time”:“2021-07-09T00:53:35Z”}
{“id”:“z35a5k3geK9z74fR”,“level”:“debug”,“min-pool”:2,“msg”:“abort terminating %!d(MISSING) instances to ensure minimum capacity met”,“servers-running”:4,“servers-to-terminate”:5,“time”:“2021-07-09T00:53:35Z”}
{“id”:“z35a5k3geK9z74fR”,“level”:“debug”,“msg”:“check capacity complete”,“time”:“2021-07-09T00:53:35Z”}
{“id”:“Btdk7g26mri1np3L”,“level”:“debug”,“msg”:“calculate unfinished jobs”,“time”:“2021-07-09T00:58:35Z”}
{“id”:“Btdk7g26mri1np3L”,“level”:“debug”,“msg”:“calculate server capacity”,“time”:“2021-07-09T00:58:35Z”}
{“id”:“Btdk7g26mri1np3L”,“level”:“debug”,“max-pool”:30,“min-pool”:2,“msg”:“check capacity”,“pending-builds”:0,“running-builds”:0,“server-buffer”:0,“server-capacity”:7,“server-count”:7,“time”:“2021-07-09T00:58:35Z”}
{“id”:“Btdk7g26mri1np3L”,“level”:“debug”,“msg”:“terminate 5 servers”,“time”:“2021-07-09T00:58:35Z”}
{“id”:“Btdk7g26mri1np3L”,“level”:“debug”,“min-pool”:2,“msg”:“abort terminating %!d(MISSING) instances to ensure minimum capacity met”,“servers-running”:4,“servers-to-terminate”:5,“time”:“2021-07-09T00:58:35Z”}
{“id”:“Btdk7g26mri1np3L”,“level”:“debug”,“msg”:“check capacity complete”,“time”:“2021-07-09T00:58:35Z”}
{“id”:“RUOF0rnuBhvPm6S0”,“level”:“debug”,“msg”:“calculate unfinished jobs”,“time”:“2021-07-09T01:03:35Z”}
{“id”:“RUOF0rnuBhvPm6S0”,“level”:“debug”,“msg”:“calculate server capacity”,“time”:“2021-07-09T01:03:35Z”}
{“id”:“RUOF0rnuBhvPm6S0”,“level”:“debug”,“max-pool”:30,“min-pool”:2,“msg”:“check capacity”,“pending-builds”:0,“running-builds”:0,“server-buffer”:0,“server-capacity”:7,“server-count”:7,“time”:“2021-07-09T01:03:35Z”}
{“id”:“RUOF0rnuBhvPm6S0”,“level”:“debug”,“msg”:“terminate 5 servers”,“time”:“2021-07-09T01:03:35Z”}
{“id”:“RUOF0rnuBhvPm6S0”,“level”:“debug”,“min-pool”:2,“msg”:“abort terminating %!d(MISSING) instances to ensure minimum capacity met”,“servers-running”:4,“servers-to-terminate”:5,“time”:“2021-07-09T01:03:35Z”}
{“id”:“RUOF0rnuBhvPm6S0”,“level”:“debug”,“msg”:“check capacity complete”,“time”:“2021-07-09T01:03:35Z”}