My take on GKE auto-pilot mode

Posted on 2021-10-17 in Programmation

I recently switched from a standard GKE cluster where I had to manage nodes to an autopilot one where Google does it for me. I must say the switch was easy to do (despite an interruption of our services) and greatly simplified the management of the cluster.

I noticed this simplification when we added a service to the cluster and I had to change the node configuration to fit our new needs because the workload couldn't be deployed. This resulted in a suboptimal configuration for our cluster (too many nodes) and our node auto-scaling configuration didn't seem to optimize it correctly. With auto-pilot, everything is way more simple: your specify resources for each pods (within constraints see here) and GKE handles node provisioning for you automatically. Need more? New properly sized nodes will appear. Need less? Nodes will disappear.

From my tests, it seems the cluster is always optimized for your current workloads (which helped us reduce our cost despite resources being more costly in autopilot). This make sense since in this mode you are charged for the resources you ask for, not for the node you need. So not optimizing would cost Google more.

It also greatly simplifies cluster administration: you handle what you deploy and Google takes care of the rest (mostly node security and scaling). So I do advise using this mode if you are new to kubernetes or have little time to spend on sysadmin task. For more information, see the documentation. Beware though: you cannot easily switch from one mode to another and some features are not yet available. Your cluster will also be less optimized: you create resources from 0.25 CPU step and cannot do less unlike customer provisioning. Your requested resources must also be equal to your limit resources. I guess it's the price to pay to simplify node provisioning. For our use cases, it was roughly what we needed anyway.

Here's how I did our migration (we rely on Helm to deploy our project):

  1. Create the new cluster.
  2. Remove services, deployments and load balancers associated with the old cluster so can associate the IP and other resources to our new cluster.
  3. Export of all secrets (they are not managed in the repo for security reason and thus not managed by Helm) and reimport in the new cluster. To do that, you export the secret in a file as YAML, clean the metadata and then run kubectl apply -f secret.yaml.
  4. Redeploy everything.
  5. Delete old cluster.

As a bonus, I also configured ingress BackendConfig to automatically configure the CDN on my resources as well as a FrontendConfig to select my SSL rules and redirect traffic to HTTPS automatically in the ingress. I discovered this feature recently and found it extremely useful although hard to come across. More details here.