My take on GKE auto-pilot mode

Posted on 2021-10-17 in Programmation

I recently switched from a standard GKE cluster where I had to manage nodes to an autopilot one where Google does it for me. I must say the switch was easy to do (despite an interuption of our services) and greatly simplified the gestion of the cluster.

I noticed this simplification when we added a service to the cluster and I had to change the node configuration to fit our new needs because the workload couldn't be deployed. This resulted in a suboptimal configuration for our cluster (too many nodes) and our node auto-scaling configuration didn't seem to optimize it correctly. With auto-pilot, everything is way more simple: your specify ressources for each pods (within constraints see here) and GKE handles node provisionning for you automatically. Need more? New properly sized nodes will appear. Need less? Nodes will disapear.

From my tests, it seems the cluster is always optimized for your current workloads (which helped us reduce our cost despite ressources being more costly in autopilot). This make sense since in this mode you are charged for the ressources you ask for, not for the node you need. So not optimizing would cost Google more.

It also greatly simplifies cluster administration: you handle what you deploy and Google takes care of the rest (mostly node security and scaling). So I do recommand using this mode if you are new to kubernetes or have little time to spend on sysadmin task. For more information, see the documentation. Beware though: you cannot easily switch from one mode to another and some features are not yet available. Your cluster will also be less optimized: you create ressources from 0.25 CPU step and cannot do less unlike custome provisionning. Your requested ressources must also be equal to your limit ressources. I guess it's the price to pay to simplify node provisionning. For our use cases, it was roughtly what we needed anyway.

Here's how I did our migration (we rely on Helm to deploy our project):

  1. Create the new cluster.
  2. Remove services, deployments and load balancers associated with the old cluster so can associate the IP and other ressources to our new cluster.
  3. Export of all secrets (they are not managed in the repo for security reason and thus not managed by Helm) and reimport in the new cluster. To do that, you export the secret in a file as YAML, clean the medata and then run kubectl apply -f secret.yaml.
  4. Redeploy everything.
  5. Delete old cluster.

As a bonus, I also configured ingress BackendConfig to automatically confure the CDN on my ressources as well as a FrontendConfig to select my SSL rules and redirect traffic to HTTPS automatically in the ingress. I discovered this feature recently and found it extremely useful although hard to come across. More details here.