Downtime, Upgrade, Backup, and Restore in Kubernetes 🚀
Downtime
Node Downtime 🛠️
- If any node is down for more than 5 minutes, the pods will be terminated and recreated on another node if they are part of a replica set.
- The default eviction timeout of a node is 5 minutes but can be changed in the master node controller manager config file.
OS Upgrade and Server Reboot 🔄
Commands for Node Management
-
Drain Node:
- Removes existing pods from the node and moves them to another node for rebooting and maintenance.
kubectl drain node-1
-
Cordon Node:
- Makes the node unschedulable. New pods won't get scheduled on this node, but existing pods remain.
kubectl cordon node-2
-
Uncordon Node:
- Makes the node schedulable and active in the cluster.
kubectl uncordon node-1 kubectl uncordon node-2
Kubernetes Version Conflicts
Important Links 📚
- Kubernetes API Concepts (opens in a new tab)
- API Conventions (opens in a new tab)
- API Changes (opens in a new tab)
Version Compatibility
kubectl
can be one version up or down from the master/Kube API server.- Other components cannot be a higher version than the Kube API server.
- CoreDNS and ETCD can be of any version as they are separate projects.
Upgrade Process
- Upgrade can be done one minor version at a time (e.g., from 1.10 to 1.11 to 1.12).
- First, upgrade the master node, then the worker nodes.
Master Node Upgrade:
- The master goes down during the upgrade, but worker nodes are unaffected.
- Example commands:
kubeadm upgrade plan
apt-get upgrade -y kubeadm=1.12.0-00
kubeadm upgrade apply v1.12.0
kubectl get nodes # Shows old kubelet version
apt-get upgrade kubelet=1.12.0-00
systemctl restart kubelet
kubectl get nodes # Shows new kubelet version
Worker Node Upgrade:
- Add a new node to the cluster with the upgraded version and remove the older nodes.
- This process can be managed easily in cloud environments.
Backup and Restore
Backup All Resources:
kubectl get all --all-namespaces -o yaml > all-resource-backup.conf.yaml
- Better to store all configs in a version control system like GitHub.
ETCD Backup and Restore
- Kubernetes saves the state of the cluster in ETCD.
Create a Snapshot:
export ETCDCTL_API=3
etcdctl snapshot save <snapshot-path> \
--cacert=<ca-cert> --cert=<client-cert> --key=<client-key> \
--endpoints=[127.0.0.1:2379]
Restore a Snapshot:
- Restore the snapshot to a different directory and change the ETCD service path to the new location.
- Commands:
systemctl daemon-reload
systemctl restart etcd
systemctl start kube-apiserver
- ETCD is a TLS-enabled database, so certificates are mandatory for backup steps but not required for restoring snapshots.
Alternative Methods
-
Update ETCD YAML Manifest:
- Change the
--data-dir=<path>
in the manifest file. - If the ETCD pod doesn't start in 2 minutes, delete the pod:
kubectl delete pod etcd-controlplane -n kube-system
- The pod will auto-recreate since it is a static pod.
- Change the
-
Switch Context:
kubectl config view kubectl config use-context <clustername>