Kubernetes
Learn Kubernetes
7 Clustermaintainence

Downtime, Upgrade, Backup, and Restore in Kubernetes 🚀

Downtime

Node Downtime 🛠️

  • If any node is down for more than 5 minutes, the pods will be terminated and recreated on another node if they are part of a replica set.
  • The default eviction timeout of a node is 5 minutes but can be changed in the master node controller manager config file.

OS Upgrade and Server Reboot 🔄

Commands for Node Management

  1. Drain Node:

    • Removes existing pods from the node and moves them to another node for rebooting and maintenance.
    kubectl drain node-1
  2. Cordon Node:

    • Makes the node unschedulable. New pods won't get scheduled on this node, but existing pods remain.
    kubectl cordon node-2
  3. Uncordon Node:

    • Makes the node schedulable and active in the cluster.
    kubectl uncordon node-1
    kubectl uncordon node-2

Kubernetes Version Conflicts

Important Links 📚

Version Compatibility

  • kubectl can be one version up or down from the master/Kube API server.
  • Other components cannot be a higher version than the Kube API server.
  • CoreDNS and ETCD can be of any version as they are separate projects.

Upgrade Process

  • Upgrade can be done one minor version at a time (e.g., from 1.10 to 1.11 to 1.12).
  • First, upgrade the master node, then the worker nodes.

Master Node Upgrade:

  • The master goes down during the upgrade, but worker nodes are unaffected.
  • Example commands:
kubeadm upgrade plan
apt-get upgrade -y kubeadm=1.12.0-00
kubeadm upgrade apply v1.12.0
kubectl get nodes  # Shows old kubelet version
apt-get upgrade kubelet=1.12.0-00
systemctl restart kubelet
kubectl get nodes  # Shows new kubelet version

Worker Node Upgrade:

  • Add a new node to the cluster with the upgraded version and remove the older nodes.
  • This process can be managed easily in cloud environments.

Backup and Restore

Backup All Resources:

kubectl get all --all-namespaces -o yaml > all-resource-backup.conf.yaml
  • Better to store all configs in a version control system like GitHub.

ETCD Backup and Restore

  • Kubernetes saves the state of the cluster in ETCD.
Create a Snapshot:
export ETCDCTL_API=3
etcdctl snapshot save <snapshot-path> \
  --cacert=<ca-cert> --cert=<client-cert> --key=<client-key> \
  --endpoints=[127.0.0.1:2379]
Restore a Snapshot:
  • Restore the snapshot to a different directory and change the ETCD service path to the new location.
  • Commands:
systemctl daemon-reload
systemctl restart etcd
systemctl start kube-apiserver
  • ETCD is a TLS-enabled database, so certificates are mandatory for backup steps but not required for restoring snapshots.

Alternative Methods

  1. Update ETCD YAML Manifest:

    • Change the --data-dir=<path> in the manifest file.
    • If the ETCD pod doesn't start in 2 minutes, delete the pod:
    kubectl delete pod etcd-controlplane -n kube-system
    • The pod will auto-recreate since it is a static pod.
  2. Switch Context:

    kubectl config view
    kubectl config use-context <clustername>

View ETCD Cluster Members:


🧙 AI Wizard - Instant Page Insights

Click the button below to analyze this page.
Get an AI-generated summary and key insights in seconds.
Powered by Perplexity AI!