Downtime, Upgrade, Backup, and Restore in Kubernetes 🚀

Downtime

Node Downtime 🛠️

If any node is down for more than 5 minutes, the pods will be terminated and recreated on another node if they are part of a replica set.
The default eviction timeout of a node is 5 minutes but can be changed in the master node controller manager config file.

OS Upgrade and Server Reboot 🔄

Commands for Node Management

Drain Node:
- Removes existing pods from the node and moves them to another node for rebooting and maintenance.
```
kubectl drain node-1
```
Cordon Node:
- Makes the node unschedulable. New pods won't get scheduled on this node, but existing pods remain.
```
kubectl cordon node-2
```
Uncordon Node:
- Makes the node schedulable and active in the cluster.
```
kubectl uncordon node-1
kubectl uncordon node-2
```

Kubernetes Version Conflicts

Important Links 📚

Version Compatibility

kubectl can be one version up or down from the master/Kube API server.
Other components cannot be a higher version than the Kube API server.
CoreDNS and ETCD can be of any version as they are separate projects.

Upgrade Process

Upgrade can be done one minor version at a time (e.g., from 1.10 to 1.11 to 1.12).
First, upgrade the master node, then the worker nodes.

Master Node Upgrade:

The master goes down during the upgrade, but worker nodes are unaffected.
Example commands:

kubeadm upgrade plan
apt-get upgrade -y kubeadm=1.12.0-00
kubeadm upgrade apply v1.12.0
kubectl get nodes  # Shows old kubelet version
apt-get upgrade kubelet=1.12.0-00
systemctl restart kubelet
kubectl get nodes  # Shows new kubelet version

Worker Node Upgrade:

Add a new node to the cluster with the upgraded version and remove the older nodes.
This process can be managed easily in cloud environments.

Backup and Restore

Backup All Resources:

kubectl get all --all-namespaces -o yaml > all-resource-backup.conf.yaml

Better to store all configs in a version control system like GitHub.

ETCD Backup and Restore

Kubernetes saves the state of the cluster in ETCD.

Create a Snapshot:

export ETCDCTL_API=3
etcdctl snapshot save <snapshot-path> \
  --cacert=<ca-cert> --cert=<client-cert> --key=<client-key> \
  --endpoints=[127.0.0.1:2379]

Restore a Snapshot:

Restore the snapshot to a different directory and change the ETCD service path to the new location.
Commands:

systemctl daemon-reload
systemctl restart etcd
systemctl start kube-apiserver

ETCD is a TLS-enabled database, so certificates are mandatory for backup steps but not required for restoring snapshots.

Alternative Methods

Update ETCD YAML Manifest:
- Change the --data-dir=<path> in the manifest file.
- If the ETCD pod doesn't start in 2 minutes, delete the pod:
```
kubectl delete pod etcd-controlplane -n kube-system
```
- The pod will auto-recreate since it is a static pod.

Switch Context:

kubectl config view
kubectl config use-context <clustername>

View ETCD Cluster Members:

6 Applifecyclemgmt 8 Security