In continuation of this series, where I left off in Part 3 was with a single-master kubernetes cluster build. That obviously is not so great for real-world implementations, and is meant for quick proof-of-concept builds, as part of learning the technology or rapid prototyping. In order to make Kubernetes reliable, the control plane needs to be resilient and reliable. Towards that purpose, we would have to deploy a multi-master architecture. With k8s distributions like OpenShift, Rancher, or public cloud flavors like AKS, GKE, EKS etc, the control plane is built to be highly available. I’d strongly recommend using one of the various options available for enterprise deployments. But the purpose for this series is to kick the proverbial tires, so let us delve a little deeper into what would make the control plane highly available.

In the image above, I’ve tried to capture as much information as is meaningfully possible in a single graphic, as to the way in which the Kubernetes Control Plane can be made highly available. The official k8s documentation does a good job of illustrating this, the primary difference between two ways in which they outline the deployments is with a stacked etcd topology, or an external etcd topology. By that, what they mean is, the etcd ensemble can be part of the kubernetes control plane, or it can be external to the kubernetes control plane. In this series, I opted to use the stacked etcd topology. Further more, the “HA” aspect is a result of both the etcd component being highly available, and the requests to the API service instances being “highly available”. Having multiple etcd instances will ensure that the backing store is highly available. In order to make the API service highly available, we have to employ a load balancer, which is where HAproxy comes into the picture. But just having a load balancer shunting traffic over to the API server instances on the master nodes is not sufficient, as the health of the API endpoints needs to be evaluated, to determine the viability of a particular endpoint. For that purpose, a keepalived instance is used, whose job is to query the API server instances for their health.
Follow the instructions here —
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability
And here —
There are various modes in which the haproxy and keepalived services can be run to provide the HA functionality in this excellent blog here —
https://thebsdbox.co.uk/2020/01/02/Designing-Building-HA-bare-metal-Kubernetes-cluster
I opted to run haproxy and keepalived only on the first k8s master node. This required a few prerequisite steps —
- Install HAproxy and keepalived on the first master node
- Set up a VIP — in my case, my cluster comprises of 192.168.7.11-15, so I decided to use 192.168.7.10 as the VIP
- Configure the HAproxy and keepalived configuration files —
For haproxy —
$ cat /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
log /dev/log local0
log /dev/log local1 notice
daemon
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 1
timeout http-request 10s
timeout queue 20s
timeout connect 5s
timeout client 20s
timeout server 20s
timeout http-keep-alive 10s
timeout check 10s
#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
bind *:6443
mode tcp
option tcplog
default_backend apiserver
#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
option httpchk GET /healthz
http-check expect status 200
mode tcp
option ssl-hello-chk
balance roundrobin
server master 192.168.7.11:8443 check
server backup1 192.168.7.12:8443 check
server backup2 192.168.7.13:8443 check
And keepalived —
$ cat /etc/keepalived/keepalived.conf
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
router_id LVS_DEVEL
}
vrrp_script check_apiserver {
script "/etc/keepalived/check_apiserver.sh"
interval 3
weight -2
fall 10
rise 2
}
vrrp_instance VI_1 {
state MASTER
interface enp0s3
virtual_router_id 11
priority 21
authentication {
auth_type PASS
auth_pass 11
}
virtual_ipaddress {
192.168.7.10
}
track_script {
check_apiserver
}
}
And the service check script for keepalived –
$ cat /etc/keepalived/check_apiserver.sh
#!/bin/sh
errorExit() {
echo "*** $*" 1>&2
exit 1
}
curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q 192.168.7.10; then
curl --silent --max-time 2 --insecure https://192.168.7.10:6443/ -o /dev/null || errorExit "Error GET https://192.168.7.10:6443/"
fi
The idea is for haproxy and keepalived to run as static pods in the kubernetes cluster. In order to achieve that, we have to pre-stage the manifest yaml files in /etc/kubernetes/manifests
Haproxy.yaml
$ cat /etc/kubernetes/manifests/haproxy.yaml
apiVersion: v1
kind: Pod
metadata:
name: haproxy
namespace: kube-system
spec:
containers:
- image: haproxy:2.1.4
name: haproxy
livenessProbe:
failureThreshold: 8
httpGet:
host: localhost
path: /healthz
port: 6443
scheme: HTTPS
volumeMounts:
- mountPath: /usr/local/etc/haproxy/haproxy.cfg
name: haproxyconf
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/haproxy/haproxy.cfg
type: FileOrCreate
name: haproxyconf
status: {}
keepalived.yaml
$ cat /etc/kubernetes/manifests/keepalived.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
name: keepalived
namespace: kube-system
spec:
containers:
- image: osixia/keepalived:1.3.5-1
name: keepalived
resources: {}
securityContext:
capabilities:
add:
- NET_ADMIN
- NET_BROADCAST
- NET_RAW
volumeMounts:
- mountPath: /usr/local/etc/keepalived/keepalived.conf
name: config
- mountPath: /etc/keepalived/check_apiserver.sh
name: check
hostNetwork: true
volumes:
- hostPath:
path: /etc/keepalived/keepalived.conf
name: config
- hostPath:
path: /etc/keepalived/check_apiserver.sh
name: check
status: {}
At this point, if everything goes well, when the kubeadm init command is used, specifying a different bind port, in our case, 8443, since 6443 will be used by HAProxy as the front-end port on the first master node, the cluster should be ready to go.
$ sudo kubeadm init --control-plane-endpoint "192.168.7.10:6443" --apiserver-bind-port 8443 --pod-network-cidr=10.244.0.0/16 --upload-certs
<snipped for readability>
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Let’s check if the haproxy and keepalived static pods are running —
$ kubectl get po --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default mysql-1605200795-6949fc588-sps8w 1/1 Running 0 120m
default ubuntu 1/1 Running 0 117m
kube-system coredns-f9fd979d6-j9gqp 1/1 Running 0 2d
kube-system coredns-f9fd979d6-sh2sg 1/1 Running 0 2d
kube-system etcd-k8s01 1/1 Running 0 2d
kube-system etcd-k8s02 1/1 Running 0 2d
kube-system etcd-k8s03 1/1 Running 0 2d
kube-system haproxy-k8s01 1/1 Running 0 2d
kube-system keepalived-k8s01 1/1 Running 0 2d
kube-system kube-apiserver-k8s01 1/1 Running 0 2d
kube-system kube-apiserver-k8s02 1/1 Running 0 2d
kube-system kube-apiserver-k8s03 1/1 Running 0 2d
kube-system kube-controller-manager-k8s01 1/1 Running 1 2d
kube-system kube-controller-manager-k8s02 1/1 Running 0 2d
kube-system kube-controller-manager-k8s03 1/1 Running 0 2d
kube-system kube-flannel-ds-2547z 1/1 Running 1 2d
kube-system kube-flannel-ds-6ql5v 1/1 Running 0 2d
kube-system kube-flannel-ds-crzmt 1/1 Running 2 2d
kube-system kube-flannel-ds-hgwzb 1/1 Running 0 2d
kube-system kube-flannel-ds-knh9t 1/1 Running 0 2d
kube-system kube-proxy-np4tp 1/1 Running 0 2d
kube-system kube-proxy-pcbhx 1/1 Running 0 2d
kube-system kube-proxy-wdmvk 1/1 Running 0 2d
kube-system kube-proxy-xzx52 1/1 Running 0 2d
kube-system kube-proxy-z75qk 1/1 Running 0 2d
kube-system kube-scheduler-k8s01 1/1 Running 1 2d
kube-system kube-scheduler-k8s02 1/1 Running 0 2d
kube-system kube-scheduler-k8s03 1/1 Running 0 2d
Network Plugin installation
We installed flannel —
https://coreos.com/flannel/docs/latest/kubernetes.html
Important for Flannel to work, the pod-cidr-network needs to be 10.244.0.0/16
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Check if DNS is working —
$ kubectl run -i --tty ubuntu --image=ubuntu:16.04 --restart=Never -- bash -il
# apt-get install dnsutils -y
# # nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
# nslookup google.com
Server: 10.96.0.10
Address: 10.96.0.10#53
Non-authoritative answer:
Name: google.com
Address: 216.58.192.206
And then followed through with the rest of the steps provided here —
You can now join any number of the control-plane node running the following command on each as root:
kubeadm join 192.168.7.10:6443 --token 78of9v.drvt98ld2swwykk3 \
--discovery-token-ca-cert-hash sha256:490c9c4fcd367e06f0bde7254435efca8fce3fe21d5cc9b4043ab98083baa4f2 \
--control-plane --certificate-key a15a0a2230b2c73a3d7b768e6c0332a427e26b230042a68ef4b4c16e602ad943
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.7.10:6443 --token 78of9v.drvt98ld2swwykk3 \
--discovery-token-ca-cert-hash sha256:490c9c4fcd367e06f0bde7254435efca8fce3fe21d5cc9b4043ab98083baa4f2
Check status —
$ kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s01 Ready master 47h v1.19.3 192.168.7.11 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.13
k8s02 Ready master 47h v1.19.3 192.168.7.12 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.13
k8s03 Ready master 47h v1.19.3 192.168.7.13 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.13
k8s04 Ready <none> 47h v1.19.3 192.168.7.14 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.11
k8s05 Ready <none> 47h v1.19.3 192.168.7.15 <none> CentOS Linux 7 (Core) 3.10.0-1127.19.1.el7.x86_64 docker://19.3.11
- Platform Automation and Infrastructure As Code – 3
- Setting up a log management system on Kubernetes
- Installing metrics-server on a kubeadm managed k8s cluster
- Platform Automation and Infrastructure as Code – 2