vintage, helm, sextant

Kicking the Tires on Kubernetes – Part 4

In continuation of this series, where I left off in Part 3 was with a single-master kubernetes cluster build. That obviously is not so great for real-world implementations, and is meant for quick proof-of-concept builds, as part of learning the technology or rapid prototyping. In order to make Kubernetes reliable, the control plane needs to be resilient and reliable. Towards that purpose, we would have to deploy a multi-master architecture. With k8s distributions like OpenShift, Rancher, or public cloud flavors like AKS, GKE, EKS etc, the control plane is built to be highly available. I’d strongly recommend using one of the various options available for enterprise deployments. But the purpose for this series is to kick the proverbial tires, so let us delve a little deeper into what would make the control plane highly available.

In the image above, I’ve tried to capture as much information as is meaningfully possible in a single graphic, as to the way in which the Kubernetes Control Plane can be made highly available. The official k8s documentation does a good job of illustrating this, the primary difference between two ways in which they outline the deployments is with a stacked etcd topology, or an external etcd topology. By that, what they mean is, the etcd ensemble can be part of the kubernetes control plane, or it can be external to the kubernetes control plane. In this series, I opted to use the stacked etcd topology. Further more, the “HA” aspect is a result of both the etcd component being highly available, and the requests to the API service instances being “highly available”. Having multiple etcd instances will ensure that the backing store is highly available. In order to make the API service highly available, we have to employ a load balancer, which is where HAproxy comes into the picture. But just having a load balancer shunting traffic over to the API server instances on the master nodes is not sufficient, as the health of the API endpoints needs to be evaluated, to determine the viability of a particular endpoint. For that purpose, a keepalived instance is used, whose job is to query the API server instances for their health.

Follow the instructions here —

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability

And here —

https://github.com/kubernetes/kubeadm/blob/master/docs/ha-considerations.md#options-for-software-load-balancing

There are various modes in which the haproxy and keepalived services can be run to provide the HA functionality in this excellent blog here —

https://thebsdbox.co.uk/2020/01/02/Designing-Building-HA-bare-metal-Kubernetes-cluster

I opted to run haproxy and keepalived only on the first k8s master node. This required a few prerequisite steps —

  • Install HAproxy and keepalived on the first master node
  • Set up a VIP — in my case, my cluster comprises of 192.168.7.11-15, so I decided to use 192.168.7.10 as the VIP
  • Configure the HAproxy and keepalived configuration files —

For haproxy —

$ cat /etc/haproxy/haproxy.cfg 
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
    log /dev/log local0
    log /dev/log local1 notice
    daemon

#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
    mode                    http
    log                     global
    option                  httplog
    option                  dontlognull
    option http-server-close
    option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 1
    timeout http-request    10s
    timeout queue           20s
    timeout connect         5s
    timeout client          20s
    timeout server          20s
    timeout http-keep-alive 10s
    timeout check           10s

#---------------------------------------------------------------------
# apiserver frontend which proxys to the masters
#---------------------------------------------------------------------
frontend apiserver
    bind *:6443
    mode tcp
    option tcplog
    default_backend apiserver

#---------------------------------------------------------------------
# round robin balancing for apiserver
#---------------------------------------------------------------------
backend apiserver
    option httpchk GET /healthz
    http-check expect status 200
    mode tcp
    option ssl-hello-chk
    balance     roundrobin
        server master 192.168.7.11:8443 check
        server backup1 192.168.7.12:8443 check
        server backup2 192.168.7.13:8443 check

And keepalived —

$ cat /etc/keepalived/keepalived.conf 
! /etc/keepalived/keepalived.conf
! Configuration File for keepalived
global_defs {
    router_id LVS_DEVEL
}
vrrp_script check_apiserver {
  script "/etc/keepalived/check_apiserver.sh"
  interval 3
  weight -2
  fall 10
  rise 2
}

vrrp_instance VI_1 {
    state MASTER
    interface enp0s3
    virtual_router_id 11
    priority 21
    authentication {
        auth_type PASS
        auth_pass 11
    }
    virtual_ipaddress {
        192.168.7.10
    }
    track_script {
        check_apiserver
    }
}

And the service check script for keepalived –

$ cat /etc/keepalived/check_apiserver.sh 
#!/bin/sh

errorExit() {
    echo "*** $*" 1>&2
    exit 1
}

curl --silent --max-time 2 --insecure https://localhost:6443/ -o /dev/null || errorExit "Error GET https://localhost:6443/"
if ip addr | grep -q 192.168.7.10; then
    curl --silent --max-time 2 --insecure https://192.168.7.10:6443/ -o /dev/null || errorExit "Error GET https://192.168.7.10:6443/"
fi

The idea is for haproxy and keepalived to run as static pods in the kubernetes cluster. In order to achieve that, we have to pre-stage the manifest yaml files in /etc/kubernetes/manifests

Haproxy.yaml

$ cat /etc/kubernetes/manifests/haproxy.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: haproxy
  namespace: kube-system
spec:
  containers:
  - image: haproxy:2.1.4
    name: haproxy
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: localhost
        path: /healthz
        port: 6443
        scheme: HTTPS
    volumeMounts:
    - mountPath: /usr/local/etc/haproxy/haproxy.cfg
      name: haproxyconf
      readOnly: true
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/haproxy/haproxy.cfg
      type: FileOrCreate
    name: haproxyconf
status: {}

keepalived.yaml 

$ cat /etc/kubernetes/manifests/keepalived.yaml 
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  name: keepalived
  namespace: kube-system
spec:
  containers:
  - image: osixia/keepalived:1.3.5-1
    name: keepalived
    resources: {}
    securityContext:
      capabilities:
        add:
        - NET_ADMIN
        - NET_BROADCAST
        - NET_RAW
    volumeMounts:
    - mountPath: /usr/local/etc/keepalived/keepalived.conf
      name: config
    - mountPath: /etc/keepalived/check_apiserver.sh
      name: check
  hostNetwork: true
  volumes:
  - hostPath:
      path: /etc/keepalived/keepalived.conf
    name: config
  - hostPath:
      path: /etc/keepalived/check_apiserver.sh
    name: check
status: {}

At this point, if everything goes well, when the kubeadm init command is used, specifying a different bind port, in our case, 8443, since 6443 will be used by HAProxy as the front-end port on the first master node, the cluster should be ready to go.

$ sudo kubeadm init --control-plane-endpoint "192.168.7.10:6443" --apiserver-bind-port 8443 --pod-network-cidr=10.244.0.0/16 --upload-certs

<snipped for readability>

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Let’s check if the haproxy and keepalived static pods are running —


$ kubectl get po --all-namespaces
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
default       mysql-1605200795-6949fc588-sps8w   1/1     Running   0          120m
default       ubuntu                             1/1     Running   0          117m
kube-system   coredns-f9fd979d6-j9gqp            1/1     Running   0          2d
kube-system   coredns-f9fd979d6-sh2sg            1/1     Running   0          2d
kube-system   etcd-k8s01                         1/1     Running   0          2d
kube-system   etcd-k8s02                         1/1     Running   0          2d
kube-system   etcd-k8s03                         1/1     Running   0          2d
kube-system   haproxy-k8s01                      1/1     Running   0          2d
kube-system   keepalived-k8s01                   1/1     Running   0          2d
kube-system   kube-apiserver-k8s01               1/1     Running   0          2d
kube-system   kube-apiserver-k8s02               1/1     Running   0          2d
kube-system   kube-apiserver-k8s03               1/1     Running   0          2d
kube-system   kube-controller-manager-k8s01      1/1     Running   1          2d
kube-system   kube-controller-manager-k8s02      1/1     Running   0          2d
kube-system   kube-controller-manager-k8s03      1/1     Running   0          2d
kube-system   kube-flannel-ds-2547z              1/1     Running   1          2d
kube-system   kube-flannel-ds-6ql5v              1/1     Running   0          2d
kube-system   kube-flannel-ds-crzmt              1/1     Running   2          2d
kube-system   kube-flannel-ds-hgwzb              1/1     Running   0          2d
kube-system   kube-flannel-ds-knh9t              1/1     Running   0          2d
kube-system   kube-proxy-np4tp                   1/1     Running   0          2d
kube-system   kube-proxy-pcbhx                   1/1     Running   0          2d
kube-system   kube-proxy-wdmvk                   1/1     Running   0          2d
kube-system   kube-proxy-xzx52                   1/1     Running   0          2d
kube-system   kube-proxy-z75qk                   1/1     Running   0          2d
kube-system   kube-scheduler-k8s01               1/1     Running   1          2d
kube-system   kube-scheduler-k8s02               1/1     Running   0          2d
kube-system   kube-scheduler-k8s03               1/1     Running   0          2d


Network Plugin installation

We installed flannel —

https://coreos.com/flannel/docs/latest/kubernetes.html

Important for Flannel to work, the pod-cidr-network needs to be 10.244.0.0/16

$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Check if DNS is working —

$ kubectl run -i --tty ubuntu --image=ubuntu:16.04 --restart=Never -- bash -il
# apt-get install dnsutils -y
# # nslookup kubernetes.default
Server:		10.96.0.10
Address:	10.96.0.10#53

Name:	kubernetes.default.svc.cluster.local
Address: 10.96.0.1
# nslookup google.com
Server:		10.96.0.10
Address:	10.96.0.10#53

Non-authoritative answer:
Name:	google.com
Address: 216.58.192.206

And then followed through with the rest of the steps provided here —

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join 192.168.7.10:6443 --token 78of9v.drvt98ld2swwykk3 \
    --discovery-token-ca-cert-hash sha256:490c9c4fcd367e06f0bde7254435efca8fce3fe21d5cc9b4043ab98083baa4f2 \
    --control-plane --certificate-key a15a0a2230b2c73a3d7b768e6c0332a427e26b230042a68ef4b4c16e602ad943

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.7.10:6443 --token 78of9v.drvt98ld2swwykk3 \
    --discovery-token-ca-cert-hash sha256:490c9c4fcd367e06f0bde7254435efca8fce3fe21d5cc9b4043ab98083baa4f2 

Check status —

$ kubectl get no -o wide
NAME    STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                CONTAINER-RUNTIME
k8s01   Ready    master   47h   v1.19.3   192.168.7.11   <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.13
k8s02   Ready    master   47h   v1.19.3   192.168.7.12   <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.13
k8s03   Ready    master   47h   v1.19.3   192.168.7.13   <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.13
k8s04   Ready    <none>   47h   v1.19.3   192.168.7.14   <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.11
k8s05   Ready    <none>   47h   v1.19.3   192.168.7.15   <none>        CentOS Linux 7 (Core)   3.10.0-1127.19.1.el7.x86_64   docker://19.3.11

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.