Setting up a log management system on Kubernetes

A Kubernetes cluster has various components that are constantly emitting logs — at the node level, at the pod level, application log data and so on. A very standard model for implementing a centralized log management framework is to use the combination of Elasticsearch, Fluentd and Kibana (aka an EFK Stack). There are many good blogs that cover how to do this, such as Nonetheless, I wanted to cover it here in the ongoing Kubernetes series for completeness. I am also creating associated code, configuration snippets which I will share as I go along on this journey.

In the old world of bare-metal nodes and virtual machines, it was easy enough to centralize and manage log data — for example, a very typical model would be to use a centralized syslog instance such as syslog-ng or rsyslog, and configure the managed hosts to write their log data to this centralized instance. It was also fairly trivial to set up a log indexing system like Splunk and have it index all the log data from this central log server. This is not so easily done in Kubernetes, and therefore requires a bit more intricate configuration.

The image above is a high-level logical representation of what an implementation of the EFK stack looks like.

Fluentd is run as a DaemonSet on the Kubernetes cluster and is configured to read log files by specifying the directory paths to include. This way, a single pod of Fluentd is scheduled per node, and it constantly reads the files under the defined directory paths (such as /var/log for node-level logs, /var/lib/docker/containers for container logs, and so on). The logs are then streamed to an elasticsearch cluster running as a StatefulSet within the k8s cluster. ElasticSearch is responsible for indexing the log data as it streams in, and therefore making the data searchable.

Additionally, Kibana is deployed within the k8s cluster and connected to the elasticsearch cluster using an URL. Kibana provides the user interface to this stack – to set up indexes, query the data, create dashboards, visualizations etc.

For application logs, an established practice is to run a side-car container alongside the main application container in each pod, which will read application log data and output it to the container STDOUT. This gets picked up by fluentd and is indexed as well. This way, a “haystack” combining all the log data in the given environment, and the combination of elasticsearch and kibana can then be used to search the said haystack.

As an example —

$ cat loggingpattern.yml 
apiVersion: v1
kind: Pod
   name: logging-sidecar
   namespace: default
  - name: shared-data
    emptyDir: {}
  - name: busybox1
    image: busybox
    command: ['sh', '-c', 'while true; do echo $RANDOM Logging data > /output/output.log; sleep 5; done']
    - name: shared-data
      mountPath: /output

  - name: sidecar
    image: busybox
    command: ['sh', '-c', 'tail -f /input/output.log']
    - name: shared-data
      mountPath: /input

$ kubectl apply -f loggingpattern.yml 
pod/logging-sidecar created

$ kubectl get po
logging-sidecar   2/2     Running   0          9s

And then query for this data using the pod name —

The associated configuration for this setup can be found here — Feel free to clone/fork/modify.

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.