Kubernetes Logging Guide:
Advanced Concepts

Arfan Sharif - February 14, 2023

Modern distributed applications are based on microservices architecture, typically running on Kubernetes-managed containers. The architectural shift has also changed how these applications generate logs. Due to the ephemeral nature of Kubernetes Pods, operations teams may not have consistent access to the containers in those Pods to collect the application logs. Application logs are lost whenever a Pod goes down or the orchestrator evicts it. The cluster nodes where the Pods run can also be transient due to the elastic nature of cloud-hosted infrastructure.

In this Kubernetes logging guide, we cover the fundamentals of the Kubernetes logging architecture and some of the use cases. In part one, we covered basic node-level logging and cluster-level logging using a node-level logging agent.

In part two, we will cover cluster-level logging using sidecar patterns and the benefits of centralized logging. We will also introduce Falcon LogScale, a modern log management solution.

Learn More

Explore the complete Kubernetes Logging Guide series:

What Is Cluster-level Logging?

Node-level logging, with the default logging drivers, redirects the output from an application’s stdout and stderr streams to appropriate log files. These files exist only during the lifetime of the Pod. The data is lost when Kubernetes evicts a Pod or performs garbage collection.

Cluster-level logging focuses on aggregating logs using a backend service so that logs exist beyond the lifetime of the Pods and their containers. Kubernetes recommends two methods for implementing cluster-level logging.

The first method uses the DaemonSet pattern with a node-level agent. A single Pod runs in all the nodes and is responsible for capturing logs from all the Pods in that node, shipping them to a backend service. This is the method we covered in Part One of this guide.

The second recommended method involves the sidecar pattern.

What Is the Sidecar Pattern?

The sidecar pattern attaches a container to each application Pod. This container is responsible for capturing all the logs from the Pod. Therefore, the sidecar pattern uses more resources than the DaemonSet pattern. Despite this, a sidecar pattern is popular because it offers a great solution when the following are true of your environment:

  • Applications don’t log their output to stderr and stdout.
  • Logs from different application Pods need to be stored in separate locations.
  • Application Pods output their logs in different formats.

The sidecar pattern has several advantages over the DaemonSet pattern:

  • There’s no need to enforce a single logging format for application containers when a sidecar container collects the logs.
  • The architecture provides good isolation between different application containers.
  • Because the nodes don’t store any logs, bundling a log agent with the sidecar removes the need for log rotation.

Implementing Cluster-level Logging with the Sidecar Pattern

The sidecar pattern uses a companion container with the primary container to collect its log files. There are two ways you can implement this pattern:

  1. Streaming sidecar container
  2. Logging agent

Let’s cover each of these in detail.

Streaming sidecar container

A streaming sidecar has one job: fetch the logs from the application container and write them to node-level directories. A separate node-level agent then fetches the logs from the directories in the node and sends them to a logging backend. This method is suitable for applications that use a non-standard logging method or don’t send their logs to stderr or stdout.

If there’s no need for application-level isolation when capturing logs, the logging agent can run at the node level. This arrangement removes the need for extra resources which would have been present if the logging agent had been bundled with the sidecar.

The snippet below implements a streaming sidecar container.

apiVersion: v1
kind: Pod
metadata:
  name: primary
spec:
  containers:
  - name: primary
    image: busybox:1.28
    args:
    - /bin/sh
    - -c
    - >
      i=0;
      while true;
      do
        echo "$i: $(date)" >> /var/log/server.log;
        i=$((i+1));
        sleep 5;
      done      
    volumeMounts:
    - name: logdir
      mountPath: /var/log
  - name: secondary
    image: busybox:1.28
    args: [/bin/sh, -c, 'tail -n+1 -F /var/log/server.log']
    volumeMounts:
    - name: varlog
      mountPath: /var/log
  volumes:
  - name: logdir
    emptyDir: {}

In the above configuration, we start with a busybox (a lightweight Linux utility image) image and use a simple script to increment a counter every second. The script writes the counter value to a log file called server.log.

Next, we create a sidecar container (named secondary) that tails this log file. The output of the tail command is automatically redirected to stdout, where the default logging driver picks it up and saves it to node-level directories. The node-level logging agent can then access it from those directories and ship it to the logging backend. 

The problem with this approach is that it still doesn’t provide enough application-level isolation. It also lacks the flexibility to handle logs from various Pods differently.

Sidecar pattern with logging agent

The alternative option is to use a logging agent, like Fluentd, embedded in the sidecar container. This arrangement ensures that the logging agent runs at the application level and not at the node level. Naturally, the resource usage is higher than the other implementations discussed here.

The snippet below shows a sidecar configuration with the Fluentd collection agent bundled.

apiVersion: v1
kind: Pod
metadata:
  name: fluentd-sidecar
spec:
  containers:
  - name: primary
    image: busybox:1.28
    args:
    - /bin/sh
    - -c
    - >
      i=0;
      while true;
      do
        echo "$i: $(date)" >> /var/log/server.log;
        i=$((i+1));
        sleep 1;
      done      
    volumeMounts:
    - name: logdir
      mountPath: /var/log
  - name: secondary
        image: fluent/fluentd-kubernetes-daemonset:elasticsearch
        env:
        - name:  FLUENT\_ELASTICSEARCH\_HOST
          value: "<elasticsearch-host-url>"
        - name:  FLUENT\_ELASTICSEARCH\_PORT
          value: "<elasticsearch-host-port>"
        volumeMounts:
        - name: logdir
          mountPath: /var/log
      volumes:
      - name: logdir
        hostPath:
          path: /var/log

After saving the configuration file as fluentd-sidecar.yaml, you can create the Pod with the following command:

kubectl apply -f fluentd-sidecar.yaml

A sidecar with a bundled logging agent can isolate applications better than the one that uses streaming. It’s also flexible enough to handle logs from each application Pod differently.

Benefits of Centralized Logging for Kubernetes Logging Tools

A cluster-level logging setup uses a centralized logging backend to remove the limitations of short-lived container logs. This has clear advantages over basic node-level logging provided by the default configuration. One of these advantages is not storing logs in the node, which means there’s no need for log rotation.

With centralized logging, container logs live beyond the lifespan of the container, the Pod, and the cluster node. Centralized logging platforms also offer the following benefits:

  • Visualization of logs through charts and dashboards
  • Powerful query engines and custom parsing capabilities
  • Automatic correlation of log events to identify anomalies, patterns, and trends
  • Ability to create alerts based on specific log event criteria

You can use many commercial and open-source frameworks and tools for implementing a centralized logging backend. Such logging platforms can work with both DaemonSet and sidecar patterns.

Conclusion

As we have covered in this series, Kubernetes-based applications require a different log management approach. The short-lived nature of Kubernetes containers means logs get lost when a container crashes, a Pod is evicted, or a node goes down. For this reason, node-level logging (which comes by default with Kubernetes) is not an ideal solution, particularly if you want to save the logs for later analysis.

Cluster-level logging using either the node-level agent DaemonSet pattern or sidecar pattern solves this problem. Although the sidecar pattern provides better flexibility than the node-level agent pattern, it uses more resources. Depending on the flexibility you need, you can use either pattern. Both patterns work well with centralized logging platforms.

Log your data with CrowdStrike Falcon Next-Gen SIEM

Elevate your cybersecurity with the CrowdStrike Falcon® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.

Schedule Falcon Next-Gen SIEM Demo

GET TO KNOW THE AUTHOR

Arfan Sharif is a product marketing lead for the Observability portfolio at CrowdStrike. He has over 15 years experience driving Log Management, ITOps, Observability, Security and CX solutions for companies such as Splunk, Genesys and Quest Software. Arfan graduated in Computer Science at Bucks and Chilterns University and has a career spanning across Product Marketing and Sales Engineering.