Logging with Kubernetes and Humio
Kubernetes is an interesting problem when it comes to logging. With all those containers created and destroyed, logs become the only dependable window into what’s happening, but working with them becomes significantly more complex.
Humio is all about getting straight to the most important detail in your logs, especially when those logs are generated in huge volumes. That’s why we’ve created a integration between Humio and Kubernetes: kubernetes2humio.
Pulling up our socks
On each Kubernetes node, fluentd forwards both application and host level logs to Humio. You can read more about the detail of how it works in our readme.
To show kubernetes2humio in action, I fired up the Sock Shop demo on a Kubernetes cluster hosted in Google Container Engine. If you’ve not used Sock Shop before, it’s a microservices demo app from the people at WeaveWorks and it uses a bunch of containers to run an online sock store. Ideally for our purposes, it also generates a lot of log data.
Here’s Humio’s default search page.
This is the unfiltered view of the logs generated by the Sock Shop containers.
From here, we can write queries to focus on whatever might interest us. When I took this screenshot, I had three test users generating load on the sock store generating around 200,000 log entries in a 15 minute period.
Let’s try a simple query to start with. You might have noticed that each log line starts with a
[namespace/container_name] pattern. The field called k8s.container_name contains the container name and lets us write a simple groupby query to discover which containers are responsible for our sock store.
Using the search box, we can enter:
groupby(k8s.container_name) | sort()
That shows us that we have eight separate containers. We can present the log frequency of each one as a graph, to see which is the noisiest. Again, in the search box:
This shows us that the front-end is responsible for around 40% of all the log queries. Note that the chart was configured to show the frequencies stacked on top of each other and normalized to percents.
Let’s dive a little deeper and try something that might be useful in a real-world implementation. First, we’ll take a look at the log entries generated by the front-end:
If you already work a lot with logs you can probably spot something odd: there are terminal color codes in some of those entries. That could mess things up a bit when we’re trying to extract useful information from the logs.
Some search systems would ask us to specify indexes up-front. That means that we’d need to know what we’re looking for in the logs even before the first entry has been written.
With Humio, it’s different. Humio’s querying is genuinely spontaneous: we can generate whatever query we like and Humio will handle it on the fly. So, spotting something unexpected — like terminal color codes — isn’t a big problem; we just need to write our query accordingly.
We’ll use a regular expression to group together the log entries according to which front-end path was called and then plot the maximum response latency for each one.
front-end | regex(“ (?<path>/\S+) .0m(?<time>\S+) ms”) | timechart(path, function=[max(time)], span=30s)
Now let’s focus even more and look at how consistent response times are when logging in. Using a percentile function we can see that there is an order of magnitude difference between the 50th percentile and 99th:
front-end | regex(“ (?<path>/\S+) .0m(?<time>\S+) ms”) | timechart( function=percentile(time))
This might indicate an issue, depending on your system, and perhaps we should dive even further into the logs to investigate more.
As you can see, Humio makes it easy to query and make sense of the logs generated from a Kubernetes cluster. Over the coming weeks I’ll be announcing more connectors that’ll let you easily feed logs directly into Humio from many common infrastructure tools.