Mesos and DC/OS logs in Humio

Martin W. Lassen
October 25, 2017

Having focused our efforts increasingly on additional integrations, we’ve released the first beta version of our Mesos framework. For this first iteration, there’s one very clear goal:

Forward all task logs to Humio

This integration comes in addition to our plug-in for Kubernetes and supports both plain Mesos and DC/OS. It’s therefore an important step in realising our goal of providing integrations for the majority of orchestrators.

https://www.youtube.com/watch?v=6hYo_FXMljc

Installation

For the purpose of demonstrating the framework, I’ve installed the Shock Shop Demo into a DC/OS cluster.

To ease the process of setting up the Humio agent, we’ve released it into the DC/OS Universe, which offers a very easy point -and click wizard. All you need to do is to create an account on humio.com with a dataspace and an ingest token.

A feature we’re especially proud of is the framework’s ability to expand and shrink together with the cluster. Meaning that if you add another node, the Humio Agent is installed on the node ready to start streaming within seconds.

Configuration

When installation is complete, the agent will start streaming all logs immediately so you, straight away, can search for something like

groupby(mesos_service_id)

to see all applications.

Currently, all task logs are annotated with the following fields:

  • mesos_framework_id
  • mesos_framework_name
  • mesos_task_id
  • mesos_slave_id

Tasks running DC/OS clusters are also annotated with:

  • mesos_service_id

Those are all fairly static fields that can’t be changed. Therefore, we’ve added the ability to configure tasks through Mesos Task Labels.

First of all, a task’s logs can be ignored by setting the HUMIO_IGNORE label to true. Secondly, you can change the log type with the HUMIO_TYPE label by setting it to the name of the Humio parser you want to use.

Finally, Humio offers more advanced configuration of multiline fields with the HUMIO_MULTILINE_ labels. Please see documentation for more details.

Querying logs

When maintaining a Mesos cluster, it’s very interesting to know what’s going on with your tasks in your cluster, i.e. tasks are failing. The Mesos agent is writing a lot of interesting things in the Mesos agent log file. Searching for

@source="/var/log/mesos/mesos-agent.log"

will reveal the whole log on all agents in the cluster. So still a bit of a needle in a haystack. To find updates on task status, you can search for “Received status update” and pick out the task status, eventually task name too, and finally plot it into a time chart:

@source="/var/log/mesos/mesos-agent.log" "Received status update"
| regex("update TASK_(?<task_state>\S+?)\s")
| regex("for task\s(?<task_name>\S+?)\s")
| timechart(task_state)
 
Mesos and DCOS logs in Humio2.png
Task status update for the last 7 days

It’s definitely very interesting to dig into what happened the other day when roughly 150 task failed in a very short time. Just hit “Event list” to reveal the underlying events and find the interesting ones and eventually narrow down the time span by selecting a shorter span in the Time Line plot.

Having uptime services that ping you if your system is down is very helpfull, but you should also check them. Personally, I have a simple nginx deploy exposed to the public via marathon-lb.

The nginx is deployed as /health, which is conveniently picked up by the Humio agent as the mesos_service_id field. UptimeRobot presents it’s client as something “UptimeRobot”, so combining the two and piping it into a TimeChart should hopefully reveal something like what you’re seeing below:

mesos_service_id=/health UptimeRobot | timechart(mesos_slave_id)
 
Mesos and DCOS logs in Humio.png
UptimeRobot requests across all Mesos agents

Although there are a few gaps here and there, it certainly looks like there’s a very even distribution of request across all nodes in the cluster.

Having run through our newest integration, it’s time to try it out yourself for free. For more information on how to get started, here’s a thorough guide.Humio is available both as an on-premise installation and as a managed cloud service.

Content Offer

An introductory guide to inbound marketing

Get to grips with marketing in the digital age

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse varius enim in eros elementum tristique. Duis cursus, mi quis viverra ornare, eros dolor interdum nulla, ut commodo diam libero vitae erat.

Download Guide
Comments

Recent Posts