Breaking Google and Keeping it 100

How Humio logs 100TB/Day on only 25 nodes, and what it can mean for your business.

November 21st, 2019

Humio is made to handle endless data coming from across the enterprise. So to put it to the test, we set up Humio to process a massive benchmark of 100TB/day. We pushed Humio to the limit on the public cloud and Google broke before we did. (GCP quotas were all that stopped us from doing even more!)

To learn more about the test and find out how it can increase security and affordability, we talked to the engineer behind the 100 TB Benchmark test, Grant Schofield, Humio’s Director of infrastructure.

Why we did it

The test was initially intended as a means to show the public how Humio works in the Cloud. While most of Humio’s customers are on-prem, more and more are asking about our cloud services. Grant talked about our existing customers,

“While they may have an on-prem data center, if they’re doing something new, they’re doing it on major cloud providers, via AWS, Microsoft, or Google… I’ve been to the last few KubeCons and it just keeps getting bigger, and more and more enterprise folks are looking to use Kubernetes.”

The upper limits of max ingest

While Humio consistently provides services for high ingest rates, no one has targeted its upper limits. It was time for Humio to do a cloud-based max ingest test.

“We have customers running larger systems than we do on a daily basis, but nobody had really pushed it past the 50-60TB/day area.”

After a few weeks of testing and missteps, Humio was able to get astounding results.

“We hit around 100 to 150TB of ingest a day. Depending on the event size, anywhere from 2.5 to 5 million events a second.”

We hit our Google quotas!

Humio’s engineers couldn’t help but feel like they left some power on the table though.

“We could take it a little bit further. The reason I had to stop is because we hit our Google (GCP) quotas. So not only on CPUs, you only get 2,000 CPUs and you get by default around 10TB of SSD. I wanted more and they only gave me 50TB. I was limited to those quotas. We’re waiting on Google for additional assets so we can test this on a greater scale.”

That’s right, Humio’s infrastructure handled the max amount of ingest data Google could throw at it, and it was still hungry for more. This was just the start of Humio’s Kubernetes support story. Since then, Humio has developed a Kubernetes helm chart on their public site. If you want, you can easily navigate to the notes page and create your own terrifyingly powerful 100TB/day node.

“With one or two commands you can set up a 100 TB cluster yourself.”

How Humio adds security and affordability

So the power of Humio was proven, but what are the business applications of having such power? The benefits of ingest with no real upper limits are countless for security and affordability. Grant explains how operating Humio in the Cloud can save users money.

“There are actually people running huge Kubernetes clusters - hundreds of workers, thousands of containers that are actually running on spot instances. If you’re able to take it to that level and be super ephemeral, you can actually experience a huge amount of cost savings. At the end, the Cloud is generally not going to be cheaper than On Prem, but when you start using spot instances or preemptible instances, they become way cheaper than On Prem.”

High costs = security risk

The high costs of other systems that are priced based on amount of ingest often force customers to selectively choose what they log, giving them incomplete observability and compromised security.

“Do I log this? Do I log that? What are the tradeoffs of not having that? Do I sample the data? These are all questions that we used to really worry about because of the cost.”

Humio is different

“The way Humio works — and the compression and our approach to it rather than a more classic system that uses big B-tree indexes – we really have this ability to log everything. You no longer have to make those tradeoffs because not only can we ingest tens of terabytes a day, but we can allow you search it really quickly as well.”

The benefits of logging everything

The consequences of logging everything has immediate performance rewards.

“The more we see our customers log everything, they find new things. They find insights sometimes quicker than their security systems find them because they’re actually logging everything.”

Humio’s unlimited ingest capabilities prepare enterprises to handle the logging needs of the future.

“People want to log more things and that isn’t going to change. We want more insights. People are going to want to log more and more.”

To hear more from Grant, listen to the full Humio podcast episode 100TB with Grant Schofield.

We can also be found on Spotify, SoundCloud, Google Play, iTunes, and just about anywhere you get your latest podcasts. Subscribe to get the latest episodes as they are released.

We'd love to know what you think and to get ideas for future episodes! Contact us to share with, and show you give a hoot!

Find out more by reading the full 100 TB/Day Scalability Report.