Live Workshop icon

April 30: Setting up a modern observability stack with Garland Technology, Corelight, and Humio

Humio architecture now optimized for Cloud-Native storage

Humio uses bucket storage to make all data live

March 11th, 2020

Infinitely scalable in size, bucket storage lowers costs by making the most of Humio’s capacity for ingesting hundreds of terabytes of data a day and serving as both short-term and long-term storage.

The recent release of Humio adds support for bucket storage, unlocking lower storage costs while fitting neatly into Humio’s index-free architecture.

As opposed to a file system or a block system, bucket storage’s object system means there’s reduced hierarchy and adjustable metadata. The minimal and flexible design of bucket storage facilitates Humio’s intention to get data into the system as quickly as possible and provide streaming access to it. Bucket storage is the building block that provides the potential for unlimited scalability for data retention in the cloud.

Bucket data is ideal for Humio because it:

  • Works on cloud or self-hosted installations of Humio

  • Is optimized for write-once/read-many-times

  • Is not contingent upon editing files — it will accommodate unchanging log rules

  • Is appropriate for machine-based searching

  • Is infinitely scalable (buckets can be any size)

  • Supports encryption

  • Allows overcommitment of local disks, saving on hosting costs

  • Controls retention time from above easily

  • Eliminates the need for persistent disks

  • Keeps data safe with built-in redundancies

Works on SaaS or self-hosted installations

The maximum benefits of bucket storage come from using it on cloud storage and cloud services, but deployed in S3-compatible hosting or self-hosting also saves on storage costs.

Infinite scale data storage

By design, buckets are infinite in the number of files they can contain and the size of those files. The cost varies based on how much is hosted, but the architecture can be stretched indefinitely. Humio makes unlimited retention possible because its index-free structure is designed for streaming data, the same kind of data bucket storage was designed for.

For self-hosted installations, Humio supports using bucket storage for as much data as needed. It can search even months-old data in less than a second, just like it does with real-time streaming data. Using bucket storage, Humio treats all data as live data.

When you run a search, active data is automatically moved to the NVME drives — the memory and the CPU cache — depending on how frequently it is read.

Secured by encryption

Humio encrypts all copies of data sent to buckets with AES-256 encryption while uploading to ensure that even if read access to the bucket is accidentally allowed, an attacker cannot read any events or other information from the data while it’s in transit. When using a public cloud, no one at the cloud provider can look at the data. We use an encryption key based on the seed key string set in the configuration. Each file gets encrypted using a unique key derived from that seed. The seed key is stored in a global file along with all the other information required to read and decrypt your bucket contents.

Over-commit for infinite local storage

With the addition of bucket storage, Humio manages which segment files are kept on the local file system based on the amount of disk space used, and deletes local files that also exist in bucket storage. This allows more files than the local disk has room for, allowing for infinite storage of events. There are no technical limitations in this over-committing scenario. The only limits are paying for additional bucket storage and potential transfer costs when the files required for a search are not present.

Eliminate persistent disks

By cutting out the need for persistent disks, bucket storage opens up bandwidth on the network and contributes to faster overall performance. In order to do so, it is necessary to configure Kafka to run separately from Humio, and to make it persistent.

Back up without separate backups

Bucket storage allows users to not have to worry about having separate backups for their log storage. Cloud services automatically store backups of your data in multiple locations.

Using bucket storage enables paying for storage just once, and having that data available for queries at any time. With Humio bucket storage, there’s no need to rehydrate or reingest data from backup storage.

The main trade-off is actually an asset

The way bucket storage is designed — with minimal structure — makes it less ideal for data that needs to be rewritten and edited. In almost all cases, log data isn’t supposed to be rewritten. If someone is rewriting log data, it’s likely because they’re trying to hide illicit activity. Because of this design, it makes it easier to prevent editing of data.

Bucket up!

Bucket storage isn’t the only way Humio is optimized to be the fastest, most efficient, and most affordable way to handle log management. Discover how we’ve optimized our search in order to search up to a petabyte of log data in a single second. Read our TCO report to learn how Humio’s index-free architecture affects budgeting costs including hardware, license, and maintenance.

Sign up for a live demo and see how Humio’s index-free design can unlock new insights into your data, scale for future growth in the cloud, and greatly reduce your operating costs.

Start your free trial now, available Self-hosted and SaaS, or request a demo.