What is Log Aggregation?
Log aggregation is the mechanism for capturing, normalizing, and consolidating logs from different sources to a centralized platform for correlating and analyzing the data. This aggregated data then acts as a single source of truth for different use cases, including troubleshooting application performance issues or errors, identifying infrastructure bottlenecks, or finding the root cause of a cyberattack.
In this article, we will learn about the need for log aggregation, the steps involved in log aggregation, and the types of logs you should collect. We’ll also consider the features you need to look for when choosing a log aggregation and management platform.
Why Log Aggregation?
Log aggregation enables you to gather events from disparate sources into a single place so that you can search, analyze, and make sense of that data. Not only is log aggregation foundational to end-to-end observability, but it is useful in a variety of applications, including:
What’s Involved in Log Aggregation?
There are several steps involved when aggregating logs from different sources and analyzing them.
Identifying log sources
Modern distributed enterprise applications have many moving pieces, so you need to identify all the components you want to aggregate logs from. To keep logs manageable, you could choose to only capture certain types of events (such as failed login attempts or queries taking more time than a set threshold) or specific levels of importance.
For example, you can choose to collect all failed connection attempts from your network intrusion detection system (NIDS) while only collecting critical error messages about crashing pods from your Kubernetes cluster.
The next step after identifying log sources is to collect those logs. Log collection should be automatic. There are multiple ways to collect logs, which include the following:
Applications can use standard message logging protocols like Syslog to stream their logs continuously to a centralized system.
You can install custom integrations and collectors (also known as agents) on servers that read logs from the local machine and send them to the logging platform.
Code instrumentation captures messages from specific program parts, which often depends on the specific error conditions encountered.
Log management systems can directly access source systems and copy log files over the network.
Logs need to be parsed before they can be used to derive meaningful insights. Parsing is the process of extracting key pieces of information from each logged event and putting them into a common format. These values are then stored for later analysis. Logs can be quite large and contain lots of useless data. Parsing extracts only the relevant pieces of data while discarding the rest.
One example of parsing is mapping original timestamps to the values of a single time zone. Timestamps are critical metadata related to an event, and you can have different timestamps in your logs depending on your log sources.
A parser can extract other important pieces of information, such as usernames, source and destination IP addresses, the network protocol used, and the actual message of the log. For example, parsing can also filter out data to keep only ERROR and WARNING type events, while excluding anything less severe.
After parsing, log aggregation can perform some other actions in processing the inputs.
Indexing builds a map of the parsed and stored data based on a column, similar to a database index. Indexing makes querying logs easier and faster. Unique indexes also eliminate duplicate log data.
Data enrichment can also be very helpful for gaining further insight from your logs. Some examples of data enrichment include:
Adding geolocation to your log data from IP addresses
Replacing HTTP status codes with their actual messages
Including operating system and web browser details
Masking is when sensitive data like encryption keys, personal information, or authentication tokens and credentials are redacted from logged messages.
Most log management platforms compress the parsed, indexed, and enriched logs before storing them. Compression reduces the network bandwidth and storage cost for logs. Typically, compression uses a proprietary format.
When aggregating logs, you also need to set their retention policies. Retention policies dictate how long logs should be stored. This can depend on multiple factors such as storage space available, industry requirements, or organizational policies. Additionally, different types of logs can have different retention requirements. After the specified time, old logs can be removed from the system or archived to less expensive storage with higher latency. Log removal and archival help you to improve query performance by reducing the size of hot data, and they are also helpful for auditing purposes.
What Types of Logs Should You Aggregate?
The types of logs you should aggregate depend on your use case. This is part of the log identification phase discussed earlier. Although this is not a comprehensive list, here are some recommendations for logs to capture:
System logs generated by Syslog, journalctl, or Event Log service
Web Server logs
Application logs, including those from microservices
Network flow logs
Firewall, anti-virus, intrusion detection system logs
API Gateway logs
Load balancer logs
Authentication service logs
Proxy server logs
Backup and recovery logs
Based on your requirements, you can exclude some logs like those from successful health checks or logins. You may also consider skipping most logs from components like bastion servers or FTP servers in the DMZ. However, you may still want to capture authentication logs even from those systems.
Features of a Log Aggregation Platform
There are many log aggregation platforms available in the market today. When you are selecting such a platform, consider some of the following factors.
Efficient Data Collection
The log aggregation platform should seamlessly collect logs from various sources, such as application servers, databases, API endpoints, or web servers. This can be native to the platform or through actively maintained plugins. It should also support all major log formats such as text files, CSV, JSON, or XML.
The platform must efficiently parse, index, compress, store, and analyze data at enterprise scale. It should also offer an easy and rich query language to search, sort, filter, and analyze logs, along with the capability to create dashboards and reports.
Log ingestion, parsing, indexing, compression, and storing time should be short. Users should be able to monitor logs in real time as they are ingested and processed.
The platform should handle a sudden burst of incoming log data and prevent data loss during transmission. Also, the gradual increase of data volume should not degrade search and query performance.
Stored log data should be encrypted at rest and in transit. This is often a mandatory requirement for some industries. It should also have mechanisms like role-based access control to control user access to the data.
Alerting and Integration
The solution should allow operators to create alerts based on specific criteria in logged events. It should be able to send those alerts to a multitude of communication systems. Integration with third-party tools and platforms is also a nice-to-have feature. One such feature allows logging solutions to create service tickets automatically.
Finally, the log aggregation platform should justify its value by low Total Cost of Ownership (TCO) and a high Return on Investment (ROI) when you perform a cost-benefit analysis.
Logging with Humio
Humio is a modern log management solution with a unique index-free architecture that makes log data instantly available for real-time searches, dashboard updates, and alerts. Features like near-zero latency from ingest to display, automatic anomaly detection, real-time alerts, a simple query language, and the ability to aggregate millions of log lines and create dashboards make it a modern, complete logging solution. To see how it works, you can try it for free.
Learn more from other related content
How logging everything helps mitigate ransomware risks
Modern logging is effective in detecting & mitigating ransomware risks by aggregating logs in a centralized place to correlate data. Read more from Humio here!
Top five ways logging everything will change your organization
Adopting a log everything approach to log management results in major changes across the organization. Here are the top five ways logging everything can transform your organization.
Fact check: dispelling the myths of logging everything
Logging everything remains a polarizing issue in the log management industry. Taking modern log management into account, we dispel the myths of logging everything.