Structured, Unstructured and Semi Structured Logging Explained

Arfan Sharif - December 21, 2022

Structured, semi structured and unstructured logging falls on a large spectrum each with its own set of benefits and challenges. Unstructured and semi structured logs are easy to read by humans but can be tough for machines to extract while structured logs are easy to parse in your log management system but difficult to use without a log management tool.

What is Structured Logging?

Structured logging formats log data so it can be easily searched, filtered, and processed to enable more advanced analytics. The standard format for structured logging is JSON, although other formats can be used instead. Best practice is to use a logging framework to implement structured logging and that can integrate with a log management solution that accepts custom fields.

Differences between Structured, Unstructured, and Semi-structured Logs

Unstructured logs are massive text files made up of strings, which are ordered sequences of characters that are meant to be read by humans. In logs, these strings contain variables, which are placeholders for qualities that are defined elsewhere. Sometimes the variable is a wildcard, which is a placeholder that represents an unknown quality, just like in poker.

unstructured_data = ["Unstructured message","Hello Python World",str(datetime.now(timezone("EST")).isoformat())]

People can understand variables easily, but that’s not always true for machines. They can’t always tell the difference between a variable in one string and a similar sequence of characters elsewhere in the log file. When that happens, the results can be confusing, leading to slowed productivity, increased fallibility, and wasted man-hours and processing cycles.

Structured logs consist of objects instead of strings. An object can include variables, data structure, methods, and functions. For instance, an object that’s part of a log message might include information about an app or a platform. The organization can define the criteria they wish to include in the object in order to make the logs most useful in meeting their unique needs. This is the “structure” in a structured log.

Here is an example of a structured log:

structured_data = [
 {
          "tags": {
               "host": "str(ip)",
               "host_name": "str(host)",
               "filename": "str(caller.filename)",
               "line": "str(caller.lineno)",
               "error_level": "INFO"
          },
          "events": [
               {
                    "timestamp": str(datetime.now(timezone("EST")).isoformat()), #.strftime("%Y-%m-%d %H:%M:%S %Z"),
                    "attributes": {
                         "message": "Structured message",
                    }
               }
          ]
     }
 ]

Because structured logs are meant to be read by machines, the machines that read them can perform searches on them faster, produce cleaner output, and deliver consistency across platforms. Humans can still read structured logs, but they are not the primary audience. They are the audience for the output once a machine has finished operating on the data.

Semi-structured logs support both machines and humans, the logs consist of strings and objects. These logs usually need to be parsed into tables before they can be analyzed properly. These semi-structured logs haven’t found a standardization yet, thus making it harder for several programs and systems to identify and categorize them. For example, the quoting rules for the value of a white space, is not universally defined. CrowdStrike’s Falcon LogScale has taken steps in the right  direction and can adapt to semi-structured logs in your environment.

Why Use Structured Logging?

Finding an event in an unstructured log can be difficult, with a simple query returning far more information than desired and not the information actually wanted. For example, a developer seeking a log event created when a specific application exceeded disk quota by a certain amount may find all disk quota events created by all apps. In an enterprise environment, that’s going to be a big file.

To find the right event, the developer would have to write a complicated regular expression to define the search. And the more specific the event, the more complicated the expression. This approach is computationally expensive at scale because the conditions defined in the match expression have to be compared to every row value in the log record. If wildcards are used, the computational expense is even higher. And if the log data changes, the match expression won’t work as intended.

In some organizations, the developers write code in the form of strings, while Ops teams write code that parses those strings into structured data. This takes more time and increases the computational expense. If a developer or an Ops team member makes an error, the logging process breaks and more time is lost finding the source of the error.

Structured logging eliminates these problems by structuring the data as it’s generated. The organization can choose the format that works best for them, such as fixed column, key value pairs, JSON, etc. Most businesses today choose JSON format because it integrates well with automation systems, including alerting systems.

Text logs continue to have a place in enterprise because structured logging has a few drawbacks. Structured logs define data as it is created, so the data can only be used for purposes served by that definition. And if the structured data is stored on-premise or in any data warehouse with a rigid data schema, changes to that schema will require the structured data to be updated, which is a vast and costly endeavor. When deciding on a logging strategy, organizations should consider who will be using the data, what type of data is collected, where and how the data will be stored, and whether the data needs to be prepared before storing it or if it can be prepared when used.

Falcon LogScale Supports Both Structured, Semi structured  and Unstructured Logs

The benefits of structured logging can only be realized with a flexible, scalable logging management system that supports development, compliance, and security needs.

CrowdStrike’s Falcon LogScale handles all unstructured, semi structured and structured messages and works with any data format, and is compatible with the leading open-source data shippers. Custom parsers make it easy to support any text format, so integrating Falcon LogScale is simple and quick.

Most users send structured data to Falcon LogScale as JSON objects. They don’t have to be formatted in any special way, they just have to be valid. Time stamps can be sent as part of the log entry, and Falcon LogScale will use your time stamp instead of replacing it with its own. When sending unstructured data, time stamps are generated at the time of ingestion as a long comma delimited string and do not impact the ingestion time stamp.

Discover the world’s leading AI-native platform for next-gen SIEM and log management

Elevate your cybersecurity with the CrowdStrike Falcon® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.

GET TO KNOW THE AUTHOR

Arfan Sharif is a product marketing lead for the Observability portfolio at CrowdStrike. He has over 15 years experience driving Log Management, ITOps, Observability, Security and CX solutions for companies such as Splunk, Genesys and Quest Software. Arfan graduated in Computer Science at Bucks and Chilterns University and has a career spanning across Product Marketing and Sales Engineering.