Cybersecurity 101 › What is an Access Log?

What is an Access Log?

Arfan Sharif - December 21, 2022

An access log is a log file that records all events related to client applications and user access to a resource on a computer. Examples can be web server access logs, FTP command logs, or database query logs.

Managing access logs is an important task for system administrators. Software developers, operations engineers, and security analysts use access logs to monitor how their application is performing, who is accessing it, and what’s happening behind the scenes. Access logs can help IT teams discover problems, detect threats, and identify capacity issues.

Typically, access logs contain some common information. Example information includes:

The date and time of client access
The client IP address or hostname
Username
The status or criticality of the event
Success or failure of the operation
Any relevant messages

In this article, we’ll consider why access logs are important, different types of access logs and their locations, their contents, and the various configuration parameters involved.

Access Log Types

We can broadly classify access logs into three main categories:

Activity logs
Server access logs
Error logs

Activity logs

An activity log records all the actions performed by a user during a session. Such activities include executing commands, visiting URLs, and accessing files. Some examples of activity logs include:

Server access logs

Server access logs contain information about user connections and their resource requests. Unlike activity logs, these logs don’t contain detailed information about what the user actually did. Examples of server access logs include:

Error logs

Error logs contain diagnostic information about errors encountered during client sessions. These logs are useful for troubleshooting application and system errors. Some examples include:

To keep things simple, we’ll focus on web server access logs in this article. Typically, web server access logs contain all three types of information (user access, user activity, and request errors).

Why Do You Need to Capture Access Logs?

Capturing and analyzing web server access logs is beneficial for system administrators.
First, it shows the web application’s availability and health for faster error troubleshooting. For example, if the access log shows a high number of HTTP error 404, it means users are trying to access one or more non-existent pages, or the site is using the wrong URLs.
An access log also helps troubleshoot critical errors. For example, a high number of 5xx errors indicates the web server is encountering internal errors—part of the site is probably crashing. Looking further into the web server error log can reveal more information.
Digital marketing is another area of value for web server access logs. Using the access log entries, digital marketers can identify areas on the site where users visit, request data, complete forms, download files, or click links. All these can power fine-tuned user profiling and search engine optimization.

SecOps engineers use web server access logs to find unusual behaviors or anomalies. An unexpected surge of HTTP GET requests from a specific range of IP addresses is one example. This may signal a possible DDoS attack from a set of compromised computers. If a web server is only supposed to accept HTTP/HTTPS traffic from a web application firewall, then direct HTTP requests from other IP addresses can indicate possible unauthorized access.

What Does an Access Log Contain?

Typically, a web server access log will contain the following types of information:

Date and time	The date and time the site/page was accessed, which can be in UTC or in the web server’s local time.
Source IP	The client machine’s IP address.
Destination IP	IP address of the web server.
Destination FQDN	The web server’s fully qualified domain name.
Destination port	The requested port on the web server. This is typically `80` (default for HTTP) or `443` (default for HTTPS) but can be anything depending on which port the website is running.
Protocol	The client access network protocol. A typical example is HTTP 1.1.
Username	User accessing the website (if anonymous, this is denoted by a hyphen).
Resource	The page or element requested.
HTTP method	This HTTP request method (such as, `GET`, `POST`, and so on).
HTTP status code	Status code returned by the web server (such as, `200` OK, `404` Page Not Found, and so on).
URI Query	The application query sent to the website as part of the HTTP request.
HTTP referrer	The IP address or URL that directed the client to this website.
HTTP user agent	The type and version of the client browser.
Bytes received	The number of bytes received by the web server from the client.
Bytes sent	The number of bytes sent by the web server to the client.

To see what these fields look like, let’s consider the following snippet from an Apache web server access log:

116.35.41.41 - - [21/May/2022:11:22:41 +0000] "GET /aboutus.html HTTP/1.1" 200 6430 "http://34.227.9.153/" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/15.4 Safari/605.1.15"

Here, the access log shows a client request coming from the IP address 116.35.41.41 on May 21st, 2022 at 11:22 a.m. local server time. The client accessed the aboutus.html page under the website’s root directory. The HTTP status code was 200 (such as, the client request was successful), and the referring website address is http://34.227.9.153/. The user’s browser was Apple’s Safari, and the web server sent 6430 bytes to the client when it served the page.

By aggregating such information from the access log, you can find the:

Number of unique visitors per page or unique pages per visitor
Geolocations of site visitors
Most commonly accessed parts of a site
Most commonly used client queries
Total number of different HTTP status codes

How to Find Access Logs

A web server’s access log location depends on the operating system and the web server itself.

For example, the default location of the Apache web server’s access log in RHEL-based systems is /var/log/httpd. In Debian-based systems like Ubuntu, the location is /var/log/apache2.

For Nginx, by default, the access log is in the /var/log/nginx directory in both RHEL and Debian-based systems.

The default access log location for Internet Information Service (IIS) running on a Windows server is %SystemDrive%\inetpub\logs\LogFiles\W3SVC. The %SystemDrive%is typically C:\, and the site_id is the IIS-hosted website’s ID.

There are different ways administrators can read a web server’s access logs. A site administrator can SSH into the actual web server’s console for Linux-based systems and use commands like cat, tail, and grep to read the file. Sometimes, webmasters may have to use the hosting provider’s control panel (such as, cpanel) to open and read the access log.

How to Configure Access Logs

Like most other settings, you can set the properties of a web server access log in its configuration file. Locating a web server’s main configuration file depends on the web server itself and the OS. Here is a list:

Webserver	OS	Main Configuration File
Apache	RHEL-based	`/etc/httpd/conf/httpd.conf`
Apache	Debian-based	`/etc/apache2/apache2.conf`
Nginx	RHEL-based	`/etc/nginx/nginx.conf`
Nginx	Debian-based	`/etc/nginx/nginx.conf`
IIS	Windows Server	`%WinDir%\System32\Inetsrv\Config\ApplicationHost.config`

Some of the common access log settings in any web server are:

Log location
Log format
Log level
Log rotation

The access log location can be different for each website hosted on the web server. For example, in Apache, the following command sets the server-wide access log location:

CustomLog "/var/log/httpd2/access_log" common

But this can be overridden for a VirtualHost:

<VirtualHost *:80>
    ServerName www.mysite.com
    ServerAlias test.com
    DocumentRoot /var/www/html/test.com
    ErrorLog /var/log/httpd/mysite.com/error_log
    CustomLog /var/log/httpd/mysite.com/access_log combined
</VirtualHost>

The access log format configuration specifies the fields to include in log entries. The access log format can be common or combined. The snippet below shows a sample configuration:

LogFormat "%h %l %u %t \"%r\" %>s %b" common

Here:

%h is the remote hostname
%l is the remote logname from identd (if supplied)
%u is the client’s user ID (if available)
%t is the timestamp the request was received
%r is the first line of the HTTP request
%>s is the HTTP status code returned by the webserver
%b is the size of the resource returned in bytes

You can refer to the Apache documentation to see how to use the custom log module to configure your own access log format.

Other Apache access log configuration settings can include log level and log rotation. Log level allows you to include only specific events that meet a certain criticality level and above. These criticality levels can be debug, info, notice, warn, error, crit, alert, emerg, and anything between trace1to trace8. The lower the log level, the more verbose log entries will be. In the snippet below, we are configuring the access log to record only warn level messages and above:

LogLevel warn

This can be overridden for Apache VirtualHosts.

You can set Apache log rotation using the Linux logrotate utility or Apache’s rotatelog program.

Discover the world’s leading AI-native platform for next-gen SIEM and log management

Elevate your cybersecurity with the CrowdStrike Falcon^® platform, the premier AI-native platform for SIEM and log management. Experience security logging at a petabyte scale, choosing between cloud-native or self-hosted deployment options. Log your data with a powerful, index-free architecture, without bottlenecks, allowing threat hunting with over 1 PB of data ingestion per day. Ensure real-time search capabilities to outpace adversaries, achieving sub-second latency for complex queries. Benefit from 360-degree visibility, consolidating data to break down silos and enabling security, IT, and DevOps teams to hunt threats, monitor performance, and ensure compliance seamlessly across 3 billion events in less than 1 second.

GET TO KNOW THE AUTHOR

Arfan Sharif is a product marketing lead for the Observability portfolio at CrowdStrike. He has over 15 years experience driving Log Management, ITOps, Observability, Security and CX solutions for companies such as Splunk, Genesys and Quest Software. Arfan graduated in Computer Science at Bucks and Chilterns University and has a career spanning across Product Marketing and Sales Engineering.