Log File Analyser
An SEOs Guide To W3C Log Files
Introduction To W3C Log Files
The W3C Log Format is an official standard, unlike Apache Log Files that happen to follow a standard convention. Like Apache logs, W3C log files are just plain text files, so can be viewed in a simple text editor like Notepad or TextEdit. This format is used by both IIS and Amazon CloudFront.
The Format
W3C log files all start with a few lines of meta data, which are prefixed with a #. Here’s an example from an IIS log file:
#Software: Microsoft Internet Information Services 8.0
#Version: 1.0
#Date: 2017-01-01 00:09:00
#Fields: date time cs-uri-stem cs(User-Agent) sc-status
The #Fields: line is the most important as it specifies all the fields that are present in the log, and what order they appear in. If the fields, or their order, are changed a new #Fields: is added to detail the change.
The prefixes on the field names indicate the direction the value was sent:
- cs: Client to Server
- sc: Server to Client
In the example above, the status is sent from the Server to the Client, and the User-Agent is sent from the Client to the Server.
Tokenisation
Fields values are separated by whitespace, which works well for values that don’t contain spaces themselves. In an access log the cs(User-Agent) field can contain whitespace, for example Googlebot’s User Agent is:
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
So, how can this be represented without breaking tokenisation? What Microsoft have done in IIS is replace all the spaces with a +, so the User Agent will be:
Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html)
Unfortunately there is no way of knowing if the +’s are spaces, or actually +’s. So, if you upload an IIS log to the Log File Analyser, the User Agent values will contain these extra + characters. If you’re using the Advanced Logging Module however, the cs(User-Agent) value will be delimited by quotes so it can contain spaces, eg:
"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Mandatory Fields
The following fields are mandatory. Without these present the Log File Analyser will not be able to import a W3C log file.
- date or date-local: Date at which transaction completed, eg 2017-01-01
- time or time-local: Time at which transaction completed, eg 00:09:00
- cs-uri-stem or cs-uri: The uri requested by the client to the server, eg /contact.html or /www.example.com/contact.html
- sc-status: The response code sent from the server to the client, eg 200
- cs(User-Agent): The User-Agent sent from the client to the server, eg Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Here’s an example line and it’s preceding #Fields directive, containing just the mandatory fields:
#Fields: date time cs-uri-stem cs(User-Agent) sc-status
2017-01-01 00:09:00 /contact.html Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200
Download a 1,000 line example here.
Optional Fields
The following fields are optional. Ideally all these will be present in addition to the mandatory fields, which will allow you to get the most out of the Log File Analyser.
- cs-uri-query: The query string at the end of url, eg hobby=seo or – if no query present
- cs-uri: The full request uri, minus the protocol, eg /www.example.com/?hobby=seo
- cs(Referer): The uri of the Referering page, eg http://www.example.com/ or – if not provided
- time-taken: The time taken in milliseconds to serve the request, eg 500 for half a second
- sc-bytes: The number of bytes sent from the server to the client, eg 100
- cs-method: The method sent used by the client to request the page, eg GET,POST etc
- c-ip: The ip address of the client, eg 192.168.0.10
- cs-host or x-host-header: The host header if provided, – if not, eg www.example.com
- cs-protocol: The protocol used in the request, eg https
Below is an example of a W3C log file configured to provide the all information required by the Log File Analyser.
#Fields: date time cs-uri-stem cs-uri-query cs(Referer) cs(User-Agent) sc-status time-taken sc-bytes cs-method c-ip cs-host cs-protocol
2017-01-01 00:09:00 /contact.html source=ad http://www.example.com Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 500 100 GET 192.168.0.10 www.example.com https
Download a 1,000 line example here.
Microsoft detail their W3C fields here, and Amazon here.
Domains
If a log file does not contain cs-uri, cs-host or x-host-header field, which specifies the domain, you’ll be asked to confirm what it is.
If your logs cover multiple domains/subdomains, this will result in some information loss and all URLs will appear to be from the same domain. Let’s consider a simple log file with 2 lines, one for subdomain1.example.com and one for subdomain2.example.com.
#Fields: date time cs-host cs-uri-stem cs(User-Agent) sc-status cs-protocol
2017-01-01 00:09:00 subdomain1.example.com /contact.html Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 https
2017-01-01 00:09:01 subdomain2.example.com /contact.html Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 https
Here you won’t be asked for a site URL, as the Log File Analyser can read this from the log file. If however, the log file doesn’t contain the domain like the following two lines:
#Fields: date time cs-uri-stem cs(User-Agent) sc-status cs-protocol
2017-01-01 00:09:00 /contact.html Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 https
2017-01-01 00:09:01 /contact.html Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 https
You’ll be asked to specify the domain. As you can see here, if the log covers multiple domains you’ll end up misrepresenting some of your log file lines.
Tuning for SEO
As you can see, it may well be that your role as an SEO won’t only involve using the Log File Analyser to process the files. You may have to request changes in the current log configuration to really enhance your analysis.
If you have anymore questions about log files or how to use the Screaming Frog Log File Analyser, then please do get in touch with our support team.