HTTP Logs Analysis using Microsoft Log Parser

While there are several tools freely available on the web to analyze your website traffic and they are doing great at this (Google AnalyticsGoogle Webmaster ToolBing Webmaster tool …). These tools provide great and free value to track your traffic and troubleshoot potential issues on your website. As any tool available they have some limitations and the need to find alternative/complementary solutions becomes necessary.

In this post I will discuss the use of Microsoft Log Parser to analyze “hits” on your web server  Any website of different size or complexity comes to have these different types of problems with time:

1)    Change of URL
2)    Removing old pages
3)    Error pages

To some extend the tools mention above will show you these errors, but they might not be exactly what you seek in a real data analysis perspective. Let’s take for example Error pages, some of your pages crashes sending HTTP 500 Status Code, you might not be able to recover data using the normal Google Analytics Javascript depending of how you are treating these crashes.

One way to get access to these data is to analyze you web server logs (if they are active of course). So as not to get too detailed in the explanation find below some utility code that will help you troubleshoot issues in your application. (After installing Log Parser you will be able to run the below syntax from command line)

HTTP 200 OK from Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP200.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status = ‘200’ GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 200 Hits”
[/SQL]

HTTP 301 Permantly Moved Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP301.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status = ‘301’ GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 301 Hits”
[/SQL]

HTTP 4xx Not Found / Gone Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP4xx.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status >= 400 AND sc-status < 500 GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 4xx Hits”
[/SQL]

These queries will produce nice graphs of how much HTTP 200,301,4xx hits you receive per day while the Google bot is crawling you site.

You can also easily find out the same thing for your users by changing the cs(User-Agent) LIKE ‘%%google%%’ to cs(User-Agent) NOT LIKE ‘%%bot%%’.

Of course these are approximated to a certain level, because not all bots add the keyword “bot” to use user-agent.

Hoping this can come in handy. If you have more queries to share, drop by and put a comment.
Further readings :

http://blogs.iis.net/carlosag/archive/2010/03/25/analyze-your-iis-log-files-favorite-log-parser-queries.aspx

http://logparserplus.com/