HTTP Logs Analysis using Microsoft Log Parser

While there are several tools freely available on the web to analyze your website traffic and they are doing great at this (Google AnalyticsGoogle Webmaster ToolBing Webmaster tool …). These tools provide great and free value to track your traffic and troubleshoot potential issues on your website. As any tool available they have some limitations and the need to find alternative/complementary solutions becomes necessary.

In this post I will discuss the use of Microsoft Log Parser to analyze “hits” on your web server  Any website of different size or complexity comes to have these different types of problems with time:

1)    Change of URL
2)    Removing old pages
3)    Error pages

To some extend the tools mention above will show you these errors, but they might not be exactly what you seek in a real data analysis perspective. Let’s take for example Error pages, some of your pages crashes sending HTTP 500 Status Code, you might not be able to recover data using the normal Google Analytics Javascript depending of how you are treating these crashes.

One way to get access to these data is to analyze you web server logs (if they are active of course). So as not to get too detailed in the explanation find below some utility code that will help you troubleshoot issues in your application. (After installing Log Parser you will be able to run the below syntax from command line)

HTTP 200 OK from Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP200.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status = ‘200’ GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 200 Hits”
[/SQL]

HTTP 301 Permantly Moved Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP301.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status = ‘301’ GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 301 Hits”
[/SQL]

HTTP 4xx Not Found / Gone Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP4xx.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status >= 400 AND sc-status < 500 GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 4xx Hits”
[/SQL]

These queries will produce nice graphs of how much HTTP 200,301,4xx hits you receive per day while the Google bot is crawling you site.

You can also easily find out the same thing for your users by changing the cs(User-Agent) LIKE ‘%%google%%’ to cs(User-Agent) NOT LIKE ‘%%bot%%’.

Of course these are approximated to a certain level, because not all bots add the keyword “bot” to use user-agent.

Hoping this can come in handy. If you have more queries to share, drop by and put a comment.
Further readings :

http://blogs.iis.net/carlosag/archive/2010/03/25/analyze-your-iis-log-files-favorite-log-parser-queries.aspx

http://logparserplus.com/

Using windows hosts file

Windows hosts file, located under “[SystemDriveLetter]:\Windows\System32\drivers\etc” is very useful when you have to test your web applications hosted either locally or on a remote server and you do not wish to map them to your DNS.

Let’s take an example where you have a website named : http://www.my-simple-web-application.com. You will most likely have 3-4 versions of the application dev, preprod, test, live (where live would be http://www.my-simple-web-application.com)

To facilitate testing you could come up with a standard way of addressing these environments :

http://dev.my-simple-web-application.com
http://preprod.my-simple-web-application.com
http://test.my-simple-web-application.com

Each of these sub-domains might point to the same or different servers. This is where the hosts file comes handy, you can configure something like :

127.0.0.1 dev.my-simple-web-application.com
127.0.0.1 preprod.my-simple-web-application.com
127.0.0.1 test.my-simple-web-application.com

In this example all IP addresses are local, you can change them as needed, beware that this configuration should be place on each desktop (development and test) that you want to use these sub-domains.

On another note, this configuration can also be achieve network wide if you have a configurable router where you can add global hosts.

There a number of other situations where hosts file can be helpful :
1) You are migrating your website to a new server, in this case you can specify you existing domain name in the hosts file and point it to the IP of the new server
2) You have multiple web servers hosting the same application and one of them is not working properly you can target the mischievous server and change your host file to point only this server.

CodeIgniter – Extending CI_Model

Today I would like to discuss about CodeIgniter (CI) models and how you can achieve code reuse by “overriding” the base CI_Model class.

First let’s take a simple example where you have two models in your application : employee, employee_leave

You would normally create two classes :
[php]
class Employee extends CI_Model {

public function get_by_id($id)
{
//some repository operation
}

}

class Employee_Leave extends CI_Model {

public function get_by_employee_id($id)
{
//some repository operation
}

}
[/php]

which is fine, and in your controller if you need to get an employee by id you would do something like :
[php]
$this->employee->get_by_id(/* id of your employee*/);
[/php]

same for your employee leave
[php]
$this->employee_leave->get_by_employee_id(/* id of your employee*/);
[/php]

The implementation of get_by_id and get_by_employee_id would be a SELECT query to the underlying data source. Using Active Record class, it would be  :
[php]
$query = $this->db->get_where(’employee’, array(‘id’ => $id), $limit, $offset);
[/php]
and
[php]
$query = $this->db->get_where(’employee_leave’, array(’employee_id’ => $employee_id), $limit, $offset);
[/php]
A more elegant way to work out the same operation would be by overriding the base CI_Model and adding generic functions.
[php]

class MY_Model extends CI_Model {

// will hold the table name of the current instance
var $tablename = “”;

// this constructor will help us initialize our child classes
public function __construct($tablename)
{
$this->tablename = $tablename;
parent::__construct();
}

public function get_all($limit = -1, $offset = 0, $orderby = ”) {}

public function get_total_count() {}

public function get_total_count_where($where) {}

public function get_where($where = array(), $limit = 10, $offset = 0, $orderby = ”) {}

}
[/php]

By building building this custom Model with an overloaded constructor that will take the table name as parameter, you will be simplifying the child classes build on top.

You need to modify your current Model (employee and employee_leave) to make use of your new base Model.

[php]
class Employee extends MY_Model {

public function __construct()
{
parent::__construct(’employee’);
}

}

class Employee_Leave extends MY_Model {

public function __construct()
{
parent::__construct(’employee_leave’);
}

}
[/php]

 

In you controller the code will now be :

[php]
$this->employee->get_where(/* where condition with id of your employee*/);
[/php]

same for your employee leave
[php]
$this->employee_leave->get_where(/* where condition with id of your employee*/);
[/php]

 

A very basic example that can be extended to build some solid and complex logic around your Model so that you don’t repeat the same code for performing repository operation.