HTTP Logs Analysis using Microsoft Log Parser

While there are several tools freely available on the web to analyze your website traffic and they are doing great at this (Google AnalyticsGoogle Webmaster ToolBing Webmaster tool …). These tools provide great and free value to track your traffic and troubleshoot potential issues on your website. As any tool available they have some limitations and the need to find alternative/complementary solutions becomes necessary.

In this post I will discuss the use of Microsoft Log Parser to analyze “hits” on your web server  Any website of different size or complexity comes to have these different types of problems with time:

1)    Change of URL
2)    Removing old pages
3)    Error pages

To some extend the tools mention above will show you these errors, but they might not be exactly what you seek in a real data analysis perspective. Let’s take for example Error pages, some of your pages crashes sending HTTP 500 Status Code, you might not be able to recover data using the normal Google Analytics Javascript depending of how you are treating these crashes.

One way to get access to these data is to analyze you web server logs (if they are active of course). So as not to get too detailed in the explanation find below some utility code that will help you troubleshoot issues in your application. (After installing Log Parser you will be able to run the below syntax from command line)

HTTP 200 OK from Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP200.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status = ‘200’ GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 200 Hits”
[/SQL]

HTTP 301 Permantly Moved Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP301.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status = ‘301’ GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 301 Hits”
[/SQL]

HTTP 4xx Not Found / Gone Google Bots
[SQL]
LogParser.exe “SELECT date, count(*) as hit INTO HTTP4xx.jpg FROM Path\to\Logs\*.log WHERE cs(User-Agent) LIKE ‘%%google%%’ AND sc-status >= 400 AND sc-status < 500 GROUP BY date ORDER BY date” -i:w3c -groupSize:800×600 -chartType:Area -categories:ON -legend:OFF -fileType:JPG -chartTitle:”HTTP 4xx Hits”
[/SQL]

These queries will produce nice graphs of how much HTTP 200,301,4xx hits you receive per day while the Google bot is crawling you site.

You can also easily find out the same thing for your users by changing the cs(User-Agent) LIKE ‘%%google%%’ to cs(User-Agent) NOT LIKE ‘%%bot%%’.

Of course these are approximated to a certain level, because not all bots add the keyword “bot” to use user-agent.

Hoping this can come in handy. If you have more queries to share, drop by and put a comment.
Further readings :

http://blogs.iis.net/carlosag/archive/2010/03/25/analyze-your-iis-log-files-favorite-log-parser-queries.aspx

http://logparserplus.com/

CodeIgniter – Pagination SEO Issue

I have recently been working with a PHP MVC Framework called CodeIgniter on a complete web application solution.  I have been trying some major framework like CakePHP, Zen and Symphony which where all very powerful framework for MVC and RAD development, the only thing they lack was a bit more of flexibility like CodeIgniter propose. Anyway may not have taken enought time to get to know all of the specifics of the other Frameworks, but while benchmarking i got aquainted to CodeIgniter much faster.

Even though CodeIgniter is a very flexible framework, it’s very lightweight and some feature for Web application have not been taken into account, that in mind, the people behind EllisLab, Inc made sure that these small twigs were easily bypassed by allowing complete customization of their libraries.

Here is my original issue:

I have a item listing page with pagination activated and I wanted the first page to be the the root URL of the item page.
e.g. http://www.mysite.com/items

But what CodeIgniter Pagination Library generated for the first page was: http://www.mysite.com/result/1

That is pretty inconvenient for SEO, because the crawler will find two pages with the same content while crawling the pages.

Thus i modified the CI_Pagination library an created MY_Pagination.

First of all i have added a new variable called first_page_url as class variable in MY_Pagination class

[php]

class MY_Pagination extends CI_Pagination {

var $first_page_url        = ”; // The first page will have this URL

[/php]

I have changed the original Pagination Library First page rendering from

[php]

// Render the “First” link
if  ($this->cur_page > ($this->num_links + 1))
{
$output .= $this->first_tag_open.'<a href=”‘.$this->base_url.'”>’.$this->first_link.'</a>’.$this->first_tag_close;
}

[/php]

to

[php]

// Render the “First” link
if  ($this->cur_page > ($this->num_links + 1))
{
$output .= $this->first_tag_open.'<a href=”‘.$this->first_page_url == ” ? $this->base_url : $this->first_page_url.'”>’.$this->first_link.'</a>’.$this->first_tag_close;
}

[/php]

This way if during the initialization of the Pagination class the configuration setting first_page_url was passed it will be used instead of the base_url.

Some modification were also made to the pagination digit generation from

[php]

// Write the digit links
for ($loop = $start -1; $loop <= $end; $loop++)
{
$i = ($loop * $this->per_page) – $this->per_page;

if ($i >= 0)
{
if ($this->cur_page == $loop)
{
$output .= $this->cur_tag_open.$loop.$this->cur_tag_close; // Current page
}
else
{
$n = ($i == 0) ? ” : $i;
$output .= $this->num_tag_open.'<a href=”‘.$this->base_url.$n.'”>’.$loop.'</a>’.$this->num_tag_close;
}
}
}

[/php]

to

[php]

// Write the digit links
for ($loop = $start -1; $loop <= $end; $loop++)
{
$i = ($loop * $this->per_page) – $this->per_page;

if ($i >= 0)
{
if ($this->cur_page == $loop)
{
$output .= $this->cur_tag_open.$loop.$this->cur_tag_close; // Current page
}
else if($loop == 1 && $this->first_page_url != ”)
{
$output .= $this->num_tag_open.'<a href=”‘.$this->first_page_url.'”>’.$loop.'</a>’.$this->num_tag_close;
}
else
{
$n = ($i == 0) ? ” : $i;
$output .= $this->num_tag_open.'<a href=”‘.$this->base_url.$n.'”>’.$loop.'</a>’.$this->num_tag_close;
}
}
}

[/php]

which will make sure that the page numbered 1 takes has the first_page_url has href when  first_page_url is available.

The complete file can be found here: MY_Pagination

SEO: Bounce rate of a website

Why is my bounce rate so high ?

Definition: A bounce occurs when a person leaves your website after reaching your entry page. The above cases can be considered equally as bounces from your website.

1) Visitor enters your site and press back immediately (before or even after the page has loaded)

2) Visitor waits for the page to load stays on this page for some time and then press back or navigate on another site. ( In this case the visitor might have found the information and then chose to navigate elsewhere to either find some supplementary information. Or it could be that he/she might not have found it but just read some pieces to see what is there, a third case could be the persons did not like the: website, content or colors on the site and went away.)

Therefore there seems to be considerable number of aspects to take into consideration to get a more precise question about “Why is my bounce rate so high ?”. There isn’t any straight forward answer to this question, but there are many questions that can lead to possible solutions:

When you ask your questions about bounce rate here are the different questions that might come to your mind.

Why is my bounce rate so high ?

User Interface
Is my layout/presentation/design attractive to visitors ?
Does my pages load slowly ?
Do my page have appropriate ads ? Are these ads non-aggressive towards the user ?
Is your page browser friendly ? (Can be views at any resolution with any browser the same way)

Content

Does

IIS: Redirection from non-www to www domain

The problem today, is that we have a great asp.net website but search engines are indexing the http://greataspnetwebsite.com instead of http://www.greataspnetwebsite.com, this is commonly seen on the web and there are several ways to archive a good result for making the non-www to www domain. This redirection should be a 301 Permanently Moved, otherwise you will might lose your search engine indexed page or become duplicate content for your non-www and www domain. Here are easy steps how to archive a quick and clean Permanent Redirection using IIS.

Consider the case where we already have a website in IIS called: greataspnetwebsite.com

  • Go to IIS Manager
  • Create a new website that point to the same directory as your existing one
  • Select the newly created website, open the properties box
  • In the option button “When connecting to this resource the content should come from” should be change to “A redirection to a URL
  • Specify the URL http://www.greataspnetwebsite.com
  • Select the check box that says “A permanent redirection for this resource.”

Checklist for Search Engine Optimization(SEO)

From webopedia

Short for search engine optimization, the process of increasing the amount of visitors to a Web site by ranking high in the search results of a search engine. The higher a Web site ranks in the results of a search, the greater the chance that that site will be visited by a user. It is common practice for Internet users to not click through pages and pages of search results, so where a site ranks in a search is essential for directing more traffic toward the site.

There are must be more than thousands of checklist for SEO on the web. I have been involved in some SEO lately and would like to share this experience. This is a quick list that will make sure that you climb up your page rank with Search Engine you will most likely go to be on the first page of Google.

I’ll take the example of some website who produces and sells boat models.

  • Make sure that the pages are well planned and consistent
    • Make use of well structured sentences that always reflect what user might type in to get to your website
    • Do not spam your pages with keywords
    • Do not put too much content on a page, because users won’t read them all.
    • If you have a page for displaying for e.g a boat model “The Bounty” make sure that you mention “The Bounty”
      as much as you can on the page. e.g The heading of the page, the link to historical background to the boat, in the description of the model itself. This will will help you gain evidence that this page is related to “The Bounty”
  • Get backlinked whenever possible
    • Each time your backlinked, that is another website if pointing on yours, you get some influence on the search engine. You should make sure that you “buzz” your website as much as possible. Try forums and so on.
    • Make sure that the website your backlinked to is related to your bussiness background. E.g your won’t post your link on a Computer Programming Forum, this does not make anysense. Taking the e.g of the boat models website what could make sense are some souvenir website or even collectioners website to get backlinked from.
  • Make extensive use of metatags keywords, description, abstract
    • These are very important make sure that you include these on everypage that you want to be referenced by the search engine
    • Also make sure that these reflects the content of your page. do not use keywords that are not even found in your page, this will not make any sense and the search engine bot might just ignore it. e.g if you have “boat, model, miniature, ship, bounty” in your keyword meta tag. Make sure that these keywords are mentioned in your page content.
  • Make use of URL Rewritting
    • Using URL Rewritting will be very benefit to your website. This will boost up your website referencing. Take for example of bussiness of boat models. If i have a decriptive page of my boat model the page URL should be something like that www.boatmodel.com/The_Bounty_Model.html instead of something like that www.boatmodel.com/Product.php?p=10223
    • These URL will help your website gain in web rank. As for meta tags the keywords in those user friendly URL should also be present in your page content.
  • The most and not the least make your website comfortable and easy to use.
    • This is only a small talk about website usability, Your website should be as simple as possible depending on what is your audience. you should put in mind that not everyone is as expert as your developer. normal people don’t know that they must double click on some menu for something to happen of you should drag and drop some stuff in your browser for some other things to happen. Thus first place emphasis on the usability of your website, to make sure that when a user comes once, he/she will always want to be back some other time, because the website is very friendly and make me want to be back

This is a little of what i have learned for SEO. Hope that this can be of help.