CGI and Perl

Hit Counter

Another common use of CGI scripts is the hit counter. A hit counter is used to determine how many times your page has been accessed. Web servers can be configured to perform certain levels of logging. Although this can slow down the server somewhat, it can also provide valuable information to you. Many people want to know how popular their Web sites are, and the hit counter allows a Webmaster to show off to new visitors just exactly how popular a site is.

Introduction

Hit counters come in all different forms. You can have a normal ASCII text counter, or you can get creative and make a graphical counter. A common approach is to use the concept of an odometer on a car. I will show you how to obtain the number in this example and give you one example on how to make the display graphical.

The number of accesses is not the only type of counter you can provide. You can also find out how many times your page was referred by another page and also what type of browsers are accessing your page.

Setting Up the Web Server to Log Access

This section describes how to set up the NCSA httpd Web server for logging access to your Web site. This mechanism also applies to the Apache Web server and a few others that are based on httpd. Windows- and Macintosh-based servers usually provide a GUI front-end to these server options.

The NCSA httpd server has a configuration file called httpd.conf. This is an ASCII text file that is used to configure the server options. Within this configuration file, four variables are used to define where certain logs are kept. ErrorLog defines where the Web server should redirect STDERR; TransferLog defines where the page accesses are logged; AgentLog defines where the client information is logged; and RefererLog defines where the referring pages are logged. ErrorLog isn't something you would worry about in this example, although it is a very useful file to be aware of. This example focuses on TransferLog, AgentLog, and RefererLog.

Parsing the Access Log

To determine how many times a certain page has been visited, you need to scan the TransferLog. First, find out where your log file is kept. Let's assume that the Web server is installed in /usr/etc/httpd and that you have set TransferLog to logs/access_log. The file that you need to look at is /usr/etc/httpd/logs/access_log. Each line in this file pertains to one hit on a single object in your Web site. By using a Perl regular expression, you can search for the page in question and return the number of times that page has been found in the access log. The following line is an example of a record from the TransferLog.

www-proxy - - [06/Dec/1995:13:40:52 -0800] "GET /index.html HTTP/1.0" 200 638

The Perl program in Listing 7.5 opens the access log and uses a regular expression to search for the number of occurrences in the file. The page to search for is passed in as an argument to this function.

Listing 7.5. Perl subroutine to count the number of hits on a given page.

sub pageCount {
    my($page)=@_;
    # Pre-pend the GET method to limit the search scope.
    my($srchStr) = "GET $page";
    open(IN,"< /usr/etc/httpd/logs/access_log") ||
       die "Cannot open access log! $?\n";
    return(scalar(grep(/$srchStr/, <IN>)));}

This code can be included in your CGI script to display the number of hits on a given page. It can also be used outside of the Web site to provide statistics. $page is defined as a document path, relative to the document root of your server.

This routine can also be used in conjunction with some images to display a graphical hit counter. Suppose, for example, you have an image for each digit. You could take the resulting number from this function, treat it as a string, and use each digit value to locate the image associated with the digit, as shown in Listing 7.6.