Most of the available httpd servers provide you with an access log by default, along with some sort of an error log. Each of these logs has a separate format for its records, but there are a number of common fields, which naturally lends to the object-oriented model for parsing them and producing reports.
We'll be looking at the Logfile
module, written by Ulrich Pfeifer, in this section. It provides you with the ability to subclass the base record object and has subclass modules available for a number of servers' log files, including NCSA httpd, Apache httpd, CERN httpd, WUFTP, and others. If there isn't a subclass for your particular server, it's pretty easy to write one. General Issues
An HTTP server implements its logging according to configuration settings, usually within the httpd.conf
file. The data you have to analyze depends on which log files you enable in the configuration file, or at compile time for the server's source in the case of the Apache server. Several logs can be enabled in the configuration, including the access log, error log, referer log, and agent log. Each of these has information that you may need to summarize or analyze.
Logging Connections
There are some security and privacy issues related to logging too much information. Be sure to keep the appropriate permissions on your logfiles to prevent arbitrary snooping or parsing, and truncate them when you've completed the data gathering. See Chapter 3 for more details.
In general, the httpd log file is a text file with records as lines terminated with the appropriate line terminator for the architecture under which the server is running. The individual records have fields that are strings that form dates, file paths, and hostnames or IP numbers, and other items, usually separated by blank space. Ordinarily, there is one line or record per connection, but some types of transactions generate multiple lines in the log file(s). This should be considered when designing the algorithm and code that parses the log.
The access log gives you general information regarding what site is connecting to your server and what files are being retrieved. The error log receives and records the output from the STDERR
filehandle from all connections. Both of these, and especially the error log, may need to be parsed every now and then to see what's happening with your server's connections. Parsing
Using the Logfile
module, the discrete transaction record, based on some parameter of the request, is abstracted to a Perl object after being parsed. During the process of parsing the log file, the instance variables that are created with the new()
method depend on which type of log is being parsed and which field (Hostname, Date, Path, and so on) from the log file you're interested in summarizing. When parsing is complete, the return value, a blessed reference to the Logfile
class, has a hash with key/value pairs corresponding to the parameters on which you want to gather statistics about the log and the number of times each one was counted. In the simplest case, you simply write these lines:
use Logfile::Apache; # to parse the popular Apache server log $l = new Logfile::Apache File => `/usr/local/etc/httpd/logs/access_log', Group => [qw(Host Domain File)];
This parses your access log and returns the blessed reference. Reporting and Summaries
After you've invoked the new()
method for the Logfile
class and passed in your log file to be parsed, you can invoke the report()
method on the returned object.
$l->report(Group => File, Sort => Records, Top => 10);
The preceding line produces a report detailing the access counts of each of the top ten files retrieved from your archive and their percentages of the total number of retrievals. For the sample Apache access.conf
log file included with the Log file distribution, the results from the report()
method look like this:
File Records ======================================= /mall/os 5 35.71% /mall/web 3 21.43% /~watkins 3 21.43% /cgi-bin/mall 1 7.14% /graphics/bos-area-map 1 7.14% /~rsalz 1 7.14%
You can generate many other reports with the Logfile
module, including multiple-variable reports, to suit your needs and interests. See the Logfile
documentation as embedded POD in Logfile.pm
, for additional information. You can get the Logfile
module from the CPAN, from Ulrich Pfeifer's author's directory:
~/authors/id/ULPFR/
The latest release, as of the writing of this chapter, was 0.113. Have a look, and don't forget to give feedback to the author when you can. Generating Graphical Data
After you've gotten your reports back from Logfile
, you've pretty much exhausted the functionality of the module. In order to produce an image that illustrates the data, you'll need to resort to other means. Because the report gives essentially two-dimensional data, it'll be easy to produce a representative image using the GD
module, which was previously introduced in Chapter 12, "Multimedia."
This example provides you with a module that uses the GD
class and provides one method to which you should pass a Logfile
object, along with some other parameters to specify which field from the log file you wish to graph, the resultant image size, and the font. This method actually would be better placed into the Logfile::Base
class, because that's where each of the Logfile
subclasses, including the one for Apache logfiles, derive their base methods. It will be submitted to the author of the Logfile
module after some additional testing.
For now, just drop the GD_Logfile.pm
file (from Listing 14.4) into the Logfile
directory in your @INC
. You'll also need to have the GD extension and the Logfile
module installed, of course. The GD_Logfile
module uses the GD
package to produce a GIF image of the graph corresponding to data from the report()
method from the Logfile
class. The entire module, including the graph()
subroutine, looks like Listing 14.4.