CGI and Perl

Parsing Netscape History Files

We're going to close this chapter with a few tips for handling the various database files Netscape uses to store its global history and other important records.

Beginning with version 2.0, Netscape is using the Berkeley DB package to produce its databases that it accesses at runtime for various lookups. In order to run the sample code that follows, you'll need to build and install the DB library and its associated include files. You can get these from the CPAN in the misc directory. You'll also need to build and install the DB_File module, which ships as a core Perl module. You have to either remake Perl or build the module outside of the Perl distribution after you've installed the DB library and include files.

Tom Christiansen, that old wizard, took the time to figure out how the Netscape global history file was put together and wrote up a nice little tool to operate on it; the tool is called ggh, for Grok Global History. You can get the ggh tool from the CPAN in Tom's authors directory:

~/authors/Tom_Christiansen/scripts/nshist.gz

Let's take a look at how it works. Tom's stuff is usually an exercise in proper Perl coding style.

ggh has several command-line invocation options that allow the user to invoke it to grep out the URLs of interest, using Perl regular expressions from the history file as well as convert time formats.

If there's a link that you can't quite remember the location of, but you may remember the basename of the site, you can use ggh to search your entire history file to find anything that matches the basename. For instance, suppose that I wanted to find all the sites relative to Perl in my global history. I'd just use the simple invocation with the Perl regexp:

% ggh Perl

This gives me the following output from my history file at work:

Sat Sep 14 14:16:11 1996 http://moulon.inra.fr:80/oracle/www_oraPerl_eng.html
 Sat Sep 14 14:19:29 1996 http://cs.indiana.edu/Perl-server/intro.html
 Sat Sep 14 14:19:30 1996 http://www.cs.indiana.edu/Perl-server/intro.html
 Sat Sep 14 14:19:31 1996 http://www.cs.indiana.edu/picons/db/news/comp/lang/Perl/
 unknown/face.xbm
 Wed Aug 28 18:11:00 1996 http://ducks.corp.adobe.com/Perl/authors/
 Wed Sep 18 00:55:54 1996 http://www.Perl.com/CPAN/src/latest.tar.gz
 Wed Sep 18 00:55:59 1996 ftp://ftp.digital.com/pub/plan/Perl/CPAN/src/latest.tar.
 gz
 Wed Sep 18 00:56:11 1996 http://www.Perl.com
 Wed Sep 18 00:56:50 1996 http://www.ora.com/catalog/covers/pPerl2.t.gif
 Wed Sep 18 00:58:19 1996 http://www.ee.pdx.edu/~rseymour/Perl/
 Wed Sep 18 00:58:26 1996 http://www.eecs.nwu.edu/Perl/Perl.html
 Wed Sep 18 00:59:17 1996 http://www.middlebury.edu/~otisg/images/button.Perl.gif
 Wed Sep 18 00:59:33 1996 http://www.cis.ohio-state.edu/htbin/info/info/Perl.info
 Wed Sep 18 01:00:08 1996 http://www.ics.uci.edu/pub/websoft/libwww-Perl/
 Wed Sep 18 01:00:29 1996 http://www.wg.omron.co.jp/~jfriedl/Perl/index.html
 Wed Sep 18 01:00:45 1996 http://www.hut.fi/~jhi/Perl5-porters.html
 Wed Sep 18 01:01:17 1996 http://homepage.seas.upenn.edu/~mengwong/Perlhtml.html
 Wed Sep 18 01:01:45 1996 http://www.khoros.unm.edu/staff/neilb/Perl/www.html

After you try Tom's ggh script for a while, you can modify it, for instance, to use the CGI libraries and automate the process of keeping it up to date with working URLs. As with many of Tom's scripts, it's completely free, and you can hack at will. Just don't redistribute without making a note of your changes.

Note:

A bytecode compiler for Perl is currently in development and is targeted for release with the 5.005 version of Perl.