CGI and Perl

GlimpseHTTP

Now you are probably asking how all of this talk about Glimpse relates to Perl. GlimpseHTTP is a collection of Perl scripts which takes advantage of the power of Glimpse from within Perl. GlimpseHTTP outputs search results in nicely formatted HTML based on a template page (ghtemplate.html) which is easily modified to customize your output.

GlimpseHTTP was written by Michael Smith, Udi Manber, and Paul Klark. As of this writing, the most current version is 2.0 and is available from: ftp://ftp.cs.arizona.edu/glimpse/glimpseHTTP.2.0.src.tar.Z

Installation of GlimpseHTTP is very straightforward. A step-by-step installation guide can be found at: http://glimpse.cs.arizona.edu/ghttp/install.html

After GlimpseHTTP is installed, the first thing you need to do is make an "archive" using the included makegharc command. Like Glimpse, GlimpseHTTP requires a few additional files to be created to function properly. The makegharc program creates some configuration files, along with the ghindex.html files which contain the search forms. When makegharc is run, it will prompt you for the location of the archive. As we discussed earlier, the location needs to be at the root of the public_html tree on your server, and should not contain images or any other files you do not intend to have publicly available.

GlimpseHTTP in action

Figure 10.1. GlimpseHTTP in action.

To search using GlimpseHTTP, view the ghindex.html file that has been generated in each directory. The ghindex.html page has a search form which you can use search that particular subdirectory of the archive.

GlimpseHTTP allows you to integrate search with browsing. If you have several nested directories which the user may browse, you can include the Glimpse interface in each document such that only the relevant directories will be included in the search. More details are given below.

The current version of GlimpseHTTP was tested under httpd 1.2 HTML server from NCSA, and works on Apache and other Web servers.

Some features of GlimpseHTTP include:

  • Combined browsing and searching; first, you locate the directory where the relevant information might be located, then you can use search to locate specific files. The result of a search is nicely formatted hypertext with hyperlinks to matching documents.

  • Easy generation of search pages

  • Configurable search pages

  • Well-documented scripts and complete online documentation

  • Easy installation

  • Non-centralized archive management, allowing separate users to maintain separate archives with no special permissions needed

  • Uses the Glimpse search engine, which provides some unique features

Uses a very small index (3 to 5 percent of the total text) Very fast search Searches for approximate match, allowing errors