CGI and Perl

Table 5.1. HTTP FullRequest methods.

Method Description
GET The GET method is used every time your browser requests a document. Variables may be sent using this method through URI Encoding (described in the "GET Method" section).
POST The POST method is used most prominently to submit URI encoded data or text from an HTML form.
HEAD The HEAD method is used in the same way as the GET method, but the server returns only the HTTP headers and no document body.
DELETE This method instructs the server to attempt to delete the data referenced by the specified URI.
PUT This method instructs the server to append data in the body section of the request to the specified URI.
LINK By adding header meta-information to an object, this method can link an object to another specified object.
UNLINK This method removes headers (meta-information) that are specified in the request for the specified object.
SHOWMETHOD This method allows the client to request interface specifications for methods not covered in the current HTTP specification.
SPACEJUMP Similar to the TEXTSEARCH method, this header is used to specify the coordinates of a mouse click on a gif image.
TEXTSEARCH This method takes the text from the body of the request and instructs the server to perform a simple search for the specified text.
GET Method The GET method is used for all normal document requests. In other words, a request with a GET method is what your Web browser sends to the Web server to request a document. The GET method is very simple, only requiring a single argument: which URI to get. Here's the syntax of the GET method:
GET http://www.netsite.com/directory/filename CrLf

The GET method can also be used to send name/value data back to the server. Data is returned by URI encoding it and appending it to the end of the URI. URI encoding refers to the replacement of special characters such as tabs, spaces, question marks, and quotes with their HEX equivalents. All data passed from the browser to the Web server must be URI encoded. URI encoding will be discussed in more depth in the next section. The process of URI encoding the name/value pair data and then appending it to the end of the URI is what happens when a browser submits a GET method HTML form. In other words, a form called like this

<FORM METHOD="GET" ACTION="/cgi-bin/mycgi.cgi">

will cause the browser to URI encode the name/value pair form data (the names and values of the fields you create in your form), then append it to the end of the URI. Digital's AltaVista search engine uses the GET method. A search for the best web site on AltaVista caused the following URI to be generated and sent by Netscape:

http://altavista.digital.com/cgi-bin/query?pg=q&what=web&fmt=.&q=the+best+web+site

Breaking this URI down into its components, the first part of the URI calls a CGI called query in the cgi-bin/ directory:

http://altavista.digital.com/cgi-bin/query

This is followed by a ?, indicating the presence of URI encoded data, and four URI encoded name/value pairs:

?pg=q&what=web&fmt=.&q=the+best+web+site

The URI encoding of this string makes it look like a bunch of garbage at first glance. URI encoded name/value pairs are separated by the & character. The + character represents whitespace. Thus, this string translates into the following four name/value pairs once URI decoded:

pg = q
what = web
fmt = .
q = the best web site

Using the GET method in all cases is not recommended, as servers usually have size limitations on the length of URIs due to operating system environment constraints. For example, the size is 1240 bytes in the UNIX environment. The GET method is useful, however, in cases where you need to send your script an argument from a URI; for example, if you had written a script called classifieds.cgi, and you wanted the script to support multiple product categories. A good way to do this would be to send the category to the script as a variable in the URI. An HTML page could have links to the different categories like this:

<a href=http://mysite.com/cgi-bin/class.cgi?category=pets>Pets</a>
 <a href=http://mysite.com/cgi-bin/class.cgi?category=stuff>Stuff</a>
 <a href=http://mysite.com/cgi-bin/class.cgi?category=cars>Cars</a>

When your script is called, you can use the category variable to create different dynamic output based on what was specified in the category variable. Another advantage to this method is that users can bookmark your script, and since the variables are part of the URI, they can bookmark different categories.

Caution:

Be very careful when using GET requests to supply your script with input. A malicious user can easily change or modify the variables in the URI that calls your script. Depending on how you use the variables, this could easily cause undesirable and possibly dangerous results.


POST Method The POST method also URI encodes your data, but instead of the data being tagged on to the URI, it is sent separately after all of the other request headers and is available as STDIN to your script. CGI.pm provides transparent access to the data in STDIN generated by a POST method request. The POST method uses the CONTENT_LENGTH environment variable to indicate how many bytes of the standard input are used for the encoded input data. The CGI script will keep reading input data until it has read CONTENT_LENGTH bytes of data. The CONTENT_TYPE environment variable indicates to the script what kind of data is being sent. The content-type for HTML forms returning name/value data with the POST method should always be application/x-www-form-urlencoded.

To use the POST method in an HTML form, simply specify POST as the method:

 <FORM METHOD="POST" ACTION="/cgi-bin/mycgi.cgi">

Later, you will see how CGI.pm or the HTTP::Request module can be used to construct headers with these methods from within your scripts and decode URI encoded form data. The ability to easily generate GET, POST, and other headers makes the following tasks possible:

  • Retrieving a document off the Net with a single line of Perl
  • Accessing and controlling form-based CGI applications on other systems from within your Perl script
  • Requesting information about a document on another server without actually retrieving it

URI Encoding Why should you care about URI encoding? Here are three reasons:

  1. All data contained in HTML forms passed from the browser to the Web server (your CGI) is URI encoded. This data must be decoded.
  2. Data that is sent to your CGI attached to the URI using the GET method must be encoded.
  3. CGI.pm and the CGI::Base module transparently handle URI encoding and decoding.
  4. Because all of your data passed between the browser and the server is being encoded and decoded, you should understand how and why.

URI encoded data is introduced in two instances. First, data contained in HTML forms (using GET or POST methods) will be automatically URI encoded by the browser before being sent to your script. Second, if you want to send input to your script, you must know how to encode the data when you append it to the URI if the browser is not doing it for you.

URI encoded data is appended to the URI in the following manner:

http://www.netsite.com/mycgi?query_string

where query_string represents a string of URI encoded name/value pairs. Here's a typical query_string:

name=John+Doe&age=30&grade=96%25

Each name/value pair is separated by the ampersand character (&). In this example %25 represents the % character. In addition, the plus symbol (+) represents a space. When decoded, this example contains the following name/value pairs: name = John Doe

age= 30
grade = 96%

There are many ASCII characters that must be URI encoded. Use the following table to determine which HEX values you must use to represent the characters shown in Table 5.2.