CGI and Perl

Revision Control

The process (and rigor) of revision control is often overlooked or even ignored when an administrator manages an archive. However, there are some very good reasons you should use some sort of version control when creating and updating your resources. A Policy for Change--Description The process of making your documents available via the Web is really one of publishing. When you, as a representative of a company, make a document available, you're making a statement that represents your company. While some of the issues and legalities are still murky, you should consider the liability that you or your company assumes when making documents available. The information within the documents should be correct, and insofar as is possible, verifiable, and free of misrepresentations.

Such considerations give rise to the need for a policy for the management of the archive. This policy should be both comprehensive and understandable by anyone who will participate in the creation of the archive or the modification of its contents.

A policy for change can be as formal as you like. In general, the following items should be considered when designing the policy:

  • Creation--The process of creating new elements or the hierarchy (directories) within the archive.

  • Updating--The process of revising the archive's elements or hierarchy.

  • History--Retaining older versions of documents for future reference or consideration, including the purpose of reverting back to a previous version.

  • Accountability--Who installed or updated the document, when did he/she perform this action, and why?

The level of complexity in a policy for change that can arise out of just these four items might surprise you. For instance, when dealing with source code and documentation for a given software product, some organizations implement a multi-tiered structure of committees, forms, and checklists which any change or addition must pass through before being applied to the released document(s) or code. At some point, productivity may suffer if the process becomes too complex. The idea is to find a happy medium between no policy at all and one that bogs you down. A Policy for Change--Motivation Plenty of things can go wrong when you're populating an archive or updating its elements. In the best cases, the Webmaster or Web team is immediately made aware of the problem and is able to deal with it. On the other hand, some minor errors in documents or functionality may go unnoticed for a long time, potentially becoming a permanent problem due to the number of copies of the documents that were distributed with the error. With the number of indexers, auto-mirrors, archiving proxies, and other forms of duplication that exist today on the Web, the proliferation of errors can be almost immediate and quite difficult to overcome.

Obviously, the best policy is one in which no documents would be distributed with errors or misrepresentations. However, implementing such a policy is quite difficult, even if you already are using a sound policy for change. If you don't have a change policy, then the difficult becomes practically impossible. A number of situations can lead to errors; let's consider a few of them.

  • Multiple Versions--A document, perhaps an image, script, or applet, may exist in several locations within the archive. A copy is updated, but the changes may not propagate to the other installed copies. Because no single copy is designated as the master copy, changes also may occur independently to the copies, causing additional problems.

  • Simultaneous Updates--An archive or one of its components may be managed by a group of people. This inevitably leads to simultaneous changes in some element, if some form of revision control is not used. Suppose one person copies a document, starts making changes to it, and before he/she is finished, someone else makes another copy of the same document and starts making changes. The inevitable outcome is that one or the other's work will be lost, depending on who copies the changed document back into the archive last.

  • Security/Access--Some operating systems provide a means to restrict access to files based on the UserID or group. These mechanisms lack the functionality necessary for a dynamic, effective policy for allowing a particular person to perform a particular task on a given element in an archive. Such tasks may need to be performed on a repetitive basis, or possibly only once, by a given Web team member or other individual. A need also may exist to allow certain types of access (for example, reading), but disallow others (such as updating) on a given element in the archive based on the local UserID. Some file systems have this sort of functionality built in, via Access Control Lists (ACLs), but these mechanisms may be still inadequate and are rarely enforceable across networked file systems or different architectures.

  • Accountability/Audits--If more than one person has the ability to make changes to the archive, then tracking the changes and who made them becomes difficult. In case of an error or omission, it may be desirable to learn who made the error. Most ordinary file systems don't give you the ability to track changes and who made them. Ideally, each element should have its own history or record of changes made to it, and who made the changes to it during its life in the archive.

  • Creation/Population--If you, as the Webmaster, have carefully thought through the issues outlined in this chapter and have implemented a policy for change, and then someone decides to create a new directory or other element in the archive without being aware of the policy, you might find this action a bit irksome. The lack of consideration of the plan on the part of this person probably means that you will have to go in and fix things to restore the original order. Allowing others to create new elements in the archive should imply that they understand the issues involved and practices/policies for doing so.

There are other potential problem scenarios I haven't mentioned, but these should give you the general idea. In order to properly maintain an archive, especially as a group or team representing a company, it's essential to use some form of revision control, and to have a well-understood policy for change. A Policy for Change--Solutions A variety of tools and systems are available to implement revision control. Some of them are available for free, and others are commercially available and well-supported. An organization may also wish to implement a home-grown solution, perhaps using Perl and some other tool or tools. We're not going to attempt to implement such a tool but the following list should give you an idea of what tools are available. I'm also not going to try to give a comprehensive overview here; I'll just cover some of the most popular solutions.

ftp://ftp.cs.purdue.edu/pub/RCS/

RCS/CVS This toolset is probably the most widely used tool for revision control on UNIX operating systems. It's a GNU tool, originally created at Purdue University. RCS/CVS has had contributions, bugfixes, and patches like other GNU software from caring individuals all across the Internet. RCS stands for Revision Control System. CVS is a front-end to RCS, which adds functionality and implements additional features to RCS. CVS extends the functionality of RCS by providing the ability to create a private copy of an entire suite of documents, and then optionally lock, modify, and check-in a given document. Each document's changes (deltas) are kept in a storage container corresponding to the name of the document. Ports of RCS/CVS are also available for Macintosh and DOS/Windows. It is freely available, well understood, and help is fairly easy to find via the documentation, Usenet, or mailing lists. It operates primarily on text files. RCS/CVS is available at most standard Usenet sources archive sites and always at Purdue: ftp://ftp.cs.purdue.edu/pub/RCS.

http://www.atria.com

ClearCase This toolset, available through Atria, Inc., actually implements a complete file system and is possibly the most powerful, complex, and configurable of any other configuration management tool available. It's primarily used for source code control and software project management but makes a very nice archive management tool as well. ClearCase lacks a Macintosh interface, but it can export its files via NFS. It operates on text files, binaries, images, and even directories, along with any other filetype you wish to configure. It is available through the Pure-Atria sales staff at Pure-Atria, Inc., and through the Web site at http://www.pureatria.com.

http://www.microsoft.com/SSAFE/Default.html

SourceSafe is another toolset available as a commercial product. In terms of functionality, it looks and feels much like CVS, but it implements a database for its internal references to revisions and history and has additional features and user interfaces. SourceSafe operates on text, binaries, and images. I haven't used the SourceSafe toolset in the role of archive management, but it seems to have the necessary functionality. Microsoft also seems to be actively adding functionality and support since it acquired the SourceSafe product. Implementations of SourceSafe are available for UNIX, Macintosh, and Windows.

http://www.mks.com

The MKS Source Integrity toolset is another revision control system. I haven't actually seen this implementation, but because it's from MKS, you can bet it has an implementation for Windows. Contact MKS through its Web page at http://www.mks.com.
Each of these tools has advantages and disadvantages, and there are certainly other tools available that I'm not aware of. Investigate as many systems as you can, then choose one and stick with it. The process of checking the archives' elements in and out each time you wish to update them might seem a bit rigorous at first, especially for those who've never used a revision control system, but in the long run, revision control always pays off, and you'll be glad you took the time to implement it.