XML

Using an Automated Sitemap Tool

Although the Sitemap protocol is a fairly simple XML language to use, you're probably thinking that there has to be a better way to create Sitemaps than sitting there typing in all those URLs by hand. And in fact there is. There are actually several options available to you when it comes to automatically generating a Sitemap from your web pages. One such tool is Google's own Sitemap Generator, which is very powerful but somewhat difficult to use unless you already happen to have experience with Python and you have access to run command-line scripts on your web server. If not, you'll want to investigate other options. If you're already a Python guru and you have server access, feel free to look into the Sitemap Generator at http://www.google.com/webmasters/sitemaps/docs/en/sitemap-generator.html.

Python is an interpreted programming language that is similar in some ways to Perl, which is another popular web development language. Google uses Python throughout many of its tools and services.

Short of using Python, I recommend you consider using an online Sitemap generation tool to automatically generate your Sitemaps. There are many such online tools out there but one that I've found particularly useful is the XML Sitemap Generator, which is located at http://www.xml-sitemaps.com/. This online tool is very easy to use; it basically just requests a starting URL from you and then churns out a Sitemap XML document in return. What's neat is that online Sitemap generation tools such as the XML Sitemap Generator will automatically crawl your entire site figuring out the URLs of all your pages.

To find out about other Sitemap generation tools, visit Google's list of third-party sitemap resources at http://code.google.com/sm_thirdparty.html.

As an example, I fed the home page of my web site into the XML Sitemap Generator, and after it churned for a few minutes crawling through the pages on my site, it created a Sitemap XML file that I could download and post on my site. What I found particularly interesting is that it counted 299 pages on my site. Because my site is set up through a content management system that generates pages dynamically from a database, I never really had a good feel for how many pages were on it. The XML Sitemap Generator not only answered that question for me but it also created for me a Sitemap document ready to feed into Google to ensure that all of my pages get crawled efficiently.

Figure 20.4 shows the XML Sitemap Generator web page as it busily works away crawling my web site and figuring out URLs for the pages within.

Figure 20.4. It's pretty interesting just watching a Sitemap generation tool crawl your site and count the number of pages, along with their sizes.

After the XML Sitemap Generator finishes, it will provide you with links to both uncompressed and compressed versions of the resulting Sitemap. Although you can submit either version to Google, you might as well go with the compressed version to speed up the transfer time unless you just want to open the uncompressed version to study the code. In that case, why not download both? Figure 20.5 shows the results of the XML Sitemap Generator tool, including the links to the newly created Sitemap files.

Figure 20.5. When the XML Sitemap Generator tool finishes, it provides you with links to download the new Sitemap in either uncompressed or compressed form.

You now have a Sitemap document that contains a thorough representation of the pages that compose your web site. You can then turn around and feed this Sitemap document into Google Sitemaps to improve the crawling of all the pages.

You may be wondering if it's possible to automate the process of resubmitting a Sitemap based upon web site changes. The answer is yes, but you'll have to use a special utility on your web server that is capable of running in the background at regular intervals. For example, a UNIX utility called cron is perfectly suited for this task. The cron utility is typically run using the crontab command. Please refer to your specific web server documentation to find out more about setting up recurring commands and/or scripts.