XML

Markup Languages

As you saw in tutorial 1, XML is becoming an essential part of the corporate Digital Nervous System (DNS). Microsoft's focus is on using XML to accomplish three goals: creating messages in a standard format (using BizTalk), separating data and presentation when building Web pages (using Microsoft Internet Explorer 5), and calling methods through firewalls and between different platforms (using the Simple Object Access Protocol [SOAP]).

In this tutorial, we will look at some of the reasons XML is better suited to accomplish these goals than other markup language options, such as Hypertext Markup Language (HTML) or Standard Generalized Markup Language (SGML).

A markup language uses special notation to mark the different sections of a document. In HTML documents, for example, angle brackets (<>) are used to mark the different sections of text. In other kinds of documents, you can have comma-delineated text, in which commas are used as special characters. You can even use binary code to mark up the text, as could be done in a Microsoft Office document. For every markup language, software developers can build an application to read documents written in that markup language. Web browsers will read HTML documents and Microsoft Office will read Office documents. Documents written in XML can be read by customized applications using various parsing objects, or they can be combined with Extensible Stylesheet Language (XSL) and presented in a Web browser.

Documents created using a markup language consist of markup characters and text. The markup characters define the way the text should be interpreted by an application reading this document. For example, in HTML <h1>Introduction</h1> contains the markup characters <h1> and </h1> and the text Introduction. When read by an application that reads HTML—say, a Web browser—the markup characters tell the application that the text Introduction should be displayed using the h1 (heading 1) font.

Thus, when you are using a markup language, you should consider the following three elements:

  • The markup language, which defines the markup characters
  • The markup document, which uses the markup language and consists of markup characters and text
  • The interpreted document, which is a markup document that has been read and interpreted by an application

However, in XML the markup language itself is the only element that is predefined—the designer of an XML document defines the structure of the document and the markup characters. This feature makes XML flexible and allows the data in the interpreted document to be used for a wide variety of purposes. For example, the formatted data in an XML document could be parsed and then displayed to a user, placed in a database, or used by another application.

This tutorial focuses on three markup languages: XML, HTML, and SGML. Let's begin with SGML, the parent language of both HTML and XML.