DTD Basics   «Prev 

Using Attributes with DTDs

Document Type Declaration

DTDs are introduced into XML documents using the document type declaration (i.e., DOCTYPE). A document type declaration is placed in the XML document's prolog and begins with <!DOCTYPE and ends with >. The document type declaration can point to declarations that are outside the XML document (called the external subset) or can contain the declaration inside the document (called internal subset). For example, an internal subset might look like
<!DOCTYPE myMessage [
<!ELEMENT myMessage ( #PCDATA )>
]>

The first myMessage is the name of the document type declaration. Anything inside the square brackets ([]) constitutes the internal subset. As we will see momentarily, ELEMENT and #PCDATA are used in element declarations. External subsets physically exist in a different file that typically ends with the.dtd extension, although this file extension is not required. External subsets are specified using either keyword SYSTEM or PUBLIC. For example, the DOCTYPE external subset might look like
<!DOCTYPE myMessage SYSTEM "myDTD.dtd">

which points to the myDTD.dtd document. Using the PUBLIC keyword indicates that the DTD is widely used (i.e., the DTD for HTML documents). The DTD may be made available in well-known locations for more efficient downloading.
The DOCTYPE
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

uses the PUBLIC keyword to reference the well-known DTD for HTML version 4.01. XML parsers that do not have a local copy of the DTD may use the URL provided to download the DTD to perform validation. Both the internal and external subset may be specified at the same time. For example, the DOCTYPE
<!DOCTYPE myMessage SYSTEM "myDTD.dtd" [
<!ELEMENT myElement ( #PCDATA )>
]>
contains declarations from the myDTD.dtd document as well as an internal declaration.
1) This example provides the machine with specific data.
This example provides the machine with specific data.

2) This example provides the machine with the same amount of data, but in a different form.
This example provides the machine with the same amount of data, but in a different form. But how do you know when data should be represented as character data between tags or as an attribute value with a tag?

3) A combination of these two examples might be appropriate
A combination of these two examples might be appropriate. It makes sense to render the setup and the punchline for the joke to the user. But should they see that the joke is classified as "stale"? Unless that information is to be presented visually to the user, it might be better to bury it in an attribute for machine referece at a later point. You may want to search your joke database for a joke that is not "stale" at some point in the future.