Creating Documents  «Prev  Next»

Lesson 6 Adding information to XML documents
Objective Comments, CDATA sections, and encoding to add clarity

Adding Comments to XML documents

Use comments, CDATA sections, and encoding to add clarity and information to XML documents.
To help make your documents as understandable to humans as they will be to machines, you should consider adding comment tags within your XML document.
Use the same syntax you would for an HTML comment:

<!-- comment text here --> 

Here is an example of a comment in an XML document:
<?xml version="1.0"?>
<inventory-items>
<!--Begin definition of inventory items-->
<item>
 <item-name>Computer Monitor</item-name>
 <item-serial-number>981</item-serial-number>
 <units-on-hand>50</units-on-hand>
</item>
</inventory-items>

Rules for comments

There are some rules that you need to follow when creating comments.
The following table lists the rules and shows examples of correct and incorrect comment usage in an XML document.
XML rules with respect to comments
XML rules with respect to comments

CDATA sections

In large XML documents, you may need to use special characters, such as < and &. Because these characters are used as part of the markup used in XML documents, XML processors will look for these characters to read the XML document content. If you want these characters not to be treated as markup in a smaller document, you may escape these characters by using < for < and & for &. This works, but the text will become awkward and hard to read if you use many such characters.
As an alternative, you can use a CDATA section. CDATA sections instruct the XML processor that the content included in a section is not markup and should not be parsed. As a result, you may include any kind of text in a CDATA section including the special characters < and &.

CDATA syntax

A CDATA section begins with
	
<![CDATA[
and ends with a
]]>
.
When the XML processor encounters the markup
<![CDATA
, it will search for ]]> to find the end of the section. As a result, you cannot include the markup
]]>
anywhere else in a CDATA section. Also, CDATA sections may not be nested.
Here is an example of a CDATA section in an XML document:

<?xml version="1.0"?>
<inventory-items>
 <item>
  <item-name>Computer Monitor</item-name>
  <item-serial-number>981</item-serial-number>
 <![CDATA[ 
  all data included in this section is preserved
  as text. You may include the special characters
  < or > or elements such as <TEST>value</TEST>.
 ]]>
  <units-on-hand>50</units-on-hand>
 </item>
</inventory-items>

To embed script (ECMAScript or any script) within an XML document, presently you must enclose the script in a CDATA section as follows:
< SCRIPT > <![CDATA[Script statements here]]> < /SCRIPT >

Encoding and web balkanization

Bearing in mind that the Internet is a global technology, it behooves you to include information in the xml declaration about what encoding scheme (in other words, the standard character set for a language) you used to create an XML file.

The following xml declaration includes encoding information:
XML Version
  1. XML version = 1.0
  2. You can include encoding information within the XML declaration by adding the ENCODING attribute along with the appropriate value.
  3. When creating an XML document, it is useful for people in other countries or using other encoding schemes to know that our data is in the standard common English character set, which is a subset of UTF-8. UTF stands for UCS (Universal Character Set) Transformation Format. UTF-8 represents a 7-bit character set or the characters 0 through 127.

Adding Information XML Documents
The next lesson concludes this module. The following exercise checks your understanding of creating a well-formed document.
Adding XML Documents - Exercise