2.4. Writing in DocBook XML

While tools for writing XML are not as developed as those for SGML, there are a few reasons why you may want to start writing in XML:

  1. Libraries for handling XML files are developing at a rapid pace. These utilities may make it easier for new authoring tools to become available.

  2. Many popular word processing programs are now creating XML output. While it may not be DocBook XML, this does make it easier for application writers to either add DocBook XML support, or provide some method of translating between their format and DocBook XML.

  3. Everyone else is doing it. While this might not be a real reason, it allows the LDP to keep up-to-date with similar projects. Tools, procedures, and issues can be worked out in a common framework.

The real intent of this section is to get you familiar with the changes between writing in previous versions of DocBook SGML and DocBook XML. Since the LDP supports DocBook SGML 3.1 (which much of this Guide is written against) and up, and DocBook XML 4.1 and up, there will be a few differences.

In the following sections, if you see DocBook follwed by XML or SGML, it refers to the XML or SGML version of DocBook. If DocBook is followed by a version number, it refers to both the XML and SGML versions of DocBook.

2.4.1. Differences between XML and SGML

There are a few changes between writing XML and SGML. Handling these differences should be relatively easy for most small documents, and many authors will not need to make any changes except for the XML declaration and DocBook declaration at the start of their document.

For others, here is a list of what you should keep in mind when writing.

  • Most tags are case-dependent, or at least should have the same case. That is, you do not want to have code like this:

    
<para>This part will fail XML validation</PARA>
    
  • The above being said, most XML-specific tags (like entity) have to be in all capital letters (ENTITY).

  • All arguments to a tag have to be in quotes. This can be either single (') or double (") quotes, but no reverse (`) or smart quotes are allowed.

  • Tags that have no close (like xref) have to have a trailing / as part of the tag. (<xref/>)

  • Processing instructions that get sent to the DSSSL (like <?dbhtml>) have to have a question mark at the end of the tag. The new tag would look like this:

    
<?dbhtml filename="foo"?>
    
  • If you're converting to XML, be sure file names refer to .xml files instead of .sgml. Some tools may get confused if a .sgml file contains XML.

  • Tag minimizations (</>) are not supported. Their use is discouraged in DocBook SGML.

2.4.2. Differences between DocBook 3.x and DocBook 4.x

The big changes between DocBook 3.x and 4.x involve depricated tags, changed tags, and new ones. Almost all authors will run into a changed or depricated tag when going to DocBook 4.x. All tags that have been depricated or changed for 4.x are listed in DocBook: The definitive guide, published by O'Reilly and Associates.

  • The artheader tag has been changed to articleinfo;. Most other header tags have been renamed to info.

  • The graphic tag is being depricated in DocBook 5.x. To prepare for this, you can instead use the mediaobject tag. You can find out using mediaobject at Section 7.1.

  • The file format for imagedata has to be in capital letters. If you use lowercase or mixed-case spellings for your file formats, it will fail.

    Valid:

    
<imagedata format="EPS" fileref="foo.eps">
    

    Invalid:

    
<imagedata format="eps" fileref="foo.eps">