Plasma GitLab Archive
Projects Blog Knowledge

******************************************************************************
Extensions of the XML specification
******************************************************************************


==============================================================================
This document
==============================================================================

This parser has some options extending the XML specification. Here, the options 
are explained. 

==============================================================================
Optional declarations instead of mandatory declarations
==============================================================================

The XML spec demands that elements, notations, and attributes must be declared. 
However, there are sometimes situations where a different rule would be better: 
If there is a declaration, the actual instance of the element type, notation 
reference or attribute must match the pattern of the declaration; but if the 
declaration is missing, a reasonable default declaration should be assumed.

I have an example that seems to be typical: The inclusion of HTML into a meta 
language. Imagine you have defined some type of "generator" or other tool 
working with HTML fragments, and your document contains two types of elements: 
The generating elements (with a name like "gen:xxx"), and the object elements 
which are HTML. As HTML is still evolving, you do not want to declare the HTML 
elements; the HTML fragments should be treated as well-formed XML fragments. In 
contrast to this, the elements of the generator should be declared and 
validated because you can more easily detect errors.

The following two processing instructions can be included into the DTD:

-  
   <?pxp:dtd optional-element-and-notation-declarations?>
   
   References to unknown element types and notations no longer cause an  error. 
   The element may contain everything, but it must be still  well-formed. It 
   may have arbitrary attributes, and every attribute is  treated as an 
   #IMPLIED CDATA attribute.
   
-  
   <?pxp:dtd optional-attribute-declarations elements="x y ..."?>
   
   References to unknown attributes inside one of the enumerated elements  no 
   longer cause an error. Such an attribute is treated as an #IMPLIED  CDATA 
   attribute. 
   If there are several "optional-attribute-declarations" PIs, they are all 
   interpreted (implicitly merged).
   
==============================================================================
Normalized namespace prefixes
==============================================================================

The XML standard refers to names within namespaces as expanded names. This is 
simply the pair (namespace_uri, localname); the namespace prefix is not 
included in the expanded name.

PXP does not support expanded names, but it does support namespaces. However, 
it uses a model that is slightly different from the usual representation of 
names in namespaces: Instead of removing the namespace prefixes and converting 
the names into expanded names, PXP prefers it to normalize the namespace 
prefixes used in a document, i.e. the prefixes are transformed such that they 
refer uniquely to namespaces.

The following text is valid XML: 

<x:a xmlns:x="namespace1">
  <x:a xmlns:x="namespace2">
  </x:a>
</x:a>

The first element has the expanded name (namespace1,a) while the second element 
has the expanded name (namespace2,a); so the elements have different types. As 
already pointed out, PXP does not support the expanded names directly (there is 
some support for them in elements, but not in attributes). Alternatively, the 
XML text is transformed while it is being parsed such that the prefixes become 
unique. In this example, the transformed text would read: 

<x:a xmlns:x="namespace1">
  <x1:a xmlns:x1="namespace2">
  </x1:a>
</x:a>

From a programmers point of view, this transformation has the advantage that 
you need not to deal with pairs when comparing names, as all names are still 
simple strings: here, "x:a", and "x1:a". However, the transformation seems to 
be a bit random. Why not "y:a" instead of "x1:a"? The answer is that PXP allows 
the programmer to control the transformation: You can simply demand that 
namespace1 must use the normalized prefix "x", and namespace2 must use "y". The 
declaration which normalized prefix to use can be programmed (by setting the 
namespace_manager object), and it can be included into the DTD: 

<?pxp:dtd namespace prefix="x" uri="namespace1"?>
<?pxp:dtd namespace prefix="y" uri="namespace2"?>

There is another advantage of using normalized prefixes: You can safely refer 
to them in DTDs. For example, you could declare the two elements as 

<!ELEMENT x:a (y:a)>
<!ELEMENT y:a ANY>

These declarations are applicable even if the XML text uses different prefixes, 
because PXP normalizes any prefixes for namespace1 or namespace2 to the 
preferred prefixes "x" and "y". 


This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml