module Pxp_tree_parser:sig
..end
Pxp_document.node
or Pxp_document.document
.ID
. Of course, the indices can also be used to quickly look up
such elements.exception ID_not_unique
Pxp_tree_parser.index
to indicate that the same ID is
attached to several nodesclass type[< clone : 'a; node : 'a Pxp_document.node;
index =
set_node : 'a Pxp_document.node -> unit; .. >
as 'a]object
..end
class[< clone : 'a; node : 'a Pxp_document.node;
hash_index :
set_node : 'a Pxp_document.node -> unit; .. >
as 'a]object
..end
Pxp_tree_parser.index
using
a hash table.
An external entity is a file referenced by another XML text. For example, this document includes "file.xml" as external entity:
<?xml version="1.0"?>
<!DOCTYPE root [
<!ENTITY extref SYSTEM "file.xml">
]>
<root>
&extref;
</root>
(In contrast to this, an internal entity would give the definition
text immediately, e.g. <!ENTITY intref "This is the entity text">
.)
Of course, it does not make sense that the external entity has
another DOCTYPE definition, and hence it is forbidden to use this
feature in "file.xml".
There is no function to exactly parse a file like "file.xml"
as if it was included into a bigger document. The closest behavior show
Pxp_tree_parser.parse_content_entity
and
Pxp_tree_parser.parse_wfcontent_entity
. They implement the
additional constraint that the file has to have a single top-most element.
The following functions also distinguish between validating and well-formedness mode. In the latter mode, many formal document constraints are not enforced. For instance, elements and attributes need not to be declared.
There are, unfortunately, a number of myths about well-formed XML
documents. One says that the declarations are completely
ignored. This is of course not true. For example, the above shown
example includes the external XML entity "file.xml" by reference.
The <!ENTITY>
declaration is respected no matter in which mode
the parser is run. Also, it is not true that the presence of
DOCTYPE
indicates validated mode and the absence well-formedness
mode. The presence of DOCTYPE
is perfectly compatible with
well-formedness mode - only that the declarations are interpreted
in a different way.
If it is tried to parse a document in validating mode, but the
DOCTYPE
is missing, this parser will fail when the root element
is parsed, because its declaration is missing. This conforms to the
XML standard, and also follows the logic that the program calling
the parser is written in the expectation that the parsed file is
validated. If this validation is missing, the program can run into
failed assertions (or worse).
val parse_document_entity : ?transform_dtd:(Pxp_dtd.dtd -> Pxp_dtd.dtd) ->
?id_index:(< clone : 'a; node : 'a Pxp_document.node;
set_node : 'a Pxp_document.node -> unit; .. >
as 'a)
index ->
Pxp_types.config ->
Pxp_types.source -> 'a Pxp_document.spec -> 'a Pxp_document.document
If the optional argument transform_dtd
is passed, the following
modification applies: After the DTD (both the internal and external
subsets) has been read, the function transform_dtd
is called,
and the resulting DTD is actually used to validate the document.
This makes it possible
Pxp_dtd.dtd.id
with a list of allowed ID's)transform_dtd
is missing, the parser
behaves in the same way as if the identity were passed as transform_dtd
,
i.e. the DTD is left unmodified.
If the optional argument id_index
is present, the parser adds
any ID attribute to the passed index. An index is required to detect
violations of the uniqueness of IDs.
val parse_wfdocument_entity : ?transform_dtd:(Pxp_dtd.dtd -> Pxp_dtd.dtd) ->
Pxp_types.config ->
Pxp_types.source ->
(< clone : 'a; node : 'a Pxp_document.node;
set_node : 'a Pxp_document.node -> unit; .. >
as 'a)
Pxp_document.spec -> 'a Pxp_document.document
The option transform_dtd
works as for parse_document_entity
,
but the resulting DTD is not used for validation. It is just
included into the returned document (e.g. useful to get entity
declarations).
val parse_content_entity : ?id_index:(< clone : 'a; node : 'a Pxp_document.node;
set_node : 'a Pxp_document.node -> unit; .. >
as 'a)
index ->
Pxp_types.config ->
Pxp_types.source ->
Pxp_dtd.dtd -> 'a Pxp_document.spec -> 'a Pxp_document.node
<a>...</a>
;
not a sequence like <a>...</a><b>...</b>
). The element is validated
against the passed DTD, but it is not checked whether the element is
the root element specified in the DTD. This function is almost
always the wrong one to call. Rather consider Pxp_tree_parser.parse_document_entity
.
Despite its name, this function cannot parse the content
production defined in the XML specification! This is a misnomer
I'm sorry about. The content
production would allow to parse
a list of elements and other node kinds. Also, this function
corresponds to the event entry point `Entry_element_content
and
not `Entry_content
.
If the optional argument id_index
is present, the parser adds
any ID attribute to the passed index. An index is required to detect
violations of the uniqueness of IDs.
val parse_wfcontent_entity : Pxp_types.config ->
Pxp_types.source ->
(< clone : 'a; node : 'a Pxp_document.node;
set_node : 'a Pxp_document.node -> unit; .. >
as 'a)
Pxp_document.spec -> 'a Pxp_document.node
Pxp_tree_parser.parse_content_entity
.val default_extension : 'a Pxp_document.node Pxp_document.extension as 'a
val default_spec : ('a Pxp_document.node Pxp_document.extension as 'a) Pxp_document.spec
val default_namespace_spec : ('a Pxp_document.node Pxp_document.extension as 'a) Pxp_document.spec