Plasma GitLab Archive
Projects Blog Knowledge

3.2. The class type node

Signature: From Pxp_document:


class type [ 'ext ] node =
  object ('self)
    constraint 'ext = 'ext node #extension

    (* General observers *)

    method extension : 'ext
    method dtd : dtd
    method encoding : Pxp_types.rep_encoding
    method parent : 'ext node
    method root : 'ext node
    method sub_nodes : 'ext node list
    method iter_nodes : ('ext node -> unit) -> unit
    method iter_nodes_sibl : 
           ('ext node option -> 'ext node -> 'ext node option -> unit) -> unit
    method previous_node : 'ext node
    method next_node : 'ext node
    method nth_node : int -> 'ext node
    method node_type : node_type
    method node_position : int
    method node_path : int list
    method data : string
    method position : (string * int * int)
    method comment : string option
    method pinstr : string -> proc_instruction list
    method pinstr_names : string list
    method write : Pxp_types.output_stream -> Pxp_types.encoding -> unit

    (* Attribute observers *)

    method attribute : string -> Pxp_types.att_value
    method required_string_attribute : string -> string
    method optional_string_attribute : string -> string option
    method required_list_attribute : string -> string list
    method optional_list_attribute : string -> string list
    method attribute_names : string list
    method attribute_type : string -> Pxp_types.att_type
    method attributes : (string * Pxp_types.att_value) list
    method id_attribute_name : string
    method id_attribute_value : string
    method idref_attribute_names : string
    method attributes_as_nodes : 'ext node list

    (* Mutating methods *)

    method append_node : 'ext node -> unit
    method insert_nodes : ?pos:int -> 'ext node list -> unit
    method delete : unit
    method delete_nodes : ?pos:int -> ?len:int -> unit -> unit
    method set_nodes : 'ext node list -> unit
    method set_attributes : (string * Pxp_types.att_value) list -> unit
    method set_attribute : (string * Pxp_types.att_value) list -> unit
    method reset_attribute : string -> unit
    method set_comment : string option -> unit
    method set_data : string -> unit
    method add_pinstr : proc_instruction -> unit
    (* DEPRECATED: add_node, quick_set_attributes *)

    (* Cloning methods *)

    method orphaned_clone : 'self
    method orphaned_flat_clone : 'self
    method create_element : 
              ?name_pool_for_attribute_values:Pxp_types.pool ->
              ?position:(string * int * int) ->
              ?valcheck:bool ->
              ?att_values:((string * Pxp_types.att_value) list) ->
              dtd -> node_type -> (string * string) list ->
                  'ext node
    method create_data : dtd -> string -> 'ext node
    (* REMOVED: keep_always_whitespace_mode *)

   (* Validating methods *)

    method classify_data_node : 'ext node -> data_node_classification
    method validate_contents : 
              ?use_dfa:bool -> ?check_data_nodes:bool -> unit -> unit
    method validate_attlist : unit -> unit
    method validate : unit -> unit
    method complement_attlist : unit -> unit

    (* DEPRECATED: local_validate *)


    (* Namespace methods *)

    method normprefix : string
    method localname : string
    method namespace_uri : string
    method namespace_manager : namespace_manager
    method namespace_info : 'ext namespace_info

    (* ... Internal methods are undocumented. *)

  end
;;

3.2.1. The principal structure of document trees

In a document parsed with the default parser settings every node represents either an element or a character data section. There are two classes implementing the two aspects of nodes: element_impl and data_impl. There are configurations which allow more node types to be created, in particular processing instruction nodes, comment nodes, and super root nodes, but these are discussed later. Note that you can always add these extra node types yourself to the node tree no matter what the parser configuration specifies.

The following figure (A tree with element nodes, data nodes, and attributes) shows an example how a tree is constructed from element and data nodes. The circular areas represent element nodes whereas the ovals denote data nodes. Only elements may have subnodes; data nodes are always leaves of the tree. The subnodes of an element can be either element or data nodes; in both cases the O'Caml objects storing the nodes have the class type node.

Attributes (the clouds in the picture) are not directly integrated into the tree; there is always an extra link to the attribute list. This is also true for processing instructions (not shown in the picture). This means that there are separated access methods for attributes and processing instructions contained in the element node. In particular, you can call the attribute method on an element node to get the value of an attribute, and you can call the pinstr method to get the value of a processing instruction. Note that there are also attribute nodes and processing instruction nodes; these are extra node types modifying the basic model discussed here; see below for details.

Only elements, and data sections, and if configured, processing instructions and comments, can occur in the document tree. It is impossible to add entity references to the tree; if the parser finds such a reference, not the reference as such but the referenced text (i.e. the tree representing the structured text) is included into the tree.

Note that the parser collapses as much data material into one data node as possible such that there are normally never two adjacent data nodes. This invariant is enforced even if data material is included by entity references or CDATA sections, or if a data sequence is interrupted by comments. So a &amp; b <-- comment --> c <![CDATA[ <> d]]> is represented by only one data node, for instance. However, you can create document trees manually which break this invariant; it is only the way the parser forms the tree.

The attributes of elements are not part of the document tree, i.e. the sub_nodes method never returns attribute nodes. Normally, attributes are not represented as nodes, but as pairs string * att_value of names and values. Here, att_value is a conventional variant type. There are lots of access methods for attributes, see below. It is optionally possible to wrap the attributes as nodes (method attributes_as_nodes), but even in this case the attributes are outside the regular document tree.

Normally, the processing instructions are also not included into the document tree. They are considered as an extra property of the element containing them, and can be retrieved by the pinstr method of the element node. If this way of handling processing instructions is not exact enough, the parser can optionally create processing instruction nodes that are regular members of the document tree.

The node tree has links in both directions: Every node has a link to its parent (if any), and it has links to the subnodes (see figure Nodes are doubly linked trees). Obviously, this doubly-linked structure simplifies the navigation in the tree; but has also some consequences for the possible operations on trees.

Because every node must have at most one parent node, operations are illegal if they violate this condition. The following figure (A node can only be added if it is a root) shows on the left side that node y is added to x as new subnode which is allowed because y does not have a parent yet. The right side of the picture illustrates what would happen if y had a parent node; this is illegal because y would have two parents after the operation.

The "remove" operation simply removes the links between two nodes. In the picture (A removed node becomes the root of the subtree) the node x is deleted from the list of subnodes of y. After that, x becomes the root of the subtree starting at this node.

It is also possible to make a clone of a subtree; illustrated in The clone of a subtree. In this case, the clone is a copy of the original subtree except that it is no longer a subnode. Because cloning never keeps the connection to the parent, the clones are called orphaned.

3.2.2. Optional features

Parser configuration: As already pointed out, the parser does only create element and data nodes by default. The configuration of the parser can be controlled by the config record (see the module Pxp_yacc). There are a number of optional features that change the way the document trees are formed.

Note that the parser configuration only controls the parser. If you create trees of your own, you can simply add all the additional node types to the tree without needing to enable these features.

The so-called super root node is an extra node at the top of the parsed tree. Normally, the top node of the tree is the outermost element. By setting the option enable_super_root_node the parser creates trees with an artificial top node, the super root node; the outermost element of the document is one of the children of this node. The other children are the comments and processing instructions at top level (if such node types are to be created, too).

The option enable_comment_nodes lets the parser add comment nodes when it parses comments. By default, the parser behaves as if the comments were non-existent. The contents of comments can be queried using the comment method.

The option enable_pinstr_nodes changes the way processing instructions are added to the document. Instead of appending such instructions to their containing elements as additional properties, this mode forces the parser to create real nodes for them. (...)

By default, the parser does not create data nodes for ignorable whitespace. The XML standard allows that elements contain whitespace characters even if they are declared not to contain character data. Because of this, the parser considers such whitespace as ignorable detail of the XML instance, and drops the characters silently. You can change this by setting drop_ignorable_whitespace to false; in this case, every character of the XML instance will be accepted by the parser and will be added to a data node of the document tree.

By default, the parser creates elements with an annotation about the location in the XML source file. You can query this location by calling the method position. As this requires a lot of memory, it is possible to turn this off by setting store_element_positions to false.

There are a number of further configuration options; however, these options do not change the structure of the document tree. See XXX.

Optional features of nodes: Another optional feature is the creation of attribute nodes. This may be required if you want to have data structures that contain attributes together with other types of nodes. The method attributes_as_nodes returns the attributes wrapped into node objects. Note that these nodes are read-only.

Validation options: The document nodes contain the routines validating the document body. Of course, the validation checks depend on what is stored in the DTD object. Note that you always need a DTD object, even if you run the parser in well-formedness mode; this mode is simply a property of the DTD object.

Especially, the DTD object contains the declarations of elements, attribute lists, entities, and notations. Furthermore, the DTD knows whether the document is flagged as "standalone". As a real extension to classic XML processing, the DTD may specify a mix mode between "validating mode" and "well-formedness mode". It is possible to allow non-declared elements in the document, but to check declared elements against their declaration at the same time. Moreover, there is a similar feature for attribute lists; you can allow non-declared attributes and check declared attributes. (Well, the whole truth is that the parser works always in this mix mode, and that the "validating mode" and the "well-formedness mode" are only the extremes of the mix mode.)

3.2.3. Namespaces

Since version 1.1, PXP supports namespaces. In order to simplify the handling of namespace-aware documents PXP applies a transformation to the document which is called "prefix normalization". This transformation ensures that every namespace prefix uniquely identifies a namespace throughout the whole document. For an introduction into this transformation, see XXX.

The important thing here is that the impact of namespaces on the representation of documents is minimal. For elements contained in namespaces, the method node_type will return a type T_element "normprefix:localname" where normprefix is the normalized prefix and localname is the local name of the element type within the namespace. As the normalized prefix identifies the namespace, it is often not necessary to determine the namespace URI of the namespace; the element types are still simple strings which simplifies programming a lot.

The same applies to attributes. Attribute names are now either string containing a colon like "normprefix:localname", or they are strings without colons. In the first case, the attribute belongs to the namespace identified by the normalized prefix, and in the latter case, the attribute is locally defined.

Note that the prefix normalization is a parser option which can be switched on by setting enable_namespace_processing to true. If you create XML trees on your own, it is your task to ensure that the prefixes are normalized.

There are special namespace-aware implementations of the node class type which define additional namespace methods like namespace_uri. It is recommended to use these implementations although it is not strictly necessary.

Furthermore, there is a rather experimental feature enable_namespace_info. This makes the parser store information about the namespace declarations into the namespace_info object. However, these data are incompatible with the rest of the document representation, and it is likely that this feature will be modified in the future. (It is possible to let the namespace_info object create namespace nodes for the containing element. Currently, these namespace nodes reflect the declaration of namespaces before the prefix normalization was applied; I am going to change this such that they reflect the declaration after the normalization.)

3.2.4. Types

  • Type: node_type

    Description: This type enumerates the possible node types:

    • T_element name : The node is an element and has element type name

    • T_data : The node is a data node

    • T_super_root : The node is a super root node

    • T_pinstr name : The node contains a processing instruction with target name

    • T_comment : The node is a comment

    • T_attribute name : The node contains an attribute called name

    • T_namespace prefix : The node identifies a namespace for the prefix

    • T_none : This is a "bottom value" used if there is no reasonable type.

  • Type: data_node_classification

    Description: This type enumerates the result values of the method classify_data_node . See the description of this method.

  • Type: att_value

    Description: Enumerates the possible values of an attribute:

    • Value s : The attribute is declared as a non-list type, or the attribute is undeclared; and the attribute is either defined with value "s" , or it is missing but has the default value s .

    • Valuelist [s1;...;sk ]: The attribute is declared as a list type, and the attribute is either defined with value "s1 ... sk" , or it is missing but has the default value "s1 ... sk" . The components of the list must be separated by whitespace.

    • Implied_value : The attribute is declared without default value, and there is no definition for the attribute.

  • Type: 'ext node

    Signature:

    class type 'ext node = object ... end

    Description: This is the common class type of all classes representing nodes.

    Not all classes implement all methods. As the type system of O'Caml demands that there must be always a method definition for all methods of the type, methods will raise the exception Method_not_applicable if they are called on a class not supporting them. The exception Namespace_method_not_applicable is reserved for the special case that a namespace method is invoked on a class that does not support namespaces.

3.2.5. The methods of the class type node

General observers

  • Method: obj # extension

    Description: Returns the extension object of the node object obj .

    Domain: Applicable to element, data, comment, processing instruction, and super root nodes.

  • Method: obj # dtd

    Description: Returns the DTD.

    Domain: All node types. Note (1) that exemplars need not to have an associated DTD, in which case this method fails. (2) Even in well-formedness mode every node has a DTD object; this object specifies well-formedness mode.

  • Method: obj # encoding

    Description: Get the encoding which is always the same as the encoding of the DTD. See also method dtd . (Note: This method fails, too, if no DTD is present.)

    Domain: All node types. Note that exemplars need not to have an associated DTD, in which case this method fails.

  • Method: obj # parent

    Description: Get the parent node, or raise Not_found if this node is a root node. For attribute and namespace nodes, the parent is artificially defined as the element to which these nodes apply.

    Domain: All node types.

  • Method: obj # root

    Description: Gets the root node of the tree. Every node is contained in a tree with a root, so this method always succeeds. Note that this method searches the root, which costs time proportional to the length of the path to the root.

    Domain: All node types.

  • Method: obj # sub_nodes

    Description: Returns the regular children of the node as list. Only Elements, data nodes, comments, and processing instructions can occur in this list; attributes and namespace nodes are not considered as regular nodes, and super root nodes can only be root nodes and will never be children of another node. The returned list is always empty if obj is a data node, comment, processing instruction, attribute, or namespace.

    Domain: All node types.

  • Method: obj # iter_nodes f

    Description: Iterates over the regular children of obj , and calls the function f for every child ch: f ch . The regular children are the nodes returned by sub_nodes , see there for an explanation.

    Domain: All node types.

    See also: Iterators

  • Method: obj # iter_nodes_sibl f

    Description: Iterates over the regular children of obj , and calls the function f for every child: f pred ch succ .

    • ch is the child

    • pred is None if the child is the first in the list, and Some p otherwise; p is the predecessor of ch

    • succ is None if the child is the last in the list, and Some s otherwise; s is the successor of ch

    The regular children are the nodes returned by sub_nodes , see there for an explanation.

    Domain: All node types.

    See also: Iterators

  • Method: obj # previous_node

    Description: Returns the predecessor of obj in the list of regular children of the parent, or raise Not_found if this node is the first child. This is equivalent to obj # parent # nth_node (obj # node_position - 1) .

    Domain: All node types.

  • Method: obj # next_node

    Description: Returns the successor of obj in the list of regular children of the parent, or raise Not_found if this node is the last child. This is equivalent to obj # parent # nth_node (obj # node_position + 1) .

    Domain: All node types.

  • Method: obj # nth_node n

    Description: Returns the n-th regular child of obj , n >= 0 . Raises Not_found if the index n is out of the valid range.

    Domain: All node types.

  • Method: obj # node_type

    Description: Returns the type of obj :

    • T_element t : The node is an element with type t

    • T_data : The node is a data node

    • T_comment : The node is a comment node

    • T_pinstr n : The node is a processing instruction with target n

    • T_super_root : The node is a super root node

    • T_attribute n : The node is an attribute with name n

    • T_namespace p : The node is a namespace with prefix p

    Domain: All node types.

  • Method: obj # node_position

    Description: Returns the position of obj among all children of the parent node. Positions are counted from 0. There are several cases:

    • The regular nodes get positions from 0 to l-1 where l is the length of the list of regular children.

    • Attribute nodes and namespace nodes are irregular nodes, which means here that their positions are counted seperately. All attribute nodes have positions from 0 to m-1; all namespace nodes have positions from 0 to n-1.

    • If obj is a root, this method raises Not_found

    Domain: All node types.

  • Method: obj # node_path

    Description: Returns the list of node positions describing the location of this node in the whole tree. The list describes the path from the root node down to this node; the first path element is the index of the child of the root, the second path element is the index of the child of the child, and so on, and the last path element is the index of this node. The method returns [ ] if this node is the root node.

    Attribute and namespace nodes are not part of the regular tree, so there is a special rule for them. Attribute nodes of an element node x have the node path x # node_path @ [-1; p ] where p is the position of the attribute node. Namespace nodes of an element node x have the node path x # node_path @ [-2; p ] where p is the position of the namespace node. (This definition respects the document order.)

    Domain: All node types.

  • Method: obj # data

    Description: This method returns what is considered as the data of the node which depends on the node type:

    • Data nodes: the method returns the character string the node represents

    • Element nodes, super root nodes: the method returns the concatenated character strings of all (direct or indirect) data nodes below obj

    • Comment nodes: the method returns the comment string (without delimiters), or it raises Not_found if the comment string is not set

    • Processing instructions: the method returns the data part of the instruction, or "" if the data part is missing

    • Attribute nodes: the method returns the attribute value as string, or it raises Not_found if the attribute is implied.

    • Namespace nodes: the method returns the namespace URI

    Domain: All node types.

  • Method: obj # position

    Description: Returns a triple (entity,line,pos) describing the location of the element in the original XML text. This triple is only available for elements, and only if the parser has been configured to store positions (see parser option store_element_positions ). If available, entity describes the entity where the element occurred, line is the line number >= 1 , and pos is the byte position of the first character of the element in the line.

    If unavailable, the method will return the triple ("?",0,0) .

    Domain: All node types. Note that the method will always return ("?",0,0) for non-element nodes.

  • Method: obj # comment

    Description: Returns Some text if the node is a comment node and if text is the comment string (without the delimiters <-- and --> ). Otherwise, None is passed back.

    Note: The data method also returns the comment string, but it raises Not_found if the string is not available.

    Domain: All node types. Note that the method will always return None for non-comment nodes.

  • Method: obj # pinstr n

    Description: Returns all processing instructions that are directly contained in obj and that have a target specification of n .

    Domain: All node types. However, this method is only reasonable for processing instruction nodes, and for elements; for all other node types the method will return the empty list. Note that the parser can be configured such that it creates processing instruction nodes or not; in the first case, only the processing instruction nodes contain processing instruction, in the latter case, only the elements embracing the instructions contain them.

  • Method: obj # pinstr_names

    Description: Returns the targets of all processing instructions that are directly contained in obj .

    Domain: All node types. However, this method is only reasonable for processing instruction nodes, and for elements; for all other node types the method will return the empty list. Note that the parser can be configured such that it creates processing instruction nodes or not; in the first case, only the processing instruction nodes contain processing instruction, in the latter case, only the elements embracing the instructions contain them.

  • Method: obj # write ~prefixes stream enc

    Description: Write the contents of this node and the subtrees to the passed stream encoded as enc . The generated output is again XML. The output style is rather compact and should not be considered as "pretty printing".

    Option ~prefixes : The class namespace_element_impl interprets this option and passes it recursively to subordinate invocations of write . The meaning is that the normprefixes enumerated by this list have already been declared by surrounding elements. The option defaults to [] forcing the method to output all necessary prefix declarations.

    Option ~default : Specifies the normprefix that becomes the default namespace in the output.

    KNOWN BUG: comment nodes are not printed.

    Domain: All regular node types (elements, data nodes, comments, processing instructions, super root nodes).

Attribute observers

  • Method: obj # attribute name

    Description: Returns the value of the attribute name .

    If the parser is in validating mode, the method is able to return values for declared attributes, and it raises Not_found for any undeclared attribute. Note that it even returns a value if the attribute is actually missing but is declared as #IMPLIED or has a default value.

    If the parser (more precisely, the DTD object) is in well-formedness mode, the method is able to return values for defined attributes, and it raises Not_found for any unknown attribute name.

    Possible return values are:

    • Implied_value : The attribute has been declared with the keyword #IMPLIED , and the attribute definition is missing in the attribute list of the element.

    • Value s : The attribute has been declared as type CDATA , as ID , as IDREF , as ENTITY , or as NMTOKEN , or as enumeration or notation, and one of the two conditions holds: (1) The attribute value is defined in the attribute list in which case this value is returned in the string s . (2) The attribute has been omitted, and the DTD declares the attribute with a default value. The default value is returned in s .

      Summarized, Value s is returned for non-implied, non-list attribute values.

      Furthermore, Value s is returned for non-declared attributes if the DTD object allows this, for instance, if the DTD object specifies well-formedness mode.

    • Valuelist l : The attribute has been declared as type IDREFS , as ENTITIES , or NMTOKENS , and one of the two conditions holds: (1) The attribute value is defined in the attribute list in which case the space-separated tokens of the value are returned in the string list l . (2) The attribute has been omitted, and the DTD declares the attribute with a default value. The default value is returned in l .

      Summarized, Valuelist l is returned for all list-type attribute values.

    Note that before the attribute value is returned, the value is normalized. This means that newlines are converted to spaces, and that references to character entities (i.e. &#n; ) and general entities (i.e. &name; ) are expanded; if necessary, the expansion is performed recursively.

    Domain: All node types. However, only elements and attribute nodes will return values, all other node types always raise Not_found .

  • Method: obj # required_string_attribute name

    Description: Returns the value of the attribute name as string, i.e. if the value of the attribute is Value s , this method will return simply s , and if the value is Valuelist l , this method will return the elements of l separated by spaces. If the attribute value is Implied_value , the method will fail.

    Domain: All node types. However, only elements and attribute nodes will return values, all other node types always fail.

  • Method: obj # optional_string_attribute name

    Description: Returns the value of the attribute name as optional string, i.e. if the value of the attribute is Value s , this method will return Some s , and if the value is Valuelist l , this method will return Some s where s consists of the concatenated elements of l separated by spaces. If the attribute value is Implied_value , the method will return None .

    Domain: All node types. However, only elements and attribute nodes will return Some values, all other node types always return None .

  • Method: obj # required_list_attribute name

    Description: Returns the value of the attribute name as string list, i.e. if the value of the attribute is Valuelist l , this method will return simply l , and if the value is Value s , this method will return the one-element list [s ]. If the attribute value is Implied_value , the method will fail.

    Domain: All node types. However, only elements and attribute nodes will return values, all other node types always fail.

  • Method: obj # required_list_attribute name

    Description: Returns the value of the attribute name as string list, i.e. if the value of the attribute is Valuelist l , this method will return simply l , and if the value is Value s , this method will return the one-element list [s ]. If the attribute value is Implied_value , the method will return the empty list [ ].

    Domain: All node types. However, only elements and attribute nodes will return non-empty values, all other node types always return the empty list.

  • Method: obj # attribute_names

    Description: Returns the list of all attribute names of this element. In validating mode, this list is simply the list of declared attributes. In well-formedness mode, this list is the list of defined attributes.

    Domain: All node types. However, only elements and attribute nodes will return a non-empty list, all other node types always return the empty list.

  • Method: obj # attribute_type name

    Description: Returns the type of the attribute name . If the attribute is declared, the declared type is returned. If the attribute is defined but undeclared, the type A_cdata will be returned. (The module Pxp_types contains the Caml type of attribute types.) This method raises Not_found if the attribute is unknown.

    Domain: All node types. However, only elements and attribute nodes will return values, all other node types always raise Not_found .

  • Method: obj # attributes

    Description: Returns the list of (name,value) pairs describing all attributes (declared attributes plus defined attributes).

    Domain: All node types. However, only elements and attribute nodes will return non-empty values, all other node types always return the empty list.

  • Method: obj # id_attribute_name

    Description: Returns the name of the (at most one) attribute being declared as type ID . The method raises Not_found if there is no declared ID attribute for the element type.

    Domain: All node types. However, only elements and attribute nodes will return names, all other node types always raise Not_found .

  • Method: obj # id_attribute_value

    Description: Returns the string value of the (at most one) attribute being declared as type ID . The method raises Not_found if there is no declared ID attribute for the element type.

    Domain: All node types. However, only elements and attribute nodes will return names, all other node types always raise Not_found .

  • Method: obj # idref_attribute_names

    Description: Returns the names of the attributes being declared as type IDREF or IDREFS .

    Domain: All node types. However, only elements and attribute nodes will return names, all other node types always return the empty list.

  • Method: obj # attributes_as_nodes

    Description: Returns all attributes (i.e. declared plus defined attributes) as a list of attribute nodes with node type T_attribute name .

    This method should be used if it is required for typing reasons that the attributes have also type node . A common example are sets that may both contain elements and attributes, as they are used in the XPath language.

    The attribute nodes are read-only; any call to a method modifying their contents will raise Method_not_applicable . In order to get the value of such an attribute node anode , one can invoke the method attribute :

    anode # attribute name

    where name is the name of the attribute represented by anode . This will return the attribute value as att_value . Of course, the other attribute observers can be applied as well. Furthermore, the method data will return the attribute value as string. However, every attribute node only contains the value of the one attribute it represents, and it does not make sense to pass names of other attributes to the observer methods.

    The attribute nodes live outside of the regular XML tree, and they are not considered as children of the element node. However, the element node is the parent node of the attribute nodes (i.e. the children/parent relationship is asymmetric).

    The method attributes_as_nodes computes the list of attribute nodes when it is first invoked, and it will return the same list again in subsequent invocations.

    Domain: This method is only applicable to elements.

Mutating methods The following methods may be used to modify the XML tree. These methods do not validate the changes, and the result may be invalid.

  • Method: obj # append_node n

    Description: Adds the node n to the list of children of obj . The method expects that n is a root, and it requires that n and obj share the same DTD.

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: This method is only applicable to element nodes.

  • Method: obj # insert_nodes ~pos nl

    Description: Inserts the list of nodes nl in-place into the list of children of obj . The insertion is performed at position pos , i.e. in the modified list of children, the first element of nl will have position pos . If the optional argument pos is not passed to the method, the list nl is appended to the list of children.

    The method requires that all elements of the list nl are roots, and that all elements and obj share the same DTD.

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: Elements.

  • Method: obj # remove ()

    Description: Removes obj from the tree. After this operation, obj is no longer the child of the former father node, i.e. it does neither occur in the former father's list of children nor is the former father the parent of obj . The node obj becomes orphaned.

    If obj is already a root, remove does nothing.

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: Elements, comments, processing instructions, data nodes, super root nodes.

  • Method: obj # remove_nodes ~pos ~len ()

    Description: Removes the specified nodes from the list of children of obj . The method deletes the nodes from position pos to pos+len-1 . The optional argument pos defaults to 0. The optional argument len defaults to the length of the children list.

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: Elements.

  • Method: obj # set_nodes l

    Description: Sets the list of children to l . It is required that every member of l is either a root or was already a children of this node before the method call, and it is required that all members and the current object share the same DTD.

    Former children which are not members of l are removed from the tree and get orphaned (see method remove ).

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: Elements.

  • Method: obj # set_attributes al

    Description: Sets the attributes of this element to al .

    Note that this method does not add missing attributes that are declared in the DTD. It also never rejects undeclared attributes. The passed values are not checked.

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: Elements.

  • Method: obj # set_attribute ~force n v

    Description: Sets the attribute n of this element to the value v . By default, it is required that the attribute n has already some value. If you pass ~force:true, the attribute is added to the attribute list if it is missing.

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: Elements.

  • Method: obj # reset_attribute n

    Description: If the attribute n is a declared attribute, it is set to its default value, or to Implied_value if there is no default (the latter is performed even if the attribute is #REQUIRED ). If the attribute is an undeclared attribute, it is removed from the attribute list.

    The idea of this method is to simulate what had happened if n had not been defined in the attribute list of the XML element. In validating mode, the parser would have chosen the default value if possible, or Implied_value otherwise, and in well-formedness mode, the attribute would be simply missing in the attribute list.

    Note: It is intentionally not possible to remove a declared attribute. (However, you can remove it by calling set_attributes, but this would be very inefficient.)

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: Elements.

  • Method: obj # set_comment c

    Description: Sets the comment string contained in comment nodes, if c = Some s . Otherwise, this method removes the comment string ( c = None ).

    Note that the comment string must not include the delimiters <-- and --> . Furthermore, it must not contain any character or character sequence that are forbidden in comments, such as "--" . However, this method does not check this condition.

    Domain: Comment nodes.

  • Method: obj # set_data s

    Description: This method sets the character string contained in data nodes.

    Note: This method does not check whether the modified XML tree is still valid.

    Domain: Data nodes.

  • Method: obj # add_pinstr pi

    Description: Adds the processing instruction pi to the set of processing instructions contained in obj . If obj is an element node, you can add any number of processing instructions. If obj is a processing instruction node, you can put at most one processing instruction into this node.

    Domain: Elements, and processing instruction nodes.

Cloning methods

  • Method: obj # orphaned_clone

    Description: Returns a clone of the node and the complete tree below this node (deep clone). The clone does not have a parent (i.e. the reference to the parent node is not cloned). While copying the subtree strings are skipped; normally the original tree and the copy tree share strings. Extension objects are cloned by invoking the clone method on the original objects; how much of the extension objects is cloned depends on the implemention of this method.

    Domain: All node types.

  • Method: obj # orphaned_flat_clone

    Description: return a clone of this element where all subnodes are omitted. The type of the node, and the attributes are the same as in the original node. The clone has no parent.

    Domain: All node types.

  • Method: obj # create_element ~name_pool_for_attribute_values ~position ~valcheck ~att_values dtd ntype att_list

    Description: Returns a flat copy of this element node with the following modifications:

    • The DTD is set to dtd

    • The node type is set to ntype (which must be T_element name )

    • The attribute list is set to the concatenation of att_list and att_values ; att_list passes attribute values as strings while att_values passes attribute values as type att_value

    • The copy does not have children nor a parent

    • The copy does not contain processing instructions.

    • The position triple is set to position

    Note that the extension object is copied, too.

    If valcheck = true (the default), it is checked whether the element type exists and whether the passed attributes match the declared attribute list. Missing attributes are automatically added, if possible. If valcheck = false , any element type and any attributes are accepted.

    If a name_pool_for_attribute_values is passed, the attribute values in att_list are put into this pool.

    The optional arguments have the following defaults:

    • ~name_pool_for_attribute_values : No pool is used

    • ~position : The position is not available in the copy

    • ~valcheck : false

    • ~att_values : empty

    Domain: Elements.

  • Method: obj # create_data dtd cdata

    Description: Returns a flat copy of this data node with the following modifications:

    • The DTD is set to dtd

    • The character string is set to cdata

    Note that the extension object is copied, too.

    Domain: Data nodes.

    See also:

  • Method: obj # create_other ~position dtd ntype

    Description: Returns a flat copy of this node with the following modification:

    • The DTD is set to dtd

    • The position triple is set to position

    Note that the extension object is copied, too.

    The passed node type ntype must match the node type of obj .

    Domain: Super root nodes, processing instruction nodes, comment nodes

Validating methods The following methods validate nodes.

  • Method: obj # classify_data_node n

    Description: Classifies the passed data node n , and returns whether it is reasonable to append the data node to the list of subnodes (using append_node ). The following return values are possible:

    • CD_normal : Adding n does not violate any validation constraint

    • CD_other : n is not a data node

    • CD_empty : The element obj is declared as EMTPY , and n contains the empty string. It is allowed to append n but it does not make sense

    • CD_ignorable : The element obj is declared such that it is forbidden to put character data into it. However, the node n only contains white space which is allowed as an exception to this rule. This means that it is allowed to append n but n would not contain any information except formatting hints.

    • CD_error e : It is an error to append n . The exception e , usually a Validation_error , contains details about the problem.

    Note that the method always returns and never raises an exception.

    Domain: Elements.

  • Method: obj # validate_contents ?use_dfa ?check_data_nodes ()

    Description: Checks that the subnodes of this element match the declared content model of this element. The method returns () if the element is okay, and it raises an exception if an error is found (in most cases Validation_error ).

    This check is always performed by the parser, such that software that only reads parsed XML trees needs not call this method. However, if software modifies the tree itself, an invocation of this method ensures that the validation constraints about content models are fulfilled.

    Note that the check is not performed recursively.

    • Option ~use_dfa : If true, the deterministic finite automaton of regexp content models is used for validation, if available. Defaults to false.

    • Option ~check_data_nodes : If true, it is checked whether data nodes only occur at valid positions. If false, these checks are left out. Defaults to true. (Usually, the parser turns this feature off because the parser already performs a similar check.)

      See classify_data_node for details about what is checked.

    In previous releases of PXP, this method was called local_validate .

    Domain: All node types. However, there are only real checks for elements; for other nodes, this method is a no-op.

  • Method: obj # validate_attlist ()

    Description: Checks whether the attribute list of the element obj matches the declared attribute list. The method returns () if the attribute list is formed correctly, and it raises an exception (usually a Validation_error ) if there is an error.

    This check is implicitly performed by create_element unless the option ~valcheck:false has been passed. This means that it is usually not necessary to call this method; however, if the attribute list has been changed by set_attributes or if ~valcheck:false is in effect, the invocation of this method ensures the validity of the attribute list.

    Note that the method complains about missing attributes even if these attributes have been declared with a default value or as being #IMPLIED ; this method only checks the attributes but does not modify the attribute list. If you know that attributes are missing and you want to add them automatically just as create_element does, you can call complement_attlist before doing this check.

    Domain: All node types. However, for non-element nodes this check is a no-op.

  • Method: obj # validate ()

    Description: Calls validate_contents and validate_attlist , and ensures that this element is locally valid. The method returns () if the element is valid, and raises an exception otherwise.

    Domain: All node types. However, for non-element nodes this check is a no-op.

  • Method: obj # complement_attlist ()

    Description: Adds attributes that are declared in the DTD but are currently missing: #IMPLIED attributes are added with Implied_value , and if there is a default value for an attribute, this value is added. #REQUIRED attributes are set to Implied_value , too.

    It is only necessary to call this method if the element is created with ~valcheck:false, or the attribute list has been modified, and the element must be validated.

    Domain: Elements.

Namespace methods The following methods are about namespaces. They are only available if the class supports namespaces, otherwise you get a Namespace_method_not_applicable exception. See XXX.

  • Method: obj # normprefix

    Description: For namespace-aware implementations of the node class, this method returns the normalized prefix of the element or attribute. If the object does not have a prefix, "" will be passed back.

    The normalized prefix is the part of the name before the colon. It is normalized because the parser ensures that every prefix corresponds only to one namespace. Note that the prefix can be different than in the parsed XML source because the normalization step needs to change the prefix to avoid prefix conflicts.

    Domain: Elements and attributes supporting namespaces.

  • Method: obj # localname

    Description: For namespace-aware implementations of the node class, this method returns the local part of the name of the element or attribute.

    The local name is the part of the name after the colon, or the whole name if there is no colon.

    Domain: Elements and attributes supporting namespaces.

  • Method: obj # namespace_uri

    Description: For namespace-aware implementations of the node class, this method returns the namespace URI of the element, attribute or namespace. It is required that a namespace manager is available.

    If the node does not have a namespace prefix, and there is no default namespace, this method returns "".

    The namespace URI is the unique name of the namespace.

    Domain: Elements and attributes supporting namespaces; furthermore namespace nodes.

  • Method: obj # namespace_manager

    Description: For namespace-aware implementations of the node class, this method returns the namespace manager. If the namespace manager has not been set, the exception Not_found is raised.

    The namespace manager is an object that holds the mapping from namespace prefixes to namespace URIs, and vice versa. It is contained in the DTD.

    Domain: Elements and attributes supporting namespaces; furthermore namespace nodes.

  • Method: obj # namespace_info

    Description: Returns additional information about the namespace prefixes in the parsed XML source. This method has been added for better XPath conformance. Note that it is still experimental and it is likely that it will be changed.

    This record is only available if the parser has been configured to support namespaces, and if the parser has been configured to set this record (requires a lot of memory). Furthermore, only the implementation namespace_element_impl supports this method.

    This method raises Not_found if the namespace_info field has not been set.

    Domain: Elements supporting namespaces.

3.2.6. The class element_impl

Class: 'ext element_impl

Description: This class is an implementation of node which realizes element nodes. You can create a new object by

let exemplar = new element_impl ext_obj

which creates a special form of empty element which already contains a reference to the ext_obj , but is otherwise empty. This special form is called an element exemplar. In order to get a working element that can be used in a node tree it is required to apply the method create_element on the exemplar object.

Note that the class element_impl is not namespace-aware.

Example First, create an exemplar by

let exemplar     = new element_impl ext_obj in
The exemplar is not used in node trees, but only as a pattern when the element nodes are created:
let element = exemplar # create_element dtd (T_element name) attlist 
The element is a copy of exemplar (even the extension ext_obj has been copied) which ensures that element and its extension are objects of the same class as the exemplars; note that you need not to pass a class name or other meta information. The copy is initially connected with the dtd, it gets a node type, and the attribute list is filled. The element is now fully functional; it can be added to another element as child, and it can contain references to subnodes.

3.2.7. The class data_impl

Class: 'ext data_impl

Description: This class is an implementation of node which realizes data nodes. You can create a new object by

let exemplar = new data_impl ext_obj

which creates a special form of empty data node which already contains a reference to the ext_obj , but is otherwise empty. This special form is called a data exemplar. In order to get a working data node that can be used in a node tree it is required to apply the method create_data on the exemplar object.

Example First, create an exemplar by

let exemplar     = new data_impl ext_obj in
The exemplar is not used in node trees, but only as a pattern when the data nodes are created:
let data_node = exemplar # create_data dtd "The characters contained in the data node" 
The data_node is a copy of exemplar. The copy is initially connected with the dtd, and it is filled with character material. The data_node is now fully functional; it can be added to an element as child.

3.2.8. The classes super_root_impl, pinstr_impl, and comment_impl

Class: 'ext super_root_impl

Description: This class is an implementation of node which realizes super root nodes. You can create a new object by

let exemplar = new super_root_impl ext_obj

which creates a special form of empty super root which already contains a reference to the ext_obj , but is otherwise empty. This special form is called a super root exemplar. In order to get a working node that can be used in a node tree it is required to apply the method create_other on the exemplar object, e.g.

let root = exemplar # create_other dtd

Class: 'ext pinstr_impl

Description: This class is an implementation of node which realizes processing instruction nodes. You can create a new object by

let exemplar = new pinstr_impl ext_obj

which creates a special form of empty node which already contains a reference to the ext_obj , but is otherwise empty. This special form is called a processing instruction exemplar. In order to get a working node that can be used in a node tree it is required to apply the method create_other on the exemplar object, e.g.

let pi = exemplar # create_other dtd

Class: 'ext comment_impl

Description: This class is an implementation of node which realizes comment nodes. You can create a new object by

let exemplar = new comment_impl ext_obj

which creates a special form of empty element which already contains a reference to the ext_obj , but is otherwise empty. This special form is called an comment exemplar. In order to get a working element that can be used in a node tree it is required to apply the method create_other on the exemplar object, e.g.

let comment = exemplar # create_other dtd

3.2.9. Examples: Creating and accessing nodes

Building trees. Here is the piece of code that creates the tree of the figure A tree with element nodes, data nodes, and attributes. The extension object and the DTD are beyond the scope of this example.

let exemplar_ext = ... (* some extension *) in
let dtd = ... (* some DTD *) in

let element_exemplar = new element_impl exemplar_ext in
let data_exemplar    = new data_impl    exemplar_ext in

let a1 = element_exemplar # create_element dtd (T_element "a") ["att", "apple"]
and b1 = element_exemplar # create_element dtd (T_element "b") []
and c1 = element_exemplar # create_element dtd (T_element "c") []
and a2 = element_exemplar # create_element dtd (T_element "a") ["att", "orange"]
in

let cherries = data_exemplar # create_data dtd "Cherries" in
let orange   = data_exemplar # create_data dtd "An orange" in

a1 # append_node b1;
a1 # append_node c1;
b1 # append_node a2;
b1 # append_node cherries;
a2 # append_node orange;
Alternatively, the last block of statements could also be written as:
a1 # set_nodes [b1; c1];
b1 # set_nodes [a2; cherries];
a2 # set_nodes [orange];
The root of the tree is a1, i.e. it is true that
x # root == a1
for every x from { a1, a2, b1, c1, cherries, orange }.

Furthermore, the following properties hold:

  a1 # attribute "att" = Value "apple"
& a2 # attribute "att" = Value "orange"

& cherries # data = "Cherries"
&   orange # data = "An orange"
&       a1 # data = "CherriesAn orange"

&       a1 # node_type = T_element "a"
&       a2 # node_type = T_element "a"
&       b1 # node_type = T_element "b"
&       c1 # node_type = T_element "c"
& cherries # node_type = T_data
&   orange # node_type = T_data

&       a1 # sub_nodes = [ b1; c1 ]
&       a2 # sub_nodes = [ orange ]
&       b1 # sub_nodes = [ a2; cherries ]
&       c1 # sub_nodes = []
& cherries # sub_nodes = []
&   orange # sub_nodes = []

&       a2 # parent == a1
&       b1 # parent == b1
&       c1 # parent == a1
& cherries # parent == b1
&   orange # parent == a2

Searching nodes. The following function searches all nodes of a tree for which a certain condition holds:

let rec search p t =
  if p t then
    t :: search_list p (t # sub_nodes)
  else
    search_list p (t # sub_nodes)

and search_list p l =
  match l with
    []      -> []
  | t :: l' -> (search p t) @ (search_list p l')
;;

For example, if you want to search all elements of a certain type et, the function search can be applied as follows:

let search_element_type et t =
  search (fun x -> x # node_type = T_element et) t
;;

Getting attribute values. Suppose we have the declaration:

<!ATTLIST e a CDATA #REQUIRED
            b CDATA #IMPLIED
            c CDATA "12345">
In this case, every element e must have an attribute a, otherwise the parser would indicate an error. If the O'Caml variable n holds the node of the tree corresponding to the element, you can get the value of the attribute a by
let value_of_a = n # required_string_attribute "a"
which is more or less an abbreviation for
let value_of_a = 
  match n # attribute "a" with
    Value s -> s
  | _       -> assert false
- as the attribute is required, the attribute method always returns a Value.

In contrast to this, the attribute b can be omitted. In this case, the method required_string_attribute works only if the attribute is there, and the method will fail if the attribute is missing. To get the value, you can apply the method optional_string_attribute:

let value_of_b = n # optional_string_attribute "b"
Now, value_of_b is of type string option, and None represents the omitted attribute. Alternatively, you could also use attribute:
let value_of_b = 
  match n # attribute "b" with
    Value s       -> Some s
  | Implied_value -> None
  | _             -> assert false

The attribute c behaves much like a, because it has always a value. If the attribute is omitted, the default, here "12345", will be returned instead. Because of this, you can again use required_string_attribute to get the value.

The type CDATA is the most general string type. The types NMTOKEN, ID, IDREF, ENTITY, and all enumerators and notations are special forms of string types that restrict the possible values. From O'Caml, they behave like CDATA, i.e. you can use the methods required_string_attribute and optional_string_attribute, too.

In contrast to this, the types NMTOKENS, IDREFS, and ENTITIES mean lists of strings. Suppose we have the declaration:

<!ATTLIST f d NMTOKENS #REQUIRED
            e NMTOKENS #IMPLIED>
The type NMTOKENS stands for lists of space-separated tokens; for example the value "1 abc 23ef" means the list ["1"; "abc"; "23ef"]. (Again, IDREFS and ENTITIES have more restricted values.) To get the value of attribute d, one can use
let value_of_d = n # required_list_attribute "d"
or
let value_of_d = 
  match n # attribute "d" with
    Valuelist l -> l
  | _           -> assert false
As d is required, the attribute cannot be omitted, and the attribute method returns always a Valuelist.

For optional attributes like e, apply

let value_of_e = n # optional_list_attribute "e"
or
let value_of_e = 
  match n # attribute "e" with
    Valuelist l   -> l
  | Implied_value -> []
  | _             -> assert false
Here, the case that the attribute is missing counts like the empty list.

3.2.10. The type spec

  • Type: 'ext spec

    Description: The abstract data type specifying which objects are actually created by the parser.

  • Function: make_spec_from_mapping ~super_root_exemplar ~comment_exemplar ~default_pinstr_exemplar ~pinstr_mapping ~data_exemplar ~default_element_exemplar ~element_mapping ()

    Description: Creates a spec from the arguments. Some arguments are optional, some arguments are mandatory.

    • ~super_root_exemplar : Specifies the exemplar to be used for new super root nodes. This exemplar is optional.

    • ~comment_exemplar : Specifies the exemplar to be used for new comment nodes. This exemplar is optional.

    • ~pinstr_exemplar : Specifies the exemplar to be used for new processing instruction nodes by a hashtable mapping target names to exemplars. This hashtable is optional.

    • ~default_pinstr_exemplar : Specifies the exemplar to be used for new processing instruction nodes. This exemplar will be used for targets that are not contained in the ~pinstr_exemplar hashtable. This exemplar is optional.

    • ~data_exemplar : Specifies the exemplar to be used for new data nodes. This exemplar is mandatory.

    • ~element_mapping : Specifies the exemplar to be used for new element nodes by a hashtable mapping element types to exemplars. This hashtable is mandatory (but may be empty).

    • ~default_element_exemplar : Specifies the exemplar to be used for new element nodes. This exemplar will be used for element types that are not contained in the ~element_mapping hashtable. This exemplar is mandatory.

  • Function: make_spec_from_alist ~super_root_exemplar ~comment_exemplar ~default_pinstr_exemplar ~pinstr_alist ~data_exemplar ~default_element_exemplar ~element_alist ()

    Description: Creates a spec from the arguments. This is a convenience function for make_spec_from_mapping ; instead of requiring hashtables the function allows it to pass associative lists.

  • Function: create_data_node spec dtd datastring

    Description: Creates a new data node from the exemplar contained in spec . The new node contains datastring and is connected with the dtd .

  • Function: create_element_node ~name_pool_for_attribute_values ~position ~valcheck ~att_values spec dtd eltype att_list

    Description: Creates a new element node from the exemplar(s) contained in spec :

    • The new node will be connected to the passed dtd .

    • The new node will have the element type eltype .

    • The attributes of the new node will be the concatenation of att_list and att_values ; att_list passes attribute values as strings while att_values passes attribute values as type att_value

    • The source position is set to ~position (if passed)

    • The ~name_pool_for_attribute_values will be used, if passed.

    • If ~valcheck = true (the default), the attribute list is immediately validated. If ~valcheck = false , the validation is left out; in this case you can pass any element type and and any attributes, and it does not matter whether and how they are declared.

  • Function: create_super_root_node ~position spec dtd

    Description: Creates a new super root node from the exemplar contained in spec . The new node is connected to dtd , and the position triple is set to ~position .

    The function fails if there is no super root exemplar in spec .

  • Function: create_pinstr_node ~position spec dtd pi

    Description: Creates a new processing instruction node from the exemplar contained in spec . The new node is connected to dtd , and the position triple is set to ~position . The contents of the node are set to pi .

    The function fails if there is no processing instruction exemplar in spec .

  • Function: create_comment_node ~position spec dtd commentstring

    Description: Creates a new comment node from the exemplar contained in spec . The new node is connected to dtd , and the position triple is set to ~position . The contents of the node are set to commentstring .

    The function fails if there is no comment exemplar in spec .

3.2.11. Examples: Creating nodes using spec

Building trees. Here is again a piece of code that creates the tree of the figure A tree with element nodes, data nodes, and attributes. Now the type spec is used to encapsulate the exemplar objects.

let exemplar_ext = ... (* some extension *) in
let dtd = ... (* some DTD *) in

let element_exemplar = new element_impl exemplar_ext in
let data_exemplar    = new data_impl    exemplar_ext in

let spec = make_spec_from_alist 
             ~data_exemplar:data_exemplar
             ~default_element_exemplar:element_exemplar
             ~element_alist:[]
             () in

let a1 = create_element_node spec dtd "a" ["att", "apple"]
and b1 = create_element_node spec dtd "b" []
and c1 = create_element_node spec dtd "c" []
and a2 = create_element_node spec dtd "a" ["att", "orange"]
in

let cherries = create_data_node spec dtd "Cherries" in
let orange   = create_data_node spec dtd "An orange" in

a1 # append_node b1;
a1 # append_node c1;
b1 # append_node a2;
b1 # append_node cherries;
a2 # append_node orange;
The type spec is obviously useful as a container for the exemplars. Moreover, the type spec allows it to control which extension object is used for which element type which is explained in the section The class type extension.

3.2.12. Iterators

There are also several iterators in Pxp_document. You can find examples for them in the "simple_transformation" directory.

  • Function: find ~deeply f startnode

    Description: Searches the first node in the tree below startnode for which the predicate f is true, and returns it. Raises Not_found if there is no such node.

    By default, ~deeply=false . In this case, only the children of startnode are searched.

    If passing ~deeply=true , the children are searched recursively (depth-first search). Note that even in this case startnode itself is not checked.

    Attribute and namespace nodes are ignored.

  • Function: find_all ~deeply f startnode

    Description: Searches all nodes in the tree below startnode for which the predicate f is true, and returns them.

    By default, ~deeply=false . In this case, only the children of startnode are searched.

    If passing ~deeply=true , the children are searched recursively (depth-first search). Note that even in this case startnode itself is not checked.

    Attribute and namespace nodes are ignored.

  • Function: find_element ~deeply eltype startnode

    Description: Searches the first element in the tree below startnode that has the element type eltype , and returns it. Raises Not_found if there is no such node.

    By default, ~deeply=false . In this case, only the children of startnode are searched.

    If passing ~deeply=true , the children are searched recursively (depth-first search). Note that even in this case startnode itself is not checked.

  • Function: find_all_elements ~deeply eltype startnode

    Description: Searches all elements in the tree below startnode having the element type eltype , and returns them.

    By default, ~deeply=false . In this case, only the children of startnode are searched.

    If passing ~deeply=true , the children are searched recursively (depth-first search). Note that even in this case startnode itself is not checked.

  • Exception: Skip

    Description: This exception can be used in the functions passed to map_tree , map_tree_sibl , iter_tree , and iter_tree_sibl to skip the current node, and to proceed with the next node. See these function for details.

  • Function: map_tree ~pre ~post startnode

    Description: Maps the tree beginning at startnode to a second tree using the following algorithm.

    startnode and the whole tree below it are recursively traversed. After entering a node, the function ~pre is called. The result of this function must be a new node; it must not have children nor a parent. For example, you can pass ~pre:(fun n -> n # orphaned_flat_clone) to copy the original node. After that, the children are processed in the same way (from left to right) resulting in a list of mapped children. These are added to the mapped node as its children.

    Now, the ~post function is invoked with the mapped node as argument, and the result is the result of the function (~post should return a root node, too; if not specified, the identity is the ~post function).

    Both ~pre and ~post may raise Skip which causes that the node is left out (i.e. the mapped tree does neither contain the node nor any children of the node). If the top node is skipped, the exception Not_found is raised.

    For example, the following piece of code duplicates a tree, but removes all comment nodes:

    map_tree ~pre:(fun n -> if n # node_type = T_comment then raise Skip else n # orphaned_flat_clone) startnode

    Attribute and namespace nodes are ignored.

  • Function: map_tree_sibl ~pre ~post startnode

    Description: Maps the tree beginning at startnode to a second tree using the following algorithm.

    startnode and the whole tree below it are recursively traversed. After entering a node, the function ~pre is called with three arguments: some previous node, the current node, and some next node. The previous and the next node may not exist because the current node is the first or the last in the current list of nodes. In this case, None is passed as previous or next node, resp. The result of this function invocation must be a new node; it must not have children nor a parent. For example, you can pass ~pre:(fun prev n next -> n # orphaned_flat_clone) to copy the original node. After that, the children are processed in the same way (from left to right) resulting in a list of mapped children.

    Now, the ~post function is applied to the list of mapped children resulting in a list of postprocessed children. (Note: this part works rather differently than map_tree .) ~post has three arguments: some previous child, the current child, and some next child. The previous and the next child are None if non-existing. The postprocessed children are appended to the mapped node resulting in the mapped tree.

    Both ~pre and ~post may raise Skip which causes that the node is left out (i.e. the mapped tree does neither contain the node nor any children of the node). If the top node is skipped, the exception Not_found is raised.

    Attribute and namespace nodes are ignored.

  • Function: iter_tree ~pre ~post startnode

    Description: Iterates over the tree beginning at startnode using the following algorithm.

    startnode and the whole tree below it are recursively traversed. After entering a node, the function ~pre is called. Now, the children are processed recursively. Finally, the ~post function is invoked.

    The ~pre function may raise Skip causing that the children and the invocation of the ~post function are skipped. If the ~post function raises Skip nothing special happens.

    Attribute and namespace nodes are ignored.

  • Function: iter_tree_sibl ~pre ~post startnode

    Description: Iterates over the tree beginning at startnode using the following algorithm.

    startnode and the whole tree below it are recursively traversed. After entering a node, the function ~pre is called with three arguments: some previous node, the current node, and some next node. The previous and the next node may be None if non-existing. Now, the children are processed recursively. Finally, the ~post function is invoked with the same three arguments.

    The ~pre function may raise Skip causing that the children and the invocation of the ~post function are skipped. If the ~post function raises Skip nothing special happens.

    Attribute and namespace nodes are ignored.

3.2.13. Stripping whitespace

Type: stripping_mode

Description: The different ways how to strip whitespace from a single data node:

  • `Strip_one_lf : If there is a linefeed character at the beginning/at the end, it will be removed. If there are more linefeed characters, only the first/the last is removed. (This is the SGML rule to strip whitespace.)

  • `Strip_one : If there is a whitespace character at the beginning/at the end, it will be removed. If there are more whitespace characters, only the first/the last is removed. Whitespace characters are space, newline, carriage return, tab.

  • `Strip_seq : All whitespace characters at the beginning/at the end are removed.

  • `Disabled : Do not strip whitespace.

Function: strip_whitespace ~force ~left ~right ~delete_empty_nodes startnode

Description: Modifies the passed tree in-place by the following rules:

  • In general, whitespace stripping is not applied to nodes inside an xml:space="preserve" region, unless ~force:true is passed to the function (default is ~force:false ). Only if whitespace stripping is allowed, the following rules are carried out. Note that the detection of regions with preserved whitespace takes the parent nodes of the passed startnode into account.

  • If applied to a data node, whitespace at the beginning of the node is removed according to ~left , and whitespace at the end of the node is removed according to ~right .

  • If applied to an element, whitespace at the beginning of the first data subnode is removed according to ~left , and whitespace at the end of the last data subnode is removed according to ~right . Furthermore, these rules are recursively applied to all subelements (but not to other node types).

  • If applied to the super root node, this node is treated as if it were an element.

  • Whitespace of other node types is left as-is, as whitespace occuring in attributes.

  • Option ~delete_empty_nodes (default true): If data nodes become empty after removal of whitespace, they are deleted from the XML tree.

Defaults:

  • ~force:false

  • ~left:`Disabled

  • ~right:`Disabled

Examples:

strip_whitespace ~left:`Strip_one_lf ~right:`Strip_one_lf startnode
Strips LF characters according to the SGML rules: One LF is stripped after the start tag, and one before the end tag. xml:space is respected.

iter_tree
  ~pre:(fun n -> if n # node_type = T_data then 
                   n # strip_whitespace 
                    ~force:true ~left:`Strip_seq ~right:`Strip_seq
       )
  startnode
Strips any whitespace characters from every data nodes individually.

Traps: In order to work properly, this function expects a normalized XML tree (no consecutive text nodes, no empty text nodes). If the tree is not normalized, the semantics of strip_whitespace is well-defined, but the function may not do what is expected. Especially, whitespace is not stripped across text nodes. E.g. if the spaces in

<A>  </A>
are stored in two nodes, and ~left:`Strip_seq is demanded, the function will only remove the first space.

3.2.14. Document order

The functions compare and ord_compare implement the so-called "document order". The basic principle is that the nodes are linearly ordered by their occurence in the textual XML representation of the tree. While this is clear for element nodes, data nodes, comments, and processing instructions, a more detailed definition is necessary for the other node types. In particular, attribute nodes of an element node occur before any regular subnode of the element, and namespace nodes of that element occur even before the attribute nodes. So the order of nodes of

<sample a1="5" a2="6"><subnode/></sample> 
is

  1. element "sample"

  2. attribute "a1"

  3. attribute "a2"

  4. element "subnode"

Note that the order of the attributes of the same element is unspecified, so "a2" may alternatively be ordered before "a1". If there were namespace nodes, they would occur between 1 and 2.

If there is a super root node, it will be handled as the very first node.

  • Function: compare n1 n2

    Description: Returns -1 if n1 occurs before n2 , or +1 if n1 occurs after n2 , or 0 if both nodes are identical. If the nodes are unrelated (do not have a common ancestor), the result is undefined (Note: this case is different from ord_compare ). This test is rather slow, but it works even if the XML tree changes dynamically (in contrast to ord_compare below).

  • Type: 'ext ord_index

    Description: The type of ordinal indexes.

  • Function: create_ord_index startnode

    Description: Creates an ordinal index for the subtree starting at startnode . This index assigns to every node an ordinal number (beginning with 0) such that nodes are numbered upon the order of the first character in the XML representation (document order). Note that the index is not automatically updated when the tree is modified.

  • Function: ord_compare idx n1 n2

    Description: Compares two nodes like compare : Returns -1 if n1 occurs before n2 , or +1 if n1 occurs after n2 , or 0 if both nodes are identical. If one of the nodes does not occur in the ordinal index, Not_found is raised. (Note that this is a different behaviour than what compare would do.)

    This test is much faster than compare .

3.2.15. Functions

  • Function: normalize startnode

    Description: Normalizes the tree denoted by startnode such that neither empty data nodes nor adjacent data nodes exist. Normalization works in-place.

  • Function: validate startnode

    Description: Validates the tree denoted by startnode . In contrast to startnode # validate() this function validates recursively.

  • Function: pinstr n

    Description: Returns the processing instruction contained in a processing instruction node. This function raises Invalid_argument if invoked for a different node type than T_pinstr.

  • Function: attribute_name n

    Description: Returns the name of the attribute contained in an attribute node. Raises Invalid_argument if n does not have node type T_attribute .

  • Function: attribute_value n

    Description: Returns the value of the attribute contained in an attribute node. Raises Invalid_argument if n does not have node type T_attribute .

  • Function: attribute_string_value n

    Description: Returns the string value of the attribute contained in an attribute node. Raises Invalid_argument if n does not have node type T_attribute .

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml