Plasma GitLab Archive
Projects Blog Knowledge

******************************************************************************
Development of PXP
******************************************************************************


==============================================================================
PXP
==============================================================================

PXP is a validating parser for XML-1.0 which has been written entirely in 
Objective Caml. This page contains development information for PXP; if you are 
looking for the stable distribution, please go here [1]. 

==============================================================================
Download
==============================================================================

-  Current stable release: 1.1.6 [2]
   
-  Current development version: 1.2.0test1 [3] 
   
==============================================================================
Version History
==============================================================================

-  1.2.0test*: New ~minimization option for the  [write] and [display] methods 
   (user wish).
   Improvement: better control what is printed as DTD for  document#write and 
   #display
   Fix: [Pxp_document.liquefy] terminates now when invoked  only on a subtree 
   of a document
   Cleaned up the code a bit so fewer warnings are emitted in the build.
   Ported pxp-pp to O'Caml 3.10
   
-  1.1.96: Works now for O'Caml 3.09, too.
   Fix: The "root element check" is disabled  in Pxp_dtd. It did not work 
   together with namespaces.
   Pxp_validate: Fix for namespace mode
   
-  1.1.95: Addition of ulex lexing.
   Fix in Pxp_reader.combine.
   Revised namespace handling: There are now namespace_scope  objects keeping 
   the scoping structure of the namespaces.  The namespace_info stuff has been 
   removed. The "display"  methods can print XML while respecting the scoping 
   structure.  
   New exceptions Namespace_not_managed, Namespace_prefix_not_managed,  
   Namespace_not_in_scope (all replacing Not_found). Methods  of 
   namespace_manager may raise these exceptions.  
   The event-based representation of XML is now symmetrical  to the tree-based 
   representation, such that it is possible  to convert one representation into 
   the other without loss.  The type of events had to be changed to achieve 
   this effect.  
   The new module Pxp_event contains functions for the event-based  
   representation.
   Addition of pxp-pp, the PXP preprocessor.
   This release requires Ocamlnet 0.98. You should also  install ulex. There 
   are no longer precompiled wlex lexers  (use ulex instead).
   
-  1.1.94.2: Again fixes for the combination of 3.07/wlex
   
-  1.1.94.1: Fixes for 3.07 concerning the pregenerated wlexers. - New: 
   Pxp_document.build_node_tree
   
-  1.1.94: The Pxp_reader module has been completely rewritten. This fixes some 
   problems with relative URLs. - Pxp_yacc has been split up into four modules: 
   Pxp_tree_parser contains now the parser API returning object trees, 
   Pxp_dtd_parser is the parser API returning DTDs, Pxp_ev_parser is the 
   event-based API, and Pxp_core_parser is the core of the parser. Pxp_yacc is 
   still available as compatibility API. As part of the module redesign, 
   Pxp_types includes now parts of its interface from Pxp_core_types_type. I 
   hope this style of programming is comprehensible. - I think PXP can now 
   compiled with CVS releases of O'Caml. - It is now possible to turn warnings 
   into errors. - The event-based parser can now preprocess namespaces. 
   Furthermore, there are normalization filters.
   
-  1.1.93: This is a bugfix release. Sometimes files were not closed in 
   previous versions, but now they are. There were debug statements in the pull 
   parser code, I have removed them. Finally, some errors in the Makefiles have 
   been corrected.
   
-  1.1.92: The whole lexing stuff has been restructured. There is a new tool, 
   lexpp, that generates the lexers from only five files. Furthermore, much 
   more 8 bit character sets are now supported as internal encodings. In 
   previous versions of PXP, the internal representation of the XML trees was 
   restricted to either UTF-8 or ISO-8859-1. Now, a number of additional 
   encodings are supported, including the whole ISO-8859 series. 
   Bugfix: If the processing instruction <?xml...?> occurs in the middle of the 
   XML document, version 1.1.91 will immediately stop parsing, and ignore the 
   rest of the file. This is now fixed.
   
-  1.1.91: The curly braces can now even be used inside attributes, and escape 
   from normal XML parsing.
   There is a new entry point Entry_expr for event-based parsing that expects 
   either a single element, a single processing instruction, or a single 
   comment, or whitespace. This allows more fine-grained control of what is 
   parsed.
   There is now a "pull parser". In contrast to the "push parser" introduced in 
   1.1.90, the calling order of parser and parser user have been inverted, i.e. 
   the user calls the parser to get ("pull") the next event instead of letting 
   the parser call back a user function ("push"). An interesting application is 
   that O'Caml's lazy streams can be used to analyze events. An example can be 
   found in examles/pullparser.
   Pull parsing is not yet well-tested!
   
-  1.1.90: This version introduces a new event-based interface in Pxp_yacc. For 
   start tags, end tags, data strings, and several other things that are found 
   in the XML source so-called events are generated, and a user function is 
   called for every event. See the directory examples/eventparser for examples.
   Another innovation is support for curly braces as escape characters. Inside 
   elements, the left curly brace escapes from XML parsing and starts a foreign 
   parser until the matching right curly brace is found: 
   
   <element> ... { foreign syntax } ... </element>
   
   The curly braces are borrowed from the XQuery draft standard. They cannot 
   yet be used inside attribute values. Curly braces are mostly useful in 
   conjunction with event-based parsing, because it is not yet possible to 
   include the "value" of the curly brace expression into XML trees.
   It is even possible to call the XML parser from the foreign parser as 
   subparser. However, there not yet enough entry points for the event-based 
   parser (e.g. you cannot parse just the following processing instruction, 
   only misc* element misc* or whole documents are possible). 
   A long-standing bug has been found in the entity layer. When an external 
   entity A opens an external entity B, and B opens C, relative paths of C have 
   been interpreted wrong.
   

--------------------------

[1]   see http://www.ocaml-programming.de/packages/documentation/pxp

[2]   see http://www.ocaml-programming.de/packages/pxp-1.1.6.tar.gz

[3]   see http://www.ocaml-programming.de/packages/pxp-1.2.0test1.tar.gz




This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml