Plasma GitLab Archive
Projects Blog Knowledge

******************************************************************************
INSTALL - PXP, the XML parser for O'Caml
******************************************************************************


==============================================================================
The "pxp" package
==============================================================================

------------------------------------------------------------------------------
Prerequisites
------------------------------------------------------------------------------

PXP requires that the ocamlnet library [1] is already installed (version 4.1 
required). PXP works only with O'Caml >= 4.01. The installation procedure 
defined in the Makefile requires findlib [2] to work [3]. PXP may be optionally 
compiled with support for Alain Frisch's patch of ocamllex called "wlex" [4]. 
There is now also support for ulex, Alain's Unicode-aware replacement for 
ocamllex (same link), which is simpler to build and now highly recommended. 

------------------------------------------------------------------------------
Configuration
------------------------------------------------------------------------------

Beginning with PXP 1.1 it is necessary to configure the parser! 

Configuration is very simple, and in almost all cases it is sufficient to do 

./configure

in the top level directory of the distribution. It is possible to turn some 
options on or off using the -with-xxx and -without-xxx arguments. You can get a 
list by invoking ./configure -help: 

-  -with-lex
   Enables the lexical analyzers ("lexers") generated by the ocamllex tool. You 
   can specify which ocamllex-based lexers are created with the -lexlist option 
   (see below).
   
-  -with-wlex
   Enables the lexical analyzer that works for UTF-8 as internal encoding, and 
   that is based on Alain Frisch's wlex tool. It is relatively small and almost 
   as fast as the ocamllex-based lexers. I recommend it if it is ok to install 
   another library (wlex).
   
-  -with-wlex-compat
   Creates a compatibility package pxp-wlex that includes lexers for UTF8 and 
   ISO-8859-1 (may be required to build old software)
   
-  -with-ulex
   Enables the lexical analyzer that works for UTF-8 as internal encoding, and 
   that is based on Alain Frisch's ulex tool. It is relatively small, but a bit 
   slower than the ocamllex-based lexers. ulex will supersede wlex soon. ulex 
   is required for the preprocessor (see below).
   
-  -with-pp
   Enables the PXP preprocessor (installed as package pxp-pp). See the file 
   PREPROCESSOR for details. The preprocessor requires ulex.
   
-  -lexlist <list-of-encodings>
   Specifies the  character encodings to support by the ocamllex- based  
   lexers. You need only the encodings you are going to use for  the internal 
   representation of the XML data in memory. It is  not necessary to mention a 
   character set here if you only  want to read an external file. Note that 
   utf8 is also  provided by both -with-wlex and -with-ulex, and it is  
   reasonable to omit it here if one of the mentioned options  is in effect.
   
Note that you need at least one lexical analyzer to use PXP as parser.

------------------------------------------------------------------------------
Compilation
------------------------------------------------------------------------------

The Makefile defines the following goals: 

-  make all
   compiles with the bytecode compiler and creates various bytecode archives 
   (suffix .cma) and some bytecode objects (suffix .cmo) in the directories 
   below src.
   
-  make opt
   compiles with the native compiler and creates various native archives 
   (suffixes .cmxa and .a) and some native objects (suffixes .cmx and .o) in 
   the directories below src.
   
------------------------------------------------------------------------------
Installation
------------------------------------------------------------------------------

The Makefile defines the following goals:

-  make install
   installs the bytecode archives, the interface definitions, and if present, 
   the native archives in the default location of findlib. Up to five packages 
   may be installed: pxp, pxp-engine, pxp-lex-iso88591, pxp-lex-utf8, pxp-wlex. 
   
-  make uninstall
   removes any of the mentioned packages
   
Note: Previous versions of PXP had a compatibility API for the old "markup" 
distribution. This API is no longer supported. Upgrading to the PXP API is not 
very difficult.

------------------------------------------------------------------------------
Usage with the help of "findlib"
------------------------------------------------------------------------------

You can refer to the parser as the findlib package "pxp": 

ocamlfind ocamlc -package pxp ...

Using "pxp" includes as much features of the parser as available, i.e. 
everything that has been configured. This may result in large executables. 

One possibility to reduce the code size is to specify Netstring-related 
predicates (e.g. netstring_only_iso); see the documentation of Netstring. Note 
that these predicates reduce the number of conversion tables for character 
encodings, and this means that not every character encoding of external files 
can be processed. 

Another way of reducing the size of executables is to link only selected parts 
of PXP. It is possible to specify which PXP subpackage is linked; for example, 

ocamlfind ocamlc -package pxp-engine,pxp-lex-iso88591 ...

will only use "pxp-engine" (the core package) and the lexical analyzer 
"pxp-lex-iso88591", even if you have installed more PXP packages.

------------------------------------------------------------------------------
Linking with the archives directly
------------------------------------------------------------------------------

The following archives and objects may be used:

-  pxp_engine.cma: The core of PXP (always needed)
   
-  pxp_lex_iso88591.cma: The ocamllex-based lexical analyzer if you want to 
   internally represent texts as ISO-8859-1.
   
-  pxp_lex_link_iso88591.cmo: Registers pxp_lex_iso88591 as lexer.
   
-  pxp_lex_utf8.cma: The ocamllex-based lexical analyzer if you want to 
   internally represent texts as UTF-8.
   
-  pxp_lex_link_utf8.cmo: Registers pxp_lex_utf8 as lexer.
   
-  pxp_wlex.cma: The wlex-based lexical analyzer that works for both ISO-8859-1 
   and UTF-8 and results in smaller executables (but needs wlex).
   
-  pxp_wlex_link.cmo: Registers pxp_wlex as lexer.
   
-  pxp_top.cmo: Loading this module into the toploop installs several printers 
   for PXP types.
   
Note that you need at least one of the lexical analyzers if you want to parse 
XML texts. You do not need them if your program uses other features of PXP but 
not parsing. The archives containing the lexers are only linked into your 
executable if you also link the corresponding "register module".

==============================================================================
The examples
==============================================================================

In the "examples" directory you find several applications of PXP. They require 
that PXP has been installed using findlib. See the Makefiles in the directories 
for descriptions of "make" goals. 

==============================================================================
Trouble shooting
==============================================================================

------------------------------------------------------------------------------
Solaris
------------------------------------------------------------------------------

The "make" utility of Solaris does not work properly enough; there is a bug in 
it that prevents the so-called suffix rules from being recognized. There are 
two solutions:

-  Install GNU make and use it instead of Solaris make. This is the recommended 
   way to solve the problem, as GNU make can process almost every Makefile from 
   open source projects, and you will never have problems with building 
   software again.
   
-  Add the following lines to Makefile.rules:  
   
   %.cmx: %.ml
           $(OCAMLOPT) -c $<
   
   %.cmo: %.ml
           $(OCAMLC) -c $<
   
   %.cmi: %.mli
           $(OCAMLC) -c $<
   
   %.ml: %.mll
           ocamllex $<
   
   
   

--------------------------

[1]   see /projects/ocamlnet.html

[2]   see /projects/findlib.html

[3]   Findlib is a package manager, see the file ABOUT-FINDLIB.

[4]   see http://www.eleves.ens.fr:8080/home/frisch/soft




This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml