Plasma GitLab Archive
Projects Blog Knowledge

Module Pxp_reader

module Pxp_reader: sig .. end
Resolving identifiers and associating resources

Purpose of this module: The Pxp_reader module allows you to exactly specify how external identifiers (SYSTEM or PUBLIC) are mapped to files or channels. This is normally only necessary for advanced configurations, as the built-in functions Pxp_types.from_file, Pxp_types.from_channel, and Pxp_types.from_string often suffice.

There are two ways to use this module. First, you can compose the desired behaviour by combining several predefined resolver objects or functions. See the example section at the end of the file. Second, you can inherit from the classes (or define a resolver class from scratch). I hope this is seldom necessary as this way is much more complicated; however it allows you to implement any required magic.

Types and exceptions

exception Not_competent
Raised by the open_in method if the object does not know how to handle the passed external ID.
exception Not_resolvable of exn
Indicates that the resolver was competent, but there was an error while resolving the external ID. The passed exception explains the reason. Not_resolvable(Not_found) serves as indicator for an unknown reason.
type lexer_source = {
   lsrc_lexbuf :Lexing.lexbuf Lazy.t;
   lsrc_unicode_lexbuf :Netulex.ULB.unicode_lexbuf Lazy.t;
The parser chooses one of these ways of lexing the input into tokens.

The resolver class type

The class type resolver is the official type of all "resolvers". Resolvers take file names (or better, external identifiers) and return lexbufs, scanning the file for tokens. Resolvers may be cloned, and clones can interpret relative file names relative to their creator.

Example of cloning:

Given resolver r reads from file:/dir/f1.xml this text:

 <tag>some XML text &e; </tag> 

The task is to switch to a resolver for reading from the entity e (which is referenced by &e;), and to switch back to the original resolver when the parser is done with e. Let us assume that e has the SYSTEM ID subdir/f2.xml. Our approach is to first create a clone of the original resolver so that we can do the switch to e in a copy. That means switching back is easy: We give up the cloned resolver, and continue with the original, unmodified resolver. This gives us the freedom to modify the clone in order to switch to e. We do this by changing the input file:

  • Step 1: let r' = <create clone of r>
  • Step 2: <direct r' to open the file subdir/f2.xml>
r' must still know the directory of the file r is reading, otherwise it would not be able to resolve subdir/f2.xml, which expands to file:/dir/subdir/f2.xml.

Actually, this example can be coded as:

 let r = new resolve_as_file in
 let lbuf = r # open_in "file:/dir/f1.xml" in
 ... read from lbuf ...
 let r' = r # clone in
 let lbuf' = r' # open_in "subdir/f2.xml" in
 ... read from lbuf' ...
 r' # close_in;
 ... read from lbuf ...
 r # close_in;

class type resolver = object .. end
type accepted_id = Netchannels.in_obj_channel * Pxp_types.encoding option *
Pxp_types.resolver_id option
When a resolver accepts an ID, this triple specifies how to proceed. The in_obj_channel is the channel to read data from, the encoding option may enforce a certain character encoding, and the resolver_id option may detail the ID (this ID will be returned by active_id).

If None is passed as encoding option, the standard autodetection of the encoding is performed.

If None is passed as resolver_id option, the original ID is taken unchanged.

Base resolvers

class resolve_to_this_obj_channel : ?id:Pxp_types.ext_id -> ?rid:Pxp_types.resolver_id -> ?fixenc:Pxp_types.encoding -> ?close:Netchannels.in_obj_channel -> unit -> Netchannels.in_obj_channel -> resolver
Reads from the passed in_obj_channel.
class resolve_to_any_obj_channel : ?close:Netchannels.in_obj_channel -> unit -> channel_of_id:(Pxp_types.resolver_id -> accepted_id) -> unit -> resolver
This resolver calls the function channel_of_id to open a new channel for the passed resolver_id.
class resolve_to_url_obj_channel : ?close:Netchannels.in_obj_channel -> unit -> url_of_id:(Pxp_types.resolver_id -> Neturl.url) -> base_url_of_id:(Pxp_types.resolver_id -> Neturl.url) -> channel_of_url:(Pxp_types.resolver_id -> Neturl.url -> accepted_id) -> unit -> resolver
When this resolver gets an ID to read from, it calls the function url_of_id to get the corresponding URL (such IDs are normally system IDs, but it is also possible to other kinds of IDs to URLs).
class resolve_as_file : ?file_prefix:[ `Allowed | `Not_recognized | `Required ] -> ?host_prefix:[ `Allowed | `Not_recognized | `Required ] -> ?system_encoding:Pxp_types.encoding -> ?map_private_id:Pxp_types.private_id -> Neturl.url -> ?open_private_id:Pxp_types.private_id ->
Pervasives.in_channel * Pxp_types.encoding option -> ?base_url_defaults_to_cwd:bool -> ?not_resolvable_if_not_found:bool -> unit ->
Reads from the local file system.
val make_file_url : ?system_encoding:Pxp_types.encoding ->
?enc:Pxp_types.encoding -> string -> Neturl.url
This is a convenience function to create a file URL (for localhost). The argument is the file name encoded in the character set enc. Relative file names are automatically converted to absolute names by prepending Sys.getcwd() to the passed file name.

system_encoding: Specifies the encoding of file names of the local file system. Default: UTF-8. (This argument is necessary to interpret Sys.getcwd() correctly.)

enc: The encoding of the passed string. Defaults to `Enc_utf8

Note: To get a string representation of the URL, apply Neturl.string_of_url to the result.

Catalog resolvers

class lookup_id : (Pxp_types.ext_id * resolver) list -> resolver
The general catalog class.
class lookup_id_as_file : ?fixenc:Pxp_types.encoding -> (Pxp_types.ext_id * string) list -> resolver
The list (catalog) argument specifies pairs (xid,file) mapping external IDs xid to files.
class lookup_id_as_string : ?fixenc:Pxp_types.encoding -> (Pxp_types.ext_id * string) list -> resolver
The list (catalog) argument specifies pairs (xid,s) mapping external IDs xid to strings s.
class lookup_public_id : (string * resolver) list -> resolver
This is the generic builder for PUBLIC id catalog resolvers: The list (catalog) argument specifies pairs (pubid, r) mapping PUBLIC identifiers to subresolvers.
class lookup_public_id_as_file : ?fixenc:Pxp_types.encoding -> (string * string) list -> resolver
Makes a resolver for PUBLIC identifiers.
class lookup_public_id_as_string : ?fixenc:Pxp_types.encoding -> (string * string) list -> resolver
Makes a resolver for PUBLIC identifiers.
class lookup_system_id : (string * resolver) list -> resolver
This is the generic builder for URL-based catalog resolvers: The catalog argument specifies pairs (url, r) mapping URL's identifiers to subresolvers.
class lookup_system_id_as_file : ?fixenc:Pxp_types.encoding -> (string * string) list -> resolver
Looks up resolvers for URL identifiers: The catalog argument specifies pairs (url, filename) mapping URL's to filenames.
class lookup_system_id_as_string : ?fixenc:Pxp_types.encoding -> (string * string) list -> resolver
Looks up resolvers for URL identifiers: The catalog argument specifies pairs (url, text) mapping URL's to XML text (which must begin with <?xml ...?>).

System ID normalization

class norm_system_id : resolver -> resolver
Normalizes URL's, and forwards the open request to the passed resolver.

ID rewriting

class rewrite_system_id : ?forward_unmatching_urls:bool -> (string * string) list -> resolver -> resolver
Rewrites the URL's according to the list of pairs.

Resolver construction

type combination_mode = 
| Public_before_system
| System_before_public
class combine : ?mode:combination_mode -> resolver list -> resolver
Combines several resolver objects.
val set_debug_mode : bool -> unit
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml