class type resolver =object
..end
method init_rep_encoding : Pxp_core_types.I.rep_encoding -> unit
Lexing.lexbuf
(and as its advanced version,
Netulex.ULB.unicode_lexbuf
).
After creating a resolver, one must invoke the two methods
init_rep_encoding
and init_warner
to set the internal encoding of
strings and the warner object, respectively. This is normally
already done by the parsing core.
It is not necessary to invoke these two methods for a fresh
clone.
It is possible that the character encoding of the source and the internal encoding of the parser are different. To cope with this, one of the tasks of the resolver is to recode the characters of the input source into the internal character encoding.
Note that there are several ways of determining the encoding of the input: (1) It is possible that the transport protocol (e.g. HTTP) transmits the encoding, and (2) it is possible to inspect the beginning of the file, and to analyze:
<?xml ... encoding="xyz" ...?>
. The encoding found here is
to be used.
A resolver is like a file: it must be opened before one can work
with it, and it should be closed after all operations on it have been
done. The method open_rid
is called with the resolver ID as argument
and it must return the lexbuf reading from the external resource.
(There is also the old method open_in
that expects an ext_id
as
argument. It is less powerful and should not be used any longer.)
The method close_in
does not require an argument.
It is allowed to re-open a resolver after it has been closed. It is
forbidden to open a resolver again while it is open.
It is allowed to close a resolver several times: If close_in
is
invoked while the resolver is already closed, nothing happens.
The method open_rid
may raise Not_competent
to indicate that this
resolver is not able to open this type of IDs.
If open_rid
gets a PUBLIC
ID, it can be assumed that the string
is already normalized (concerning whitespace).
The method change_encoding
is called from the parser after the
analysis of case (2) has been done; the argument is either the
string name of the encoding, or the empty string to indicate
that no XML declaration was found. It is guaranteed that
change_encoding
is invoked after only a few tokens of the
file. The resolver should react as follows:
change_encoding
.change_encoding
must
be compatible with UTF-16. This should be
checked, and violations should be reported.change_encoding
has not yet
been invoked, the lexbuf contains at most one character (which may
be represented by multiple bytes); i.e. the lexbuf is created by
Lexing.from_function
, and the function puts only one character into
the buffer at once.
After change_encoding
has been invoked, there is no longer a limit
on the lexbuf size.
The reason for this rule is that you know exactly the character where
the encoding changes to the encoding passed by change_encoding
.
The method clone
may be invoked for open or closed resolvers.
Basically, clone
returns a new resolver which is always closed.
If the original resolver is already closed, the clone is simply a clone.
If the original resolver is open at the moment of cloning, this applies:
If the clone is later opened for a relative system ID (i.e. relative
URL), the clone must interpret this ID relative to the ID of the
original resolver.
method init_warner : Pxp_core_types.I.symbolic_warnings option ->
Pxp_core_types.I.collect_warnings -> unit
method rep_encoding : Pxp_core_types.I.rep_encoding
init_rep_encoding
method open_in : Pxp_core_types.I.ext_id -> lexer_source
open_rid
.
This method may raise Not_competent
if the object does not know
how to handle this ext_id
.method open_rid : Pxp_core_types.I.resolver_id -> lexer_source
ext_id
but works in the same way.method close_in : unit
method change_encoding : string -> unit
method clone : resolver
method active_id : Pxp_core_types.I.resolver_id
open_rid
where unused components have been set to None. The
resolver ID returned by active_id
plays an important role when
expanding relative URLs.