module Html:sig
..end
"<"
is converted to "<"
.
As the entities may be named, there is a dependency on the character
set.val encode_from_latin1 : string -> string
val decode_to_latin1 : string -> string
val unsafe_chars_html4 : string
val encode : in_enc:Netconversion.encoding ->
?out_enc:Netconversion.encoding ->
?prefer_name:bool -> ?unsafe_chars:string -> unit -> string -> string
in_enc
is recoded to
out_enc
, and the following characters are encoded as HTML
entity (&name;
or &#num;
):unsafe_chars
out_enc
. By
default (out_enc=`Enc_usascii
), only ASCII characters can be
represented, and thus all code points >= 128 are encoded as
HTML entities. If you pass out_enc=`Enc_utf8
, all characters
can be represented."(a<b) & (c>d)"
is encoded as
"(a<b) & (c>d)"
.
It is required that out_enc
is an ASCII-compatible encoding.
The option prefer_name
selects whether named entities (e.g. <
)
or numeric entities (e.g. <
) are prefered.
The efficiency of the function can be improved when the same encoding is applied to several strings. Create a specialized encoding function by passing all arguments up to the unit argument, and apply this function several times. For example:
let my_enc = encode ~in_enc:`Enc_utf8 () in
let s1' = my_enc s1 in
let s2' = my_enc s2 in ...
typeentity_set =
[ `Empty | `Html | `Xml ]
val decode : in_enc:Netconversion.encoding ->
out_enc:Netconversion.encoding ->
?lookup:(string -> string) ->
?subst:(int -> string) ->
?entity_base:entity_set -> unit -> string -> string
in_enc
to out_enc
, and HTML
entities (&name;
or &#num;
) are resolved. The input encoding
in_enc
must be ASCII-compatible.
By default, the function knows all entities defined for HTML 4 (this
can be changed using entity_base
, see below). If other
entities occur, the function lookup
is called and the name of
the entity is passed as input string to the function. It is
expected that lookup
returns the value of the entity, and that this
value is already encoded as out_enc
.
By default, lookup
raises a Failure
exception.
If a character cannot be represented in the output encoding,
the function subst
is called. subst
must return a substitute
string for the character.
By default, subst
raises a Failure
exception.
The option entity_base
determines which set of entities are
considered as the known entities that can be decoded without
help by the lookup
function: `Html
selects all entities defined
for HTML 4, `Xml
selects only <
, >
, &
, "
,
and '
,
and `Empty
selects the empty set (i.e. lookup
is always called).