"<"is converted to
"<". As the entities may be named, there is a dependency on the character set.
val encode :
?prefer_name:bool -> ?unsafe_chars:string -> unit -> string -> string
in_encis recoded to
out_enc, and the following characters are encoded as HTML entity (
out_enc. By default (
out_enc=`Enc_usascii), only ASCII characters can be represented, and thus all code points >= 128 are encoded as HTML entities. If you pass
out_enc=`Enc_utf8, all characters can be represented.
"(a<b) & (c>d)"is encoded as
"(a<b) & (c>d)".
It is required that
out_enc is an ASCII-compatible encoding.
prefer_name selects whether named entities (e.g.
or numeric entities (e.g.
<) are prefered.
The efficiency of the function can be improved when the same encoding is applied to several strings. Create a specialized encoding function by passing all arguments up to the unit argument, and apply this function several times. For example:
let my_enc = encode ~in_enc:`Enc_utf8 () in let s1' = my_enc s1 in let s2' = my_enc s2 in ...
val decode :
?lookup:(string -> string) ->
?subst:(int -> string) ->
?entity_base:entity_set -> unit -> string -> string
out_enc, and HTML entities (
&#num;) are resolved. The input encoding
in_encmust be ASCII-compatible.
By default, the function knows all entities defined for HTML 4 (this
can be changed using
entity_base, see below). If other
entities occur, the function
lookup is called and the name of
the entity is passed as input string to the function. It is
lookup returns the value of the entity, and that this
value is already encoded as
lookup raises a
If a character cannot be represented in the output encoding,
subst is called.
subst must return a substitute
string for the character.
subst raises a
entity_base determines which set of entities are
considered as the known entities that can be decoded without
help by the
`Html selects all entities defined
for HTML 4,
`Xml selects only
`Empty selects the empty set (i.e.
lookup is always called).