Plasma GitLab Archive
Projects Blog Knowledge

(* $Id: pxp_ev_parser.mli 707 2004-09-04 17:25:35Z gerd $
 * ----------------------------------------------------------------------
 * PXP: The polymorphic XML parser for Objective Caml.
 * Copyright by Gerd Stolpmann. See LICENSE for details.
 *)

open Pxp_types
open Pxp_dtd

(******************* Event-based interface **************************)

val create_entity_manager :
      ?is_document:bool ->       (* default: true *)
      config -> 
      source -> 
        Pxp_entity_manager.entity_manager
  (* Creates an entity manager that is initialized with the toplevel
   * entity referenced by the source argument. The entity manager
   * can be used by [process_entity] below.
   *
   * The following configuration options are used:
   * - warner
   * - encoding
   * - debugging_mode
   *
   * ~is_document: Pass [true] if the entity to read is a complete
   *   document, and [false] otherwise. The value [true] enforces
   *   several restrictions on document entities, e.g. that 
   *   <![INCLUDE[..]> and <![IGNORE[..]> are not allowed and that additional
   *   nesting rules are respected by parameter entities.
   *)



val process_entity :
      config -> 
      entry ->
      Pxp_entity_manager.entity_manager ->
      (event -> unit) ->
        unit
  (* Parses a document or a document fragment. At least the well-formedness
   * of the document is checked, but the flags of the [entry] argument
   * may specify more.
   *
   * While parsing, events are generated and the passed function is
   * called for every event. The parsed text is read from the
   * current entity of the entity manager. It is allowed that the
   * current entity is open or closed.
   * 
   * The entry point to the parsing rules can be specified.
   * Notes to entry points:
   * - `Entry_document:
   *   The first generated event is always E_start_doc,
   *   it contains the whole DTD as object (no events are generated
   *   during DTD parsing, only the result is passed back). The
   *   events for the contents follow, terminated by E_end_doc and
   *   E_end_of_stream.
   * - `Entry_content:
   *   Only events for contents are generated. They are terminated
   *   by E_end_of_stream.
   * - `Entry_declaration:
   *   Currently not supported. (But see Pxp_dtd_parser for functions
   *   parsing DTDs.)
   *
   * The entry points have options, see [Pxp_types] for explanations.
   *
   * The generated events are not normalized with respect to:
   * - Several E_char_data events may be generated for the same
   *   character data section
   * There are filter functions that apply normalization routines
   * to the events, see below.
   *
   * Only the following config options have an effect:
   * - warner
   * - encoding
   * - enable_pinstr_nodes
   * - enable_comment_nodes
   * - enable_super_root_node
   * - store_element_positions
   * - name_pool and all name pool options
   * - enable_namespace_processing
   *
   * If an error happens, the callback function is invoked exactly once
   * with the E_error event. The error is additionally passed to the caller
   * by letting the exception fall through to the caller. It is not possible
   * to resume parsing after an error.
   *
   * The idea behind this special error handling is that the callback
   * function should always be notified when the parser stops, no matter
   * whether it is successful or not. So the last event passed to the
   * callback function is either E_end_of_stream or E_error. You can
   * imagine that process_entity follows this scheme:
   *
   * try
   *   "parse";
   *   eh E_end_of_stream           (* eh is the callback function *)
   * with
   *   error ->
   *     "cleanup";
   *     let pos = ... in
   *     let e = At(pos, error) in
   *     eh (E_error e); 
   *     raise e
   *
   * Note that there is always an At(_,_) exception that wraps the exception
   * that originally occurred. - This style of exception handling applies
   * to exceptions generated by the parser as well as to exceptions raised
   * by the callback function.
   *)


val process_expr :
      ?first_token: Pxp_lexer_types.token ->
      ?following_token: Pxp_lexer_types.token ref ->
      config -> 
      Pxp_entity_manager.entity_manager ->
      (event -> unit) ->
        unit
  (* This is a special parsing function that corresponds to the entry
   * Entry_expr, i.e. it parses a single element, processing instruction,
   * or comment. In contrast to [process_entity], the current entity
   * is not opened, but it is expected that the entity is already open.
   * Of course, the entity is not closed after parsing (except an error
   * happens).
   *
   * ~first_token: This token is prepended to the tokens read from the
   *    entity manager.
   * ~following_token: The token following the last parsed token is
   *    optionally stored into this variable.
   *    Note: By design the parser _always_ reads the following token.
   *    I know that this may lead to serious problems when it is tried
   *    to integrate this parser with another parser. It is currently
   *    hard to change!
   *)

val create_pull_parser :
      config -> 
      entry ->
      Pxp_entity_manager.entity_manager ->
        (unit -> event option)
  (* let next_event = create_pull_parser cfg entry mng in
   * let ev = next_event()
   *
   * This function parses the XML document in "pull mode". [next_event]
   * should be invoked repeatedly until it returns [None], indicating the
   * end of the document. The events are encoded as [Some ev].
   *
   * The function returns exactly the same events as [process_entity].
   *
   * In contrast to [process_entity], no exception is raised when an
   * error happens. Only the E_error event is generated (as last event).
   *
   * To create a stream of events, just do:
   * let next = create_pull_parser cfg entry mng in
   * let stream = Stream.from(fun _ -> next())
   *)


(* Filters have been moved to Pxp_event! *)

(* For conversions from trees to event streams, and vice versa,
 * see Pxp_document.
 *)

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml