Plasma GitLab Archive
Projects Blog Knowledge

2.4. Example: An HTML backend for the readme DTD

The converter from readme documents to HTML documents follows strictly the approach to define one class per element type. The HTML code is similar to the readme source, because of this most elements can be converted in the following way: Given the input element

<e>content</e>
the conversion text is the concatenation of a computed prefix, the recursively converted content, and a computed suffix.

Only one element type cannot be handled by this scheme: footnote. Footnotes are collected while they are found in the input text, and they are printed after the main text has been converted and printed.

2.4.1. Header

open Pxp_types
open Pxp_document
open Pxp_dtd.Entity

2.4.2. Type declarations

class type footnote_printer =
  object
    method footnote_to_html : store_type -> out_channel -> unit
  end

and store_type =
  object
    method alloc_footnote : footnote_printer -> int
    method print_footnotes : out_channel -> unit
  end
;;

2.4.3. Class store

The store is a container for footnotes. You can add a footnote by invoking alloc_footnote; the argument is an object of the class footnote_printer, the method returns the number of the footnote. The interesting property of a footnote is that it can be converted to HTML, so a footnote_printer is an object with a method footnote_to_html. The class footnote which is defined below has a compatible method footnote_to_html such that objects created from it can be used as footnote_printers.

The other method, print_footnotes prints the footnotes as definition list, and is typically invoked after the main material of the page has already been printed. Every item of the list is printed by footnote_to_html.

class store =
  object (self)

    val mutable footnotes = ( [] : (int * footnote_printer) list )
    val mutable next_footnote_number = 1

    method alloc_footnote n =
      let number = next_footnote_number in
      next_footnote_number <- number+1;
      footnotes <- footnotes @ [ number, n ];
      number

    method print_footnotes ch =
      if footnotes <> [] then begin
	output_string ch "<hr align=left noshade=noshade width=\"30%\">\n";
	output_string ch "<dl>\n";
	List.iter
	  (fun (_,n) -> 
	     n # footnote_to_html (self : #store_type :> store_type) ch)
	  footnotes;
	output_string ch "</dl>\n";
      end

  end
;;

2.4.4. Function escape_html

This function converts the characters <, >, &, and " to their HTML representation. For example, escape_html "<>" = "&lt;&gt;". Other characters are left unchanged.

let escape_html s =
  Str.global_substitute
    (Str.regexp "<\\|>\\|&\\|\"")
    (fun s ->
      match Str.matched_string s with
        "<" -> "&lt;"
      | ">" -> "&gt;"
      | "&" -> "&amp;"
      | "\"" -> "&quot;"
      | _ -> assert false)
    s
;;

2.4.5. Virtual class shared

This virtual class is the abstract superclass of the extension classes shown below. It defines the standard methods clone, node, and set_node, and declares the type of the virtual method to_html. This method recursively traverses the whole element tree, and prints the converted HTML code to the output channel passed as second argument. The first argument is the reference to the global store object which collects the footnotes.

class virtual shared =
  object (self)

    (* --- default_ext --- *)

    val mutable node = (None : shared node option)

    method clone = {< >} 
    method node =
      match node with
          None ->
            assert false
        | Some n -> n
    method set_node n =
      node <- Some n

    (* --- virtual --- *)

    method virtual to_html : store -> out_channel -> unit

  end
;;

2.4.6. Class only_data

This class defines to_html such that the character data of the current node is converted to HTML. Note that self is an extension object, self # node is the node object, and self # node # data returns the character data of the node.

class only_data =
  object (self)
    inherit shared

    method to_html store ch =
      output_string ch (escape_html (self # node # data))
  end
;;

2.4.7. Class readme

This class converts elements of type readme to HTML. Such an element is (by definition) always the root element of the document. First, the HTML header is printed; the title attribute of the element determines the title of the HTML page. Some aspects of the HTML page can be configured by setting certain parameter entities, for example the background color, the text color, and link colors. After the header, the body tag, and the headline have been printed, the contents of the page are converted by invoking to_html on all children of the current node (which is the root node). Then, the footnotes are appended to this by telling the global store object to print the footnotes. Finally, the end tags of the HTML pages are printed.

This class is an example how to access the value of an attribute: The value is determined by invoking self # node # attribute "title". As this attribute has been declared as CDATA and as being required, the value has always the form Value s where s is the string value of the attribute.

You can also see how entity contents can be accessed. A parameter entity object can be looked up by self # node # dtd # par_entity "name", and by invoking replacement_text the value of the entity is returned after inner parameter and character entities have been processed. Note that you must use gen_entity instead of par_entity to access general entities.

class readme =
  object (self)
    inherit shared

    method to_html store ch =
      (* output header *)
      output_string 
	ch "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">";
      output_string
	ch "<!-- WARNING! This is a generated file, do not edit! -->\n";
      let title = 
	match self # node # attribute "title" with
	    Value s -> s
	  | _ -> assert false
      in
      let html_header =
	try replacement_text
	    (self # node # dtd # par_entity "readme:html:header") 
	with WF_error _ -> "" in
      let html_trailer =
	try replacement_text
	    (self # node # dtd # par_entity "readme:html:trailer")
	with WF_error _ -> "" in
      let html_bgcolor =
	try replacement_text
	    (self # node # dtd # par_entity "readme:html:bgcolor")
	with WF_error _ -> "white" in
      let html_textcolor =
	try replacement_text
	    (self # node # dtd # par_entity "readme:html:textcolor")
	with WF_error _ -> "" in
      let html_alinkcolor =
	try replacement_text
	    (self # node # dtd # par_entity "readme:html:alinkcolor")
	with WF_error _ -> "" in
      let html_vlinkcolor =
	try replacement_text
	    (self # node # dtd # par_entity "readme:html:vlinkcolor")
	with WF_error _ -> "" in
      let html_linkcolor =
	try replacement_text 
	    (self # node # dtd # par_entity "readme:html:linkcolor")
	with WF_error _ -> "" in
      let html_background =
	try replacement_text
	    (self # node # dtd # par_entity "readme:html:background")
	with WF_error _ -> "" in

      output_string ch "<html><header><title>\n";
      output_string ch (escape_html title);
      output_string ch "</title></header>\n";
      output_string ch "<body ";
      List.iter
	(fun (name,value) ->
	   if value <> "" then 
	     output_string ch (name ^ "=\"" ^ escape_html value ^ "\" "))
	[ "bgcolor",    html_bgcolor;
	  "text",       html_textcolor;
	  "link",       html_linkcolor;
	  "alink",      html_alinkcolor;
	  "vlink",      html_vlinkcolor;
	];
      output_string ch ">\n";
      output_string ch html_header;
      output_string ch "<h1>";
      output_string ch (escape_html title);
      output_string ch "</h1>\n";
      (* process main content: *)
      List.iter
	(fun n -> n # extension # to_html store ch)
	(self # node # sub_nodes);
      (* now process footnotes *)
      store # print_footnotes ch;
      (* trailer *)
      output_string ch html_trailer;
      output_string ch "</html>\n";

  end
;;

2.4.8. Classes section, sect1, sect2, and sect3

As the conversion process is very similar, the conversion classes of the three section levels are derived from the more general section class. The HTML code of the section levels only differs in the type of the headline, and because of this the classes describing the section levels can be computed by replacing the class argument the_tag of section by the HTML name of the headline tag.

Section elements are converted to HTML by printing a headline and then converting the contents of the element recursively. More precisely, the first sub-element is always a title element, and the other elements are the contents of the section. This structure is declared in the DTD, and it is guaranteed that the document matches the DTD. Because of this the title node can be separated from the rest without any checks.

Both the title node, and the body nodes are then converted to HTML by calling to_html on them.

class section the_tag =
  object (self)
    inherit shared

    val tag = the_tag

    method to_html store ch =
      let sub_nodes = self # node # sub_nodes in
      match sub_nodes with
	  title_node :: rest ->
	    output_string ch ("<" ^ tag ^ ">\n");
	    title_node # extension # to_html store ch;
	    output_string ch ("\n</" ^ tag ^ ">");
	    List.iter
	      (fun n -> n # extension # to_html store ch)
	      rest
	| _ ->
	    assert false
  end
;;

class sect1 = section "h1";;
class sect2 = section "h3";;
class sect3 = section "h4";;

2.4.9. Classes map_tag, p, em, ul, li

Several element types are converted to HTML by simply mapping them to corresponding HTML element types. The class map_tag implements this, and the class argument the_target_tag determines the tag name to map to. The output consists of the start tag, the recursively converted inner elements, and the end tag.

class map_tag the_target_tag =
  object (self)
    inherit shared

    val target_tag = the_target_tag

    method to_html store ch =
      output_string ch ("<" ^ target_tag ^ ">\n");
      List.iter
	(fun n -> n # extension # to_html store ch)
	(self # node # sub_nodes);
      output_string ch ("\n</" ^ target_tag ^ ">");
  end
;;

class p = map_tag "p";;
class em = map_tag "b";;
class ul = map_tag "ul";;
class li = map_tag "li";;

2.4.10. Class br

Element of type br are mapped to the same HTML type. Note that HTML forbids the end tag of br.

class br =
  object (self)
    inherit shared

    method to_html store ch =
      output_string ch "<br>\n";
      List.iter
	(fun n -> n # extension # to_html store ch)
	(self # node # sub_nodes);
  end
;;

2.4.11. Class code

The code type is converted to a pre section (preformatted text). As the meaning of tabs is unspecified in HTML, tabs are expanded to spaces.

class code =
  object (self)
    inherit shared

    method to_html store ch =
      let data = self # node # data in
      (* convert tabs *)
      let l = String.length data in
      let rec preprocess i column =
	(* this is very ineffective but comprehensive: *)
	if i < l then
	  match data.[i] with
	      '\t' ->
		let n = 8 - (column mod 8) in
		String.make n ' ' ^ preprocess (i+1) (column + n)
	    | '\n' ->
		"\n" ^ preprocess (i+1) 0
	    | c ->
		String.make 1 c ^ preprocess (i+1) (column + 1)
	else
	  ""
      in
      output_string ch "<p><pre>";
      output_string ch (escape_html (preprocess 0 0));
      output_string ch "</pre></p>";

  end
;;

2.4.12. Class a

Hyperlinks, expressed by the a element type, are converted to the HTML a type. If the target of the hyperlink is given by href, the URL of this attribute can be used directly. Alternatively, the target can be given by readmeref in which case the ".html" suffix must be added to the file name.

Note that within a only #PCDATA is allowed, so the contents can be converted directly by applying escape_html to the character data contents.

class a =
  object (self)
    inherit shared

    method to_html store ch =
      output_string ch "<a ";
      let href =
	match self # node # attribute "href" with
	    Value v -> escape_html v
	  | Valuelist _ -> assert false
	  | Implied_value ->
	      begin match self # node # attribute "readmeref" with
		  Value v -> escape_html v ^ ".html"
		| Valuelist _ -> assert false
		| Implied_value ->
		    ""
	      end
      in
      if href <> "" then
	output_string ch ("href=\""  ^ href ^ "\"");
      output_string ch ">";
      output_string ch (escape_html (self # node # data));
      output_string ch "</a>";
	
  end
;;

2.4.13. Class footnote

The footnote class has two methods: to_html to convert the footnote reference to HTML, and footnote_to_html to convert the footnote text itself.

The footnote reference is converted to a local hyperlink; more precisely, to two anchor tags which are connected with each other. The text anchor points to the footnote anchor, and the footnote anchor points to the text anchor.

The footnote must be allocated in the store object. By allocating the footnote, you get the number of the footnote, and the text of the footnote is stored until the end of the HTML page is reached when the footnotes can be printed. The to_html method stores simply the object itself, such that the footnote_to_html method is invoked on the same object that encountered the footnote.

The to_html only allocates the footnote, and prints the reference anchor, but it does not print nor convert the contents of the note. This is deferred until the footnotes actually get printed, i.e. the recursive call of to_html on the sub nodes is done by footnote_to_html.

Note that this technique does not work if you make another footnote within a footnote; the second footnote gets allocated but not printed.

class footnote =
  object (self)
    inherit shared

    val mutable footnote_number = 0

    method to_html store ch =
      let number = 
	store # alloc_footnote (self : #shared :> footnote_printer) in
      let foot_anchor = 
	"footnote" ^ string_of_int number in
      let text_anchor =
	"textnote" ^ string_of_int number in
      footnote_number <- number;
      output_string ch ( "<a name=\"" ^ text_anchor ^ "\" href=\"#" ^ 
			 foot_anchor ^ "\">[" ^ string_of_int number ^ 
			 "]</a>" )

    method footnote_to_html store ch =
      (* prerequisite: we are in a definition list <dl>...</dl> *)
      let foot_anchor = 
	"footnote" ^ string_of_int footnote_number in
      let text_anchor =
	"textnote" ^ string_of_int footnote_number in
      output_string ch ("<dt><a name=\"" ^ foot_anchor ^ "\" href=\"#" ^ 
			text_anchor ^ "\">[" ^ string_of_int footnote_number ^ 
			"]</a></dt>\n<dd>");
      List.iter
	(fun n -> n # extension # to_html store ch)
	(self # node # sub_nodes);
      output_string ch ("\n</dd>")
 
  end
;;

2.4.14. The specification of the document model

This code sets up the hash table that connects element types with the exemplars of the extension classes that convert the elements to HTML.

open Pxp_yacc

let tag_map =
  make_spec_from_alist
    ~data_exemplar:(new data_impl (new only_data))
    ~default_element_exemplar:(new element_impl (new no_markup))
    ~element_alist:
      [ "readme", (new element_impl (new readme));
	"sect1",  (new element_impl (new sect1));
	"sect2",  (new element_impl (new sect2));
	"sect3",  (new element_impl (new sect3));
	"title",  (new element_impl (new no_markup));
	"p",      (new element_impl (new p));
	"br",     (new element_impl (new br));
	"code",   (new element_impl (new code));
	"em",     (new element_impl (new em));
	"ul",     (new element_impl (new ul));
	"li",     (new element_impl (new li));
	"footnote", (new element_impl (new footnote : #shared :> shared));
	"a",      (new element_impl (new a));
      ]
    ()
;;
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml