Plasma GitLab Archive
Projects Blog Knowledge

{1:tutorial Netmime Tutorial} 

{2 Structure of Mail Messages}

Nowadays mail messages are in MIME format. This format allows us to
attach files to messages, and to encode the main text in markup
languages like HTML. In principle, mail messages have only one header
block (with fields like "Subject", sender and receiver addresses, etc.)
and one body block. However, this is only one view on the mail format,
e.g. as seen by MTAs (mail transfer agents). The MIME format adds the
possibility to structure the body block into "parts" by additional
encoding sequences. The MTAs can simply ignore this additional
stuff, but software creating and analyzing mails can usually not. In
[Netmime], one can control whether one wants to see the parts or
not.

Logically, the parts of the mail body are small mail messages themselves.
This means that every part has again a header and a body. The header
can, in principal, contain any number of fields, and any kind of field,
but in practice only a small subset of the possible fields are used,
in particular only those fields that are necessary to describe the body of the 
part. The body can be a normal text or data block, but it is explicitly
also allowed that the body is again structured into a sequence of parts.
Thus complex mail messages are recursive data structures (to be exact,
they are trees).

For example, a message with two attachments usually looks like:
{[
  (mail_header, mail_body)
                 |
                 +-- (main_text_header, main_text_body)
                 +-- (att1_header, att1_body)
                 +-- (att2_header, att2_body)
]}

The headers contains two crucial fields that control the structure of
the message:

- The [Content-type] describes the kind of data found in the body,
  e.g. "text/html". When the [Content-type] has the major type
  "multipart" (e.g. "multipart/mixed"), the body is composed of 
  subparts. For all other types, the body is a leaf of the message
  tree. (To be exact, there is another major type that opens a further
  dimension of "message-in-message" composition: "message". This type
  is usually used when it is not clear whether the inner message is
  syntactically correct. [Netmime] handles this type always as
  leaf, but users of [Netmime] can try to parse these inner messages
  themselves.)
- The [Content-transfer-encoding] describes how the body data is
  encoded as ASCII text. It is usually only set for leaves.
  Recommended values are ["quoted-printable"] for bodies that
  contain some kind of ASCII text, and ["base64"] for binary
  data.

{2 Messages in [Netmime]}

In [Netmime], the types of mail headers and mail bodies are defined 
before and independent of their implementations: We have the 
types

- [class type mime_header]: Specification of possible header implementations
- [class type mime_body]: Specification of possible body implementations
- [type complex_mime_message]: The type of a message tree

and the implementations

- [class basic_mime_header]: A basic header implementation
- [class memory_mime_body]: A body implementation storing the contents
  in an O'Caml string in-memory
- [class file_mime_body]: A second body implementation storing the
  contents in an external file

Of course, the implementation classes fulfill the specifications of
the corresponding class types. For completeness, there are also
reduced read-only class types that maybe helpful for signatures
to indicate that a function does not modify a header or body.
In principal, one can also define further implementations provided
they fit to the class types.

The type [complex_mime_message] represents the message as a tree.
We have:
{[
type complex_mime_message = mime_header * complex_mime_body
and complex_mime_body =
  [ `Body of mime_body
  | `Parts of complex_mime_message list
  ]
]}
For example, the above mentioned mail with two attachments has
the following representation:

{[
let tree =
  (mail_header, `Parts [ (main_text_header, `Body main_text_body);
                         (att1_header, `Body att1_body);
                         (att2_header, `Body att2_body) ] )
]}

Here, [*_header] are objects of type [mime_header], and 
[*_body] are objects of type [mime_body]. It is obvious how to
create the tree once one has these objects: Just use the
syntax in this expression. Beginners of  O'Caml should recall
that it is as easy to decompose such structured values by using
the pattern matching feature of the language. For example, to get
the [main_text_header] of [tree], use

{[
let main_text_header =
  match tree with
      (_, `Parts ( (mth, _) :: _ )) -> mth
    | _ -> failwith "Message has unexpected structure"
]}

(Note that [ [x1;x2;...] ] is just an abbreviation for
[ x1 :: x2 :: ... :: [] ]; by switching to the "::" syntax
the message may have any number of parts in order to be
matching.) At the first glance, it looks a bit strange to
access the inner parts of mail messages in this way, but
pattern matching is a very powerful sword once one gets
accustomed to it.

Another hint: Because [complex_mime_message] is a quite
challanging type for the compiler, it is often necessary to
give type annotations, such as

[ (tree : complex_mime_message) ]

before passing such values to functions, otherwise you get compiler
errors.

{2 Accessing Headers}

It is easy to get and set the fields of headers, e.g.
[ mail_header # field "subject" ] returns the "Subject"
header field as string (or raises [Not_found]). The names of
header fields are case-insensitive. To set a field, use
[update_field], e.g.
[ mail_header # update_field "subject" "Ocamlnet is great" ].

The methods [field] and [update_field] process the field value
as unparsed string (the parsers do only very little preprocessing,
e.g. one can configure to remove all linefeeds). The module
{!Netmime_string} has a lot functions to parse and generate field
values with a certain syntax. For example, "Subject" may contain
so-called encoded words to express text written in a character
set other than ASCII. To parse this, use

{[
let subject = mail_header # field "subject" in
let word_list = Netmime_string.scan_encoded_text_value subject in
]}
Now, the words contained in [word_list] can be accessed with
a number of functions, e.g.
{[
let word_val = Netmime_string.get_decoded_word word in
let word_cset = Netmime_string.get_charset word
]}
Here, the string [word_val] is the word written in the character set
[word_cset].

For example, for the "Subject" field 

[=?iso-8859-1?q?this=20is=20some=20text?=]

this method returns a [word_list] with one word, and for this word
[word_val = "this is some text"] and [word_cset = "iso-8859-1"].

To create such structured header values, there is the function [write_value] 
in {!Netmime_string}. This function requires some more background beyond the
scope of this tutorial. As this function also supports folding of header
fields, we explain only this particular application.

Folding means that long header values must be split into several lines.
There is a soft limit of 78 bytes and a hard limit of 998 bytes
(not counting the end-of-line sequence). The soft limit only ensures that
values can be displayed in usual terminals or windows without needing horizontal
scrolling. Values exceeding the hard limit may be truncated in mail transport,
however. To fold a string [s] composed of words, first split it into its
[words], make atoms of them, format them with [write_value], and put the result into
the header field (note: this example can be programmed better, see below):

{[
let name = "Subject" in
let words = Str.split (Str.regexp "[ \t]+") s in
let atoms = List.map (fun w -> Netmime_string.Atom w) in
let buf = Buffer.create 100 in
let ch = new Netchannels.output_buffer buf in
Netmime_string.write_value 
  ~maxlen1:(78 - String.length name - 2)
  ~maxlen:78
  ~hardmaxlen1:(998 - String.length name - 2)
  ~hardmaxlen:998
  ch;
mail_header # update_field name (Buffer.contents buf)
]}

Unfortunately, there is no general method that can fold any kind
of string. The problem is that folding is only allowed at certain
places in the string, but this depends on the type of the header
field. The shown method works only for informational texts like
"Subject". For other fields, like "Received", the method would
have to be varied, especially how the list [atoms] is determined.
The syntax of the field must be known to compute [atoms].

In the module {!Netsendmail} you can find formatting and
folding functions for informational texts like "Subject",
and for mail addresses. With these functions, the "Subject"
field could also be set by

{[
let atoms = Netsendmail.create_text_tokens s in
mail_header # update_field 
  name (Netsendmail.format_field_value name atoms)
]}

{2 Accessing Bodies}

Both types of bodies (in-memory, and file) support the following two
ways of accessing:
- Get/set the value as O'Caml string
- Read/write the value as object channel (see {!Netchannels})

Note that when the value of a file-based body is changed, the file is
overwritten, independently of which of the two ways is taken.

The [string] access is very simple: To get the value, just call
[value]:

[ let s = body # value ]

To set the value, just call [set_value]:

[ body # set_value s ]

The string returned by  [value] is not transfer-encoded, or better,
all such encodings (e.g. BASE-64) are decoded. Of course, 
[set_value] expects that the passed string is not decoded, too.

Note that using [value] may be dangerous (or even fail) when the body
is stored in a file and is very large. [value] forces that the file
is completely read into memory. You may run into serious problems when
there is not enough memory, or when the value is larger than
[Sys.max_string_length] (16MB on 32 bit platforms).

Fortunately, there is the channel-based access method. It does not
need much memory, even when large bodies are accessed. However, one
does not get access to the completely body at once, but only chunk
by chunk. For example, to read a body line by line, use:

{[
let ch = body # open_value_rd() in
let line1 = ch # input_line() in
let line2 = ch # input_line() in
...
ch # close_in()
]}

As for [value], there are no transfer encodings in the returned lines.

The channel [ch] can be used whereever an Ocamlnet function allows it,
i.e. it is a full implementation. For example, one can pass it to the
HTML parser:

{[
let ch = body # open_value_rd() in
let html_doc = Nethtml.parse ch in
ch # close_in()
]}

To set the value using a channel, a body can also be opened for writing:

{[ 
let ch = body # open_value_wr() in
ch # output_string "First line\n";
ch # output_string "Second line\n";
...
ch # close_out()
]}

{2 Parsing mail messages}

The message to parse must be available as an object channel. Recall that
you can create an object channel from a string with

[ let ch = new Netchannels.input_string s ]

and from a file with

[ let ch = new Netchannels.input_channel (open_in "filename") ]

so one can parse mail messages coming from any source. As only sequential
access is needed, it is even possible to read directly from a Unix pipe.

Now, it is required to create a so-called netstream from [ch]:

[ let nstr = new Netstream.input_stream ch ]

A netstream is an object channel with additional look-ahead features.
We need it here because the parser can then recognize certain patterns in
the message in a simpler manner, for example the escape sequences
separating the parts of a structured body.

Finally, one can invoke the parser:

[ let tree = Netmime_channels.read_mime_message nstr ]

There are a number of optional arguments for this function that can
modify the way the message tree is generated. By default, all bodies
are created in memory, and the tree is deeply parsed (i.e. inner
multipart bodies are represented in tree form).

When bodies should be written to disk, the argument [storage_style]
can be passed: It is a function that is called whenever a header
has been parsed, but before the corresponding body. The function must
return the body object for representation and the output channel 
connected to the body object. For example, to write the bodies
into numbered files:

{[
let n = ref 1
let ext_storage_style header =
  let body = new file_mime_body ("file" ^ string_of_int !n) in
  incr n;
  (body, body#open_out_wr())
let tree = read_mime_message ~storage_style:ext_storage_style nstr 
]}

There is also the auxiliary function [storage] to create such a
storage style argument.

The [header] can be used to generate the file name from it. Often,
the [filename] argument of the [Content-disposition] field is the
original file name before the attachment was appended to the
mail message. To get this name:

{[
let filename =
  try
    let disp, disp_params = header # content_disposition() in
    (* disp is usually "attachment", but we don't check *)
    List.assoc "filename" disp_params
  with
    Not_found ->
       ...  (* No such paramater, use other method to gen filename *)
]}

It is usually a good idea to check for dangerous characters in this name
("/", "..") before constructing the name of the disk file.

A final remark: Don't forget to close [nstr] after parsing (this implicitly
closes [ch]).

{2 Creating Mail Messages}

For simple applications, the {!Netsendmail} module has a
{!Netsendmail.compose} function.
It can create a mail message with attachments, and performs all the
encoding details. This function is well explained in its module mli.

Of course, you can also do this yourself: Create the required headers
and bodies, and put them together to the resulting [tree].

Example:
{[ 
let date =
  Netdate.mk_mail_date ~zone:Netdate.localzone (Unix.time()) in
let mail_header =
  new basic_mime_header [ "MIME-version", "1.0";
                          "Subject", "Sample mail";
                          "To", "recipient\@domain.net";
                          "From", "sender\@domain.net";
                          "Date", date;
                          "Content-type", "multipart/mixed" ] in
let main_text_header =
  new basic_mime_header [ "Content-type", "text/plain;charset=ISO-8859-1";
                          "Content-transfer-encoding", "quoted-printable";
                        ] in
let main_text_body =
  new memory_mime_body "Hello world!\nThis is a sample mail.\n" in
let att_header =
  new basic_mime_header [ "Content-type", "image/jpeg";
                          "Content-transfer-encoding", "base64";
                          "Content-disposition", "inline;description=\"My photo\"";
                        ] in
let att_body =
  new file_mime_body "photo.jpeg" in
let tree =
  (mail_header, `Parts [ (main_text_header, `Body main_text_body);
                         (att_header, `Body att_body) ] )
]}

{2 Printing Mail Messages}

In order to print [tree] to the object channel [ch], simply call

[ Netmime_channels.write_mime_message ch tree ]

Before invoking this function, ensure the following:
- The [Content-type] field of all leaves should be set
- The [Content-transfer-encoding] field of all leaves should be set
  (in doubt use "base64"; if missing, the default is "7bit" -
  probably not what you want)
- The [Content-type] field of multipart nodes should be set (it 
  defaults to "multipart/mixed" if missing)
- The [Content-transfer-encoding] fields of multipart nodes should
  {b not} be set - this is done by the function

If the [boundary] parameter is missing, the function will invent one;
you don't need to deal with this.

The MIME message is written according to the found transfer encodings
and the multi-part boundaries.

Don't forget to close [ch] after writing!


This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml