Structure of Mail Messages
Nowadays mail messages are in MIME format. This format allows us to
attach files to messages, and to encode the main text in markup
languages like HTML. In principle, mail messages have only one header
block (with fields like "Subject", sender and receiver addresses, etc.)
and one body block. However, this is only one view on the mail format,
e.g. as seen by MTAs (mail transfer agents). The MIME format adds the
possibility to structure the body block into "parts" by additional
encoding sequences. The MTAs can simply ignore this additional
stuff, but software creating and analyzing mails can usually not. In
Netmime, one can control whether one wants to see the parts or
Logically, the parts of the mail body are small mail messages themselves. This means that every part has again a header and a body. The header can, in principal, contain any number of fields, and any kind of field, but in practice only a small subset of the possible fields are used, in particular only those fields that are necessary to describe the body of the part. The body can be a normal text or data block, but it is explicitly also allowed that the body is again structured into a sequence of parts. Thus complex mail messages are recursive data structures (to be exact, they are trees).
For example, a message with two attachments usually looks like:
(mail_header, mail_body) | +-- (main_text_header, main_text_body) +-- (att1_header, att1_body) +-- (att2_header, att2_body)
The headers contains two crucial fields that control the structure of the message:
Content-typedescribes the kind of data found in the body, e.g. "text/html". When the
Content-typehas the major type "multipart" (e.g. "multipart/mixed"), the body is composed of subparts. For all other types, the body is a leaf of the message tree. (To be exact, there is another major type that opens a further dimension of "message-in-message" composition: "message". This type is usually used when it is not clear whether the inner message is syntactically correct.
Netmimehandles this type always as leaf, but users of
Netmimecan try to parse these inner messages themselves.)
Content-transfer-encodingdescribes how the body data is encoded as ASCII text. It is usually only set for leaves. Recommended values are
"quoted-printable"for bodies that contain some kind of ASCII text, and
"base64"for binary data.
Netmime, the types of mail headers and mail bodies are defined
before and independent of their implementations: We have the
class type mime_header: Specification of possible header implementations
class type mime_body: Specification of possible body implementations
type complex_mime_message: The type of a message tree
class basic_mime_header: A basic header implementation
class memory_mime_body: A body implementation storing the contents in an O'Caml string in-memory
class file_mime_body: A second body implementation storing the contents in an external file
complex_mime_message represents the message as a tree.
For example, the above mentioned mail with two attachments has the following representation:
type complex_mime_message = mime_header * complex_mime_body and complex_mime_body = [ `Body of mime_body | `Parts of complex_mime_message list ]
let tree = (mail_header, `Parts [ (main_text_header, `Body main_text_body); (att1_header, `Body att1_body); (att2_header, `Body att2_body) ] )
*_header are objects of type
*_body are objects of type
mime_body. It is obvious how to
create the tree once one has these objects: Just use the
syntax in this expression. Beginners of O'Caml should recall
that it is as easy to decompose such structured values by using
the pattern matching feature of the language. For example, to get
let main_text_header = match tree with (_, `Parts ( (mth, _) :: _ )) -> mth | _ -> failwith "Message has unexpected structure"
[x1;x2;...] is just an abbreviation for
x1 :: x2 :: ... ::  ; by switching to the "::" syntax
the message may have any number of parts in order to be
matching.) At the first glance, it looks a bit strange to
access the inner parts of mail messages in this way, but
pattern matching is a very powerful sword once one gets
accustomed to it.
Another hint: Because
complex_mime_message is a quite
challanging type for the compiler, it is often necessary to
give type annotations, such as
(tree : complex_mime_message)
before passing such values to functions, otherwise you get compiler errors.
It is easy to get and set the fields of headers, e.g.
mail_header # field "subject" returns the "Subject"
header field as string (or raises
Not_found). The names of
header fields are case-insensitive. To set a field, use
mail_header # update_field "subject" "Ocamlnet is great" .
update_field process the field value
as unparsed string (the parsers do only very little preprocessing,
e.g. one can configure to remove all linefeeds). The module
Mimestring has a lot functions to parse and generate field
values with a certain syntax. For example, "Subject" may contain
so-called encoded words to express text written in a character
set other than ASCII. To parse this, use
Now, the words contained in
let subject = mail_header # field "subject" in let word_list = Mimestring.scan_encoded_text_value subject in
word_listcan be accessed with a number of functions, e.g.
Here, the string
let word_val = Mimestring.get_decoded_word word in let word_cset = Mimestring.get_charset word
word_valis the word written in the character set
For example, for the "Subject" field
this method returns a
word_list with one word, and for this word
word_val = "this is some text" and
word_cset = "iso-8859-1".
To create such structured header values, there is the function
Mimestring. This function requires some more background beyond the
scope of this tutorial. As this function also supports folding of header
fields, we explain only this particular application.
Folding means that long header values must be split into several lines.
There is a soft limit of 78 bytes and a hard limit of 998 bytes
(not counting the end-of-line sequence). The soft limit only ensures that
values can be displayed in usual terminals or windows without needing horizontal
scrolling. Values exceeding the hard limit may be truncated in mail transport,
however. To fold a string
s composed of words, first split it into its
words, make atoms of them, format them with
write_value, and put the result into
the header field (note: this example can be programmed better, see below):
let name = "Subject" in let words = Str.split (Str.regexp "[ \t]+") s in let atoms = List.map (fun w -> Mimestring.Atom w) in let buf = Buffer.create 100 in let ch = new Netchannels.output_buffer buf in Mimestring.write_value ~maxlen1:(78 - String.length name - 2) ~maxlen:78 ~hardmaxlen1:(998 - String.length name - 2) ~hardmaxlen:998 ch; mail_header # update_field name (Buffer.contents buf)
Unfortunately, there is no general method that can fold any kind
of string. The problem is that folding is only allowed at certain
places in the string, but this depends on the type of the header
field. The shown method works only for informational texts like
"Subject". For other fields, like "Received", the method would
have to be varied, especially how the list
atoms is determined.
The syntax of the field must be known to compute
In the module
Netsendmail you can find formatting and
folding functions for informational texts like "Subject",
and for mail addresses. With these functions, the "Subject"
field could also be set by
let atoms = Netsendmail.create_text_tokens s in mail_header # update_field name (Netsendmail.format_field_value name atoms)
Both types of bodies (in-memory, and file) support the following two ways of accessing:
string access is very simple: To get the value, just call
let s = body # value
To set the value, just call
body # set_value s
The string returned by
value is not transfer-encoded, or better,
all such encodings (e.g. BASE-64) are decoded. Of course,
set_value expects that the passed string is not decoded, too.
Note that using
value may be dangerous (or even fail) when the body
is stored in a file and is very large.
value forces that the file
is completely read into memory. You may run into serious problems when
there is not enough memory, or when the value is larger than
Sys.max_string_length (16MB on 32 bit platforms).
Fortunately, there is the channel-based access method. It does not need much memory, even when large bodies are accessed. However, one does not get access to the completely body at once, but only chunk by chunk. For example, to read a body line by line, use:
let ch = body # open_value_rd() in let line1 = ch # input_line() in let line2 = ch # input_line() in ... ch # close_in()
value, there are no transfer encodings in the returned lines.
ch can be used whereever an Ocamlnet function allows it,
i.e. it is a full implementation. For example, one can pass it to the
let ch = body # open_value_rd() in let html_doc = Nethtml.parse ch in ch # close_in()
To set the value using a channel, a body can also be opened for writing:
let ch = body # open_value_wr() in ch # output_string "First line\n"; ch # output_string "Second line\n"; ... ch # close_out()
Parsing mail messages
The message to parse must be available as an object channel. Recall that you can create an object channel from a string with
let ch = new Netchannels.input_string s
and from a file with
let ch = new Netchannels.input_channel (open_in "filename")
so one can parse mail messages coming from any source. As only sequential access is needed, it is even possible to read directly from a Unix pipe.
Now, it is required to create a so-called netstream from
let nstr = new Netstream.input_stream ch
A netstream is an object channel with additional look-ahead features. We need it here because the parser can then recognize certain patterns in the message in a simpler manner, for example the escape sequences separating the parts of a structured body.
Finally, one can invoke the parser:
let tree = read_mime_message nstr
There are a number of optional arguments for this function that can modify the way the message tree is generated. By default, all bodies are created in memory, and the tree is deeply parsed (i.e. inner multipart bodies are represented in tree form).
When bodies should be written to disk, the argument
can be passed: It is a function that is called whenever a header
has been parsed, but before the corresponding body. The function must
return the body object for representation and the output channel
connected to the body object. For example, to write the bodies
into numbered files:
let n = ref 1 let ext_storage_style header = let body = new file_mime_body ("file" ^ string_of_int !n) in incr n; (body, body#open_out_wr()) let tree = read_mime_message ~storage_style:ext_storage_style nstr
There is also the auxiliary function
storage to create such a
storage style argument.
header can be used to generate the file name from it. Often,
filename argument of the
Content-disposition field is the
original file name before the attachment was appended to the
mail message. To get this name:
let filename = try let disp, disp_params = header # content_disposition() in (* disp is usually "attachment", but we don't check *) List.assoc "filename" disp_params with Not_found -> ... (* No such paramater, use other method to gen filename *)
It is usually a good idea to check for dangerous characters in this name ("/", "..") before constructing the name of the disk file.
A final remark: Don't forget to close
nstr after parsing (this implicitly
Creating Mail Messages
For simple applications, the
Netsendmail module has a
It can create a mail message with attachments, and performs all the
encoding details. This function is well explained in its module mli.
Of course, you can also do this yourself: Create the required headers
and bodies, and put them together to the resulting
let date = Netdate.mk_mail_date ~zone:Netdate.localzone (Unix.time()) in let mail_header = new basic_mime_header [ "MIME-version", "1.0"; "Subject", "Sample mail"; "To", "recipient\@domain.net"; "From", "sender\@domain.net"; "Date", date; "Content-type", "multipart/mixed" ] in let main_text_header = new basic_mime_header [ "Content-type", "text/plain;charset=ISO-8859-1"; "Content-transfer-encoding", "quoted-printable"; ] in let main_text_body = new memory_mime_body "Hello world!\nThis is a sample mail.\n" in let att_header = new basic_mime_header [ "Content-type", "image/jpeg"; "Content-transfer-encoding", "base64"; "Content-disposition", "inline;description=\"My photo\""; ] in let att_body = new file_mime_body "photo.jpeg" in let tree = (mail_header, `Parts [ (main_text_header, `Body main_text_body); (att_header, `Body att_body) ] )
Printing Mail Messages
In order to print
tree to the object channel
ch, simply call
write_mime_message ch tree
Before invoking this function, ensure the following:
Content-typefield of all leaves should be set
Content-transfer-encodingfield of all leaves should be set (in doubt use "base64"; if missing, the default is "7bit" - probably not what you want)
Content-typefield of multipart nodes should be set (it defaults to "multipart/mixed" if missing)
Content-transfer-encodingfields of multipart nodes should not be set - this is done by the function
boundaryparameter is missing, the function will invent one; you don't need to deal with this.
The MIME message is written according to the found transfer encodings and the multi-part boundaries.
Don't forget to close
ch after writing!