module Xdr_mstring:Managed Stringssig
..end
A managed string ms
is declared in the XDR file as in
typedef _managed string ms<>;
In the encoded XDR stream there is no difference between strings and
managed strings, i.e. the wire representation is identical. Only
the Ocaml type differs to which the managed string is mapped. This
type is Xdr_mstring.mstring
(below).
In the RPC context there is often the problem that the I/O backend would profit from a different string representation than the user of the RPC layer. To bridge this gap, managed strings have been invented. Generally, the user can determine how to represent strings (usually either as an Ocaml string, or as memory), and the I/O backend can request to transform to a different representation when this leads to an improvement (i.e. copy operations can be saved).
Only large managed strings result in a speedup of the program (at least several K).
How to practically use managed strings
There are two cases: The encoding case, and the decoding case.
In the encoding case the mstring
object is created by the user
and passed to the RPC library. This happens when a client prepares
an argument for calling a remote procedure, or when the server
sends a response back to the caller. In the decoding case the client
analyzes the response from an RPC call, or the server looks at the
arguments of an RPC invocation. The difference here is that in the
encoding case user code can directly create mstring
objects by
calling functions of this module, whereas in the decoding case the
RPC library creates the mstring
objects.
For simplicity, let us only look at this problem from the perspective of an RPC client.
Encoding. Image a client wants to call an RPC, and one of the
arguments is a managed string. This means we finally need an mstring
object that can be put into the argument list of the call.
This library supports two string representation specially: The normal
Ocaml string
type, and Netsys_mem.memory
which is actually just
a bigarray of char's. There are two factories fac
,
and both can be used to create the
mstring
to pass to the
RPC layer. It should be noted that this layer can process the
memory
representation a bit better. So, if the original data
value is a string, the factory for string
should be used, and
if it is a char bigarray, the factory for memory
should be used.
Now, the mstring
object is created by
let mstring = fac # create_from_string data pos len copy_flag
, or bylet mstring = fac # create_from_memory data pos len copy_flag
.fac
is the factory for strings, the create_from_string
method works better, and if fac
is for memory
, the create_from_memory
method works better. pos
and len
can select a substring of data
.
If copy_flag
is false
, the mstring
object does not copy the data
if possible, but just keeps a reference to data
until it is accessed;
otherwise if copy_flag
is true
, a copy is made immediately.
Of couse, delaying the copy is better, but this requires that data
is not modified until the RPC call is completed.
Decoding. Now, the call is done, and the client looks at the
result. There is also an mstring
object in the result. As noted
above, this mstring
object was already created by the RPC library
(and currently this library prefers string-based objects if not
told otherwise). The user code can now access this mstring
object with the access methods of the mstring
class (see below).
As these methods are quite limited, it makes normally only sense
to output the mstring
contents to a file descriptor.
The user can request a different factory for managed strings. The
function Rpc_client.set_mstring_factories
can be used for this
purpose. (Similar ways exist for managed clients, and for RPC servers.)
Potential. Before introducing managed strings, a clean analysis
was done how many copy operations can be avoided by using this
technique. Example: The first N bytes of a file are taken as
argument of an RPC call. Instead of reading these bytes into a
normal Ocaml string, an optimal implementation uses now a memory
buffer for this purpose. This gives:
memory
value), and the second copy
writes the data into the socket.Unix.read
and Unix.write
do a completely avoidable copy of the data which is prevented by
switching to Netsys_mem.mem_read
and Netsys_mem.mem_write
,
respectively. The latter two functions exploit an optimization
that is only possible when the data is memory
-typed.
The possible optimizations for the decoding side of the problem
are slightly less impressive, but still worth doing it.
Interface
class type mstring =object
..end
class type mstring_factory =object
..end
mstring
objects
val string_based_mstrings : mstring_factory
val memory_based_mstrings : mstring_factory
Bigarray.Array1.create
val paligned_memory_based_mstrings : mstring_factory
Netsys_mem.alloc_memory_pages
if available, and
Bigarray.Array1.create
if not.val memory_pool_based_mstrings : Netsys_mem.memory_pool -> mstring_factory
val length_mstrings : mstring list -> int
val concat_mstrings : mstring list -> string
val prefix_mstrings : mstring list -> int -> string
prefix_mstrings l n
: returns the first n
chars of the
concatenated mstrings l
as single stringval blit_mstrings_to_memory : mstring list -> Netsys_mem.memory -> unit
typenamed_mstring_factories =
(string, mstring_factory) Hashtbl.t