module Plasma_client:Client access to the Plasma Filesystemsig..end
Plasmafs_protocol. It
explains all background concepts of the PlasmaFS protocol._e. There is always a
"normal", i.e. synchronous variant not returning engines
computing the result, but directly the result. The engines make
it possible to send queries asynchronously. For more information
about engines, see the module Uq_engines of Ocamlnet.Plasma_rpcapi_aux,
especially
type plasma_cluster
type plasma_trans
typeinode =int64
typeerrno =[ `eaccess
| `econflict
| `ecoord
| `eexist
| `efailed
| `efailedcommit
| `efbig
| `efhier
| `einval
| `eio
| `elongtrans
| `enametoolong
| `enoent
| `enoient
| `enonode
| `enospc
| `enotrans
| `eperm
| `erofs
| `etbusy ]
typetopology =[ `Chain | `Star ]
copy_inexception Plasma_error of errno
exception Cluster_down of string
val open_cluster : string ->
(string * int) list -> Unixqueue.event_system -> plasma_clusteropen_cluster name namenodes: Opens the cluster with these namenodes
(given as (hostname,port) pairs). The client automatically
determines which is the coordinator.val open_cluster_cc : Plasma_client_config.client_config ->
Unixqueue.event_system -> plasma_clusterPlasma_client_config.client_config object which
can in turn be obtained via Plasma_client_config.get_config.val buffer_stats : plasma_cluster -> stringval close_cluster : plasma_cluster -> unitval abort_cluster : plasma_cluster -> unitval cluster_name : plasma_cluster -> stringval cluster_namenodes : plasma_cluster -> (string * int) listopen_clusterval configure_buffer : plasma_cluster -> int -> unitconfigure_buffer c n: configures to use n buffers. Each buffer
is one block. These buffers are only used for buffered I/O, i.e.
for Plasma_client.read and Plasma_client.write, but not for
Plasma_client.copy_in and Plasma_client.copy_out.val configure_pref_nodes : plasma_cluster -> string list -> unitlocal_identities below), i.e. for enforcing that blocks
are allocated on the same machine, so far possible.val configure_shm_manager : plasma_cluster -> Plasma_shm.shm_manager -> unitval shm_manager : plasma_cluster -> Plasma_shm.shm_managerval blocksize_e : plasma_cluster -> int Uq_engines.engineval blocksize : plasma_cluster -> intval fsstat_e : plasma_cluster -> Plasma_rpcapi_aux.fsstat Uq_engines.engineval fsstat : plasma_cluster -> Plasma_rpcapi_aux.fsstatval local_identities_e : plasma_cluster -> string list Uq_engines.engineval local_identities : plasma_cluster -> string listconfigure_pref_nodes)plasma_trans value as argument must be
run inside a transaction. This means one has to first call start
to open the transaction, call then the functions covered by the
transaction, and then either commit or abort.
It is allowed to open several transactions simultaneously.
If you use the engine-based interface, it is important to
ensure that the next function in a transaction can first be
called when the current function has responded the result.
This restriction is only valid in the same transaction -
other transactions are totally independent in this respect.
val start_e : plasma_cluster -> plasma_trans Uq_engines.engineval start : plasma_cluster -> plasma_transval commit_e : plasma_trans -> unit Uq_engines.engineval commit : plasma_trans -> unitval abort_e : plasma_trans -> unit Uq_engines.engineval abort : plasma_trans -> unitval cluster : plasma_trans -> plasma_clusterval create_inode_e : plasma_trans ->
Plasma_rpcapi_aux.inodeinfo -> inode Uq_engines.engineval create_inode : plasma_trans ->
Plasma_rpcapi_aux.inodeinfo -> inode
At the end of the transaction inodes are automatically deleted that
do not have a name. Use link_e to assign names (below).
See also Plasma_client.create_file below, which immediately
links the inode to a name. See also Plasma_client.regular_ii,
Plasma_client.dir_ii, and Plasma_client.symlink_ii for
how to create inodeinfo values.
val delete_inode_e : plasma_trans -> inode -> unit Uq_engines.engineval delete_inode : plasma_trans -> inode -> unitval get_inodeinfo_e : plasma_trans ->
inode -> Plasma_rpcapi_aux.inodeinfo Uq_engines.engineval get_inodeinfo : plasma_trans ->
inode -> Plasma_rpcapi_aux.inodeinfoval set_inodeinfo_e : plasma_trans ->
inode -> Plasma_rpcapi_aux.inodeinfo -> unit Uq_engines.engineval set_inodeinfo : plasma_trans ->
inode -> Plasma_rpcapi_aux.inodeinfo -> unitcopy_in writes a local file to the cluster. copy_out
reads a file from the cluster and copies it into a local file.
Especially copy_in works only in units of whole blocks. The
function never reads a block from the filesystem, modifies it,
and writes it back. Instead, it writes the block with the data it
has, and if there is still space to fill, it pads the block with
zero bytes. If you need support for accessing parts of a block
only, better use the buffered access below.
The local file must be a seekable file (pipes etc. not supported).
It is possible to pass descriptors for shared memory files, though.
This way you can write data from RAM, or read into RAM.
val copy_in_e : plasma_cluster ->
inode ->
int64 ->
Unix.file_descr -> int64 -> topology -> unit Uq_engines.engineval copy_in : plasma_cluster ->
inode ->
int64 -> Unix.file_descr -> int64 -> topology -> unitcopy_in_e c inode pos fd len: Copies the data from the file descriptor
fd to the file given by inode. The data is taken from the current
position of the descriptor. Exactly len bytes are copied. The data
is written to position pos of the file referenced by the inode. If
it is written past the EOF position, the EOF position is advanced.
fd must be seekable.
topology says how to transfer data from the client to the data nodes.
`Star means the client organizes the writes to the data nodes as
independent streams. `Chain means that the data is first written to
one of the data nodes, and the replicas are transferred from there to
the next data node.
Limitation: pos must be a multiple of the blocksize. The file
is written in units of the blocksize.
copy_in performs its operations always in separate transactions.
val copy_out_e : plasma_cluster ->
inode ->
int64 -> Unix.file_descr -> int64 -> unit Uq_engines.engineval copy_out : plasma_cluster ->
inode -> int64 -> Unix.file_descr -> int64 -> unitcopy_out_e c inode pos fd len Copies the data from the file referenced
by inode to file descriptor fd. The data is taken from position
pos to pos+len-1 of the file, and it is written to the current
position of fd.
fd must be seekable.
Limitation: pos must be a multiple of the blocksize.
copy_out performs its operations always in separate transactions.
Plasma_client.configure_buffer.typestrmem =[ `Memory of Netsys_mem.memory | `String of string ]
read and write can be given as string or as bigarray
(memory). The latter is very advantageous: often, copying from
and to the buffer can be completely avoided, as only the page
table of the OS is modified ("copy-on-write" optimization).
However, the bigarray must start at a page boundary for getting
this effect (using map_file or one of the functions in
Ocamlnet's Netsys_mem). Also, the buffer should be quickly
discarded after the read or write, and it must not be reused for
further reads or writes (otherwise the copy is only delayed but
not avoided). Not meeting these requirements is not an error,
though - the only downside is that the read/write is slightly more
costly.val read_e : plasma_cluster ->
inode ->
int64 ->
strmem ->
int -> int -> (int * bool * Plasma_rpcapi_aux.inodeinfo) Uq_engines.engineval read : plasma_cluster ->
inode ->
int64 ->
strmem ->
int -> int -> int * bool * Plasma_rpcapi_aux.inodeinforead_e c inode pos s spos len: Reads data from inode, and returns
(n,eof,ii) where n is the number of read bytes, and eof the indicator
that EOF was reached. This number n may be less than len only
if EOF is reached. ii is the current inodeinfo.
Before a read is responded from a clean buffer it is checked whether
the buffer is still up to date.
val write_e : plasma_cluster ->
inode ->
int64 -> strmem -> int -> int -> int Uq_engines.engineval write : plasma_cluster ->
inode -> int64 -> strmem -> int -> int -> intwrite_e c inode pos s spos len: Writes data to inode and returns
the number of written bytes. This number n may be less than len for
arbitrary reasons (unlike read - to be fixed).
A write that is not aligned to a block implies that the old version
of the block is read first (if not available in a buffer). This is
a big performance penalty, and best avoided.
It is not ensured that the write is completed when the return value
becomes available. The write is actually done in the background,
and can be explicitly triggered with the flush_e operation. Also,
note that the write happens in a separate transaction. (With
"background" we do not mean a separate kernel thread, but an
execution thread modeled with engines.)
Writing also triggers that the EOF position is at least set to the position after the last written position. However, this is first done when the blocks are flushed in the background.
As writing happens in the background, some special attention has to be
paid for the way errors are reported. At the first error the write thread
stops, and an error code is set. This code is reported at the next
write or flush. After being reported, the code is cleared again.
Writing is not automatically resumed - only further write and
flush invocations will restart the writing thread. Also, the
data buffers are kept intact after errors - so everything will be
again tried to be written (which may run into the same error).
The function drop_inode can be invoked to drop all dirty buffers
of the inode in the near future.
val flush_e : plasma_cluster ->
inode -> int64 -> int64 -> unit Uq_engines.engineval flush : plasma_cluster -> inode -> int64 -> int64 -> unitflush_e inode pos len: Flushes all buffered data of inode from
pos to pos+len-1, or to the end of the file if len=0. This
ensures that data is really written.val drop_inode : plasma_cluster -> inode -> unitval flush_all_e : plasma_cluster -> unit Uq_engines.engineval flush_all : plasma_cluster -> unitval lookup_e : plasma_trans -> string -> inode Uq_engines.engineval lookup : plasma_trans -> string -> inode
Inside a transaction the filename is read-locked: a competing transaction
cannot delete/replace it while the transaction exists.
val dir_lookup_e : plasma_trans ->
inode -> string -> inode Uq_engines.engineval dir_lookup : plasma_trans ->
inode -> string -> inode
This also places a read-lock on the filename like lookup.
val rev_lookup_e : plasma_trans ->
inode -> string list Uq_engines.engineval rev_lookup : plasma_trans -> inode -> string listval link_count_e : plasma_trans -> inode -> int Uq_engines.engineval link_count : plasma_trans -> inode -> intval link_e : plasma_trans ->
string -> inode -> unit Uq_engines.engineval link : plasma_trans -> string -> inode -> unit
For directories there is the restriction that at most one name
may be linked with the inode.
val unlink_e : plasma_trans -> string -> unit Uq_engines.engineval unlink : plasma_trans -> string -> unit
This also works for directories! (They must be empty, of course.)
val list_inode_e : plasma_trans ->
inode -> (string * inode) list Uq_engines.engineval list_inode : plasma_trans ->
inode -> (string * inode) list
Note that this operation can result in `econflict (although read-only)
val list_e : plasma_trans ->
string -> (string * inode) list Uq_engines.engineval list : plasma_trans -> string -> (string * inode) list
Note that this operation can result in `econflict (although read-only)
val create_file_e : plasma_trans ->
string ->
Plasma_rpcapi_aux.inodeinfo -> inode Uq_engines.engineval create_file : plasma_trans ->
string -> Plasma_rpcapi_aux.inodeinfo -> inode`ftype_regular or `ftype_symlink.val mkdir_e : plasma_trans ->
string ->
Plasma_rpcapi_aux.inodeinfo -> inode Uq_engines.engineval mkdir : plasma_trans ->
string -> Plasma_rpcapi_aux.inodeinfo -> inodeval regular_ii : plasma_cluster -> int -> Plasma_rpcapi_aux.inodeinforegular_ii c mode: Creates an inodeinfo record for a new empty
regular file, where the mode field is set to mode modulo
the current maskval symlink_ii : plasma_cluster -> string -> Plasma_rpcapi_aux.inodeinforegular_ii c target: Creates an inodeinfo record for a symlink
pointing to targetval dir_ii : plasma_cluster -> int -> Plasma_rpcapi_aux.inodeinforegular_ii c mode: Creates an inodeinfo record for a new
directory, where the mode field is set to mode modulo
the current maskval get_blocklist_e : plasma_trans ->
inode ->
int64 -> int -> Plasma_rpcapi_aux.blockinfo list Uq_engines.engineval get_blocklist : plasma_trans ->
inode -> int64 -> int -> Plasma_rpcapi_aux.blockinfo listget_blocklist_e t inode block n Returns the list of blocks for
blocks block to blocks+n-1. This is useful for analyzing where
the blocks are actually physically stored.val retry : string -> int -> ('a -> 'b) -> 'a -> 'bretry name n f arg: Executes f arg and returns the result.
If a Plasma_error occurs the execution is repeated, up to n
times.
Errors are logged (Netlog). name is used in log output.