module Mapred_io:Utility library for record-based I/Osig
..end
The record reader can be used to iterate over a whole file, or only a part. For the latter, it is assumed that the file is processed block by block. Of course, it is then possible that the lines do not end at block boundaries. However, it must be determined whether the block reader for block N or the block reader for block N+1 processes such lines. The following rules do that:
class type record_reader =object
..end
class type record_writer =object
..end
val read_file : Plasma_client.plasma_cluster ->
string -> int64 -> int64 -> record_reader
read_file c name block len
: Reads from name
, starting at block
,
ending at block+len-1
. Reading is done in a separate transaction.
Note that len>=1 is a requirement here.
The function configures the number of buffers of c
.
The cluster c
is set to aborted state when not used. Note that this
also affects all transactions unrelated to read_file
, so is best
to create a separate plasma_cluster
object for reading.
val read_multiple : (unit -> record_reader) list -> record_reader
val write_file : Plasma_client.plasma_cluster -> string -> record_writer
write_file name
: Appends records to this file (which must already
exist). Writing is done in separate transactions.
The function configures the number of buffers of c
.
As read_file
, the cluster c
is set to aborted state when not used.
Note that this also affects all transactions unrelated to read_file
,
so is best to create a separate plasma_cluster
object for writing.
val write_multiple : Plasma_client.plasma_cluster ->
string -> int64 -> (string -> int -> string) -> record_writer
write_multiple c prefix limit create
: Writes into a sequence of files
whose names are composed of prefix
followed by an integer k
. The
files are created by calling create prefix k
. A new file is started
when the current file reaches the size limit
(in bytes).val create_file : ?repl:int -> Plasma_client.plasma_cluster -> string -> unit
create_file c name
: Creates this file exclusively. repl
is
the replication factor, 0 by default (i.e. use server default).val delete_file : Plasma_client.plasma_cluster -> string -> unit
val file_blocks : Plasma_client.plasma_cluster -> string -> int64