Plasma GitLab Archive
Projects Blog Knowledge

Module Mapred_io


module Mapred_io: sig .. end
Utility library for record-based I/O


A file consists here of lines. Each line is terminated by LF, and is considered as a record.

The record reader can be used to iterate over a whole file, or only a part. For the latter, it is assumed that the file is processed block by block. Of course, it is then possible that the lines do not end at block boundaries. However, it must be determined whether the block reader for block N or the block reader for block N+1 processes such lines. The following rules do that:

  • The first block does not have this problem. Its first line is always processed by the block reader for block 0.
  • For the other blocks we define that the first processed line is the line starting after the first LF character in the block.
  • For all blocks we define that the last processed line is the line following the last LF character in the block. The line can be stored partly, even fully in the next block. The block reader has to read the next block to allow this.
  • A block must contain at least one LF character (and hence, have at least one line to process)
For best efficiency, the block reader should not be used for reading individual blocks, but for contiguous ranges of blocks.
class type record_reader = object .. end
class type record_writer = object .. end
val read_file : Plasma_client.plasma_cluster ->
string -> int64 -> int64 -> record_reader
read_file c name block len: Reads from name, starting at block, ending at block+len-1. Reading is done in a separate transaction.

Note that len>=1 is a requirement here.

The function configures the number of buffers of c.

The cluster c is set to aborted state when not used. Note that this also affects all transactions unrelated to read_file, so is best to create a separate plasma_cluster object for reading.

val read_multiple : (unit -> record_reader) list -> record_reader
Constructs a record reader that reads from the input readers one after the other
val write_file : Plasma_client.plasma_cluster -> string -> record_writer
write_file name: Appends records to this file (which must already exist). Writing is done in separate transactions.

The function configures the number of buffers of c.

As read_file, the cluster c is set to aborted state when not used. Note that this also affects all transactions unrelated to read_file, so is best to create a separate plasma_cluster object for writing.

val write_multiple : Plasma_client.plasma_cluster ->
string -> int64 -> (string -> int -> string) -> record_writer
write_multiple c prefix limit create: Writes into a sequence of files whose names are composed of prefix followed by an integer k. The files are created by calling create prefix k. A new file is started when the current file reaches the size limit (in bytes).
val create_file : ?repl:int -> Plasma_client.plasma_cluster -> string -> unit
create_file c name: Creates this file exclusively. repl is the replication factor, 0 by default (i.e. use server default).
val delete_file : Plasma_client.plasma_cluster -> string -> unit
Delete this file
val file_blocks : Plasma_client.plasma_cluster -> string -> int64
Get the length of the file in blocks
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml