Plasma GitLab Archive
Projects Blog Knowledge

Module Pkv_api

module Pkv_api: sig .. end
Plasma Key Value API

PKV provides easy to use key/value database files that live in PlasmaFS with the following properties:

  • ACID: Full ACID support. All modifications are atomic, and leave consistent db files behind (even after program or system crashes). Integrity and durability are ensured by Plasma FS.
  • B-tree index: PKV maintains a B-tree index containing all keys. This ensures fast lookups.
  • Size: PKV databases are practically unlimited in size (all relevant file pointers are 64 bits). The only limit is that the transactional mode costs RAM, both in the PlasmaFS server and in the client (in total a few dozen bytes per data block). In non-transactional mode there is no RAM limit.
  • Network: PlasmaFS is accessible over the network, and hence PKV databases, too.
  • Replication: PKV files can have arbitrary replication factors (number of copies). This does not only improve data safety, but also increases the bandwidth for read accesses.
  • Concurrency: There can be any number of readers, and up to one writer at a time. The readers do not lock the writer out: While the writer modifies the db, the readers still see the consistent version of the db before the writer started the modification, and switch to the new version first when the writer finishes the modification.
  • Namenode not needed in query path: After the database has been opened in transactional mode for reading, namenode operations are no longer issued. This means that the load induced on the namenode is low, and is never a bottleneck when scaling the db up.
PKV is just a library, not a server.

There are also limitations:

  • The key length is limited, and the index preallocates space for keys.
There can only be one writer at a time. The functions will raise Plasma_client.Plasma_error `econflict when another writer is currently active.
type db 

An open database
type openflag = [ `Create of int | `Transactional ] 
val opendb : Plasma_client.plasma_cluster -> string -> openflag list -> db
opendb c basename flags: Opens the database using the client c. basename must be an absolute PlasmaFS path without suffix. The actual files have names:

  • This file contains the key/value pairs in sequential order
  • basename.del: This file points to key/value pairs that are considered as deleted, but that have not yet been physically eliminated.
  • basename.idx: This is the index making lookups by key possible.

  • `Create n: Create the db if the files are missing. The number n is the maximum key length.
  • `Transactional: Open the db in transactional mode. Only in this mode ACID properties are guaranteed. Opening takes longer because more metadata needs to be retrieved from the PlasmaFS server.

type openflag_e = [ `Transactional ] 
val opendb_e : Plasma_client.plasma_cluster ->
string -> openflag_e list -> db Uq_engines.engine
Almost the same as async function. The creation of new db's is not supported, though.
val max_key_size : db -> int
Return the maximum key size
val insert : db -> string -> string -> unit
insert db key value: Inserts a new entry, or replaces an existing entry with the same key.
val insert_large : db -> string -> int64 -> Netchannels.in_obj_channel -> unit
insert_large db key size ch: Inserts a new entry, or replaces an existing entry with the same key. The size of the value must be known and be passed as size. The value is obtained by reading from ch.
val insert_channel : db -> string -> int64 -> Netchannels.out_obj_channel
insert_channel db key size: Returns a channel into which the inserted value must be written. The channel must be closed. The value must have exactly the given size.

It is invalid to do any other modification to the db while the returned channel is open.

val delete : db -> string -> unit
delete db key: Deletes this key, if it exists
val lookup : db -> string -> string
lookup db key: Looks this key up, and returns it. Raises Not_found if not found.
val lookup_large : db ->
string -> string -> (int64 -> Netchannels.out_obj_channel) -> unit
lookup_large db key buf f: If the key is found, the function f is called with the size of the value. f must return an object channel into which the value is written.

buf is a string that is used as temporary buffer. It should not be too small, e.g. 64K. buf can be reused in the next lookup_large call.

If the key is not found, the function will raise Not_found

val lookup_large_e : db ->
string -> string -> (int64 -> Uq_io.out_device) -> unit Uq_engines.engine
Same as asynchronous function
val iterate : db -> (string -> unit) -> unit
iterate db f: Calls the function f with all keys, in index order.
val vacuum : db -> unit
Garbage-collects the db. This works by iterating over the whole db and writing a new version. The new version becomes visible when the vaccum run finishes (no commit necessary).

vacuum can run in parallel to other read accesses. Other writes are locked out.

vacuum implies a commit of the db, which is closed (see below).

val newer_version_available : db -> bool
In transactional mode, this function checks whether a concurrent writer has committed a new version in the meantime. The user can then re-open the database to roll forward to the newer version.

In non-transactional mode, this function returns only true if a vaccum run makes it necessary to re-open the database.

This function can raise Plasma_client.Plasma_error exceptions, especially when the database files are deleted or inaccessible.

val newer_version_available_e : db -> bool Uq_engines.engine
Same as async function
val commit : db -> unit
Commits changes, and closes the db.
val abort : db -> unit
Throws changes away, and closes the db. Only in transactional mode it is ensured that all changes can be undone.
val abort_e : db -> unit Uq_engines.engine
Same as async function
val reopen : db -> unit
Re-opens the db after commit or abort
val reopen_e : db -> unit Uq_engines.engine
Same as async function
This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml