This is version: 0.6 "Rechenknecht". This is a beta release intended for broader testing.
Changes
Changed in 0.6.1
Fixes:
Invalid_argument("fd_layer_of_buf ...")
Invalid_argument("Plasma_util.RangeMap.sub")
)New features in PlasmaFS:
plasma
utility when a lot of files
are removed. This is no longer done in a single transaction, but in
several, and the number of files is restricted that can be removed
in one transaction. discovery { addr="<ip>" }
entries in the datanodes
section
of the namenode configuration.Plasmamr_toolkit
). The toolkit is
an abstract layer on top of the job execution engine, and allows it to
formulate map/reduce jobs in a functional way, and to compose
complex jobs from elementary ones.Mapred_fs
.Plasmamr_file_formats
.sorter
in Mapred_def.mapred_job
)combiner
in Mapred_def.mapred_job
)Mapred_fields
for easy parsing of fields.Mapred_io
) can now read the following blocks
in a separate kernel thread while the current blocks are still being
processed, resulting in better resource utilization.start_task_servers
automatically kills the still running
old instance of the task serverGenerally, PlasmaFS works as described in the documentation. Crashes have not been observed for quite some time now, but occasionally one might see critical exceptions in the log file.
PlasmaFS has so far only been tested on 64 bit, and only on Linux as operation system. There are known issues for 32 bit machines, especially the blocksize must not be larger than 4M, and certain buffers are restricted to 16M.
Data safety: Cannot be guaranteed. It is not suggested to put valuable data into PlasmaFS.
Known problems:
plasma put -stdin
is very slow if the input pipe is
frequently flushed. Workaround: Use "dd bs=1M iflags=fullblock"
to write in chunks of 1M to plasma
, e.g.
cat file | dd bs=1M iflags=fullblock | plasma put -stdin /file
(Actually, this is not a bug
in Plasma but in Ocamlnet 3.4.1, and will be resolved there.)ECONFLICT
errors. (This has been improved in 0.2, though.)Generally, Plasma MapReduce works as described in the documentation.
Not implemented features:
reduce
but no map
cannot be supported
due to the task scheme. (Reason: Input files for sort tasks must
not exceed sort_limit
.) Workaround: Use the identity as map
.