Plasma_release_0_4

Release Notes For Plasma

This is version: 0.4 "Zauberwald". This is an alpha release to make Plasma known to interested developers.

Changes

Changed in 0.4

New features in PlasmaFS:

Added a security system - clients have now to authenticate, and it is possible to encrypt all network communication. (Note that this only holds for the PlasmaFS part of the system, and not yet for PlasmaMR.)
Added access-control checks: when accessing files it is actually checked whether the client is authorized
Various new subcommands in the plasma command-line utility (e.g. chmod and params)
The NFS bridge can translate between user names and numeric user IDs
Datanodes are discovered and monitored with multicast messages
Dead datanodes are recognized

Implementation improvements in PlasmaFS:

The namenode can now run in true multi-process mode, and thus use several CPU cores of the machine
Improved caches for inodes. New cache for directory entries.
Improved scheme for managing datanode tickets.
Improved random number generation.
Fix: performance of the cp subcommand of the plasma utility

New features in the map/reduce framework:

Added a new task type, Emap

Implementation improvements in the map/reduce framework:

Automatic setting of memory limits, especially for shared memory.
It is no longer required that the file buffers have at least the size of a bigblock.

Compatibility:

Existing PlasmaFS filesystems are incompatible (db schema changes)
There are incompatible protocol changes

What is working and not working in PlasmaFS

Generally, PlasmaFS works as described in the documentation. Crashes have not been observed for quite some time now, but occasionally one might see critical exceptions in the log file.

PlasmaFS has so far only been tested on 64 bit, and only on Linux as operation system. There are known issues for 32 bit machines, especially the blocksize must not be larger than 4M.

Data safety: Cannot be guaranteed. It is not suggested to put valuable data into PlasmaFS.

Known problems:

It is still unclear whether the timeout settings are acceptable.
There might be name clashes for generated file names. Right now it is assumed that the random number generator returns unique names, but this is for sure not the case.
The generated inode numbers are not necessarily unique after namenode restarts.

Not implemented features:

There are too many hard-coded constants.
The file name read/lookup functions should never return ECONFLICT errors. (This has been improved in 0.2, though.)
Support for checksums
Support for "host groups", so that it is easier to control which machines may store which blocks. Semantics have to be specified yet.
Define how blocks are handled that are allocated but never written.
Recognition of the death of the coordinator, and restart of the election algorithm.
Lock manager (avoid that clients have to busy wait on locks)
Restoration of missing replicas
Rebalancing of the cluster
Automated copying of the namenode database to freshly added namenode slaves

What is working and not working in Plasma MapReduce

Not implemented features:

Task servers should be able to provide several kinds of jobs
Think about dynamically extensible task servers
Run jobs only defining map but no reduce.
Support for combining (an additional fold function run after each shuffle task to reduce the amount of data)
nice web interface
support user counters as in Hadoop
restart/relocation of failed tasks
recompute intermediate files that are no longer accessible due to node failures
Speculative execution of tasks
Support job management (remember which jobs have been run etc.)

What we will never implement:

Jobs only consisting of reduce but no map cannot be supported due to the task scheme. (Reason: Input files for sort tasks must not exceed sort_limit.)

This web site is published by Informatikbüro Gerd Stolpmann

Plasma	GitLab	Archive
Projects	Blog	Knowledge