Plasma GitLab Archive
Projects Blog Knowledge

Plasmafs_and_hdfs


Feature comparison of PlasmaFS and HDFS

HDFS is another user-space filesystem that was originally developed for map/reduce.

Feature				PlasmaFS		HDFS
---------------------------------------------------------------------------
Supported blocksizes		any			any
				recommended 64K-1M	recommended >= 64M

Blocksize can be set for
  each file separately		no			yes

System can allocate blocks
  in contiguous regions		yes (all blocks are	no (each block is a
     				stored in a single	separate file)
				file)

Number of datanodes is
  limited by RAM in namenode	yes			yes

Number of files is limited
  by RAM in namenode		no			yes

Replication can be set for
  each file separately		yes			yes

Client communicates directly
  with datanodes		yes			yes

Block checksums			no			yes

Random read access to files	yes			yes

Random write access to files	yes			no
				(blocks can only be	(at most, files can
				replaced but not	be appended to after
				overwritten) 		creation, and are other-
							wise immutable)

Directory hierarchy		yes			yes

Symbolic links			yes			yes

POSIX file semantics		yes (few exceptions,	no
      	   			see [1] below)

Authentication system		yes			partially

Encrypted data communication	optional		partially

Authorization system		yes			yes

Several namenode operations
  can be bundled in an
  atomic transaction		yes			no

Accesses to file contents
can have ACID semantics		yes			no

Namenode crashes can lead to
  data loss			no (2-phase commit)	yes
       	     	       		
Datanode crashes are handled
  automatically (fail-over)	yes			yes

Namenode crashes are handled
  automatically (fail-over)	not yet			no
  				(but planned)
				so far: auto-selection
				of live coordinator at
				startup time

Datanode configuration can
  be changed w/o restart
  (e.g. add node, del node)	yes			no

Namenodes profit from SSDs	high			low

Filesystem can be mounted	yes (NFS bridge)	no
							(fuse? unclear)

Rebalancing    	  		not yet			yes
				(but planned)

Communication to local
  datanode servers via shared
  memory			yes			no

Primary access method		SunRPC from any		ad-hoc protocol
	       			language    		(undocumented)

Clients available		Ocaml			Java
				Access from any
				language via NFS


1 POSIX semantics: PlasmaFS supports not only random reads and writes, but also more complicated aspects of POSIX. In particular:

  • It is possible to delete file names while the files are still accessed. A file without name is possible as long there is still a transaction accessing it by inode number. As the user has full control when to start and finish a transaction (comparable to opening and closing a file), this state can endure for long periods of time.
  • Many operations are atomic, e.g. renames. It is even possible to make almost any sequence of file operations atomic by running them in the same transaction.
  • Holes in files are supported. It is even possible to recognize file holes, and to cut new holes into files at any block range.
  • Files can be created exclusively
There are, of course, also deviations and weaknesses:
  • The access rights for directories are incompletely implemented. In particular, the "x" bit in the file mode is ignored. This corresponds to the fact that files can be opened by inode number, defeating the purpose of the "x" bit.
  • No support for BSD groups ("s" mode bit for directories)
  • Appends are possible but only poorly supported. In particular, a file is exclusively locked while it is write-accessed, so an append operation is always atomic. However, if there are several appenders, it is likely that they lock each other out. The result is that appends are slow.
  • Random writes are implemented by allocating replacement blocks for the modified blocks. Of course, this leads to a non-contiguous block allocation pattern.
  • No support for special files (fifos, devices, sockets)
  • No support for user-driven locks (i.e. lockf API)
There are also some points where PlasmaFS implements much more than POSIX demands:
  • Accesses to files can be isolated from each other such that a transaction is bound to the snapshot taken at the beginning of the transaction. This leads to interesting concurrency schemes, comparable to SQL database accesses.

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml