PlasmaFS allows it to hot-add and hot-remove datanodes, without
interruption of the service. This can be done with a few commands,
explained here. We assume here that the sysop works from the
deployment directory clusterconfig
.
Concepts
Each datanode volume is uniquely identified by a random string, called the identity of the volume. It is absolutely required that the identity remains unique at all times.
Right now there can only be one volume on a node (this might change in the future). In the data directory you can find two files:
$ ls -l /data/plasma/data/
total 268435472
-rw-r--r-- 1 gerd gerd 41 2011-10-05 22:51 config
-rw-r--r-- 1 gerd gerd 274877906944 2011-10-11 17:20 data
The data
file contains the blocks. The config
file contains
the identity and the blocksize.
Caveat: It is possible to copy these files to another node in order to relocate the datanode volume. However, please be very careful! It must never happen that datanode servers are started on both nodes which would make both volumes available to the system. If you move data, make the original version inaccessible, e.g. by renaming the directory.
A volume can be in one of three states:
disabled
: This is the state just after creation, and it is intended
for maintenance. The volume is not available to the system.alive
: The datanode server is up, and serves requests for this
volume.dead
: The datanode server is down (unexpected)dead
volume will automatically be set to alive
when
the datanode server is back.
Listing volumes
The managedn_inst.sh
script can be used to list the datanodes, together
with their states and how much they are filled:
$ ./managedn_inst.sh list ssd2
f4f605ff5c7c09f40f395a4696087512 alive 3% 192.168.5.30:2728
a20b364f71009d51d4f1ed2687083e48 alive 3% 192.168.5.40:2728
4a7bcc383b30de2472490c121bc165e1 alive 0% 192.168.5.10:2728
OK
The "ssd2" argument is the name of the instance.
The columns:
With managedn_inst.sh
one can also disable volumes:
$ ./managedn_inst.sh disable ssd2 4a7bcc383b30de2472490c121bc165e1
OK
A disabled volume is no longer considered by the namenode for file operations. The datanode server remains up, though, and continues to respond to requests.
Note that it may take some time until no longer requests are emitted targeting the volume. All currently running transactions can be finished - depending on the transactions this can take a few seconds to a few minutes.
Disabling a volume is a safe way for removing the serving datanode from the system. Once the number of requests for this volume drops to zero the datanode server can be turned off.
It is also recommended to disable dead volumes - this prevents that requests are served by accident, e.g. when the crashed node is tried to be rebooted for maintenance.
Volumes remain disabled after a namenode restart.
Enabling volumes
With managedn_inst.sh
one can also enable volumes:
$ ./managedn_inst.sh enable ssd2 4a7bcc383b30de2472490c121bc165e1 192.168.5.10:2728
OK
Here, one has to pass another argument, saying where the volume can be found. The datanode server must be running on this machine. It is checked whether the server has the right volume.
After a volume has been enabled once, the system will search for it on the network whenever PlasmaFS starts up. (This search is done with multicast messages.)
Adding a new data node to the system
First, add the node to the file instances/<inst>/datanode.hosts
.
Install the PlasmaFS binaries:
$ ./deploy_inst.sh -only-host <host> <inst>
Replace here <host>
with the host name, and <inst>
with the instance
name.
Initialize and start the datanode:
$ ./initdn_inst.sh <inst> <size> <host>
(where <size>
is the amount of data in bytes - use K
, M
or G
suffix).
That's it!
Moving data disks from one node to the other
Imagine the machine breaks, but the disks remain intact. You want to put the disks into a different machine. How to do?
First: Disable the volume! (See above.) Once you did this, you can put the disks into the new machine.
Second: Add the new node to instances/<inst>/datanode.hosts
.
Remove the broken node.
Third: Install the PlasmaFS binaries on the new node:
$ ./deploy_inst.sh -only-host <host> <inst>
Replace here <host>
with the host name, and <inst>
with the instance
name.
Fourth: Start the datanode server:
$ ssh <host> <dir>/etc/rc_dn.ssh start
(Here, <dir>
is the installation prefix.)
Fifth: Enable the volume again (see above).