Plasma GitLab Archive
Projects Blog Knowledge

Cmd_plasma_admin



plasma_admin - managing namenodes

Synopsis

plasma_admin add_datanode      <nn_options> -size <blocks> <identity> ...
plasma_admin enable_datanode   <nn_options> <identity> <dn_host>:<dn_port>
plasma_admin disable_datanode  <nn_options> <identity>
plasma_admin list_datanodes    <nn_options>
plasma_admin destroy_datanode  <nn_options> <identity>
plasma_admin fsck              -conf <namenode_config_file>

where <nn_options>:

-namenode <host>:<port> -cluster <name>

Description

plasma_admin is used for making administrative changes to namenodes. Right now, there are two types of operations:

  • Managing the datanodes that are connected with the namenode cluster
  • Doing a consistency check of the namenode database

Managing datanodes

Every datanode has a unique identity which is created when the datanode is initialized (via Cmd_plasma_datanode_init). The identity is primarily stored on the disk of the datanode. The namenodes maintain a table of known datanode identities. Each identity can be disabled, enabled, or even be connected with a running datanode:

  • For a disabled identity it is not known on which machine the datanode server runs or might be running. Files can have blocks that are stored on a disabled identity, but these blocks are inaccessible as long as the identity remains disabled. The namenode never tries to allocate new blocks for disabled identities. This state is intended for temporarily removing a datanode server from the PlasmaFS cluster, e.g. for machine maintenance.
  • An enabled identity is usually connected with a running datanode, but in certain circumstances it is not. Actually, an enabled identity not associated to a server is an error condition. This especially occurs if the datanode crashes or is otherwise unavailable. Operationally, this state is handled in the same way as a disabled identity. The difference, however, is that it is tried to reconnect to the datanode once it is back up. (This function is not yet available in early PlasmaFS releases, though.)
  • An enabled and connected identity is backed by a running datanode server. It is fully operational.
The identity of a datanode is the permanent identifier that is stored in the namenode database. At runtime of the cluster, the hostname of the machine serving connected identities is also known to the namenode, but it is not stored on disk. Because of this, it is possible to relocate datanodes at runtime by disabling the identity, moving the files storing the data for the identity to a different machine, and re-enabling the identity there.

Note that the namenode config file also enumerates datanodes. This list of nodes, only given as host and port (and lacking identity strings), is only used for the automatic discovery of datanodes at cluster startup time. When the namenode server is started, it connects to the datanode servers listening to these ports, and automatically sets the state of these identities to enabled and connected if it finds the identities in the database, and the identities are enabled.

The following subcommands all require that the namenodes are up and running.

add_datanode subcommand

The add_datanode subcommand adds the identity on the command line to the database. It also puts size into the db, given as number of blocks. This size should match the size of the data file - however, this cannot be checked at add_datanode time. Be careful to pass the right size.

enable_datanode subcommand

The enable_datanode subcommand sets the identity to enabled, and tries to connect it to the datanode server listening on <dn_host> and <dn_port>. If the connection cannot be established, the identity is nevertheless enabled although it remains unconnected to a server.

disable_datanode subcommand

This subcommand disables the identity on the command-line.

list_datanodes subcommand

Lists the known identities and the associated states. Sample output:

20a7df016a330c6d5c5459e409edc14b  disabled    not associated to datanode
5df72474031271d41cf15c1ca6ef6a4f  enabled     not associated to datanode
47aa6893145cb75ebe3fab17d0a6521f  enabled     not associated to datanode
2c9c71b888e367ba75a36e6a6c46e2a8  enabled     192.168.5.30:2728
3347e7d4719a0a1333313a736621123a  enabled     192.168.5.40:2728
OK

destroy_datanode subcommand

The identity is entirely removed from the namenode database. This includes the block lists of the files, i.e. all information about the blocks for the identity is lost.

Managing the namenode database

fsck subcommand

This command checks the namenode database for inconsistencies. This especially includes the correctness of the blockmaps, i.e. the tables that store whether a block is free or allocated. The blockmaps are compared with the block lists attached to the inodes.

It is required that the namenode servers are down (i.e. not accessing the database at the same time).

The fsck subcommand does not try to repair the blockmaps.

Options:

  • -conf file: The file must be the config file of the namenode server. Effectively, only the parameters from the database section are interpreted.

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml