Plasma GitLab Archive
Projects Blog Knowledge

Cmd_plasmad


plasmad - daemon for datanodes and namenodes

Synopsis

plasmad -conf file [-fg] [-pid file]

Description

This is the daemon implementing datanode and namenode services. plasmad is a collection of services which can be selectively enabled from the configuration file. By choosing certain sets of services, one gets either a datanode or a namenode server.

The configuration file is in Netplex syntax, and also uses many features from this framework. See the documentation for Netplex which is available as part of the Ocamlnet library package. A working subset is described below.

Options

  • -conf file: Reads the configuration from this file. See below for details.
  • -fg: Prevents that the daemon detaches from the terminal and puts itself into the background.
  • -pid file: Writes this pid file once the service process is forked.

General configuration file layout

A config file generally looks like:

netplex {
  controller {
    socket_directory = "<socket_directory>";
    max_level = "debug";    (* Log level, also "info", "notice", "err", ... *)
    logging {
      ...
    }
  };
  service {
    name = "<service_name>";
    ...
  };
  <custom_section> {
    ...
  };
}

Without going too much into detail:

  • Sections have the form
     <name> { ... } 
  • Parameters have the form
     <name> = <value> 
  • Sequences of sections/parameters are delimited with ";"
  • Comments are between (* and *) (as in Ocaml)
  • Parameter values can be "strings", or integers (123), or floats (123.4), or bools (true/false)
The <socket_directory> is a place where the daemon puts runtime files like Unix Domain sockets. Each instance of a daemon must have a separate socket directory.

Logging

Log messages can go to stderr, to files, or to syslog. Please see the documentation in Netplex_log for details. A simple logging specification would be:

  logging { type = "stderr" }

Config file for datanodes

For a datanode the config file looks like:

netplex {
  controller {
    ... (* see above *)
  };
  datanode {
    clustername = "<name>";
    directory = "<data_dir>";
    blocksize = <blocksize>;  (* int *)
    io_processes = <p>;       (* int *)
    shm_queue_length = <q>;   (* int *)
    sync_period = <s>;        (* float *)
  };
  access {
    min_level = "auth";
    user { name = "proot"; password_file = "password_proot" };
    user { name = "pnobody"; password_file = "password_pnobody" };
  };
  service {
    name = "Dn_manager";
    protocol {
      name = "RPC";
      address {
        type = "internet";
        bind = "0.0.0.0:2728"
      };
      address {
        type = "local";
        path = "<rpc_socket>";
      }
    };
    processor {
      type = "dn_manager";
    };
    workload_manager {
      type = "constant";
      threads = 1;
    };
  };
}

Parameters:

  • clustername is the name of the PlasmaFS cluster. All namenode and datanode daemons must be configured for the same name.
  • datadir is a local directory where the datanode can store blocks. The daemon expects two files in this directory: config and data. These files can be created with the utility Cmd_plasma_datanode_init.
  • blocksize is the block size in bytes. Should be in the range 65536 (64K) to 67108864 (64M). The size must be divisible by the page size (4096). The block size of all datanodes must be the same.
  • io_processes is the number of I/O processes to start. Effectively, this is the number of parallel I/O requests the datanode server can submit to the kernel at the same time. A low number like 8 or 16 suffices in typical deployments.
  • shm_queue_length is the number of blocks the datanode server can buffer up in shared memory. These buffers are used for speeding the communication between the main datanode process and the I/O processes up. A small multiple of io_processes should be good.
  • sync_period says after how many seconds written blocks should be synced to disk. The higher the value the more efficient is the sync, but the longer clients have to wait until the sync is done. Values between 0.1 and 1.0 seem to be good.
  • rpc_socket: The path to a Unix Domain socket where the datanode can also be contacted in addition to the internet socket. The socket can live in the socket directory.
Parameters in access:

  • min_level: sets the security level for incoming connections. "none" means that no authentication is required. "auth" means that one of the listed users must connect. "int" means that additionally the integrity of the messages is protected with digital signatures. "priv" means that additionally the messages are encrypted. Note that this parameter only sets the minimum level the datanode requires. The actually effective level is determined by the client.
  • user: the names and passwords of the users that can connect

Config files for namenodes

For a namenode the config file looks like:

netplex {
  controller {
    ... (* see above *)
  };
  database {
    dbname = "<name_of_postgresql_database>";
    (* maybe more options, see below *)
  };
  namenodes {
    clustername="<cluster_name>";
    node_list = "<nn_list>";
    port = 2730;
    rank_script = "ip addr show label 'eth*' | grep link/ether | awk '{print $2}'"; (* see below *)
    inodecache { port = 2740 };
  };
  datanodes {
    node_list = "<dn_list>";
    port = 2728;
    blocksize = <blocksize>;
  };
  access {
    min_level = "priv";
    client_level = "priv";
    user { name = "proot"; password_file = "password_proot" };
    user { name = "pnobody"; password_file = "password_pnobody" };
  };
  service {
    name = "Nn_manager";
    protocol {
      name = "RPC";
      address {
        type = "internet";
        bind = "0.0.0.0:2730"
      };
      address {
        type = "local";
        path = "<manager_socket>";
      };
    };
    processor {
      type = "nn_manager";
    };
    workload_manager {
      type = "constant";
      threads = 1;
    };
  };
  service {
    name = "Nn_inodecache";
    protocol {
      name = "RPC";
      address {
        type = "internet";
        bind = "0.0.0.0:2740"
      };
      address {
        type = "container";
      };
    };
    processor {
      type = "nn_inodecache";
    };
    workload_manager {
      type = "constant";
      threads = 1;
    };
  };
}

Parameters in database:

Parameters in namenodes:
  • clustername is the name of the PlasmaFS cluster. All namenode and datanode daemons must be configured for the same name.
  • nn_list is a text file containing the names of the namenodes, one hostname a line.
  • The rank_script is quite a special parameter. Actually, one has to specify either rank or rank_script. rank is simply a string, and rank_script is a script writing this string to stdout. Every namenode instance must be configured with a different rank string. If there are two instances with the same string, the cluster will not start up. The above script is for Linux, and extracts MAC addresses from all eth* network interfaces. The rank string is used in the coordinator election algorithm. The node with the lexicographically smallest string wins.
  • A complete list of parameters can be found here: Nn_config.extract_node_config
Parameters in datanodes:
  • dn_list is a text file containing the names of the datanodes, one hostname a line. These datanodes are auto-discovered at cluster startup.
  • blocksize is the block size in bytes. Should be in the range 65536 (64K) to 67108864 (64M). The size must be divisible by the page size (4096). The block size of all nodes must be the same.
Parameters in access:
  • min_level: sets the security level for incoming connections. "none" means that no authentication is required. "auth" means that one of the listed users must connect. "int" means that additionally the integrity of the messages is protected with digital signatures. "priv" means that additionally the messages are encrypted. Note that this parameter only sets the minimum level the namenode requires. The actually effective level is determined by the client.
  • client_level: sets the security level for outgoing connections (i.e. between PlasmaFS daemons)
  • user: the names and passwords of the users (for both incoming and outgoing connections)
It is strongly advised to leave the security level at "priv" (i.e. maximum).

Other:

  • manager_socket: The path to a Unix Domain socket where the namenode can also be contacted in addition to the internet socket. The socket can live in the socket directory.

How to shut down the daemon

The orderly way for shutting down the daemon is the command

netplex-admin -sockdir <socket_directory> -shutdown

netplex-admin is part of the Ocamlnet distribution. The socket directory must be the configured socket directory.

It is also allowed to do a hard shutdown by sending SIGTERM signals to the process group whose ID is written to the pid file. There is no risk of data loss in the server because of the transactional design. However, clients may be well confused when the connections simply crash.

This web site is published by Informatikbüro Gerd Stolpmann
Powered by Caml