Cmd_nfs3d

nfs3d - daemon for the NFS bridge

Synopsis

nfs3d -conf file [-fg] [-pid file]

Description

This is the daemon acting as an NFS server and forwarding requests to the PlasmaFS cluster (namenodes and datanodes). The daemon implements the nfs and mountd programs of NFS version 3. There is no support for the nlockmgr protocol yet.

For security reasons, this daemon should only be bound to the local loopback network (127.0.0.1). NFSv3 is inherently insecure, as there are no authentication verifiers. It is possible and recommended to run the daemon on every machine that wants to mount the filesystem. This way, the security problems can be avoided, because the unprotected data exchange is then restricted to a circuit in the local machine.

An instance of the NFS bridge can only connect to a single PlasmaFS cluster.

The NFS bridge can only be contacted over TCP. There is no UDP support, and it is also not planned. NFS runs well over TCP.

NFS clients can mount PlasmaFS volumes as in (Linux syntax):

mount -o intr,port=2801,mountport=2800,nolock <host>:/<clustername> /mnt

Here, <host> is to be replaced by the machine running the NFS bridge (normally localhost). <clustername> is the name of the cluster. The port numbers might need adjustments - we assume the same numbers are used as in the examples.

NFS (version 3) only implements weak cache consistency: An NFS client usually caches data as long as nothing is known about a possible modification, and modifications can only be recognized by changed metadata (i.e. the mtime in the inode is changed after a write). Although NFS clients typically query metadata often, it is possible that data modifications remain unnnoticed. This is a problem in the NFS protocol, not in the bridge. The PlasmaFS protocol has better cache consistency semantics, especially it is ensured that a change of data is also represented as an update of the metadata. However, the different semantics may nevertheless cause incompatibilities. For example, it is allowed for a PlasmaFS client to change data without changing the mtime in the inode. Within the PlasmaFS system this is not a big problem, because there are other means to reliably detect the change. An NFS client connected via this bridge might not see the update, though, and may continue to pretend that its own cache version is up to date. All in all, it is expected that these problems are mostly of theoretical nature, and will usually not occur in practice.

NFS version 3 can deal with large blocks in the protocol, and some client implementations also support that. For example, the Linux client supports block sizes up to 1M automatically, i.e. this is the maximum transmission unit for reads and writes. Independently of the client support, the NFS bridge translates the sizes of the data blocks used in the NFS protocol to what the PlasmaFS protocol requires. This means that the NFS bridge can handle the case that the client uses data sizes smaller than the PlasmaFS block size. There is a performance loss, though.

Especially for write accesses, it should be avoided that the blocksize is larger than the maximum blocksize the NFS client can support. Otherwise there might be an extreme performance loss. (Actually, this is a problem in NFS clients, and cannot be worked around in the server.)

PlasmaFS does not keep the count of the hard links a file has. Because of this, the NFS bridge always reports this count as 1.

Mapping principals

NFS uses numeric UIDs and GIDs to identify users and groups while PlasmaFS prefers names. Because of this, the numeric identifiers need to mapped to names (and vice versa).

The daemon just consults the local /etc/passwd and /etc/group files to do the mappping. This is correct if the filesystem is mounted via the local loopback network (i.e. for the recommended configuration), and it is better than everything else if the filesystem is mounted over a real network.

For simplicity, the daemon just believes the group memberships the NFS client claims, i.e. the memberships are not verified with the PlasmaFS namenode. Because of this, it is possible to have different group memberships via NFS than via using the PlasmaFS protocol directly. (This might be fixed in a future release.)

Persistent mounts

The NFS bridge stores mounts in a PlasmaFS file under /.plasma/var/lib/nfs3. Because of this, the mounts survive restarts of the bridge (or other PlasmaFS components).

Options

-conf file: Reads the configuration from this file. See below for details.
-fg: Prevents that the daemon detaches from the terminal and puts itself into the background.
-pid file: Writes this pid file once the service process is forked.

Configuration

The configuration file is in Netplex syntax, and also uses many features from this framework. See the documentation for Netplex which is available as part of the Ocamlnet library package. There are also some explanations here: Cmd_plasmad.

The config file looks like:

netplex {
  controller {
    ... (* see plasmad documentation *)
  };
  namenodes {
    clustername = "<name>";
    node_list = "<nn_list>";
    port = 2730;
    buffer_memory = 134217728;
  };
  access {
    user { name = "proot"; password_file = "password_proot" };
    user { name = "pnobody"; password_file = "password_pnobody" };
  };
  service {
    name = "Nfs3";
    protocol {
      name = "mount3";
      address {
        type = "internet";
        bind = "127.0.0.1:2800"
      }
    };
    protocol {
      name = "nfs3";
      address {
        type = "internet";
        bind = "127.0.0.1:2801"
      }
    };
    processor {
      type = "nfs";
      nfs3 { };
      mount3 { };
    };
    workload_manager {
      type = "constant";
      threads = 1;
    };
  };

Parameters:

clustername is the name of the PlasmaFS cluster.
node_list is a text file containing the names of the namenodes, one hostname a line.
buffer_memory configures how large the internal buffer is that the NFS bridge uses. Bigger buffers improve performance.

It is not advisable to use the official NFS ports, or to register the NFS ports with portmapper.

How to shut down the daemon

First, one should unmount all NFS clients. There is no way for an NFS server to enforce unmounts (i.e. that clients write all unsaved data).

The orderly way for shutting down the daemon is the command

netplex-admin -sockdir <socket_directory> -shutdown

netplex-admin is part of the Ocamlnet distribution. The socket directory must be the configured socket directory.

It is also allowed to do a hard shutdown by sending SIGTERM signals to the process group whose ID is written to the pid file. There is no risk of data loss in the server because of the transactional design. However, clients may be well confused when the connections simply crash.

This web site is published by Informatikbüro Gerd Stolpmann

Plasma	GitLab	Archive
Projects	Blog	Knowledge