Remember that you can have several namenodes. In this case, one of the nodes becomes the "master" (or coordinator), and the other nodes are just kept synchronized. Because of the two-phase commit, the replica nodes never lag behind - the database is either modified on all namenodes or on no namenode (in case of an error). This is different from other replication techniques where the replicas are not immediately updated, and so a data loss occurs when the master crashes.
In PlasmaFS all namenodes are created equal: Because the nodes are exactly synchronous to each other, every node can become the master at any time. Right now, PlasmaFS selects the master only at cluster startup but this might change in the future. This means that PlasmaFS handles crashes of the master node nicely: Just restart the cluster, and a different node becomes the master.
There are some configurations in namenode.conf
that are important here:
alive_min
: This is the minimum number of namenodes that must be
alive. If further nodes die, the cluster is halted. This can be an
integer number or "max", which means all configured namenodes must
be up.alive_min_startup
: Also the minimum number of live namenodes.
However, this value is only checked at cluster startup time.Nn_config
and explained in
Nn_config.nn_node_config
.
clusterconfig
):
$ ./rc_all.sh stop <inst>
instances/<inst>/namenode.hosts
. Check whether
settings need to be changed in clusterconfig/<inst>/namenode.conf
.clusterconfig
):
$ ./deploy_inst.sh <inst>
clusterconfig
):
$ ./rc_all.sh start <inst>
Note: There is no performance benefit. You "only" get additional data safety.
You can do the following steps also to replace a damaged namenode.
clusterconfig
):
$ ./rc_all.sh stop <inst>
$ pg_dump -Fc -f data.dump plasma_<inst>
dropdb
):
$ dropdb plasma_<inst>
$ pg_restore -C -d template1 data.dump
(Of course, this assumes that PostgreSQL is running on the new node.
Also, you might have to configure it first, at least createuser
.)instances/<inst>/namenode.hosts
. Check whether
settings need to be changed in instances/<inst>/namenode.conf
.clusterconfig
):
$ ./deploy_inst.sh <inst>
clusterconfig
):
$ ./rc_all.sh start <inst>
It should be noted that PostgreSQL already implements another replication method, namely log shipping. It can be enabled in the DBMS configuration. Log shipping is different to the two-phase commit PlasmaFS provides, as the replicas lag behind. However, the replicas provide at least good integrity conditions, so they can be used alternatively (and there is some chance that this results in a speedup).
Because of the lag, it can now happen that the replica is no longer synchronous with the datanodes, i.e. the files contain wrong data.
See the chapter about "High Availability, Load Balancing, and
Replication" in the PostgreSQL documentation.