So let me try to explain the mechanism Sun Cluster uses to prevent both
Amnesia and
Split Brain.
This is a majority algorithm: only a cluster node or a subset of
cluster nodes that can have a majority of possible votes can start up
(in the case of amnesia) or continue (in the case of split brain)
cluster operation. The other partitions must leave the cluster.
So let us first discuss the Split Brain scenario: a node cannot
communicate with the other node over the private interconnect, but both
nodes are fine. As discussed
before
we must not allow both nodes to continue cluster operation, so one has
to leave. Each node has a vote, but in a 2 node cluster this would mean
that in case of a split brain nobody would continue cluster operation.
So in a 2 node cluster we would assign a quorum device: a LUN in shared
storage that also has a vote. So that there are 3 possible votes in the
cluster and a majority of 3 is 2 votes. Once a split brain occurs, both
nodes run for the quorum device: the one that is fastest, gets its
vote. The other one notices that it is too late and panics with a 'Lost
Operational Quorum' message. The mechanisme of reserving Quorum Devices
is through scsi reservations, which we will discuss in 2 weeks.
Now how can the quorum mechanism prevent
amnesia?
To prevent amnesia we must only allow the last node to have left the
cluster to startup the cluster. Same story: when a node leaves the
cluster, the other node(s) will make sure that it cannot acquire the
quorum disk when it starts up. Only the last node in the cluster will
be able to do so. So when the first node to have left the cluster tries
to start up, it has 1 vote of its own and knows that there are 3
possible votes in the cluster, but it cannot get the vote of the
quorum device: it waits for the other node to first form the
cluster with a message 'waiting for operational quorum'.
The last node that has left the cluster starts up, gets the vote of the
quorum disk, starts talking to the waiting node and passes the latest
cluster database to that waiting node so that this node is up to date
with all information that may have been changed when it was down.
I realise there is a lot more to be said about this, and there are a
lot more scenarios when we add more nodes. However it is the end of my
day, it is beautiful and warm (27 degrees C) weather and time to make a
nice walk with my dog Lukka followed by a nice glass of cool white
wine...
Terugkoppel URL: http://blogs.sun.com/kristien/entry/sun_cluster_3_x_quorum
Toegevoegd door thomas om 31 mei 2005 om 15:43 MEST #
Toegevoegd door tony : frosty om 15 juni 2005 om 19:33 MEST #