Kristien's Weblog
Kristien's Weblog
« On Duty Rantings | Main | Provence Blues »
20050525 woensdag 25 mei 2005
Sun Cluster 3.x Quorum algorithm
So let me try to explain the mechanism Sun Cluster uses to prevent both Amnesia and Split Brain. This is a majority algorithm: only a cluster node or a subset of cluster nodes that can have a majority of possible votes can start up (in the case of amnesia) or continue (in the case of split brain) cluster operation.  The other partitions must leave the cluster. So let us first discuss the Split Brain scenario: a node cannot communicate with the other node over the private interconnect, but both nodes are fine. As discussed before we must not allow both nodes to continue cluster operation, so one has to leave. Each node has a vote, but in a 2 node cluster this would mean that in case of a split brain nobody would continue cluster operation. So in a 2 node cluster we would assign a quorum device: a LUN in shared storage that also has a vote. So that there are 3 possible votes in the cluster and a majority of 3 is 2 votes. Once a split brain occurs, both nodes run for the quorum device: the one that is fastest, gets its vote. The other one notices that it is too late and panics with a 'Lost Operational Quorum' message. The mechanisme of reserving Quorum Devices is through scsi reservations, which we will discuss in 2 weeks.
Now how can the quorum mechanism prevent amnesia? To prevent amnesia we must only allow the last node to have left the cluster to startup the cluster. Same story: when a node leaves the cluster, the other node(s) will make sure that it cannot acquire the quorum disk when it starts up. Only the last node in the cluster will be able to do so. So when the first node to have left the cluster tries to start up, it has 1 vote of its own and knows that there are 3 possible votes in the cluster, but it cannot get the  vote of the quorum device: it waits for the other node to first form the cluster  with a message 'waiting for operational quorum'.  The last node that has left the cluster starts up, gets the vote of the quorum disk, starts talking to the waiting node and passes the latest cluster database to that waiting node so that this node is up to date with all information that may have been changed when it was down.

I realise there is a lot more to be said about this, and there are a lot more scenarios when we add more nodes. However it is the end of my day, it is beautiful and warm (27 degrees C) weather and time to make a nice walk with my dog Lukka followed by a nice glass of cool white wine...


25 mei 2005, 18:14:58 MEST Permalink Opmerkingen [2]

Terugkoppel URL: http://blogs.sun.com/kristien/entry/sun_cluster_3_x_quorum
Opmerkingen:

Kristien, ik heb iets fantastisch gevonden. Ge moet de volgende code in je template zetten, ik heb die ergens bijna helemaal onderaan gezet en twerkte. Ik weet niet of je dit al kent, maar deze code laat je toe om te zien wie allemaal jouw blog heeft gelinkt, m.a.w. je ziet welke onbekende mensen of diensten jou "kennen" zonder dat jij daar, tot nu dan, iets van wist. Hier is tem: Who Links Here plezier ermee. Ik ga nog eens zien of tnog klopt.

Toegevoegd door thomas om 31 mei 2005 om 15:43 MEST #

Very nice :) I've been to Provence, and this reminded me of that great time.

Toegevoegd door tony : frosty om 15 juni 2005 om 19:33 MEST #

Voeg je opmerking toe:

Naam:
E-Mail:
URL:

Jouw opmerking:

HTML Syntax: Uitgeschakeld