| « november 2009 |
| ma | di | wo | do | vr | za | zo |
|---|
| | | | | | | 1 |
2 | 3 | 4 | 5 | 6 | 7 | 8 |
9 | 10 | 11 | 12 | 13 | 14 | 15 |
16 | 17 | 18 | 19 | 20 | 21 | 22 |
23 | 24 | 25 | 26 | 27 | 28 | 29 |
30 | | | | | | |
| Vandaag |
Bezoekers van vandaag: 30

vrijdag 24 juni 2005
SCSI reservations in Sun Cluster 3.x
I promised some time ago to write something about the mechanisms that
Sun Cluster uses to prevent split brain and amnesia. As said, in a two
node cluster, a node can get the vote count from the quorum device by
'reserving' the quorum device or making sure that the other node cannot
reserve it. We also discussed that reserving quorum devices is not
enough: you should also make sure that all disks are fenced out from a
node that has to leave the cluster. This is called disk fencing.
SCSI reservations are used for both the quorum disk and all the other
disks.
You have probably heard of SCSI-2 versus SCSI-3. When Sun Cluster 3.x
was designed, they reckoned all disks would be ready to understand
SCSI-3 by the time Sun Cluster was released, but unfortunately this
didn't seem to be true. So they decided to have Sun Cluster use either
SCSI-2 or SCSI-3. Big question: when does it use what? And why
not use SCSI-2 all the time? Let's first try to answer the last
question: SCSI-2 is an exclusive reservation, which means that only one
node can own the disk. Which means that other nodes will not be able to
reserve the disk and they will panic. Not so handy when you have a 4
node cluster and you want to kick off only one node. SCSI-3 is a group
reservation: every node has a key on a dedicated area on the disk and
when a node has to leave, another node will just kick off its key.
The next question, when Sun Cluster uses SCSI-2 or when SCSI-3 is an
easy one to answer but there are lots of misunderstandings. Sun Cluster
will not 'test' whether the disk understands SCSI-2 or SCSI-3. Reason
for that is that we use a specific functionality of SCSI-3 called
Persistent (Group) Reservation (PGR) which is optional in the specs. So
it is perfectly possible that a disk understands SCSI-3 but does not
have PGR functionality enabled. So Sun Cluster decides what mechanism
to use based on the number of paths to the disk cluster-wide. You can
check this with the output of scdidadm -L.
An example in a 2-node cluster:
14 moon1:/dev/rdsk/c1t2d0 /dev/did/rdsk/d14
14 moon2:/dev/rdsk/c1t2d0 /dev/did/rdsk/d14
--> Here we see that there is one path from moon1 to
/dev/did/rdsk/d14, and one path from moon2 --> hence scsi-2 will be
used.
The next thing we will need to do is discuss the difference between
scsi reservations used for the Quorum device and the ones used for disk
fencing. There is no overlap: Disk fencing code will issue scsi
reservations on all shared disks except the Quorum Disk.
Let us first start with the SCSI mechanism used by disk fencing (ie the
protection of disk against 'rogue' nodes that have unexpectedly left
the cluster). As said, SCSI-2 will be used when it is a 2-node cluster,
SCSI-3 when there are more than 2 paths to the disk cluster wide.
SCSI-3 is needed in that case because of what we have discussed before:
we need more granularity than the all or nothing 'kick everyone out' of
SCSI-2. The SCSI-2 reservations used are the typical MHIOCTKOWN and
MHIOCRELEASE ioctls.
For the quorum device it is not as straightforward. As said, the quorum
rule is used to protect amnesia. This implies that any reservation of
the quorum device should be able to persist across reboots of the
storage. This is true for SCSI-3 (hence the Persistent in PGR) but not
for SCSI-2. Therefore, Sun invented a mechanism it has called SCSI-2
PGRE (Persistent Group Reservation Emulation). This is an emulation
using SCSI-2 ioctls of the SCSI-3 mechanism: keys will be put on a
designated area on the disk. These keys are able to survive a power
cycle of the disk subsystem. One additional remark: since putting your
key on a disk or kicking off another ones key off the disk has to be an
atomic operation, but the SCSI-2 emulation consists of many commands:
therefore a traditional SCSI-2 MHICTKOWN will still be used to ensure
atomicity.
Oh: both SCSI-3 and SCSI-2 keys are invisible and are not placed in a
specific partition. SCSI-2 keys are in a designated area on the disk or
LUN and the location of SCSI-3 keys is implementation-dependant. A
quorum disk can still be used to put whatever data you want on. I will
show in a next post how you can see these mysterious keys.
Terugkoppel URL: http://blogs.sun.com/kristien/entry/scsi_reservations_in_sun_cluster
Toegevoegd door Leon Koll om 17 december 2005 om 21:58 MET #
Toegevoegd door Prasad Joshi om 13 maart 2006 om 08:16 MET #
Toegevoegd door ocnsss om 30 maart 2007 om 05:35 MEST #
Thanks for this post, Kristien.
I was wondering on which part of the disk, these reservation keys are stored. I could read from multiple documents that the PGRE keys are stored in the private cylinders of the disk. However, couldnt find an answer yet on where exactly are the reservation keys of SCSI-3 as well as SCSI-2 stored (As one is by design persistent, and the other is not. do they have any registers inside the disks / Storage controllers for storing the keys ?)
Toegevoegd door Abhilash V M om 11 januari 2008 om 13:15 MET #