Kristien's Weblog
Kristien's Weblog
« Beautiful Belgium? | Main | Asking you all a... »
20050926 maandag 26 september 2005
The DID database
A long time ago I blogged about the difference between /dev/did and /dev/global device names in Sun Cluster 3.x.
I'd like to discuss some of the files in the Cluster Configuration Repository. The Cluster Configuration Repository (CCR) is the Cluster database containing the information about the current cluster setup. Changes to this setup are saved across reboots in files in the directory /etc/cluster/ccr which is replicated on each node.
One of the things kept in this database is the DID database. Just take a look at the file /etc/cluster/ccr/did_instances, and you will see it looks as follows:

ccr_gennum      3
ccr_checksum    282788695BAD93E939748ECE92B52B4B
19      disk|DEVID_SCSI_SERIAL|SEAGATE ST39102LCSUN9.0GLJW992510000U0010JDH|5345414741544520535433393130324c4353554e392e30474c4a5739393235313030303055303031304a4448|2:/dev/rdsk/c0t1d0
20      disk||||2:/dev/rdsk/c0t6d0
1       disk|DEVID_SCSI_SERIAL|SEAGATE ST39102LCSUN9.0GLJW8793900001001J327|5345414741544520535433393130324c4353554e392e30474c4a57383739333930303030313030314a333237|1:/dev/rdsk/c0t1d0
2       disk||||1:/dev/rdsk/c0t6d0
3       disk|DEVID_SCSI3_WWN| |200000203714ce27|2:/dev/rdsk/c1t21d0|1:/dev/rdsk/c1t21d0
4       disk|DEVID_SCSI3_WWN| |20000020370d3f7d|2:/dev/rdsk/c1t16d0|1:/dev/rdsk/c1t16d0
5       disk|DEVID_SCSI3_WWN| |20000020370d3f5f|2:/dev/rdsk/c1t0d0|1:/dev/rdsk/c1t0d0
6       disk|DEVID_SCSI3_WWN| |20000020370d3f03|2:/dev/rdsk/c1t3d0|1:/dev/rdsk/c1t3d0
7       disk|DEVID_SCSI3_WWN| |20000020370d3590|2:/dev/rdsk/c1t17d0|1:/dev/rdsk/c1t17d0
8       disk|DEVID_SCSI3_WWN| |200000203714ca15|2:/dev/rdsk/c1t22d0|1:/dev/rdsk/c1t22d0
9       disk|DEVID_SCSI3_WWN| |20000020370d3d6d|2:/dev/rdsk/c1t4d0|1:/dev/rdsk/c1t4d0
10      disk|DEVID_SCSI3_WWN| |20000020370a2b24|2:/dev/rdsk/c1t1d0|1:/dev/rdsk/c1t1d0
11      disk|DEVID_SCSI3_WWN| |20000020370dc6ac|2:/dev/rdsk/c1t19d0|1:/dev/rdsk/c1t19d0
12      disk|DEVID_SCSI3_WWN| |200000203714c427|2:/dev/rdsk/c1t20d0|1:/dev/rdsk/c1t20d0
13      disk|DEVID_SCSI3_WWN| |20000020370d10e2|2:/dev/rdsk/c1t5d0|1:/dev/rdsk/c1t5d0
14      disk|DEVID_SCSI3_WWN| |20000020370d4094|2:/dev/rdsk/c1t2d0|1:/dev/rdsk/c1t2d0
15      disk|DEVID_SCSI3_WWN| |20000020370d3ed9|2:/dev/rdsk/c1t6d0|1:/dev/rdsk/c1t6d0
16      disk|DEVID_SCSI3_WWN| |20000020370d4039|2:/dev/rdsk/c1t18d0|1:/dev/rdsk/c1t18d0
17      disk|DEVID_SCSI_SERIAL|IBM     DNES30917SUN9.0G1QK087          |49424d2020202020444e4553333039313753554e392e304731514b30383720202020202020202020|1:/dev/rdsk/c2t10d0
18      disk|DEVID_SCSI_SERIAL|IBM     DNES30917SUN9.0G1QM765          |49424d2020202020444e4553333039313753554e392e304731514d37363520202020202020202020|1:/dev/rdsk/c2t11d0
8191    tape||||1:/dev/rmt/0

All CCR files start with a gennum (generation number) and a checksum (second line). These files are indeed checksum protected and should NOT be edited manually. You CAN edit them in some occasions but if that is required you will need to contact Sun for assistance.
Let us look at one of these lines:

13      disk|DEVID_SCSI3_WWN| |20000020370d10e2|2:/dev/rdsk/c1t5d0|1:/dev/rdsk/c1t5d0

This is the entry for DID 13, which we can see in the output of scdidadm -L as follows:

13       moon1:/dev/rdsk/c1t5d0         /dev/did/rdsk/d13
13       moon2:/dev/rdsk/c1t5d0         /dev/did/rdsk/d13

The second field is 'disk'. This is the type of DID device. These types are defined in the file /etc/cluster/ccr/did_types and right now you have 'disk' and 'tape'.
The third field is 'DEVID_SCSI3_WWN': This defines the type of device ID that this device provides. Each did device is normally identified by a unique ID, such as a serial number, WWN etc.
The actual device ID is in the fourth field, in this case: 20000020370d10e2. This also means that this disk is uniquely identified in the DID database and we cannot just replace it by another disk without telling the cluster about it.
The fifth field is 2:/dev/rdsk/c1t5d0. This means that on the node with nodeid 2, this disk is referred to as 'c1t5d0'
The sixth field is 1:/dev/rdsk/c1t5d0. This means that on the node with nodeid 1, this disk is referred to as 'c1t5d0'. Please be aware that these names may differ on different nodes. The DID layer uses the device ID to make sure we are talking about the same disk, even if it has different Solaris names on the different nodes.

If you are seeing error messages about DID devices, it may be that you have replaced a disk without following the official procedure to replace a disk in the cluster. Let us say you changed the disk on c1t5d0 by another one. You must now tell the cluster that it should update the did database for did number 13 with the new device ID. You can do that as follows:
#scdidadm -R c1t5d0
OR:
#scdidadm -R 13

Sometimes it is possible that the DID configuration is completely messed up, for example because you have been switching cables/controllers without following the correct procedure. To check this, please doublecheck the entries in the did_instances file with, for example, the 'diskinfo' output in the explorer. To fix this contact your Sun Resolution Center.



 



26 sep 2005, 11:56:46 MEST Permalink Opmerkingen [3]

Terugkoppel URL: http://blogs.sun.com/kristien/entry/the_did_database
Opmerkingen:

Kristien, thank you for very informative and interesting posts on Sun Cluster issues. Could you help me to find information on NFS failover configuration of SC? Thanks again, -- Leon Koll leonkoll@lk.net

Toegevoegd door Leon Koll om 06 november 2005 om 00:14 MET #

Hi, Thanks for your comment! You can find the documentation on how to setup HANFS here: http://docs.sun.com/app/docs/doc/817-4646 Did you know there is also a discussion forum on Sun Cluster? You can find it here: http://forum.sun.com/forum.jspa?forumID=1 Kristien

Toegevoegd door kristien om 07 november 2005 om 09:34 MET #

Thanks a lot.

Toegevoegd door Leon Koll om 10 december 2005 om 21:10 MET #

Voeg je opmerking toe:

Naam:
E-Mail:
URL:

Jouw opmerking:

HTML Syntax: Uitgeschakeld