A long time ago I blogged about
the difference between /dev/did and /dev/global device names in Sun Cluster 3.x.
I'd like to discuss some of the files in the Cluster Configuration
Repository. The Cluster Configuration Repository (CCR) is the Cluster
database containing the information about the current cluster setup.
Changes to this setup are saved across reboots in files in the
directory /etc/cluster/ccr which is replicated on each node.
One of the things kept in this database is the DID database. Just take
a look at the file /etc/cluster/ccr/did_instances, and you will see it
looks as follows:
ccr_gennum 3
ccr_checksum 282788695BAD93E939748ECE92B52B4B
19 disk|DEVID_SCSI_SERIAL|SEAGATE
ST39102LCSUN9.0GLJW992510000U0010JDH|5345414741544520535433393130324c4353554e392e30474c4a5739393235313030303055303031304a4448|2:/dev/rdsk/c0t1d0
20 disk||||2:/dev/rdsk/c0t6d0
1 disk|DEVID_SCSI_SERIAL|SEAGATE
ST39102LCSUN9.0GLJW8793900001001J327|5345414741544520535433393130324c4353554e392e30474c4a57383739333930303030313030314a333237|1:/dev/rdsk/c0t1d0
2 disk||||1:/dev/rdsk/c0t6d0
3 disk|DEVID_SCSI3_WWN| |200000203714ce27|2:/dev/rdsk/c1t21d0|1:/dev/rdsk/c1t21d0
4 disk|DEVID_SCSI3_WWN| |20000020370d3f7d|2:/dev/rdsk/c1t16d0|1:/dev/rdsk/c1t16d0
5 disk|DEVID_SCSI3_WWN| |20000020370d3f5f|2:/dev/rdsk/c1t0d0|1:/dev/rdsk/c1t0d0
6 disk|DEVID_SCSI3_WWN| |20000020370d3f03|2:/dev/rdsk/c1t3d0|1:/dev/rdsk/c1t3d0
7 disk|DEVID_SCSI3_WWN| |20000020370d3590|2:/dev/rdsk/c1t17d0|1:/dev/rdsk/c1t17d0
8 disk|DEVID_SCSI3_WWN| |200000203714ca15|2:/dev/rdsk/c1t22d0|1:/dev/rdsk/c1t22d0
9 disk|DEVID_SCSI3_WWN| |20000020370d3d6d|2:/dev/rdsk/c1t4d0|1:/dev/rdsk/c1t4d0
10 disk|DEVID_SCSI3_WWN| |20000020370a2b24|2:/dev/rdsk/c1t1d0|1:/dev/rdsk/c1t1d0
11 disk|DEVID_SCSI3_WWN| |20000020370dc6ac|2:/dev/rdsk/c1t19d0|1:/dev/rdsk/c1t19d0
12 disk|DEVID_SCSI3_WWN| |200000203714c427|2:/dev/rdsk/c1t20d0|1:/dev/rdsk/c1t20d0
13 disk|DEVID_SCSI3_WWN| |20000020370d10e2|2:/dev/rdsk/c1t5d0|1:/dev/rdsk/c1t5d0
14 disk|DEVID_SCSI3_WWN| |20000020370d4094|2:/dev/rdsk/c1t2d0|1:/dev/rdsk/c1t2d0
15 disk|DEVID_SCSI3_WWN| |20000020370d3ed9|2:/dev/rdsk/c1t6d0|1:/dev/rdsk/c1t6d0
16 disk|DEVID_SCSI3_WWN| |20000020370d4039|2:/dev/rdsk/c1t18d0|1:/dev/rdsk/c1t18d0
17
disk|DEVID_SCSI_SERIAL|IBM
DNES30917SUN9.0G1QK087
|49424d2020202020444e4553333039313753554e392e304731514b30383720202020202020202020|1:/dev/rdsk/c2t10d0
18
disk|DEVID_SCSI_SERIAL|IBM
DNES30917SUN9.0G1QM765
|49424d2020202020444e4553333039313753554e392e304731514d37363520202020202020202020|1:/dev/rdsk/c2t11d0
8191 tape||||1:/dev/rmt/0
All CCR files start with a gennum (generation number) and a checksum
(second line). These files are indeed checksum protected and should NOT
be edited manually. You CAN edit them in some occasions but if that is
required you will need to contact Sun for assistance.
Let us look at one of these lines:
13 disk|DEVID_SCSI3_WWN| |20000020370d10e2|2:/dev/rdsk/c1t5d0|1:/dev/rdsk/c1t5d0
This is the entry for DID 13, which we can see in the output of scdidadm -L as follows:
13 moon1:/dev/rdsk/c1t5d0 /dev/did/rdsk/d13
13 moon2:/dev/rdsk/c1t5d0 /dev/did/rdsk/d13
The second field is 'disk'. This is the type of DID device. These types
are defined in the file /etc/cluster/ccr/did_types and right now you
have 'disk' and 'tape'.
The third field is 'DEVID_SCSI3_WWN': This defines the type of device
ID that this device provides. Each did device is normally identified by
a unique ID, such as a serial number, WWN etc.
The actual device ID is in the fourth field, in this case:
20000020370d10e2. This also means that this disk is uniquely identified
in the DID database and we cannot just replace it by another disk
without telling the cluster about it.
The fifth field is 2:/dev/rdsk/c1t5d0. This means that on the node with nodeid 2, this disk is referred to as 'c1t5d0'
The sixth field is 1:/dev/rdsk/c1t5d0. This means that on the node with
nodeid 1, this disk is referred to as 'c1t5d0'. Please be aware that
these names may differ on different nodes. The DID layer uses the
device ID to make sure we are talking about the same disk, even if it
has different Solaris names on the different nodes.
If you are seeing error messages about DID devices, it may be that you
have replaced a disk without following the official procedure to
replace a disk in the cluster. Let us say you changed the disk on
c1t5d0 by another one. You must now tell the cluster that it should
update the did database for did number 13 with the new device ID. You
can do that as follows:
#scdidadm -R c1t5d0
OR:
#scdidadm -R 13
Sometimes it is possible that the DID configuration is completely
messed up, for example because you have been switching
cables/controllers without following the correct procedure. To check
this, please doublecheck the entries in the did_instances file with,
for example, the 'diskinfo' output in the explorer. To fix this contact
your Sun Resolution Center.