There
is plenty of documentation as to how to change a disk in Sun Cluster.
Including everything that you need to on the volume manager layer etc.
Just check out the cluster collection on http://docs.sun.com
I just want to point out some of the common mistakes.
Especially, what NOT to do when replacing a disk in Sun Cluster.
First of all, if you are replacing a disk in a Hardware RAID box there
is **absolutely nothing** you need to do on the cluster or OS layer.
Just follow the instructions of the box. No need to do any commands as
teh LUNs that the OS (and hence cluster) sees do not change.
Do **not** start to run scdidadm -C, scdidadm -r etc. I repeat: Do NOT
start to run scdidadm -C scdidadm -r. It is completely useless.
If you are replacing a physical disk or an entire LUN, and the
WWN/DiskID of the disk changes and hence the way the OS sees the disk
changes, you have to make Sun Cluster aware of that, so that it can
update its DID database.
But again DO NOT run scdidadm -C!
I have seen over the last few months quite a few escalations where
someone just happily ran scdidadm -C as part of a disk replacement
procedure. And it screwed up the DID database. No problem for me as it
is always fun to fix but not fun for the owners of that cluster as in
many cases this means downtime.
Now what is the so-feared scdidadm -C for? What it does it will 'clear
out' the DID database. Which means, if you permanently (and here I see
**permanently**, ie not part of a disk replacement procedure) remove a
disk you may run it to free up the DID it was using. But this is a very
unlikely situation. Let me sketch one thing that can go wrong when you
run it out of the blue because you think that is the right thing to do.
Let's say you had a disk represented by the DID number d12. There are
some problems on the fabric and for some reason the disk is temporarily
unavailable. You see errors and you think 'oh lets run scdidadm -C,
that'll fix it'. The command will not find the disk associated with d12
and free up that did number. Next time you reboot or run scdidadm -r or
whatever, the disk is back but may be associated with a different DID
number. Which of course, can cause many problems.
So: scdidadm -C: Don't do it. Unless you really have to. But not as part of a disk replacement procedure.
The thing you need to do if you replace a physical disk is write
down the DID with which is associated. When you insert a new
replacement disk you type
scdidadm -R d# where d# is the did number. This will update the did
database with the information (ie diskID) of the new disk and everybody
will be happily ever after.