Kristien's Weblog
Kristien's Weblog
« The challenges of... | Main | David's Blog »
20060505 vrijdag 05 mei 2006
Replacing a disk in Sun Cluster
There is plenty of documentation as to how to change a disk in Sun Cluster. Including everything that you need to on the volume manager layer etc. Just check out the cluster collection on http://docs.sun.com

I just want to point out some of the common mistakes.
Especially, what NOT to do when replacing a disk in Sun Cluster.

First of all, if you are replacing a disk in a Hardware RAID box there is **absolutely nothing** you need to do on the cluster or OS layer. Just follow the instructions of the box. No need to do any commands as teh LUNs that the OS (and hence cluster) sees do not change.

Do **not** start to run scdidadm -C, scdidadm -r etc. I repeat: Do NOT start to run scdidadm -C scdidadm -r. It is completely useless.

If you are replacing a physical disk or an entire LUN, and the WWN/DiskID of the disk changes and hence the way the OS sees the disk changes, you have to make Sun Cluster aware of that, so that it can update its DID database.

But again DO NOT run scdidadm -C!
I have seen over the last few months quite a few escalations where someone just happily ran scdidadm -C as part of a disk replacement procedure. And it screwed up the DID database. No problem for me as it is always fun to fix but not fun for the owners of that cluster as in many cases this means downtime.
Now what is the so-feared scdidadm -C for? What it does it will 'clear out' the DID database. Which means, if you permanently (and here I see **permanently**, ie not part of a disk replacement procedure) remove a disk you may run it to free up the DID it was using. But this is a very unlikely situation. Let me sketch one thing that can go wrong when you run it out of the blue because you think that is the right thing to do. Let's say you had a disk represented by the DID number d12. There are some problems on the fabric and for some reason the disk is temporarily unavailable. You see errors and you think 'oh lets run scdidadm -C, that'll fix it'. The command will not find the disk associated with d12 and free up that did number. Next time you reboot or run scdidadm -r or whatever, the disk is back but may be associated with a different DID number. Which of course, can cause many problems.
So: scdidadm -C: Don't do it. Unless you really have to. But not as part of a disk replacement procedure.
The thing you need to do  if you replace a physical disk is write down the DID with which is associated. When you insert a new replacement disk you type
scdidadm -R d# where d# is the did number. This will update the did database with the information (ie diskID) of the new disk and everybody will be happily ever after.







05 mei 2006, 15:59:27 MEST Permalink Opmerkingen [1]

Opmerkingen:

Thanks for the tip!

Toegevoegd door Leon Koll om 07 mei 2006 om 23:14 MEST #

Voeg je opmerking toe:

Opmerkingen zijn uitgeschakeld.