Kristien's Weblog
Kristien's Weblog
« Previous page | Main | Next page »
20050706 woensdag 06 juli 2005
Show me the Key!
As promised in a previous blog entry, I will now show you the commands to look at SCSI-2 PGRE or SCSI-3 PGR keys.
As discussed, such keys are used on the quorum disk because they are persistent and persistency is what you need if you want to avoid amnesia. We also discussed that SCSI-2 PGRE keys are an emulation of SCSI-3 keys. They are invented by Sun Cluster engineering whereas SCSI-3 PGRs are part of the SCSI-3 specification.

The first cluster is a 2 node cluster with clusterwide 2 paths to the quorum device:

# scdidadm -L d4
4        node1:/dev/rdsk/c2t0d0     /dev/did/rdsk/d4    
4        node2:/dev/rdsk/c2t0d0     /dev/did/rdsk/d4 

So the keys used will be PGRE's. The command to use is, guess what, pgre:
# /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2
key[0]=0x42b2c3e500000001.
key[1]=0x42b2c3e500000002.

The second cluster is a 3 node cluster. It has more than 2 paths to the quorum disk:
# scdidadm -L d5
5        node1:/dev/rdsk/c3t50020F2300002A89d0 /dev/did/rdsk/d5    
5        node2:/dev/rdsk/c3t50020F2300002A89d0 /dev/did/rdsk/d5    
5        node3:/dev/rdsk/c3t50020F2300002A89d0 /dev/did/rdsk/d5 

So the keys used will be scsi-3 PGR's. The command to use is scsi:
# /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d5s2
Reservation keys(3):
0x4225e25100000003
0x4225e25100000001
0x4225e25100000002

Both the commands pgre and scsi contain options to scrub the keys off the disk. Please don't ever do that if not so instructed by authorised Sun personnel! Sometimes this is mistakenly done to solve amnesia but unfortunately this will only make things worse: the idea with amnesia is to get that nodes key on the disk not to remove all other nodes' keys!
An example of when scrubbing the keys would be useful is for example when you are getting reservation conflict panics because a disk was previously used in another cluster and still has that other cluster's keys. But again, this will have to be diagnosed first before any action is taken.



06 jul 2005, 15:39:23 MEST Permalink Opmerkingen [0]

20050705 dinsdag 05 juli 2005
Fell in love this weekend
This weekend, while my David was away on a mountainbiking weekend, I fell in love with another man.
He is a slightly older guy, and has fathered 17 children who are currently in foster care. He himself is in prison but was allowed out for the Sunday afternoon. We went for a walk with my dog Lukka. He was shy at first but at when we sat down in the grass at the end of the walk he gave me a kiss on my nose. We went to my brother's and both my brother and my little niece immediately liked him. I took him home where after some minutes of restlessness he sat down quietly beside me. Too soon he had to return to where he came from and it was with a broken heart that I dropped him off. He didn't want to leave me so the guards had to push him back to his cage. Next week I shall visit him again, together with David, so that he also gets a chance to meet this wonderful guy and to consider allowing a second man to live in our house.
His name is Madde. You can see his picture here.

05 jul 2005, 09:52:53 MEST Permalink Opmerkingen [3]

20050703 zondag 03 juli 2005
Blake's 7 Season 3
The BBC has just released Season 3 of Blake's 7 on DVD. For those of you who do not know it, this was a SF series from the 70s (played in Belgium beginning of the 80s). Far less posh than Star Trek but good dialogues and less straightforward goodies vs baddies. Blake for example is a freedom fighter who is willing to risk his life and that of his crew for his cause. His crew are criminals: hackers, smugglers and petty tieves.

Here is my first impression of Season 3:

-Blake and Jenna are no longer part of the crew. Dayna and Tarrant joined. Also, the federation is reduced to a sheer shadow of what it was before which means that the original idea (Blake vs the federation) is far less prevalent. However, this also means that the current crew is far more relaxed: they play a future version of monopoly and eat snacks.
-This season is far more sexy than Season 1 and 2:
*Avon kisses Servalan in episode 1
*Dayna kisses Avon in episode 1
*Servalan kisses a guy called Jarvik in episode 5. She is impressed by his virility and ancient Greek values (I guess)
* Vila kisses a girl (and it is hinted that they even make love but this is not shown) in episode 6.
-So far the individual stories were very good. Better than in Episode 2.

Let us hope the BBC, who is doing a great job releasing their old series, releases the final Season 4 on DVD in the near future.

Oh, please also go and sign this petition, to have the entire Twin Peaks series released on DVD at last!.



03 jul 2005, 08:49:27 MEST Permalink Opmerkingen [0]

20050701 vrijdag 01 juli 2005
What to do when your cluster doesn't start up your application
Don't panic! First thing you need to find out is whether the application itself maybe at fault.

Here is a scenario:

Let us say you have a resource group called myapp-rg. Inside myapp-rg you have three resources:
A logicalhostname resource named loghost-rs
A HAStorageplus resource named haplus-rs
A home-brewn application resource named myapp-rs

Someday you stop the resource group and you try to start it up again. However, the application resource does not come online and you see a message like, or something similar:
resource myapp-rs status on node1 change to R_FM_FAULTED
Your first thought may be that there is something wrong with the cluster. My experience is however, that in 99% of the cases it is the application itself that is not able to start up in a timely manner.
Here is how you can check this:

1) Switch offline the resource group:

#scswitch -F -g myapp-rs

2) Disable the application resource:

#scswitch -n -j myapp-rs

3) Start up the resource group. Use scswitch -z:

#scswitch -z -h nodea -g myapp-rg

After this has been done, use the Start script of the myapp-rs (you can check this by grepping for START_COMMAND in the output of scrgadm -pvv) to launch the application manually.
If the application fails to come online, you know that it is the application with is at fault and should be fixed by checking application logs and contacting the appropriate vendor.
If the application comes online, but it takes longer than the START_TIMEOUT value of the application resource (again, find this by grepping for it in the output of scrgadm -pvv), you should increase that value:

#scrgadm -c -j myapp-rs -y START_TIMEOUT=<appropriate value>



01 jul 2005, 14:20:27 MEST Permalink Opmerkingen [0]

20050628 dinsdag 28 juni 2005
And here is another one
For all you people working from home and desperately needing a break...

Go HERE, push PLAY and sing along:

(DISCLAIMER: For those who wonder, this is a link to the official INXS site so it is legal)

All veils and misty
Streets of blue
Almond looks
That chill divine
Some silken moment
Goes on forever
And we're leaving broken hearts behind

Mystify
Mystify me
Mystify
Mystify me

I need perfection
Some twisted selection
That tangles me
To keep me alive

In all that exists
None have your beauty
I see your face
I will survive

Mystify
Mystify me
Mystify
Mystify me

Eternally wild with the power
To make every moment come alive
All those stars that shine upon you
Will kiss you every night

All veils and misty
Streets of blue
Almond looks
That chill divine
Some silken moment
Goes on forever
And we're leaving
Yeah we're leaving broken hearts behind

Mystify
Mystify me
Mystify
Mystify me

You're eternally wild with the power
To make every moment come alive
All those stars that shine upon you
And they'll kiss you every night

Mystify
Mystify me
Mystify
Mystify me
Mystify
Mystify me
Mystify


28 jun 2005, 11:38:25 MEST Permalink Opmerkingen [0]

An honest mistake


One of the nice things of Work from Home + the existence of the Internet is that when there is a nice song on the radio you can look up the lyrics and sing along loud!!

So Push Here and Here We Go:

THE BRAVERY LYRICS

An Honest Mistake


People
They don't mean a thing to you
They move right through you
Just like your breath
But sometimes
I still think of you
And I just wanted to
Just wanted you to know
My old friend...
I swear I never meant for this
I never meant...

Don't look at me that way
It was an honest mistake
Don't look at me that way
It was an honest mistake
An honest mistake

Sometimes
I forget I'm still awake
I fuck up and say these things out loud

My old friend...
I sweat I never meant for this
I never meant...

Don't look at me that way
It was an honest mistake
Don't look at me that way
It was an honest mistake
An honest mistake

Don't look at me that way
It was an honest mistake
Don't look at me that way
It was an honest mistake
An honest mistake

28 jun 2005, 11:19:49 MEST Permalink Opmerkingen [1]

Blogging Belgians
Found that my colleague from Belgium, David Delabassee is blogging too. He writes interesting stuff about Java... I am still looking for other blogging Belgians to link to...

Oh, best song that is currently around is playing right NOW: THE BRAVERY: "Honest mistake". The Eighties revival is the best trend in music of the last couple of years...


28 jun 2005, 11:04:58 MEST Permalink Opmerkingen [4]

20050624 vrijdag 24 juni 2005
Interesting SVM Blog!
Found Sanjay Nadkarni's weblog who discusses some interesting implementation details about Solaris Volume Manager, which as you know is the absolute best Volume Manager to walk the face of this earth...
Keep it coming Sanjay!


24 jun 2005, 17:06:08 MEST Permalink Opmerkingen [0]

SCSI reservations in Sun Cluster 3.x
I promised some time ago to write something about the mechanisms that Sun Cluster uses to prevent split brain and amnesia. As said, in a two node cluster, a node can get the vote count from the quorum device by 'reserving' the quorum device or making sure that the other node cannot reserve it. We also discussed that reserving quorum devices is not enough: you should also make sure that all disks are fenced out from a node that has to leave the cluster. This is called disk fencing.  SCSI reservations are used for both the quorum disk and all the other disks.

You have probably heard of SCSI-2 versus SCSI-3. When Sun Cluster 3.x was designed, they reckoned all disks would be ready to understand SCSI-3 by the time Sun Cluster was released, but unfortunately this didn't seem to be true. So they decided to have Sun Cluster use either SCSI-2 or SCSI-3. Big question: when does it use what?  And why not use SCSI-2 all the time? Let's first try to answer the last question: SCSI-2 is an exclusive reservation, which means that only one node can own the disk. Which means that other nodes will not be able to reserve the disk and they will panic. Not so handy when you have a 4 node cluster and you want to kick off only one node. SCSI-3 is a group reservation: every node has a key on a dedicated area on the disk and when a node has to leave, another node will just kick off its key.

The next question, when Sun Cluster uses SCSI-2 or when SCSI-3 is an easy one to answer but there are lots of misunderstandings. Sun Cluster will not 'test' whether the disk understands SCSI-2 or SCSI-3. Reason for that is that we use a specific functionality of SCSI-3 called Persistent (Group) Reservation (PGR) which is optional in the specs. So it is perfectly possible that a disk understands SCSI-3 but does not have PGR functionality enabled. So Sun Cluster decides what mechanism to use based on the number of paths to the disk cluster-wide. You can check this with the output of scdidadm -L.
An example in a 2-node cluster:

14       moon1:/dev/rdsk/c1t2d0         /dev/did/rdsk/d14
14       moon2:/dev/rdsk/c1t2d0         /dev/did/rdsk/d14

-->  Here we see that there is one path from moon1 to /dev/did/rdsk/d14, and one path from moon2 --> hence scsi-2 will be used.

The next thing we will need to do is discuss the difference between scsi reservations used for the Quorum device and the ones used for disk fencing. There is no overlap: Disk fencing code will issue scsi reservations on all shared disks except the Quorum Disk.
Let us first start with the SCSI mechanism used by disk fencing (ie the protection of disk against 'rogue' nodes that have unexpectedly left the cluster). As said, SCSI-2 will be used when it is a 2-node cluster, SCSI-3 when there are more than 2 paths to the disk cluster wide. SCSI-3 is needed in that case because of what we have discussed before: we need more granularity than the all or nothing 'kick everyone out' of SCSI-2. The SCSI-2 reservations used are the typical MHIOCTKOWN and MHIOCRELEASE ioctls.

For the quorum device it is not as straightforward. As said, the quorum rule is used to protect amnesia. This implies that any reservation of the quorum device should be able to persist across reboots of the storage. This is true for SCSI-3 (hence the Persistent in PGR) but not for SCSI-2. Therefore, Sun invented a mechanism it has called SCSI-2 PGRE (Persistent Group Reservation Emulation). This is an emulation using SCSI-2 ioctls of the SCSI-3 mechanism: keys will be put on a designated area on the disk. These keys are able to survive a power cycle of the disk subsystem. One additional remark: since putting your key on a disk or kicking off another ones key off the disk has to be an atomic operation, but the SCSI-2 emulation consists of many commands: therefore a traditional SCSI-2 MHICTKOWN will still be used to ensure atomicity.

Oh: both SCSI-3 and SCSI-2 keys are invisible and are not placed in a specific partition. SCSI-2 keys are in a designated area on the disk or LUN and the location of SCSI-3 keys is implementation-dependant. A quorum disk can still be used to put whatever data you want on. I will show in a next post how you can see these mysterious keys.






24 jun 2005, 09:59:50 MEST Permalink Opmerkingen [4]

20050616 donderdag 16 juni 2005
Jambers on people who like animals
There is a guy called Paul Jamber who is rather famous in Belgium as he makes 'documentaries' about all kinds of people: older farmers without a wife, people who like SM, people who used to be born a different gender....
Yesterday the show was about people who like animals. Paul Jambers surely is not one of them.
I only saw the last woman interviewed. She was an older lady living in a big house in a big park. This house had belonged to her parents, who were now long dead. Her father was a lawyer and they were very mundane people: having cocktail parties and playing tennis in the garden. She never really liked that kind of life, hated all the fuzz and that was why she lived a totally different life now: she had 15 dogs, quite a few horses and donkeys, chickens and some cats. Most of these animals were rescue animals. Needless to say that the house was not quite as posh as before the 15 dogs, but the lady didn't really mind. In fact I thought she was a quite clever person, who deliberately chosen the way she lived now, unmarried and with lots of animals to care for...
But Paul Jambers didn't understand. He kept going on an on asking her wouldn't she not rather be married, showing pictures and films made of the house as it was when her parents was still alive (admittedly very chique), repeating sentences like 'This woman, who used to be so pretty, prefers to live with 15 dogs ...' Nothing really about the animals themselves and how her day was with them.
When she told him were her dogs slept, you could sense his disgust through the screen: "are they actually sleeping in your bedroom??? Isn't that unhygienic???" Well Paul, MY dog is sleeping in my bedroom too! Many other dog owners have their dogs sleeping in their bedrooms. Come and make a freak show of all of us!
The animals looked very well taken care of, and the woman looked happy with who she was now.
I think TV shows about people who love animals should not be made by people who obviously don't.


16 jun 2005, 11:20:39 MEST Permalink Opmerkingen [0]

20050615 woensdag 15 juni 2005
A Good Day for the Roses
Or so it seems ...



15 jun 2005, 14:18:15 MEST Permalink Opmerkingen [0]

20050613 maandag 13 juni 2005
Walking the Sentier Martel with your Dog

One of the most famous walks in Europe is the Sentier Martel through the Gorges du Verdon. It is a 14 kilometer walk: you typically leave your car at the end point (the 'Couloir Samson') and take a cab up to the starting point. Since it is a must-do when you are in the region we had it on our to-do list for our last vacation. One problem: can you take a dog? In the middle of the walk you have to descend the Breche Imbert, steep stairs of 150 steps. Officially dogs are forbidden to take the stairs, unless you carry them. I could not find anything on the internet whether it was faisible to carry a 15kg dog and whether it was advisable to do the walk altogether. So we decided to do the following 12 kilometer alternative:

-Park your car at the official end point: the 'Couloir Samson'.
-Do the Sentier Martel backwards up till the Breche Imbert and then go back to Couloir Samson.

This was still a very nice and beautiful walk and you get to do the 'fun part' of the Sentier Martel (the pitch black 1 km long tunnels) twice. When we got to the Breche Imbert we saw that this was indeed nearly impossible to do with dog: you need your hands to descend safely yourself so only a dog that can be carried in a rucksack would be OK. Still, even the alternative walk that we did is not advisable for very big dogs as you will have to help them and carry them over a couple of steps; Also your dog has to have some experience doing walks in stony and mountainy environment.  For humans it is a fairly easy walk, except for the last 1 kilometer before the Breche Imbert, which is very steep. Also, take good hiking shoes as there are lots of loose stones on the trail.


13 jun 2005, 12:23:17 MEST Permalink Opmerkingen [0]

20050610 vrijdag 10 juni 2005
Contractor trouble
We are getting our kitchen refurbished. We have an very old kitchen with adjacent a very old bathroom, now used to store pet  food and toys. There is a shabby toilet in the middle. This has to be transformed in one big kitchen area with glass doors looking out to our terrace. We will have a new toilet for guests coming out on the ground floor corridor (and not in the kitchen as is the case now :o(
We already bought all kitchen furniture and equipment in January. They will start placing it when the construction works have finished. And here's where the misery starts.
We were looking for an contractor that is willing to supervise the entire work: electricity, constructions,  heating...
But it seems like most Belgian contractors will not come out to play unless they are asked to do entire houses or appartments. At last there were two that were willing to come and take a look. The first one came and promised me to get back to me after 10 days. Two months later: still nothing. So I called him and he said he would show up in 2 weeks, which he did. He was going to send us a price offering in 10 days. It is now 2 months later and still we haven't heard anything.
The second one we really liked, and this one looked very professional. However: it is very hard getting hold of him as well.
-We had the first appointment on a Wednesday at 7pm. At 8pm he wasn't there so we called him and he said he was very busy and could he come the next day at 8pm.
-Next day he came! YES!! Very nice guy, we really wanted him. He said he would send us a price offering in 2 weeks.
-3 weeks later we were getting nervous and called him. He said it would take him another week. The next week he called to have an appointment 2 weeks later.
-This appointement was last Tuesday 7pm. When he hadn't showed up at 8pm, we called him: "Could he postpone it till tomorrow" he asked, because he had lots of phone calls to do. Oh well...
-Wednesday 8pm: got a phone call from him that his wife had an accident. Nothing serious he just couldn't come. Now he will definitely come next Sunday at 11am...
I don't want to accuse anyone of lying: everyone deserves the benefit of a doubt and I hope his wife, if she had an accident, was fine, but I found it striking that the "wife's had an accident, but she and the car are OK" excuse has been used by different contractors at friends and family. If they all are telling the truth there should be an investigation in the relation between being an contractor's wife and being more prone to non-severe car accidents.
The kitchen has to be finished before the end of October because afterwards it will be too cold to do the construction works.
Sometimes I think we are just too naive to deal with this kind of stuff or we do not play it hard enough. I hope we have to deal with contractors only very few times in my life so there is not much time for practice...


10 jun 2005, 10:18:17 MEST Permalink Opmerkingen [5]

20050609 donderdag 09 juni 2005
Provence Blues
Last  week hasproven again that one week is just too short for a vacation :o( Especially when you are in the South of France, in a rented house with beautiful weather...
So here is a short recap:
Friday: We drove up to Macon in the Bourgogne where we stayed in a nice hotel and enjoyed a nice dinner. Dogs allowed in both hotelroom and restaurant!
Saturday: Continued our journey to the Gorges du Verdon where we had rented a gite 7 km from La Palud sur Verdon.
Sunday: Did an 8 kilometer walk. Very hot. Had a sunstroke in the evening. Had simple but good dinner in restaurant La Provence in La Palud. Fortunately in the evening David told me that we did not **have** to go for a walk every day: every other day would be fine...
Monday: Visited Moustiers-Sainte-Marie. Touristy but beautiful. Climbed all the stairs up to the Chapel of Notre Dame de Beauvoir (Simone?). At least Lukka enjoyed the climb, as you can see on the picture.
Tuesday: Another walk. This time along the very nice river Siagne in the Var region with a romantic bridge (Pont des Tuves) and cute little Canal de Siagne. Great walk except for the steep climb at the end :o)
Wednesday: Stayed at our gite the whole day. I have 2 exams in June so I spent this day studying in the Sun. In the evening we did our first Vegetarian Barbecue, and even my meat-eating David and dog did not miss the meat :o) Suggestion: artichokes on the barbecue dipped in aioli.
Tuesday: Did the famous Sentier Martel, or at least a dog friendly variant of it. More about this in another post. We also had dinner in a superb place called Le Refuge ran by a Belgian lady.
Friday: We rented an electric boat to explore parts of the Gorges du Verdon from the Lac de St Croix.
Saturday: Back home... Back to life and reality :o(((


09 jun 2005, 11:06:40 MEST Permalink Opmerkingen [0]

20050525 woensdag 25 mei 2005
Sun Cluster 3.x Quorum algorithm
So let me try to explain the mechanism Sun Cluster uses to prevent both Amnesia and Split Brain. This is a majority algorithm: only a cluster node or a subset of cluster nodes that can have a majority of possible votes can start up (in the case of amnesia) or continue (in the case of split brain) cluster operation.  The other partitions must leave the cluster. So let us first discuss the Split Brain scenario: a node cannot communicate with the other node over the private interconnect, but both nodes are fine. As discussed before we must not allow both nodes to continue cluster operation, so one has to leave. Each node has a vote, but in a 2 node cluster this would mean that in case of a split brain nobody would continue cluster operation. So in a 2 node cluster we would assign a quorum device: a LUN in shared storage that also has a vote. So that there are 3 possible votes in the cluster and a majority of 3 is 2 votes. Once a split brain occurs, both nodes run for the quorum device: the one that is fastest, gets its vote. The other one notices that it is too late and panics with a 'Lost Operational Quorum' message. The mechanisme of reserving Quorum Devices is through scsi reservations, which we will discuss in 2 weeks.
Now how can the quorum mechanism prevent amnesia? To prevent amnesia we must only allow the last node to have left the cluster to startup the cluster. Same story: when a node leaves the cluster, the other node(s) will make sure that it cannot acquire the quorum disk when it starts up. Only the last node in the cluster will be able to do so. So when the first node to have left the cluster tries to start up, it has 1 vote of its own and knows that there are 3 possible votes in the cluster, but it cannot get the  vote of the quorum device: it waits for the other node to first form the cluster  with a message 'waiting for operational quorum'.  The last node that has left the cluster starts up, gets the vote of the quorum disk, starts talking to the waiting node and passes the latest cluster database to that waiting node so that this node is up to date with all information that may have been changed when it was down.

I realise there is a lot more to be said about this, and there are a lot more scenarios when we add more nodes. However it is the end of my day, it is beautiful and warm (27 degrees C) weather and time to make a nice walk with my dog Lukka followed by a nice glass of cool white wine...


25 mei 2005, 18:14:58 MEST Permalink Opmerkingen [2]