Availability Engineering
Sun Cluster Oasis
« Sun Cluster and... | Main | Getting the RGM to... »
Monday Nov 06, 2006
Taming a Runaway Resource Group

In Sun Cluster, data services are represented by resources contained in resource groups. The resource group is the unit of failover, that is, all of the resources in a group will be started together on a given node or zone. The component of Sun Cluster that is responsible for managing resource groups is, naturally enough, called the Resource Group Manager, or "RGM" for short.

Like any high availability solution, the main objective of the RGM is to maintain service availability. If a resource group goes offline, the RGM will attempt to bring it online somewhere else. The RGM actions are targeted to a "lights out" or unattended mode, in which attempts to recover service are carried out automatically with no user intervention. In some cases, you might want to interrupt the normal automated actions of the RGM in order to assert manual control
over the application.

For example, if a data service is mis-configured, such that it cannot start successfully in any node or zone, you might observe "ping-pong" behavior where the resource group successively tries to start on each node in the cluster. The RGM implements a ping-pong prevention algorithm, in which the resource group will only take two attempts to start on each node, then will give up and remain offline.

What if the resource's start method doesn't fail outright, but just hangs? When it exceeds its configured timeout, the Start method will be killed. Suppose I have a four-node cluster and the Start_timeout is set to five minutes. The RGM will attempt to start the resource group on each node, twice. Each start attempt hangs for five minutes before it times out and is killed. This means that the resource group will be continuously in Pending_online state, on one node or another, for about 40 minutes. The resource state of the faulted resource will show as "Starting".

How can you interrupt this process? If you just try to execute a command -- for example "clresourcegroup offline" to take the resource group offline, or "clresource disable" to disable the resource -- you will get an error, saying that "the resource group is undergoing a reconfiguration, please try again later."

In the earliest releases of Sun Cluster, there was no way out of this impasse other than waiting the 40 minutes, or manually killing the currently executing Start method. However, in recent Sun Cluster releases, there is a "resource group quiesce" command that you can use to interrupt the failing resource group and regain control over it. In the new command line interface of Sun Cluster 3.2, the command is "clresourcegroup quiesce". In the old command line interface, the equivalent command would be "scswitch -Q".

The quiesce command will wait for the currently executing Start method to exit or timeout. If the method exceeds its timeout, it is killed. The RGM stops trying to start the resource group, and you can fix whatever problem is causing the Start method to hang. Note that the quiesce command blocks until the resource group has reached a terminal, or quiescent, state. If a resource group is quiesced during the stopping phase of a switchover, the resource group will end up offline.

What if you don't even want to wait for the currently executing method to exit? There is a "kill method" option, specified by "clresourcegroup quiesce -k" or "scswitch -Q -k", which kills the currently executing method immediately rather than waiting for it to exit or time out. Using the -k flag might leave the resource in an errored state: START_FAILED if a start method is killed, or STOP_FAILED if a stop method is killed. If no start or stop method is currently running, then the -k flag has no effect. A START_FAILED state is easily cleared by switching the resource group offline or disabling the resource. A STOP_FAILED error state can be cleared by using the "clresource clear"
command, or "scswitch -c -f stop_failed" in the old command line interface.

In summary, the quiesce command interrupts a resource group switchover in a similar manner that typing control-C to a Unix shell will interrupt a command currently executing.

Marty Rattner
Sun Cluster Engineering

Posted at 12:00AM Nov 06, 2006 in Sun  |  Comments[2]

Comments:

How i can clear "quiesce" state from group? Even removing/recreating group not help - "Cluster.RGM.rgmd: Execution of method <hafoip_prenet_start> on resource <resource-nfs-lhn>, node <xxxx.yyyyy.zz> skipped to achieve user-initiated fast quiesce of the resource group <group-nfs>."

Posted by d.s.ivanov on February 12, 2008 at 08:06 PM PST #

To d.s.ivanov:

If your problem was encountered after a node reboot that occurred during execution of a "suspend" or "quiesce" command, then you most likely have encountered a known bug; the Sun CR number for this bug is 6612744. It has already been fixed in the next update or patch release of Solaris Cluster.

Here is the work-around:
The quiesce command can be re-executed after the node has rebooted and all RGs are in quiescent state. Execute:
#clrg quiesce -k <rglist>
or
#clrg quiesce <rglist>

according to whether the original quiesce command was run with or without the -k option, respectively. If it is not known whether the -k flag was used, try executing both forms of the command. <rglist> is a list of the same RGs that were originally quiesced or suspended.

If the quiesce command runs successfully, this should remove the problem.

Posted by Martin Rattner on February 13, 2008 at 11:34 AM PST #

Post a Comment:
  • HTML Syntax: NOT allowed
« Sun Cluster and... | Main | Getting the RGM to... »