Carlos A. Morillo's Weblog

All | General | Java | Music | Personal | Solaris | Sun Cluster
20060803 Thursday August 03, 2006

Goodbye

Like in HBO's Six Feet Under: ... Everything ends ...
I still remember how thrilled and excited I was 10 years ago
when I got the offer to work for the company where Bill Joy was a founder ...

I'm very happy and proud of my 10 years at Sun ...
I do strongly believe Sun has "State of the Art" technology ...

I do wish you all the best,


Carlos A. Morillo
Carlos.Morillo-AT-GMail-DOT-COM
+1 908 902 0914
Aug 03 2006, 08:20:00 PM EDT Permalink

20060717 Monday July 17, 2006

Sun Cluster Integration with Solaris 10 SMF

The following is one of the topics I covered during my CEC 2005 presentation last February in San Francisco, CA at the Moscone Center, months before SC3.1U4 was released. Here I'm covering the topic with a lot of more detail and using examples of the Sun Cluster services running in a 2 nodes Solaris 10, SC3.1U4 cluster.

0. Introduction
With Solaris 10 among the new features introduced, we have smf(5).

smf(5) is a mechanism to define, deliver, and manage long-running application services for Solaris. Administrators can now:


In order for Sun Cluster 3.1 8/05 (also known as Sun Cluster 3.1 Update 4 or SC3.1U4 in a brief short form) to operate appropriately in Solaris 10, most of the initialization startup scripts needed to be modified and/or adapted accordingly so SC3.1U4 could be integrated into the Solaris 10 SMF framework. This article describes in detail this integration.

1. SMF Features

Service: A long lived software object with a well defined state, error boundary, definition of start and stop, and relationship to other services. A service is often critical to system operation or fulfillment of business objectives.

Service Management:


For example, the sendmail(1M) service may depend upon the syslog(1M) service to be able to log messages for an administrator as well as on the ability to resolve Internet host names using a name service such as NIS or DNS. The DNS service may in turn depend upon a configured network interface and an Internet routing service in order to perform its duties. And the ability to run any of these commands in turn depends on being able to mount the filesystem on which they reside. Therefore, the process of starting services at boot time relies upon the system's ability to make sense of a very complex web of dependencies between system services.

2. Co-existence with SMF

Sun Cluster /etc/rc scripts which previously depended on the ordering of Solaris boot scripts will have to be converted because these Solaris boot scripts have been converted to SMF services.

Likewise ISVs like Veritas, EMC, Hitachi, etc. will need to recast their boot scripts to SMF services.

Sun Cluster RPC services specified in /etc/inetd.conf will have to be converted to SMF RPC services since the notion of RPC services specified by /etc/inetd.conf has been obsoleted.

Services like Apache and NFS are also managed by the Sun Cluster Framework as Highly Available Data Services. This poses a possible conflict because the same service could be started/stopped by both Frameworks.

3. Sun Cluster/SMF Integration: Scope

The idea is to make Sun Cluster components like /etc/rc scripts and inetd services SMF compatible, ensuring that the Sun Cluster service architecture facilitates integration with ISVs like EMC, Hitachi and Veritas.

4. Sun Cluster FMRI and namespaces

Service Manifest: XML description of a service, or set of services. It is a delivery mechanism for service description automatically imported into the SMF repository on install, upgrade, boot and pkgadd. It is ignored once it has been imported. The manifest for the Sun Cluster services will be located under /var/svc/manifest/system/cluster. The FMRI's for services will have the format

svc:/system/cluster/{service name}

For example:


autana# cat /etc/cluster/release
                     Sun Cluster 3.1u4 for Solaris 10 sparc
           Copyright 2005 Sun Microsystems, Inc. All Rights Reserved.
autana# cd /var/svc/manifest/system/cluster
autana# ls
bootcluster.xml                pnm.xml
cl_ccra.xml                    rgm.xml
cl_event.xml                   rpc_fed.xml
cl_eventlog.xml                rpc_pmf.xml
cl_svc_cluster_milestone.xml   scdpm.xml
cl_svc_enable.xml              scmountdev.xml
clusterdata.xml                scsymon_srv.xml
gdevsync.xml                   scvxinstall.xml
initdid.xml                    spm.xml
mountgfsys.xml
autana#
autana# svcs -a|grep cluster
legacy_run     Feb_07   lrc:/etc/rc2_d/S74xntpd_cluster
online         Feb_07   svc:/network/multipath:cluster
online         Feb_07   svc:/system/cluster/scmountdev:default
online         Feb_07   svc:/system/cluster/bootcluster:default
online         Feb_07   svc:/system/cluster/initdid:default
online         Feb_07   svc:/system/cluster/scvxinstall:default
online         Feb_07   svc:/system/cluster/mountgfsys:default
online         Feb_07   svc:/system/cluster/gdevsync:default
online         Feb_07   svc:/system/cluster/clusterdata:default
online         Feb_07   svc:/system/cluster/cl-svc-enable:default
online         Feb_07   svc:/system/cluster/pnm:default
online         Feb_07   svc:/system/cluster/scdpm:default
online         Feb_07   svc:/system/cluster/cl-event:default
online         Feb_07   svc:/system/cluster/cl-ccra:default
online         Feb_07   svc:/system/cluster/cl-eventlog:default
online         Feb_07   svc:/system/cluster/rpc-fed:default
online         Feb_07   svc:/system/cluster/rpc-pmf:default
online         Feb_07   svc:/system/cluster/rgm:default
online         Feb_07   svc:/system/cluster/spm:default
online         Feb_07   svc:/system/cluster/cl-svc-cluster-milestone:default
maintenance    Feb_07   svc:/system/cluster/scsymon-srv:default
autana#


All Sun Cluster services are single instance.

Sun Cluster inetd services will rely on the naming rules implemented by the SMF inetconv utility. This utility is used to convert the Sun Cluster services specified in inetd.conf into SMF services.

5. SMF services for converted Sun Cluster /etc/rc scripts


6. Non Cluster boot "boot -x"

Prior to SMF, Sun Cluster boot scripts returned immediately after detecting the node was being booted in a non cluster mode, returning 0 Success without making any distinction between cluster and non cluster mode.

SMF introduced the notion of service states while also providing a mechanism to explicitly specify strong dependencies between these services. A service "B" dependent on "A" wouldn't be brought online until "A" is brought online first. If "A" is restarted, this would imply the restart of "B" as well.

Milestones corresponding to rc running levels specify their dependencies on various Solaris services and are brought online only after the services that these milestones depended upon started first.

Boot scripts intertwined Solaris boot scripts require conversion because:


This is possible through the "optional_all" dependency specification.

autana# svcs -a|grep cluster
legacy_run     15:42:42 lrc:/etc/rc2_d/S74xntpd_cluster
online         15:41:56 svc:/system/cluster/scmountdev:default
online         15:42:03 svc:/system/cluster/bootcluster:default
online         15:42:04 svc:/system/cluster/scvxinstall:default
offline        15:41:45 svc:/system/cluster/clusterdata:default
offline        15:41:45 svc:/system/cluster/cl-svc-enable:default
offline        15:41:47 svc:/system/cluster/rpc-fed:default
offline        15:41:47 svc:/system/cluster/spm:default
offline        15:41:47 svc:/system/cluster/scdpm:default
offline        15:41:47 svc:/system/cluster/cl-svc-cluster-milestone:default
offline        15:41:47 svc:/system/cluster/rgm:default
offline        15:41:47 svc:/system/cluster/scsymon-srv:default
offline        15:41:47 svc:/system/cluster/pnm:default
offline        15:41:47 svc:/system/cluster/rpc-pmf:default
offline        15:41:47 svc:/system/cluster/cl-event:default
offline        15:41:47 svc:/system/cluster/cl-ccra:default
offline        15:41:47 svc:/system/cluster/cl-eventlog:default
maintenance    15:41:54 svc:/network/multipath:cluster
maintenance    15:42:03 svc:/system/cluster/initdid:default
maintenance    15:42:24 svc:/system/cluster/mountgfsys:default
maintenance    15:42:24 svc:/system/cluster/gdevsync:default
autana#


7. The scmountdev service

It replaces /etc/rcS.d/S45scmountdev.sh which had to run between S40standardmounts.sh and S50devfsadm.

Prior to Solaris 10, S50devfsadm ran devfsadm(1M) to populate the /dev/{r}dsk symbolic links with the entries in /global/.devices/node@<nodeid>.

scmountdev is responsible for mounting /global/.devices/node@<nodeid> as a local file system (after mapping the did device name to the cXtYdZ name).

Same devfsadm(1M) invocation will be done by the /system/device/local start method.

scmountdev will be interposed between /system/filesystem/usr and /system/device/local.

autana# svcs -d svc:/system/cluster/scmountdev:default
STATE          STIME    FMRI
online         16:14:18 svc:/system/filesystem/usr:default
autana# svcs -D svc:/system/cluster/scmountdev:default
STATE          STIME    FMRI
online         16:14:19 svc:/system/device/local:default
autana#


It will run identically in cluster and non-cluster mode, therefore never in maintenance in non-cluster.


8. The bootcluster service

It replaces /etc/rcS.d/S56bootcluster.sh which had to run after the S55fdevattach SAN script so that did operations within bootcluster are aware of SAN based disks.

In cluster mode initializes did, ORB, transport, CCR, and CMM/HA framework, launching clexecd, etc.

did enabling is done even in non cluster mode.

bootcluster will not be in maintenance.

S55devattach has been replaced in Solaris 10 by /system/device/fc-fabric and /milestone/devices is dependent on it.

Therefore bootcluster is dependent on /milestone/devices.

Daemons launched by bootcluster are still linked with the failfast driver. They will not be restarted by SMF. bootcluster is also termed a transient service.


autana# svcs -d svc:/system/cluster/bootcluster:default
STATE          STIME    FMRI
online         Feb_15   svc:/network/loopback:default
online         Feb_15   svc:/network/physical:default
online         Feb_15   svc:/milestone/devices:default
online         Feb_15   svc:/system/coreadm:default
autana# svcs -D svc:/system/cluster/bootcluster:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/initdid:default
autana#


9. The initdid service

It replaces /etc/rcS.d/S65initdid which runs after /etc/rcS.d/S56bootcluster.sh. initdid ensures the registration of new did devices in /global/.devices/node@<nodeid> following a reconfiguration boot.

initdid is dependent on /system/cluster/bootcluster and makes /milestone/single-user dependent on it with optional_all.

initdid will be placed in maintenance during non-cluster mode.


autana# svcs -d svc:/system/cluster/initdid:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/bootcluster:default
autana# svcs -D svc:/system/cluster/initdid:default
STATE          STIME    FMRI
online         Feb_15   svc:/milestone/single-user:default
autana#


10. The mountgfsys service

It replaces /etc/rc2.d/S75MOUNTGFSYS that runs after /etc/rc2.d/S01MOUNTFSYS.

/etc/rc2.d/S75MOUNTGFSYS runs /usr/cluster/lib/sc/run_reserve to get access to all connected disks, unmounts local /global/.devices/node@ file system, runs clconfig -g to start active replicas, starts the mount client (which also unmounts stale versions of the global mount) and links into the current PxFS namespace. /global/.devices/node@<nodeid> is mounted globally after all other PxFS file systems in /etc/vfstab. Switchover of the PxFS primary is done at the end if needed. It calls /dev/sulogin in case of fsck or mount failure.

/sbin/sulogin calls are removed because they might interfere with other consumers of the SMF console login service. SMF svc.startd will start sulogin if a service start method returns an error and when the console is not online yet.

mountgfsys is dependent on /system/filesystem/local and /milestone/single-user. It will be in maintenance in non-cluster mode.


autana# svcs -d svc:/system/cluster/mountgfsys:default
STATE          STIME    FMRI
online         Feb_15   svc:/milestone/single-user:default
online         Feb_15   svc:/system/filesystem/local:default
online         Feb_15   svc:/system/cluster/scvxinstall:default
online         Feb_15   svc:/system/mdmonitor:default
autana# svcs -D svc:/system/cluster/mountgfsys:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/gdevsync:default
online         Feb_15   svc:/system/cluster/clusterdata:default
online         Feb_15   svc:/milestone/multi-user:default
autana#


11. The gdevsync service

/etc/rc2.d/S76gdevsync script synchronizes the global namespace on remote nodes and has to run before S95svm.sync to allow RPC access to global devices and file systems.

gdevsync service will depend on mountgfsys and make the multi-user milestone dependent on it.

gdevsync will be in maintenance during non-cluster mode.


autana# svcs -d svc:/system/cluster/gdevsync:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/mountgfsys:default
autana# svcs -D svc:/system/cluster/gdevsync:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/clusterdata:default
online         Feb_15   svc:/milestone/multi-user:default
autana#


12. The clusterdata service

When online it indicates core cluster components are up and running.

Dependent on /system/cluster/mountgfsys and /system/cluster/gdevsync.


autana# svcs -d svc:/system/cluster/clusterdata:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/mountgfsys:default
online         Feb_15   svc:/system/cluster/gdevsync:default
autana# svcs -D svc:/system/cluster/clusterdata:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/cl-svc-enable:default
autana#


13. RPC Services


autana# svcs -l svc:/network/rpc/metacld:default
fmri         svc:/network/rpc/metacld:default
name         Sun Cluster service for SVM
enabled      true
state        online
next_state   none
state_time   Wed Feb 15 16:14:56 2006
restarter    svc:/network/inetd:default
dependency   require_all/restart svc:/network/rpc/bind (online)
autana#


autana# svcs -l svc:/network/rpc/metamed:default
fmri         svc:/network/rpc/metamed:default
name         SVM remote mediator services
enabled      true
state        online
next_state   none
state_time   Wed Feb 15 16:14:56 2006
restarter    svc:/network/inetd:default
dependency   require_all/restart svc:/network/rpc/bind (online)
autana#



autana# svcs -l svc:/network/rpc/scadmd:default
fmri         svc:/network/rpc/scadmd:default
name         Sun Cluster administrative service
enabled      true
state        online
next_state   none
state_time   Wed Feb 15 16:14:56 2006
restarter    svc:/network/inetd:default
dependency   require_all/restart svc:/network/rpc/bind (online)
autana#

autana# svcs -l svc:/network/sccheckd:default
fmri         svc:/network/sccheckd:default
name         Sun Cluster configuration checker service
enabled      true
state        online
next_state   none
state_time   Wed Feb 15 16:14:56 2006
restarter    svc:/network/inetd:default
autana#

autana# svcs -l svc:/network/rpc/scrcmd:default
fmri         svc:/network/rpc/scrcmd:default
name         SunCluster scrcmd service
enabled      true
state        online
next_state   none
state_time   Wed Feb 15 16:14:56 2006
restarter    svc:/network/inetd:default
dependency   require_all/restart svc:/network/rpc/bind (online)
autana#


14. Data Services




15. Sun Cluster userland services



Converted Sun Cluster Userland daemons are:



A SMF service for a Sun Cluster userland service includes
  1. Manifest: declares properties like dependencies and callback methods, located in /var/svc/manifest/system/cluster
  2. stop/start method: minor variations of rc scripts. Main change is removal of functionality taken care by SMF (start ordering and dependency enforcement).

Effects of disabling a userland service
  1. If rgm, rpc-fed, or rpc-pmf are disabled is like killing the daemon, therefore node gets rebooted because they are under failfast control.
  2. Any other service will kill daemon and set it in the disabled state, until enabled or next reboot which will restart the service.


16. Minimizing harm by SMF commands


svc:/system/cluster/cl-svc-enable:default is used for two reasons:

  1. This service acts as a bridge between lower level FMRIs and Sun Cluster userland FMRIs. Instead of making all userland FMRIs individually dependent on lower level FMRIs, cl-svc-enable is meant to be a FMRI which comes online once all the lower level FMRIs are up. And the userland FMRIs can depend just on this FMRI. If in later stages, userland FMRIs may need to depend on more lower level FMRIs other than clusterdata, just cl-svc-enable can be made to be dependent on that instead of going and changing all userland services.
  2. There may be cases, when the cluster is up and running, a user might go and disable a userland cluster FMRI. During the next reboot, to make sure the cluster is stable and all the required services are up and running, cl-svc-enable will enable all the userland cluster services.


autana# svcs -d svc:/system/cluster/cl-svc-enable:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/clusterdata:default
online         Feb_15   svc:/milestone/multi-user:default
autana# svcs -D svc:/system/cluster/cl-svc-enable:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/pnm:default
online         Feb_15   svc:/system/cluster/cl-event:default
online         Feb_15   svc:/system/cluster/scdpm:default
online         Feb_15   svc:/system/cluster/cl-ccra:default
online         Feb_15   svc:/system/cluster/rpc-fed:default
online         Feb_15   svc:/system/cluster/rpc-pmf:default
online         Feb_15   svc:/milestone/multi-user-server:default
online         Feb_15   svc:/system/cluster/spm:default
autana#


17. cl-svc-cluster-milestone

It is a new service representing status of Sun Cluster userland services


autana# svcs -d svc:/system/cluster/cl-svc-cluster-milestone:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/pnm:default
online         Feb_15   svc:/system/cluster/cl-event:default
online         Feb_15   svc:/system/cluster/scdpm:default
online         Feb_15   svc:/system/cluster/cl-ccra:default
online         Feb_15   svc:/system/cluster/cl-eventlog:default
online         Feb_15   svc:/system/cluster/rpc-fed:default
online         Feb_15   svc:/system/cluster/rpc-pmf:default
online         Feb_15   svc:/system/cluster/rgm:default
online         Feb_15   svc:/system/cluster/spm:default
autana# svcs -D svc:/system/cluster/cl-svc-cluster-milestone:default
STATE          STIME    FMRI
autana#


18. The Resource Group Manager Daemon

Keeping with the same semantics prior to the introduction of SMF in Solaris 10, rgm is dependent on the equivalent services for the daemons it depends on while using the /etc/rc scripts framework.

autana# svcs -d svc:/system/cluster/rgm:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/pnm:default
online         Feb_15   svc:/system/cluster/rpc-fed:default
online         Feb_15   svc:/system/cluster/rpc-pmf:default
autana# svcs -D svc:/system/cluster/rgm:default
STATE          STIME    FMRI
online         Feb_15   svc:/system/cluster/cl-svc-cluster-milestone:default
maintenance    Feb_15   svc:/system/cluster/scsymon-srv:default
autana#

Jul 17 2006, 10:00:00 AM EDT Permalink

20060407 Friday April 07, 2006

Sandy Hook Time Trial

Well, after spending some time this past Winter doing a couple of indoor bike trainer sessions weekly, and in order to keep the momentum and the motivation through the Spring and Summer now that the weather is improving I did my 1st Bicycling Time Trial this past Saturday in Sandy Hook, New Jersey.

The weather could not have been better, it was very nice and pleasant, about 69 Fahrenheit, like 19-20 Celsius, and it was sunny, somewhat warm and even a bit humid, despite the forecast for rain.
There was some tail wind heading North, and heading back South to the finish line with the wind in your face was a bit rough.

I'm number 271 wearing a red helmet. Here are some pictures, courtesy of one of my teammates.

I raced as part of the biking team I just joined a few weeks ago 3D Racing Team. I was happy with my time and how strong I felt. I was the last of my team but not the last of the race. There are some people like 4 or 5 minutes slower than me. Not to have an excuse, but I guess not bad for riding in an old, cheap 1993 Specialized Allez with the shift gears levers in the frame. Bicycles technology has evolved a lot and now road bikes have the shift gears levers kind of close to the brakes like the mountain bikes. I guess it's time for me to get a new bike if I'm going to continue racing.

Definitely, this will be a good baseline for how much I improve when I'll do it again next year!
Apr 07 2006, 02:17:39 PM EDT Permalink

20050520 Friday May 20, 2005

Fencing and MPxIO

A couple of days ago I worked on a pretty interesting problem where we had the following scenario:

A 2 node Sun Cluster 3.1 09/04 (Also known as Sun Cluster 3.1 Update 3) using a SAN as the Shared Storage.

We get a Reservation Conflict panic occurring on the same  surviving node fencing the shared storage that is placing the reservation
with the following symptoms:

Node 2 is unable to take ownership of disk d120.

May 16 02:53:16 orthanc Cluster.CCR: [ID 949565 daemon.warning] reservation error(fence_node) - do_scsi2_tkown() error for disk /dev/did/rdsk/d120s2

Sun Cluster tries it 3 times and then sends the following warning message:

May 16 02:53:10 orthanc Cluster.CCR: [ID 580163 daemon.warning] reservation warning(fence_node) - MHIOCTKOWN error will retry in 2 seconds

May 16 02:53:12 orthanc Cluster.CCR: [ID 580163 daemon.warning] reservation warning(fence_node) - MHIOCTKOWN error will retry in 2 seconds
May 16 02:53:14 orthanc last message repeated 1 time
May 16 02:53:16 orthanc Cluster.CCR: [ID 949565 daemon.warning] reservation error(fence_node) - do_scsi2_tkown() error for disk /dev/did/rdsk/d120s2


Disk having the reservation conflict:

SolarisCAT(vmcore.2)> path2inst |grep 197
"/scsi_vhc/ssd@g60060e800427d100000027d100000382" 197
"/ssd@g60060e800427d100000027d100000382" 197
SolarisCAT(vmcore.2)>

      62. c6t60060E800427D100000027D100000382d0 <HITACHI-OPEN-V-SUN-5003 cyl 13651 alt 2 hd 15 sec 512>
          /scsi_vhci/ssd@g60060e800427d100000027d100000382

scdidadm -L shows:

120      baraddur:/dev/rdsk/c6t60060E800427D100000027D100000382d0 /dev/did/rdsk/d120
120      orthanc:/dev/rdsk/c6t60060E800427D100000027D100000382d0 /dev/did/rdsk/d120


In the /var/adm/messages file for node 2 we see some path degraded messages prior to the  panic:

May 16 02:55:14 orthanc mpxio: [ID 669396 kern.info] /scsi_vhci/ssd@g60060e800427d100000027d100000382 (ssd197) multipath status: failed, path /pci@3c,600000/SUNW,qlc@1/fp@0,0 (fp1) to target address: 50060e800427d138,19 is offline. Load balancing: none
May 16 02:57:01 orthanc mpxio: [ID 669396 kern.info] /scsi_vhci/ssd@g60060e800427d100000027d100000387 (ssd192) multipath status: optimal, path /pci@1c,600000/SUNW,qlc@1/fp@0,0 (fp0) to target address: 50060e800427d128,1f is online. Load balancing: none
 

Being this the only disk showing a failed path status.

At around the same time that the cluster is trying to place a reservation a forcelip was issued.

It's possible that this caused the cluster to exceed the amount of time allowed to place the reservation resulting in a "Reservation conflict" panic.

It's also possible that mpxio has become confused and has issued an I/O down a path where it doesn't hold the reservation. Since these disks are using SCSI-2, the reservation is all or nothing. This means that when using mpxio only one path would have placed the reservation and only one path would be useable. If an I/O happened to go down a path that had been "fenced-off" then a reservation conflict would have occurred. This points to an mpxio issue.

At a lower level what is happening is that orthanc fences the shared storage and some I/O was started around 02:50:00. Meanwhile baraddur is being rebooted in non cluster mode around 02:51:00. At this time, MPxIO would enforce a SCSI-2 reservation (at the same time Sun Cluster does), even if we had a SCSI-3 PGR earlier before this node baraddur went down. At that time, we are generating I/O and playing with the shared storage LUNs by executing commands such as LIP. During this time, if we submit a command where MPxIO needs to go to other path (as it does not have persistence) for any reason, system would panic as the ssd device driver has a scsi_watch_thread()  monitoring the reservation.

Under this scenario, what we are doing might lead to a Reservation Conflict panic and in specific when a cluster node baraddur is rebooting and we are playing with the remaining cluster node which is in the process of placing a SCSI-2 reservation by fencing the Shared Storage.

To summarize, this behavior and symptoms had to do more with the timing of the testing sequence within the cluster which led to this Reservation Conflict panic.

The handling of these type of events is more robust and has been enhanced in the latest SAN 4.4.X patch, in specific SCSI-2 reservations.


May 20 2005, 08:05:58 AM EDT Permalink

20050518 Wednesday May 18, 2005

Rolling Upgrades

Hardly ever we might run into some unique circumstances
that would not let scconf(1M) add a new adapter.

Personally I prefer to use scsetup(1M) which is a pretty easy
to use interface to scconf(1M), as compared to using scconf(1M) directly.

For example, let's say you recently upgraded from Sun Cluster 3.1 10/03 also known as Sun Cluster 3.1 Update 1 to Sun Cluster 3.1 09/04 also known as Sun Cluster 3.1 Update 3.

# scconf -a -A trtype=dlpi,name=ce8,node=minastirith
scconf: Failed to add cluster transport adapter - cluster upgrade is not committed


What happens is that scconf checks to see if the heartbeat is tunable when
scconf_add_cltr_adapter() calls tunable_hb_is_available() which checks for the versions and if it finds a difference, it returns SCCONF_VP_MISMATCH as an error. In other words, we are in the presence of this problem because the upgrade has not been committed yet. This is accomplished by running scversions(1M).

For more reference see:

May 18 2005, 04:51:00 PM EDT Permalink

20050512 Thursday May 12, 2005

Everest Base Camp Trekking

Well, It's been four years since I had the luck and privilege of the magical experience of having in front of me Mount Everest, also known as Sagarmatha in Nepalese, or Chomolungma in Tibetan.

This picture was taken approximately at noon Kathmandu, Nepal Time on Monday April 9th 2001 in Kala Patthar. The altimeter of one of my friends read 5630 meters or 18472 feet. It was a very long day when my friends and I started walking around 05:00 early in the morning and the temperature was -15 Celsius or 5 Fahrenheit. We were back at our camp in Lobuche, which is around 5007 meters or 16428 feet by 15:00-16:00 in the afternoon.

Namaste.

Carlos in Kala Patthar May 12 2005, 12:09:25 PM EDT Permalink

Software Updates

One pretty new cool feature I ran into playing with Solaris 10 is smpatch(1M).

For example it can do an automated analysis of the patches you need to install:

[auyantepui]</home/morillo>% su -
Password:
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
Sourcing //.profile-EIS.....
root@auyantepui # csh
auyantepui# smpatch analyze
119015-02 SunOS 5.10: Packaging Commands Patch
118550-01 SunOS 5.10: pcipsy Patch
118371-02 SunOS 5.10: elfsign Patch
119728-01 SunOS 5.10: Fujitsu fmd.conf patch
auyantepui#

or it can do automatically the analysis, the download of the patches and the install for you:

auyantepui# smpatch update -L
119015-02 has been validated.
118371-02 has been validated.
119728-01 has been validated.
Installing patches from /var/sadm/spool...
119015-02 has been applied.
NOTICE: Patch 118550-01 cant be installed because its type is prohibited by policy.
NOTICE: Patch 118371-02 cant be installed because its type is prohibited by policy.
ALERT: Failed to install the patch {0}. 119728-01
/var/sadm/spool/patchpro_dnld_2005.05.12@10:18:30:EDT.txt has been moved to /var/sadm/spool/patchproSequester/patchpro_dnld_2005.05.12@10:18:30:EDT.txt

ID's of the patches that are disallowed by installation policy have been
written to file
        /var/sadm/spool/disallowed_patch_list
Please use
        smpatch add -x idlist=/var/sadm/spool/disallowed_patch_list
to install these patches.
auyantepui#


If we want to verify that the patches listed have been installed


auyantepui# smpatch analyze
118550-01 SunOS 5.10: pcipsy Patch
118371-02 SunOS 5.10: elfsign Patch
119728-01 SunOS 5.10: Fujitsu fmd.conf patch
auyantepui# exit
auyantepui# root@auyantepui # exit
[auyantepui]</home/morillo>%


We have to keep in mind that still the logs indicated by the smpatch(1M) messages above have to be reviewed carefully and for certain patches to take effect the system will have to be rebooted, like a kernel update patch. In the example above some patches might not be relevant to the hardware configuration of this system.

For more reference see:

May 12 2005, 10:32:22 AM EDT Permalink

The Start of the Beginning ...

Hi, my name is Carlos A. Morillo and I am a Member of Technical Staff working for Product Technical Support in the Americas Cluster team. I am based in Somerset, New Jersey, United States and I have been with Sun close to nine years. May 20th is my anniversary, just around the corner Friday next week.

Before joining PTS I have occupied different positions including Corporate Systems Support Engineer, Consulting Engineer and Systems Support Engineer for Sun Support Services in the US.

Previously I worked for ECCS, Inc. Tinton Falls, New Jersey as a Software Engineer, mostly doing Systems Programming, Graphical User Interface development and Device Drivers Sustaining mainly in Solaris on SPARC using C++, X11/Motif for about four years. ECCS, Inc. also known as "Storage Engine" used to design, build and manufacture Highly Available Storage Solutions for the Open Systems Server Market using RAID technology.

Prior to ECCS, I worked for Hewlett Packard and IBM in Caracas, Venezuela where I worked as a HP/UX-HP Workstations and IBM AIX/RS6000 Technical Support Engineer respectively, and prior to that I worked for the Sun distributor in Caracas, Venezuela for about four years.

I received a Computer Science Engineering degree from "Simon Bolivar" University in Caracas, Venezuela on January 29th 1988. May 12 2005, 09:20:00 AM EDT Permalink