Friday Sep 18, 2009

The new Patching Pre-flight Checks ('ppc') tool is now available to all customers who have a support contract.

The idea for this tool comes directly from customer feedback.  

The customer wanted to reduce the cost of patching Solaris systems by enabling more junior Sys Admins to successfully patch Solaris 10 zones systems.   Their concern was that potential zones patching issues in versions of Solaris prior to Solaris 10 8/07 (Update 4) meant that they needed to assign senior System Administrators to patch such systems to identify and resolve potential issues.  

Furthermore, the customer was concerned that such issues had the potential to derail planned maintenance windows - for example, if during the patching session an unexpected issue was encountered and the patching session couldn't be completed as planned.

To address these concerns, my colleague, Ronan O'Connor, has written the Patching Pre-flight Checks tool, 'ppc'.  It can be run prior to a planned patching session to check that the target system is in a clean state ready for patching.

It's important to understand the scope of the tool.  It checks a target system (and a patch set, if supplied) for a variety of inconsistencies which could cause problems.

It looks for left over lock files from previously aborted patching or packaging operations, inconsistencies in the contents database, IDRs installed on the target system, zones "mountability", space issues, etc.  Some of these issues can occur on early versions of Solaris 10, particularly in a Zones environment.  Many of the underlying causes of such issues are fixed in the latest versions of the patch utility patches (119254 SPARC / 119255 x86), which is why we always recommend you apply the latest patch utility patches before applying other patches.

If you have a directory of patches to be applied, 'ppc' checks the integrity of those patches, and cross-checks whether any of the patches patch pkgs which have been locked down by any IDRs on the system and warns if there is a conflict.

The 'ppc' Release Notes provide information to help interpret the messages produced.

The idea is that 'ppc' can be run by a junior Sys Admin prior to a planned patching session, and any potential issues uncovered can then be analyzed by a more experienced Sys Admin.  This helps avoid nasty surprises during patches sessions and also helps to reduce the level of expertise required to patch Solaris systems, leading to cost savings for customers.

It is outside the scope of the 'ppc' tool to do root cause analysis of why the inconsistency arose or what actions may be needed, if any, to correct the situation.

If 'ppc' returns without noting any problems, you can be pretty confident that the patching session will succeed.  If 'ppc' notes potential issues, they can be investigated prior to the planned maintenance window.

The next version of 'ppc' will include a Zones consistency check to check that all zones are at a consistent patch level.   It will also contain a more sophisticated space checking algorithm.  There's no planned release date yet for Version "2.0" yet as we're awaiting feedback on Version 1.0.x first.

Some of the ideas in 'ppc' may find their way back into 'patchadd', although it's probably appropriate to keep 'ppc' as a separate tool.

To download the Patching Pre-flight Checks tool, 'ppc', go to the 'ppc' thread on the Customer Patch Forum.  If you have not accessed the Customer Patch Forum before, please see my blog entry on the initial "secret-handshake" login. 

The Patching Pre-flight Checks tool, 'ppc', and the Customer Patch Forum are only available to customers with a support contract.

We're very interested in your feedback as to the usefulness of this tool and how you'd like to see 'ppc' develop going forward.

Many thanks to Ronan O'Connor for all his work on the tool!

Best Wishes,

Gerry Haskins
Director, Software Patch Services

Friday Aug 14, 2009

My colleague, Ed Clark, has made significant improvements to the Solaris 10 Recommended and Sun Alert patch clusters.  These improvements have just been released and are in the current clusters available to contract customers from the Patch Cluster & Patch Bundle Downloads on SunSolve.

Ed's improvements include:

  • Filtering out "false negatives" from the patch utility return codes, so that if the cluster install script returns "1", you know you've got a real problem which needs investigating.   As you may know, the Solaris patch utility, 'patchadd', can return errors for some acceptable situations - for example, if the patch is already applied to the system, or a later revision of the patch or a patch which obsoletes it is already applied to the system, or none of the packages in the patch are on the target system (e.g. because a reduced Install Metacluster was used to install it or the system has been security hardened by package removal), etc.   Such conditions are acceptable "errors" which do not usually require further investigation by the user.  By filtering these conditions out, if the 'installcluster' script returns "1", you know it isn't because of one of these acceptable "errors", and therefore you need to look at the logfiles to find out what's gone wrong.  For further information, please see the cluster README and Analyzing a patchadd or patchrm Failure in the Solaris OS.
  • The new 'installcluster' script will exit as soon as it encounters an unexpected failure - i.e. not one of the acceptable "errors" mentioned above.  This prevents potentially compounding issues by attempting to apply further patches.
  • The new 'installcluster' script includes context intelligence for patching operations.   It informs the user when zones need to be halted, and it provides phased installation to handle patches which absolutely require an immediate reboot before further patches can be applied.  Such interim reboots are only needed when patching a live boot environment on a system below Kernel patch 118833-36 (SPARC) / 118855-36 (x86) and well as the earlier interim reboot required on x86 related to 'libc.so' patches and Kernel patch 118844-14.  On systems below these patch levels, the 'installcluster' will stop at the appropriate point when patching the live boot environment, and inform the user to reboot and re-invoke the 'installcluster' script.  (In the old cluster install script, it simply tried to carry on blindly past such interim reboots, spewing out error messages, although code in the relevant patches prevented any harm from being done).  These interim reboots, when required, are dealt with relatively early in the cluster install sequence so that once completed, the Sys Admin can leave the rest of the installation to finish unattended and move onto other systems.
  • The new 'installcluster' script provides better integration with Solaris Live Upgrade as the user can now specify the Live Upgrade alternate boot environment to patch by name.
  • The new 'installcluster' script performs space checking prior to installing each patch, and will halt if it believes there is insufficient space to complete the installation successfully.  For example, this helps avoid non-global zones getting out of sync regarding patch levels with respect to the global zone.  This is an important enhancement as running out of space during patching can potentially leave the system in an inconsistent state and is to be avoided.  Even removing a patch requires space, so immediate removal of a patch which has failed to apply correctly due to space issues should be avoided until sufficient space is freed up and potential issues caused by its partial installation investigated - for example, was the undo.Z file successfully created to enable backout ? (Tip: It may be better to retry the patch installation once space has been freed up rather than patch removal in such circumstances.  Contact Sun Support for instructions if you encounter such issues.).   The space checking enhancements in the 'installcluster' script are designed to prevent such problems occurring.
  • The messages and log files produced by the 'installcluster' script are clear and well structured.  For example, a "failed" log is created if a patch fails to apply.  See the Cluster README for further information.
  • The 'patch_order' places patches in an optimal order for installation to avoid known issues - for example, the patch utilities patches are installed as early in the sequence as possible to avoid hitting patch installation bugs which are fixed in the patch utility patches, and the Kernel patch procedural script override patch, 125555 (SPARC) / 125556 (x86), is ordered prior to 137137-09 (SPARC) / 137138-09 (x86) to resolve some known issues.  When patching an alternate boot environment (which is recommended), a small sub-set of pre-requisite patches, primarily the patch utility patches, need to be applied to the live boot environment to ensure correct patching operation.  The 'installcluster' script will check for these pre-requisite patches are halt installation if they are not present, advising the user of the 'installcluster' script option to use to install these pre-requisite patches.   Further patches may need to be installed on the live boot environment to support Live Upgrade.  See the cluster README for further information.
  • The patches have been moved to a 'patches' sub-directory, to de-clutter the top level directory of the unzipped cluster.
  • Please see the cluster README file for further information.  Customers should read the cluster README file and look at the Special Install Instructions in the patches within the cluster prior to installation.

I really want to thank Ed Clark for the enormous amount of thought and effort he has put into improving the cluster installation experience.   The work he's done on the Solaris 10 Recommended and Sun Alert patch cluster is a continuation of his previous work on the Solaris Update Patch Bundles and the Solaris 10 Live Upgrade Zones Starter Patch Bundle.  Nice work, Ed!

While the 'installcluster' script is copyrighted, I am happy for customers to use it, and the 'patch_order' file, as a starting point for their own customized patch bundles, so long as it is for their own use and is not to be given to a 3rd party or used for commercial gain (e.g. by a 3rd party maintainer or 3rd party commercial automation tool).

We have also made significant improvements to the back end processes to ensure higher and more consistent cluster quality. 

Originally, the clusters were created by the Patch Operations and Distribution (POD) team after patch release.  The POD Cluster QA process left a lot to be desired, resulting in inconsistent cluster quality.   To plug this gap, my Patch System Test team have been testing the clusters for several years, but the old process only allowed us to test them in parallel with their release, which meant that we found issues at the same time that early downloaders of the cluster encountered them.  Although we ensured such issues were fixed as quickly as possible, it still obviously compromised our customers' experience.

In the new process, the clusters are routed to Patch System Test (PST) prior to release.  PST run a transformation script on them to optimize the patch installation order, etc.  The clusters will only be released once they have passed PST testing.  This should ensure higher and more consistent quality for customers.  Work is continuing to move the entire patch cluster generation process to PST, although these future backend enhancements in this regard should be invisible to customers.

Thursday Jun 18, 2009

The Solaris 10 5/09 (Update 7) patch bundle is now available for download from the SunSolve Patch Cluster & Patch Bundle Download Page.  Click on the "Solaris Update Patch Bundles" link.

As with previous patch bundles, it contains the patches which are included in the corresponding Solaris Update, in this case Solaris 10 5/09 (Update 7).

This is useful for Sys Admins who wish to bring all their systems up to the same patch level as the Solaris Update without wanting to upgrade to the release - for example, due to change control policy restrictions in their organizations.

See previous blog entries for previous Solaris Update patch bundles for further information.

Wednesday Jun 17, 2009

The Zones Parallel Patching feature is now available in the latest Solaris 10 patch utilities patch, 119254-66 (SPARC) and 119255-66 (x86).

This is available for use on all Solaris 10 systems. 

Simply install this patch, set the maximum number of non-global zones to be patched in parallel in the config file /etc/patch/pdo.conf, and away you go.

Prior to this feature, each non-global zone was patched sequentially, leading to unnecessarily long patching times for zones systems.  (Sequential patching remains the default behavior unless the config file is edited to enable Zones Parallel Patching.)

With this feature invoked, the global zone continues to be patched first, but then the non-global zones can be patched in parallel, leading to significant performance gains in patching operations on Zones systems.

While the performance gain is dependent on a number of factors, including the number of non-global zones, the number of on-line CPUs, the speed of the system, the I/O configuration of the system, etc., a performance gain of ca. 300% can typically be expected for patching the non-global zones - e.g. On a T2000 with 5 sparse root non-global zones.

Here's the relevant note from the patch README file:

NOTE 10: 119255-66 is the first revision of the patch utilities to deliver "zones parallel patching".
          This new functionality allows multiple non-global zones to be patched in parallel by patchadd.
          Prior to revision 66, patchadd would patch all applicable non-global zones sequentially,
          that is one after another. With zones parallel patching, a sysadmin can now set the number
          of zones to patch in parallel in a new configuration file for patchadd called /etc/patch/pdo.conf.

         The two factors that affect the number of non-global zones that can be patched in parallel are
         1. Number of on-line CPUs
         2. The value of num_proc in /etc/patch/pdo.conf

          If the value of num_proc is less than or equal to 1.5 times the number of on line CPUs,
          then patchadd limits the maximum number of non-global zones that will be patched in
          parallel to num_proc. If the value of num_proc is greater than 1.5 times the number of on line CPUs,
          then patchadd limits the maximum number of non-global zones that will be patched in parallel
          to 1.5 times the number of on line CPUs. Note that patchadd will patch all applicable non-global
          zones on a system, the above description outlines only how patchaadd determines the
          maximum number of job slots to be used during parallel patching of non-global zones.

          An example of this in operation would be where:
          num_proc=8
          and number of on line CPU's is 4

          In this case the maximum setting for num_proc would be 6, that is the maximum number
          of zones that could be patched in parallel is 6.  If there are more than this number of non-global zones on the
          system, the first 6 will be patched in parallel, then the remaining non-global zones will be patched
          as processes finish patching the first 6 non-global zones.   Only one patch process will be used for each
          non-global zone, so if there are less than 6 non-global zones on the system, then only the number of processes
          equal to the number of non-global zones will be initiated.

          Please see comments in /etc/patch/pdo.conf for more details on setting num_proc.

I would like to thank Ed Clark and Enda O'Connor from my own team for all their work in developing and testing Zones Parallel Patching.

I would also like to thank Jon Bowman, Arindam Sarkar, and the rest of the RPE (Sustaining) Install team for all their work in getting this feature integrated into the patch utilities and delivered to production.

I would also like to thank our selected key customers who kindly Beta tested the feature for us.

I believe this feature is an important milestone in improving our customers' patching experience in a Zones environment as it addresses a long standing customer complaint on Zones patching performance.

Enjoy!

Wednesday Feb 11, 2009

Thanks to my colleague Enda O'Connor, who has made p7zip available for Solaris 8 SPARC, it's now possible to upgrade directly from Solaris 8 SPARC to the latest Solaris 10 Update releases such as Solaris 10 5/08 and Solaris 10 10/08.  See http://sunsolve.sun.com/search/document.do?assetkey=1-9-250526-1 and http://sunsolve.sun.com/search/document.do?assetkey=1-61-72099-1 for details.

Previously, due to the lack of p7zip on Solaris 8, customers needed to perform an interim upgrade to Solaris 9 or an earlier Solaris 10 release before upgrading to the latest Solaris 10 release.

Monday Jan 05, 2009

This blog entry expands on a previous blog entry regarding Solaris patch entitlement.  

The Solaris patch entitlement policy is available on http://sunsolve.sun.com/search/document.do?assetkey=1-61-203648-1. "Entitlement" refers to patches which require you to have a valid support contract to access them.

Solaris changed its business model a few years ago from selling Solaris and providing patches for free to a model of giving away the Solaris releases for free and charging for patches.

The Solaris patch entitlement policy applies to all Solaris Operating System patches.  It does not necessarily apply to middleware or application layer product patches which may be installed on top of Solaris, such as SunStudio, Java, etc.

The Solaris patch entitlement policy is that the following Solaris OS patches will remain available irrespective of whether or not you have a valid support contract:

  • the specific patch revisions which introduce all new security fixes
  • the specific patch revisions which introduce certain hardware support
  • all revisions of Solaris patch utility, smpatch, and Update Manager patches to ensure correct patch application
  • the specific pre-requisite patch revisions for Live Upgrade
  • the specific pre-requisite patch revisions for certain Sun software application products
  • all revisions of patches which patch products which are both bundled as part of Solaris and also released as separate products which don't enforce patch entitlement
  • a small number of other specific patch revisons at the discretion of Sun
  • any patch revision explicitly required by any of the above patches

Other Solaris OS patches require that you have a valid support contract to access them.

All fixes will all be available for free in the next Solaris 10 Update release, so if you are not willing to pay for a support contract, you can still get the fixes by installing or upgrading to the next Solaris 10 Update release.  You'll just need to wait for it to be released.

The key point is that if you may need timely access to a patch which fixes a critical non-security issue, then you need to have a valid support contract for each system you may wish to patch.  You also need to have a valid support contract in order to get telephone support or fixes coded for any issues which are unique to your environment.

So it's highly advisable for you to have a valid support contract in place for each production system.

If you are a home user for example, and don't want to go to the expense of buying a support contract, using OpenSolaris or waiting for the next Solaris 10 Update release are valid options.

This policy is not changing.

What is changing is the implementation of patch entitlement to ensure it matches the policy.  Currently, circa 60% of Solaris OS patches are available without a support contract, including most of the key patches.  Under the new entitlement implementation, 18% of Solaris OS patches will remain available without a support contract.  The rest will require a valid support contract to access. 

Any of the following support contracts will provide you with access to all Solaris OS patches and patch clusters: a Solaris subscription, a Software Support Contract, a Sun System Service Plan for Solaris, a Sun Spectrum Storage Plan, or a Sun Spectrum Enterprise Service Plan.  Since the names of the support contracts change from time-to-time, this list may change.

If you are running Solaris on Sun Hardware, I suggest you consider purchasing a SunSpectrum System Plan.  This will cover both your HW and OS with one simple support contract.

If you are running Solaris on non-Sun hardware, you should consider a Solaris Subscription Support Plan, which is available on-line from just $324 per year.

Remember, you need a support contract for each system you wish to patch, so if you need more of a site-wide support plan, Solaris Everywhere is a good choice. 

BTW: It's important to remember that hardware warranties do not cover software support or access to Solaris patches.

The new implementation will roll out in phases, starting this week.

You should check that you have valid support contracts in place for each system you may need to patch.  Please do not wait until you need a patch to put the support contract in place. There is a latency of several days between subscribing for a support contract and patch access being granted.  Support for your production Operating System really isn't something you should play "chicken" with.

The new Solaris OS patch entitlement implementation roll-out should be completely transparent if you have a valid support contract for each system you wish to patch.

A PodCast talking about the above and the Solaris 8 Vintage program which commences April 1, 2009 is available here

Wednesday Dec 17, 2008

The following is now available as Infodoc 249046:


What follows is an open letter to customers in response to customer confusion over how to handle the "rebootimmediate" and "reconfigimmediate" flags specified in some patches.

Despite the READMEs of patch clusters which contain such patches clearly stating that during a patching session, a reboot is only required in exceptional and documented circumstances, it has come to my attention that some customers are initiating reboots after applying every single patch in a patch set which specifies such flags.  Not surprisingly, such customers are concerned at the length of time this takes.

Open Letter with definitive interpretation of the "rebootimmediate" and "reconfigimmediate" patch flags

To whom it may concern,

Summary: When patching a live boot environment, it is usually OK to apply any number of patches before performing a single reboot at the end, even if multiple patches specify "rebootimmediate" or "reconfigimmediate".  On the rare occasion when it is found that this is not possible, specifically for 118833-36 (SPARC) and 118855-36 (x86) and 118844-14+ (x86), code will typically be inserted into the relevant patches to prevent the application of further patches which could cause problems.  Use of Live Upgrade to patch an inactive boot environment is recommended as it avoids the need for interim reboots for even these atypical patches.  Details below.

The "reboot" metadata flags which may be contained in the patch 'pkginfo' file(s) have the following meaning:

rebootafter - a reboot is required to activate some of the content delivered in the patch, but the system remains in a consistent state until the reboot is performed.

reconfigafter - a reconfiguration reboot is required to activate some of the content in the patch, but the system remains in a consistent state until the reconfiguration reboot is performed.

rebootimmediate - the system is in a potentially inconsistent state until the system is rebooted.  The objects applied in the patch are potentially inconsistent with processes running in memory.  Normal production must not be resumed until a reboot takes place to bring the system back into a fully consistent state.  However, since the footprint of the patch utilities is relatively small, it is normally OK to continue to apply further patches before initiating the reboot.   In cases where this is not OK, the patch in question will typically contain additional code to prevent further patches from being applied until the reboot takes place*.  Since the system is in a potentially inconsistent state, it's advisable to avoid running any additional processes until the reboot takes place.  If patch automation tools are being used to apply "rebootimmediate" or "reconfigimmediate" patches, it's up to the automation tools' QA to ensure that their additional code footprint does not hit the potential inconsistent system state when applying such patches.

reconfigimmediate - exactly the same as rebootimmediate, except a reconfiguration reboot is required.

*This is the case with Kernel patch 118833-36 (SPARC) / 118855-36 (x86), whose patch scripts replace 'patchadd' with a no-op telling the user to reboot the system.  The only other known reboot required before further patching can be done is specific to x86, and only if the system is running at a Kernel patch level below 118844-14.  A later revision of 118844, e.g. 118844-20, needs to be applied and the system rebooted to ensure the Kernel running in memory is compatible with library changes supplied in the libc patch 121208-02.  The prepatch script in 121208-02 and -03, and 118855-xx which obsoletes it, contains code to ensure 118844-14 or later is installed and active on the system.  (BTW, 118844-14 wasn't released. 118844-20 is recommended to fulfill the libc compatibility requirement.)

UPDATE, Jan 20, 2009: Murphy's Law strikes again!.  There's currently an issue, CR 6704883, with the "Sun Fibre Channel Device Drivers" patches 125184-05, -06, -07, and -08 (SPARC) and 125185-05, -06, -07, and -08 (x86) as described in Sun Alert 238630.  The fix for this issue is in rev-09 of the patches which is currently available as a T-Patch and will be released shortly.  Rev-09 of the patches uses modloading in its prepatch script to avoid the issue.  In the meantime, a workaround is to apply the affected patches last, immediately prior to rebooting the system.  The patches in the Solaris 10 10/08 patch bundle were specifically ordered to avoid this issue.  Where such issues are found, SunAlerts are published and the issue fixed.

Remember, patches can be downloaded and installed individually.  Therefore, each patch which requires a reboot must specify the reboot requirements.  But if patches are installed collectively in the same patching session, for example, as part of a patch cluster, then the install instructions contained in the cluster README file take precedence - e.g. that reboots are only required *during* patching sessions for the specific cases mentioned above.

Since the above patches were created, a significant enhancement has been made to the Solaris patch utilities called Deferred Activation Patching.  This enhancement is not retrospective, so the above historical problematic patches remain.

Deferred Activation Patching

The problem with the above atypical patches is that the new code they deliver may be invoked by the original patchadd code and the utilities it calls *during* patch installation.  A patch may patch many packages.  The packages are applied in alphabetic order.  In a Zones environment, the patch is applied to the global zone first, then to each non-global zone.

In the case of 118833-36 (SPARC) / 118855-36 (x86), the new versions of the libdevinfo.so.1 and libsec.so.1 libraries delivered in the patch could be invoked by patchadd and are potentially incompatible with the processes running in memory.

The solution devised in the patch scripts contained in 118833-36 (SPARC) / 118855-36 (x86) is to overlay mount the old objects on top of the newly laid down objects using the loopback filesystem (lofs).  This ensures that the system remains in a consistent state *during* the patch process as the old library versions which are compatible with what's running in memory will be called.

To avoid the application of further patches, which patch the same objects as 118833-36 (SPARC) / 118855-36 (x86), from patching the overlay mounted objects instead of the patched objects, 118833-36 (SPARC) / 118855-36 (x86) replace 'patchadd' with a no-op telling the customer to reboot the system before applying any further patches.

During reboot, the loopback filesystem mounts are torn down exposing the patched objects.  Further patching can now continue as the system is in a fully consistent state.

This loopback filesystem mount solution is the basis of Deferred Activation Patching.  After patch 118833-36 (SPARC) / 118855-36 (x86) was released, the solution was perfected and moved to the patch utilities.  The few patches which require application using Deferred Activation Patching specify the SUNW_PATCH_SAFE_MODE=true flag in their pkginfo files.  The solution was enhanced so that any subsequent patch applied prior to a reboot of the system, which patches the same objects as a patch explicitly specifying Deferred Activation Patching, will itself be automatically applied in Deferred Activation Patching mode.   This is known as implicit Deferred Activation Patching and enables other patches to be applied on top of a patch applied using Deferred Activation Patching without the need for an intervening reboot.  When a patch specifying Deferred Activation Patching mode is applied to a system, the user will see lots of loopback filesystem mounts on the system until such time as the reboot takes place.  Upon reboot, the loopback filesystem mounts are torn down, exposing the newly patched objects.

Kernel patch 12001[12]-14 which is included in Solaris 10 8/07 (Update 4), Kernel patch 12712[78]-11 which is included in Solaris 10 5/08 (Update 5), and Kernel patch 13713[78]-09 which is included in Solaris 10 10/08 (Update 6), are currently the only patches which specify application in Deferred Activation Patching mode.  Future Kernel patch included in future Solaris 10 Update releases are the likely candidates requiring application using Deferred Activation Patching.

With the introduction of Deferred Activation Patching, it is highly unlikely that future patches will require an interim reboot before further patches can be applied.

The problems with the system getting into an inconsistent state *during* patching (which Deferred Activation Patching resolves) could only occur when patching a live boot environment as it's due to the interaction between newly patched objects which are incompatible with processes running in memory being invoked prior to the system being rebooted.

To avoid this and other issues, Sun strongly recommends the use of Live Upgrade to patch (or upgrade) an inactive boot environment, which dramatically reduces the risk and downtime associated with patching.  For example, even though Deferred Activation Patching resolves the inconsistency issue, patching a live boot environment takes time and the system is out of production.

Using Live Upgrade, the inactive boot environment is patched, potentially while the system is still in production.  Issues such as those described above with Kernel patch 118833-36 (SPARC) / 118855-36 (x86), and 118844-20 (x86) simply don't apply when patching an inactive boot environment as there is no interaction between the objects being patched and the processes running in memory, as all the calls patchadd makes will be to the objects on the live partition, not the patched objects on the inactive partition.  A single reboot is required to boot into the new boot environment.

Another advantage of Live Upgrade is that if a problem arises with the new boot environment for whatever reason, the user can simply reboot back into the old boot environment to enable production to resume and the issues with the now inactive boot environment can be resolved later.

Best Wishes,

Gerry Haskins
Director, Software Patch Services

Thursday Dec 04, 2008

New title, same role, same me

I was promoted to Director, Software Patch Services in September.  The last couple of months have been quite hectic, as I've suddenly got a whole new bunch of buddies in Marketing and elsewhere who want some of my time.  That's a good thing, and I believe it will help me to drive and co-ordinate improvements for you, our customers, patching experience. 

Resources are limited and, as always, I'm interested in getting your thoughts as to what areas I should concentrate on next.  

Some of the stuff we're currently working on is outlined below as well as other information which I hope you will find useful.

Solaris 10 10/08 Patch Bundle

The Solaris 10 10/08 Patch Bundle, which delivers the equivalent set of patches to the Solaris 10 10/08 (Update 6) release image, is now available from SunSolve.  See my blog entry below on the Solaris 10 5/08 (Update 5) Patch Bundle for further information on why we produce it, what it contains, why you might wish to use it, how to download it, etc.

Recommended and Sun Alert patch cluster contents updated

I discussed the purpose of, and difference between, the Solaris Recommended and Sun Alert patch clusters in a previous blog posting. To recap:

The "Recommended" Cluster contains the latest revision of any Solaris OS patch which addresses a Sun Alert issue.  That is, a fix for a Security, Data Corruption, or System Availability issue.  The cluster also contains the latest revision of the patch utility patches to ensure correct patch application and any patch required by any other patch in the cluster.

The Sun Alert Cluster is newer, and contains the minimum revision of any Solaris OS patch which addresses a Sun Alert issue. The cluster also contains the latest revision of the patch utility patches to ensure correct patch application and any patch required by any other patch in the cluster.  Therefore, the Sun Alert Cluster provides the minimum amount of change to fix all Solaris OS Sun Alert issues. 

Both clusters are updated whenever a new patch meeting their inclusion criteria is released.  The Sun Alert Cluster changes less frequently than the "Recommended" Cluster as it contains only what is really needed to address Sun Alert issues and apply the patches.

One of my team members has been reconciling the cluster contents against the Sun Alert reports and the cluster contents have been updated as a result.  Some issues where found, largely to do with patches for things like GNOME which are also part of the Solaris OS.  A process has been put in place to ensure the cluster contents match the patches specified in the Sun Alert reports.   

Keeping as up to date as possible with the SunAlert or Recommended Cluster contents is advisable.   Remember also to keep firmware up to date.

BTW: The monthly EIS (Enterprise Installation Standards) patch baseline is based upon the Recommended Cluster contents but also includes ca. 150 additional patches to address irritants which are not Sun Alert fixes and includes patches for SunCluster, SunVTS, etc.  The monthly EIS patch baselines are available through xVM Ops Center and Sun Proactive Services.

I am planning to merge the Recommended and Sun Alert patch clusters into a single cluster using the Sun Alert cluster criteria as having two very similar clusters tends to confuse customers unnecessarily.  

I also intend to merge the two cluster pages on SunSolve as one is essentially a better formated subset of the other. 

ZFS and Zones features fully contained in patches

As I've mentioned previously, there's effectively a single customer visible code branch for each Solaris named release.  That means that there's one set of patches for all of Solaris 10, a separate set for Solaris 9, and a separate set for Solaris 8.  Within a named release, e.g. Solaris 10, the same set of patches will apply to any of the Solaris 10 releases, from the original Solaris 10 3/05 release right up to the current Solaris 10 10/08 (Update 6) release.  This simplifies System Administration and enables Sun to provide very long term support at reasonable cost for each Solaris named release. 

A consequence of effectively having a single code branch for each Solaris named release is that any change to pre-existing packages will be delivered in patch format.

New features are typically only added to the current Solaris named release, which is currently Solaris 10.  (They are also available via OpenSolaris.)

This means that if new features don't add any new packages, then the entire feature functionality is fully available in patches.  Customers can utilize the new features by simply applying the appropriate patches to their existing Solaris 10 system.  This is the case with all current Zones and ZFS* functionality, including neat features like ZFS Root, ZFS Boot, and Zones "Update on Attach".

Other features which deliver new packages are only available from the Solaris Update release in which they were first included.  So, for example, if a new package was first delivered in Solaris 10 8/07 (Update 4), then a customer wishing to use that feature would need to install or upgrade to the Solaris 10 8/07 (Update 4) or subsequent update release image.   Such features are not available in patches.

*OK, we cheated with ZFS.  ZFS does deliver new packages, but they are streamed into existence from a patch.  This type of patch is called a "genesis" patch, but they are hard to perfect, so we don't intend to release any more "genesis" patches.

Improving Zones Patching Performance

Zones Parallel Patching

My team has been working with those awfully nice folks in the Sustaining organization to deliver a Zones Parallel Patching enhancement to the patch utilities to dramatically improve Zones patching performance.  We have a fully stable prototype which has been given to selected Beta customers to trial. 

For a simple T2000 with 5 sparse non-global zones, the performance improvement is >3x.  On systems with optimized I/O (as Zones patching is primarily I/O bound), we expect the performance improvement to be even better.  A configuration file will allow users to select how many Zones to patch in parallel.  This will typically equate to the number of processors or threads available on the target system.

The general release of this feature is planned for April 2009.

Zones "Update on Attach" 

The Kernel patch associated with Solaris 10 10/08 (Update 6), 137137-09 (SPARC) / 137138-09 (x86) contains some cool new features, such as ZFS Root, ZFS Boot, and Zones "Update on Attach".  Beware, installing this patch requires significant free disk space to install!  See Sun Alert http://sunsolve.sun.com/search/document.do?assetkey=1-66-246207-1

Zones "Update on Attach" is a very cool feature indeed.

For example, if the patch level of non-global Zones is out-of-sync with respect to the global Zone, e.g. because the non-global Zones ran out of disk space during patch application, Zones "Update on Attach" provides a very neat way to bring the Zones back into sync.  Simply detach the affected non-global Zones, apply Kernel patch 137137-09 (SPARC) / 137138-09 (x86) to the global zones, and reattach the affected non-global Zones using 'zoneadm -z <zone-name> attach -u'.  The non-global Zones will be automagically updated to the same patch level as the global Zone.  Neat!

There are other interesting possibilities.  For example, detach all non-global Zones, apply an arbitrary set of patches to the global Zone (including 13713[78]-09), and reattach the non-global Zones using 'zoneadm -z <zone-name> attach -u'.  Viola!, the non-global Zones will be automagically updated with all of the patches applied to the global Zone.  Way neat!  And more importantly, way faster than even the Zones Parallel Patching solution we're working on.  And even better, it's available now!  This could be a key solution for customers having difficulty completing patching updates on Zones systems during tight maintenance windows.

We are working to explore potential caveats.  For example, when a patch is applied using 'patchadd' to a non-global zone, an "Undo.Z" file containing the data necessary to back out the patch is created specifically for each non-global zone to which the patch is applied.   Using Zones "Update on Attach" to patch non-global Zones will cause the "Undo.Z" file from the global Zone to be propagated to the non-global Zones.  This could theoretically cause issues if the patch is subsequently backed out (e.g. data from global Zone config files could potentially be merged into non-global Zone config files during patch backout which could potentially cause issues), although we've never actually encountered such an issue.  BTW: The same caveat applies to creating non-global Zones after the global Zone has been patched.  Again, we have yet to see this causing an actual issue, so it appears to be more of a theoretically caveat than a practical issue.

Improvements to 'smpatch' and Update Manager

The way the PatchPro analysis engine for 'smpatch' and Update Manager used to work was fine in theory, but in practice was what I call "a process with too many moving parts".   Too many steps had to happen correctly for the overall result to be correct.  In Six Sigma terms, there was too much error opportunity.  Occasionally, it would end up recommending a SPARC patch for an x86 system or a Solaris 8 patch for a Solaris 10 system.  Not surprisingly, its reputation suffered.

I'm pleased to say that a major overhaul to dramatically simplify the back end processing of 'smpatch' and Update Manager has just been rolled out by their engineering team.  The way 'smpatch' and Update Manager work is that Realization Detector(s) are associated with each patch.  These Realization Detectors determine whether it's appropriate to recommend a patch for application on a target system.  In the vast majority of cases, the Realization Detectors are simply comparing the packages contained in the patch to the packages installed on the system to see if the patch is applicable.  The enhancement is to replace these myriad Realization Detectors, which could potentially contain coding bugs, with a single Generic Realization Detector to map patch packages to packages on the target system.  It looks at the package name, package version, and package architecture fields (in pkginfo) for each package in the patch, and compares them to the same values for the packages installed on the target system.  If they match, the patch is recommended, else not.  Guess what, this is exactly how 'patchadd' decides whether a patch is applicable or not when installing a patch.  It's also how 'pca' works too in determining which patches to apply.

A few specialist Realization Detectors remain for a small number of patches which require special handling.

The changes to 'smpatch' and Update Manager should dramatically improve the reliability of these tools and the accuracy of their patching recommendations.

One remaining distinction between 'smpatch' / Update Manager and 'pca' is that 'pca' "knows" about all current Sun patches via the patchdiag.xref file, whereas 'smpatch' / Update Manager "knows" about all patches containing a 'patchinfo' file, including older patch revisions.  All Solaris OS and Java Enterprise System (middleware) patches contain a 'patchinfo' file.  These account for 49% of patches.  For patching the Solaris OS, the tools should produce similar results.  A decision was made not to "auto-include" all other patches for 'smpatch' and Update Manager, as it was felt that the explicit step of the patch creator including a non-blank PATCH_CORRECTS realization detector specification line in the 'patchinfo' file to signal that the patch was suitable for patch automation was potentially useful.  (Don't worry about what value the PATCH_CORRECTS field has.  This is overriden by the Generic Realization Detector in the vast majority of cases.  It has no meaning from a customer perspective.)

This enhancement is not an attempt to undermine 'pca'.  It's simply to improve 'smpatch' and Update Manager.  I will continue to work closely with Martin Paul to give him heads-ups on any initiative which may impact 'pca' and resolve any issues with patchdiag.xref.

One thing I want to do when I can free up some resources, is a comparative study of the patching recommendations of the various available patch automation tools, 'smpatch' / Update Manager, 'pca', UCE (a.k.a Sun Connection Satellite),  xVM Ops Center*, and TLP (Traffic Light Patching) which is used by Sun Proactive Services to provide tailored patching solutions for customers in conjunction with SRAS (Sun Risk Analysis Service) and the EIS (Enterprise Installation Standards) methodology, with a view to ensuring that the patching recommendations of the various tools are coherent and consistent, with the higher value tools providing more sophisticated analysis.  It's part of my efforts to co-ordinate patching improvements to improve our customers' patching experience.

*xVM OC also utilitizes the monthly EIS patch "baselines".

Same Patch Entitlement policy, new Patch Entitlement implementation

Solaris changed its business model a few years ago from selling Solaris and providing patches for free to a model of giving away the software releases for free and charging for patches. 

The policy is that patches delivering new security fixes will remain free to all customers, irrespective of whether or not they have a support contract, but most other patches require that customers have a valid support contract to access them.  (See my earlier blog entry on the subject.)

All fixes will all be available for free in the next Solaris Update release (and OpenSolaris), so customers not willing to pay for a support contract can still get the fixes by installing or upgrading to the next Solaris Update release.  They'll just need to wait for it to ship.  Alternatively, they can use OpenSolaris.

This policy is not changing.

What is changing is the implementation of patch entitlement to ensure it matches the policy.  Currently, circa 60% of Solaris patches are free, including most of the key patches.  Under the new entitlement implementation, 18% of Solaris patches will remain free, including the specific revision of all Solaris patches which include new security fixes.  The rest will require a valid support contract to access. 

Any of the following support contracts will provide access to all Solaris patches and patch clusters: a Solaris subscription, a Software Support Contract, a Sun System Service Plan for Solaris, a Sun Spectrum Storage Plan, or a Sun Spectrum Enterprise Service Plan.  Since the names of the support contracts change from time-to-time, this list may change.

The new implementation will roll out in Phases, starting this month.  The roll-out should be transparent to customers with valid support contracts.

Patch signing certificate renewal

The signing certificate used to sign Sun patches expires shortly.  A new signing certificate will be rolled out in January and instructions provided on how to adopt it.

Customers who download the unsigned patch versions will not need to take any action.

"Accumulation-only" patches

The "SplitGate" source code management model we first introduced in Solaris 10 8/07 (Update 4) has dramatically improved Solaris 10 patch quality.  A side-effect of the "SplitGate" model is that base PatchIDs (the first 6 digits) change at the end of each Update release.  See my earlier Solaris 10 Kernel PatchID Sequence posting.

In the "SplitGate" model, when building an Update release, we effectively have two parallel source code gates, one called the Sustaining Gate containing just the bug fixes we need to release to customers in patches asynchronous to the Update release, and the other called the Update Gate containing a superset of the the Sustaining Gate and as well as new features and less critical bug fixes which will be released as part of the Update release. 

The two gates remain separate (split) for the duration of the Update release build process.  Once the Update release has reached release quality, the Update Gate is promoted to become the new Sustaining Gate and the process repeats.  Since the Update Gate is always a strict superset of the Sustaining Gate, no regressions should result from the promotion of the Update Gate to become the new Sustaining Gate.  Each patch in the old Sustaining Gate is obsoleted by a corresponding patch from the Update Gate which has accumulated its contents.  When the Update is released, these new PatchIDs are released to SunSolve.  This is why you see the base PatchIDs changing after each Update release. 

If the Update Gate patch doesn't contain any additional code changes over the corresponding Sustaining Gate patch, then there's no need for customers to install the new Update Gate patch.  Such patches are called "accumulation-only" patches and can be identified as they have a different base PatchID (the first 6 digits) but don't contain any additional CR numbers over the Sustaining patch which they obsolete.

The reason Sun releases these "accumulation-only" patches is because some customers insist that all of the PatchIDs pre-applied into a Solaris Update release image be also available from SunSolve.

Thursday Mar 06, 2008

You can sign up to receive a weekly notification advising of new and updated SunAlerts .

Sun Alerts inform customers of the most critical issues affecting Sun's hardware and software.

They cover Security, Data Corruption, and System Availability issues.

Customers with a valid support contract will be able to access all Sun Alerts and patches which fix Sun Alert issues, including the Sun Alert patch clusters available on SunSolve which contain all Solaris OS patches which address Sun Alert issues.

Customer without a valid support contract will be able to access Sun Alerts and Patches only for Security related issues when they log onto SunSolve.

This blog copyright 2009 by Gerry Haskins