The view from the Engine RoomBart Smaalders' weblog |
|
Wednesday Jul 25, 2007
Rethinking patching
As Stephen mentioned recently, several of us have been thinking about revising the way we manage software change on Solaris. I've been particularly focused on the difficulties Sun and it's customers have with the patching process, and the kinds of changes we need to make as a result in our technology and development processes. Today, most customers don't run OpenSolaris; they run a supported version of Solaris such as Solaris 8, 9 or 10. A supported release means that someone will answer the phone, and that patches for problems are available. Patches are a separate software change control mechanism distinct from package versions in Solaris. Each patch may affect portions of several packages; patches are intended to include all the files necessary to fix one or more problems, either directly or by specifying dependencies. If a patch affects packages which are not installed on this system (typically because it has been minimized), those portions of the patch are not installed. If the administrator later adds the missing package, he must remember (good luck) to re-apply the patches since the packaging code knows nothing of patches. Customers are today free to install which ever patches they feel are appropriate for their environment, consistent with the built-in dependency requirements. This customization is a technique I refer to as Dim Sum patching, and is a major cause of patching difficulties. Many customers pick and choose amongst the thousands of patches available for Solaris 10, for example; this means that customers are often pioneering new configurations. Note that each Solaris release consists of a single source base; all Solaris 10 updates, for example, are but snapshots of the same Solaris patch gate at different times. As a result, the developers are working on a cumulative set of all previous changes; when a new patch is created, the files in the patch not only contain the desired fix, but all previous fixes as well. Thus, the software change is constructed as a linear stream of change, but customers installs selected binaries from the various builds via patches.
When I've discussed the hazards of Dim Sum patching with customers, the reasons given are typically characterizable as :
To these, I reply:
For our new packaging system, there is a powerful incentive to eliminate Dim Sum patching: since we wish to use a single version numbering space for any package, attempting to support fine-grain Dim Sum patching would require very small packages - affecting the performance of packaging operations, and significantly increasing the workload of OpenSolaris developers. Instead, we can set package boundaries according to what makes sense for minimization purposes. This implies that future (post Solaris 11) patches will be completely cumulative (aside from some exceptions for urgent security fixes), at least for the core OS. Your system will be able to determine what is needed to bring the installed software up to the desired revision level automatically; needing to pick and choose patches will be a thing of the past.
![]() Posted at 03:56PM Jul 25, 2007 by barts in packaging and patching | Comments[10] |
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
How do you *know* what you can delete? Given a running/production system how do you determine which of the multitude of installed packages can be removed without adverse impact?
Posted by David Bryant on July 25, 2007 at 05:46 PM PDT #
Posted by Matty on July 25, 2007 at 07:31 PM PDT #
More often than not Sun support recommends individual patches, even when it is communicated that an in-house tested patch set that is largely based on a "recommended and security" or EIS patch cluster that includes the recommended patches is available. The deviations that make up "largely based on" are those required to address Sun Alerts and have review by Sun SSE's or better. Such recommendations commonly come in in the course of support cases that involve anyone from front line support to high level escalations that involve PTS, Sun Architects on-site, and Cx0 involvement.
The first step in addressing this problem is to get support to be able to identify the specific bug that they think a patch will fix before recommending a patch. In very few cases has a patch Sun recommends addressed the root cause of the problem I am experiencing. By and large, Sun Support's reactive recommendations actually drive variability into the environment and create the Dim Sum patching you describe.
Posted by Mike Gerdts on July 25, 2007 at 07:51 PM PDT #
Long story short: preconfigured, prepatched and stringently tested Solaris images.
This is what system engineering and platform lifecycle management are all about. At the very core, we can summarize the above terms to standardized Flash(TM) builds, preconfigured, prepatched and rigorously tested Solaris images. No ad-hoc changes are ever allowed in such a setup; configurations come on as tightly controlled package payload; and adding or removing something from the image is a matter of request/bug tracking, specification (in writing), panel approval, and finally, if the request / fix is approved, integration into the next platform release cycle.
Note that I'm not referring to Sun development, although one can surmise they have similar practices.
Also, ad-hoc changes, and this includes patching, are strictly banned, prohibited, and forbidden, unless engineering tested them and approved them. For example, it would be strictly forbidden for an SA to log into the system and start modifying configuration files with `vi` (or `emacs`, or whichever editor); changes would come in form of revised package payload, so they would be reproducible and uniform across systems.
Posted by UX-admin on July 26, 2007 at 05:46 AM PDT #
Posted by mario on July 26, 2007 at 09:50 AM PDT #
Providing better support for minimization is going to be increasing important as OpenSolaris continues to grow. Clearly, we need better pkg dependencies, and tools to help maintain them. We also need more automated ways of determining which components may be removed.
Posted by barts on July 26, 2007 at 11:37 AM PDT #
Posted by PeterC on July 30, 2007 at 02:48 AM PDT #
Posted by Chris Quenelle on July 31, 2007 at 09:10 PM PDT #
Posted by Ceri Davies on August 01, 2007 at 06:49 AM PDT #
We're still exploring what the approach should be for security fixes, particularly those that are more separable. It's clear, however, that the ability to piece a running kernel and set of core libraries together by selecting binaries from all the builds over the last two years is fraught with hazard, and is difficult and expensive for Sun. Much of the reason that patches take so long to deliver is that they need to be tested in so many configurations; restricting the ability to mix and match binaries vastly eases the work and testing required to release a fix.
Posted by barts on August 01, 2007 at 03:15 PM PDT #