Cindi's Weblog There' s no place like /home
$model.weblog.about

Thursday Mar 12, 2009

Today's release of the Series 7000 software (1.0.4) update includes significant fixes for disk hotplug and cluster rejoin and restart.  See the Fishworks wiki or the Sun Download Center for more details on the specific issues addressed.

Friday Mar 06, 2009

Continuing on our path to perfection, the next software update for the Sun Storage 7000 series was released today.  The update is applicable to all Series 7000 platforms and contains a critical fix for the CIFS server.  Customers using the CIFS feature are strongly encouraged to update their 7110, 7210, and 7410 systems to ak-2008.11.20.1.1.

Downloads are available from the Sun Download Center.  A matrix of our software update releases for the Series 7000 may be found on the Fishworks wiki.  The matrix includes additional release information and a list of bugs fixed in each release.

Wednesday Feb 11, 2009

The second software update for the Sun Storage 7000 series was released today.  The update is applicable to all Series 7000 platforms and contains a critical fix for the IPv6 network stack and addresses some problems in the NDMP back-up service.  Customers using IPv6 for network connectivity or NDMP for back-up are strongly encouraged to update their 7110, 7210, and 7410 systems to ak-2008.11.20.1.0.

Downloads are available from the Sun Download Center.  A matrix of our software update releases for the Series 7000 may be found on the Fishworks wiki.  The matrix includes additional release information and a list of bugs fixed in each release.

Tuesday Jan 13, 2009

The first software update for the Series 7000 was released yesterday.  The update is applicable to all Series 7000 platforms and contains a critical fix for the CIFS server.  Customers using CIFS are strongly encouraged to update their 7110, 7210, and 7410 systems to ak-2008.11.20.0.1.

Downloads are available from the Sun Download Center.  A matrix of our software update releases for the Series 7000 may be found on the Fishworks wiki.  The matrix includes additional release information and a list of bugs fixed in each release.

Tuesday Dec 02, 2008

So what does a project like Fishworks look like? 

I put together a visual representation of the Fishworks project using Code Swarm that has been captured in a video. The video shows how the Fishworks team and project evolved based on changes made to the source code over the course of two and half years.  The code swarm tool uses organic visualization techniques to model the history of a project based on source code files and their relationship to the developers that create and modify them.  It's a very cool tool and a bit addictive.

There are a number of code swarm project visualizations available online.  The OpenSolaris and Image Packaging System (IPS) projects have been represented by Code Swarm based on raw commit data made to  each project. The OpenSolaris Code Swarm pulses as a single blob of source as developers come and go within the orbit of a vast code base.  In contrast, the Fishworks project shows well-defined orbits surrounding each developer.  This is a testament to the almost constant activity of a small number of developers on a well-partitioned source base.  I have elided gate re-synchronizations to better represent the project and the contributions of each developer.  This avoids single bursts of activity by what seems to be one developer as seen in the IPS Code Swarm.


Fishworks CodeSwarm from John Danielson on Vimeo

Code Swarm runs natively in Subversion and Mercurial repostitories.  The Fishworks project source base was controlled by SCCS with logs created by Teamware.  I converted the Teamware 'putback' logs to the Code Swarm input XML format .  I do wonder how the visualization would change if I accounted for lines of code changed per file.  I might be able to use Eric's code tracking script to generate suitable input.

In the meantime, enjoy the show.

Cindi

Monday Nov 10, 2008

Fishworks is the name of a team of engineers at Sun Microsystems.  The FISH in Fishworks is an acronym for Fully Integrated Software and Hardware and is the underlying software that unites operating system functionality, a pleasing user interface, and hardware capabilities to create a plug-it-in-and-it-just-works experience for appliances such as that found in the new Sun Storage 7XXX product line.

At the top of the Fishworks appliance stack is a new user interface.  A single AJAX based development environment supports a web browser UI and scriptable CLI. To the extent possible, functions available in the BUI are mirrored in the CLI and vice-versa. In many cases, the same Javascript is shared between the two.  I think of the UI as the Little Black Dress (LBD) of the user interface world; it's simple, elegant, and looks absolutely fabulous.

In some sense, Fishworks was born some 8 years ago with some key innovations that went into Solaris 10 for storage (ZFS), observability (DTrace),  management (SMF), and RAS (FMA). These technologies delivered the right set of abstractions and capabilities necessary to build our appliance software. The Fishworks software stack uses these technologies and other operating system libraries to create the environment in which users and administrators interact via the UI. The control point for appliance operation, configuration, and management is not the operating system but rather the Fishworks appliance software. The operating system can be thought of as the "micro-code" in our software stack and the appliance software controls base operating system functions and hardware for a simple, just-works experience.

As an example, all distinct functionality is expressed as a SMF service in Solaris 10. The appliance software uses SMF libraries to monitor, configure and restart all services in a system when changes are directed from our UI.  The appliance software also permits other  information to be integrated with SMF manifest and methods for a fully integrated experience.  The NFS service, for example, is controlled by traditional SMF methods for starting and stopping NFS daemons but we also integrate NFS specific configuration properties that may be update on-the-fly. The Fishworks software takes care of updating the new configuration properties and restarting the NFS service.  In Solaris, this task would require the contents of /etc/defaults/nfs to be modified and the NFS SMF service to be restarted in a multi-step process. In a Fishworks appliance, the same set of tasks are accomplished from a single UI dialog.  SMF is but one of the Solaris 10 technologies we leveraged for the Fishworks appliance software.  ZFS, DTrace, and FMA play key roles in storage, analytics, and system health monitoring.

The Fishworks software is designed to be extensible and applicable to other types of appliances.  New appliance prototypes may be created by simply adding a new "class" and the necessary metadata to describe features and purpose. Over the last few months, I've prototyped a couple of non-storage appliances.  I was amazed by how quickly I had a functioning appliance up and running.  You can imagine how powerful this is going forward.  We now have the foundation to rapidly build new fully integrated systems and I'm really excited to continue work on some new prototypes and look for ways to build on the current developer environment.  If that wasn't enough, I get to work with an incredibly talented bunch of people.

Cindi

Friday Jun 08, 2007

Just today, I posted the first draft of the Sensor Abstraction Layer design document. The project addresses the problem of aggregating and analyzing telemetry exported by disparate sources such that the results may be observed via standard interfaces. The basic design is composed of three distinct sub-layers: a provider layer, a collection layer and a analyzer layer. At the lowest level, the provider layer exports interfaces to read sensor or statistical values without having to understand the implementation details of the subs-system exporting the telemetry.

Telemetry data is logged according to collection parameters established for a collector . Sensor telemetry is passed from collectors to the analyzer layer for the purpose of online analysis. For example, we may want to collect telemetry for our network sub-system based upon GLD-aware NIC driver kstats, protocol-specific errors and memory usage as seen in netstat(1M) to help predict unhealthy hardware or software or to ensure QOS guarantees.

We can use many of the concepts and the infrastructure developed for the Solaris Fault Manager. For example, telemetry data can be passed as FMA standard events and logged using the Extended Accounting format developed for the errlog and fltlog. We can also leverage the fmd(1M) tool set to observe telemetry logs and analysis results.

 

 

 

Hope to have more details soon...

 

Cindi

 



Friday May 04, 2007

The Solaris Fault Management Architecture has come a long way since Mike Shapiro and I started talking about it way back in 2001. We started out with a bang as the industry leader in fault management technology:


  • August 10, 2001: First discussions of a new approach to fault management begin at Sun.

  • January 15, 2002: First internal presentation of plans for a Solaris Fault Management Architecture

  • March 18, 2004: FMA integrates into Solaris 10 Build 56, providing CPU/Mem for US-III and IV

  • March 7, 2005: FMA ships to customers as part of Solaris 10 G/A

     

  • The members of our original development team have changed along the way, but our commitment to improving the architecture and adding new content remains steadfast. Since the introduction of FMA in Solaris 10, additional content has been added to support new platforms and extend FMA concepts into other subsystems. Just look at what we've delivered since S10 was released a short 2 years ago:

    • New for SPARC: US-IV+, US-T1, Niagara & Niagara-2, Fire PCI-E I/O

    • New for x64: CPU/Memory error handling and diagnosis for AMD Opteron and Athlon 64

    Enables all detector banks and sets all documented MCi_CTL bits

    Full machine-check and error-poller handling for all error types documented in the BKDG

    Diagnosis engine rules for all error types

    Response agent: core offline, page retire

    • New for x64: PCI-Express

    Diagnostic correlation based on transmit/receiver error information

    Connections to platform machine-check error handling

    Connections to FMA-aware leaf drivers for increased availability and diagnosability

    Diagnosis engine rules for all error described in PCI-E Base Specification

    Generates SNMP traps (notifications) for FMA diagnosis

    FM MIB permits additional details by UUID

    Web browsable interface to view

    3730 FMA Events

    338 FMA Knowledge Articles

    CLIs to extract event payload and message content

    • New for Developers: Public interfaces for IO FMA

    Updated WDD chapter for writing FMA-aware drivers

    • Deployment: FMA Demo Package

    Infrastructure to inject errors in a simulation environment

    What's best is that Solaris FMA is getting noticed and showing real benefits. The Sun Service organization estimates that platforms shipping without FMA support can cost $252 per-unit per-year. Let's do the math...if Sun sells 100,000 units per year that means after 3 years, Solaris with FMA is saving Sun $75,600,000.

    100000 units per year x $252 per unit x 3 years = $75,600,000

    I don't know about you, but I wouldn't mind saving $75,000,000.00 a year. A paper presented by Mike Shapiro and Dong Tang at the Dependable Systems Network 2006 demonstrated a decrease in annual system downtime by 37-54% using quantitative analysis of the FMA memory retirement capabilities. InfoWorld gave Solaris FMA a nod by awarding our team members its 2005 Innovation of the Year Award.

    So, what are we working on now? Well, we are continuing to deliver on the promise of Predictive Self-Healing. Work is on-going to support out-the-door fault management capabilities for new processors, platforms and I/O subsystems. With the announced support for Intel on Solaris (or is it Solaris on Intel?), we are busily working on a FMA implementation for Intel processors. Solaris will be the first OS to take full advantage of industry-leading x86 processor error handling features. In the I/O space, we are beefing up leaf drivers, adding FMA error handling and diagnosis for SCSI problems and using SMART disk data to actively predict impending disk failures for all platforms. The Xen project gives us an opportunity to deploy a FMA in a virtualized environment. We'll take some of the infrastructure we delivered for LDOMs and use it to connect hypervisor error handling to a DOM0 diagnosis environment. But that's not all...we are looking at ways to use sensor telemetry to offer better fault prediction, manage resource guarantees and power budgeting. On the software front, we are modifying the techniques we've used to diagnose hardware problems to be useful for software diagnosis. This is a huge under-explored area that will keep Solaris in the fore-front with leading-edge availability and serviceability.

    Stay tuned, we're not done with FMA just yet.

    Cindi