SteveJay's Weblog

« Previous month (Apr 2005) | Main | Next month (Jun 2005) »

20050619 Sunday June 19, 2005

The 1394 Software Framework - Attach, Detach, and Events

Continuing my discussion of the Solaris 1394 Software Framework, in this post I'm going to go into some detail on the methods by which a 1394 target driver can register and deregister itself with the core framework. And, in the process, I will also touch on the Solaris event notification mechanisms leveraged by the framework.

t1394_attach()

All 1394 target drivers must register themselves with the framework in order to operate properly. The 1394 framework will make an association with the device driver instance for the target and the corresponding HAL driver instance (called the 'parent') for the adapter to which it is attached.

In addition, the 1394 framework will allocate resources (internally) to track the target driver state (in the target driver 'handle') and return useful information about the current state of the target device on the 1394 bus and DMA and/or interrupt properties of the parent HAL.[1]

/*
 * Function:    t1394_attach()
 * Input(s):    dip                     The dip given to the target driver
 *                                          in it's attach() routine
 *              version                 The version of the target driver -
 *                                          T1394_VERSION_V1
 *              flags                   The flags parameter is unused (for now)
 *
 * Output(s):   attachinfo              Used to pass info back to target,
 *                                          including bus generation, local
 *                                          node ID, dma attribute, etc.
 *              t1394_hdl               The target "handle" to be used for
 *                                          all subsequent calls into the
 *                                          1394 Software Framework
 *
 * Description: t1394_attach() registers the target (based on its dip) with
 *              the 1394 Software Framework.  It returns the bus_generation,
 *              local_nodeID, iblock_cookie and other useful information to
 *              the target, as well as a handle (t1394_hdl) that will be used
 *              in all subsequent calls into this framework.
 */
/* ARGSUSED */
int
t1394_attach(
    dev_info_t            *dip,         /* supplied by the target */
    int                   version,      /* supplied by the target */
    uint_t                flags,        /* supplied by the target */
    t1394_attachinfo_t    *attachinfo,  /* filled in by the framework */
    t1394_handle_t        *t1394_hdl)   /* returned to the target */

During a target driver's attach processing, it calls t1394_attach() to register with the 1394 Software Framework. The Framework initializes any necessary internal data structures and returns a t1394_hdl which the target driver uses with all other calls into the Framework. The Framework also returns additional information in attachinfo, which is needed by some target driver implementations.

Context:

Should be called only from base kernel context.

Parameters:

Return Values:

t1394_detach()

Of course, for every registration a 1394 target driver should also deregister (when complete and detaching). This gives the Framework the opportunity to reclaim all the internally allocated resources that had been set aside for tracking the target state.
/*
 * Function:    t1394_detach()
 * Input(s):    t1394_hdl               The target "handle" returned by
 *                                          t1394_attach()
 *              flags                   The flags parameter is unused (for now)
 *
 * Output(s):   DDI_SUCCESS             Target successfully detached
 *              DDI_FAILURE             Target failed to detach
 *
 * Description: t1394_detach() unregisters the target from the 1394 Software
 *              Framework.  t1394_detach() can fail if the target has any
 *              allocated commands that haven't been freed.
 */
/* ARGSUSED */
int
t1394_detach(t1394_handle_t *t1394_hdl, uint_t flags)

The target driver calls t1394_detach() to deregister from the 1394 Software Framework. Typically the target calls this from its detach(9E) routine.

Context:

Should be called only from base kernel context.

Parameters:

Return Values:

Events

Target drivers may register callbacks for general 1394 Software Framework events by using the Solaris Event Framework. All calls to the Event Framework must be performed before the call to t1394_attach() is performed. For details about the Solaris Event Framework, see:

The following events are supported by the 1394 Framework:

     void (*handler)(dev_info_t  *dip,  ddi_eventcookie_t cookie,  void  *arg,
         void *impl_data);

The callback impl_data provided to the eventcalls associated with each of these events is a t1394_localinfo_t *, as described above.

Note: Within an event callback function, a target driver shouldn't invoke any procedure that blocks or sleeps. For example, an event callback function shouldn't issue any outgoing asynch request that has the CMD1394_BLOCKING flag set. (Yep, again... more on this in a later blog entry).

And for next time?

OK. So obviously there's more I could say here about the details of Framework's implementation for tracking and coordinating target drivers and their events, but as I've said before that I want to first go through the Framework at a high-level.

Next time, I'm going to go over the outgoing asynch interfaces: command structure allocation, command completion mechanisms (event-driver, blocking, polling), command types (read, write, lock), quadlet requests, block requests, etc. That's a ton of stuff to cover, so maybe it won't all get into a single blog entry, but anyway that's where I'm headed next.


[1]Both t1394_attach() and t1394_detach() can be found in t1394.c
[2]I've worked with Mark for many years. He's a good guy, very bright, and he was one of us original four designers of the Solaris 1394 Software Framework. His contribution was essentially the entire OpenHCI-compliant HAL driver (hci1394).
[3]OK, so... flags. Why do we have it all over the place even though we don't often use it, you might be asking? Well the cynic among you might quote me some Emerson, but in reality this is more about our desire to be able to accomodate backwards compatibility. This goal of being able to have the design move forward and evolve, while still accomodating older versions of the software is part of Sun culture. It is an imperative of design in Solaris, especially for driver frameworks like theses. Of course, without more than one version of a framework like this (this one is essentially unchanged from when it was initially putback), a designer must consult a crystal ball (or draw on experience if you prefer). And sometimes you're gonna come up with flags that don't do anything (yet!)
(2005-06-19 09:16:00.0) Permalink

20050615 Wednesday June 15, 2005

InfiniBand HCA driver missing from OpenSolaris?

Yeah, that's right. Unfortunately, Sun is not yet able to open up our source code for our Solaris InfiniBand HCA driver. (One of my colleagues, Steve Rust touches on this in his most recent blog entry.) Although we wrote all the code ourselves, we did it with access to info that we got under NDA. So we're still under obligation not to disclose anything. I sincerely hope we will soon be able to open it up too, because there is some really interesting code in there that Steve R. and I are really proud of. For now, though, I guess it is among those few OpenSolaris drivers which you can get only as a binary.

The driver itself is called tavor and it basically started out as my baby (Steve R. owns it now). After my work on the Solaris 1394 Software Framework (and a handful of aborted or "development only" projects with InfiniBand HCA's), I finally got an opportunity in early 2002 to design and implement my own driver, from the ground up. The driver was to be for the Mellanox InfiniHost MT23108 HCA device, which was going to be the central I/O component in a SPARC-based blade server platform (which we never ultimately shipped).

But although the driver started out life as with a very specific purpose for a very specific (and since canceled) platform, we (the engineers) anticipated a value from the beginning if it could work well with plug-in cards. And today, a plug-in card is still the primary mechanism for adding InfiniBand to a system.

It took about a year and half of design/implementation/testing before it was ready for putback into Solaris (Steve Rust's blog says August 6th, 2003 and I'll trust him, since he was our 'gatekeeper' for the entire Solaris InfiniBand Framework putback). Subsequent to that putback, there were bug fixes (obviously), enhancements for x86 and AMD64 support, the userland access support, and (most recently) support for Shared Receive Queues (SRQ) and for the new Mellanox InfiniHost III Ex MT25208 HCA device.

The latter half of that work above was done by Steve R. and was done subsequent to Solaris 10 release. (I had the "project lead" role, he did all the hard work.) But if you want to check out the fruits of Steve R's latest work - check out Solaris Express 04/05 for the latest 'tavor' bits.

I know we're both extremely proud of this code (and really do wish we could show it off). And it's got some really fun stuff in it: handling userland access to HCA resources (i.e. OS bypass for lower latency), extreme configurability (honestly probably too configurable), a fancy mechanism for keeping track of "Work Request Identifiers" (for which I recently received US Patent #6,901,463), and a cool queue pair number allocation/reuse scheme for which Steve R. and I have a patent pending.

But anyway, I probably sound like a tease, since the driver isn't yet available in source form. But, if you're interested in InfiniBand, there's still plenty of really excellent code in OpenSolaris to check out. (Check Steve Rust's latest blog entry "InfiniBand Support in OpenSolaris" for a good starter.)

And if you've got an InfiniBand HCA card (from any of a number of vendors - Sun, TopSpin, Mellanox, etc.), then you can see this driver attached to your hardware and you can use it. (Matter of fact, like I said, this same 'tavor' driver will also attach to and operate on the latest generation of Mellanox's PCI-Express-capable InfiniHost III Ex MT25208 card. So, if you've got a system with PCI-Express - there are a few out there and I know Sun's got plenty coming - then you can get some really kick-ass performance out of our IB stack.)

Also, if you want to read more about our Solaris IB stuff, here's a few blogs by some other colleagues of mine:

There's a ton of other engineers (dozens, literally) who've contributed to the Solaris InfiniBand Framework. But maybe they're a little shy? I can't seems to find blogs by any of them. Anyway, they should all be proud too. And I'm sure that they are happy to have you folks able to see their code now in OpenSolaris.

Shoot me a comment (below) if you've used our IB software, or if you've been poking through the code. I'm very curious to hear from folks on the other end about what they think of our work.

(2005-06-15 18:14:00.0) Permalink Comments [3]

20050614 Tuesday June 14, 2005

OpenSolaris - The 1394 Software Framework

When I started at Sun about seven years ago, the Solaris 1394 Software Framework was my first project. A team of four, we designed, implemented, tested and putback to Solaris 8 (Update 2) in about a year and a half. Since then the code has been transitioned to others (like Alan Perry and Artem Kachitchkine), who continue to maintain and extend its functionality even today. Now, almost six years later, I finally get a chance to share this code with the world (and, hopefully, at least some interested readers.)

So I figured I'd start by giving a brief overview of the stack and the features that it provides and then talk a bit about the source files for the modules: how they're organized, what's in them, etc. Then (over the course of several posts) I'll get into the specifics of how to use each of interfaces, gotcha's for potential developers working with the source, some of the little bits of which I'm most proud, and maybe some discussion of what could be improved or extended in the existing code (and I'll be interested to hear what others think). This will not be an intro to the IEEE 1394 specification or the technology, nor will it be an introduction to writing Solaris device drivers (though I'll try to help anyone in any way I can). What follows will assume a certain familiarity with IEEE 1394 and with writing Solaris device drivers.

The Solaris 1394 Software Framework



The framework itself consists of a central module (s1394) called the "1394 Services Layer", an OpenHCI-compliant HAL driver (hci1394), and numerous target drivers (currently, av1394, scsa1394, and dcam1394 - which I'll say more about later.) If you are familiar with Solaris's SCSA framework (for SCSI drivers), you'll recognize this stacking of modules. This arrangement is a typical way to abstract hardware-specific details (below the Service Layer) from the target drivers (above the Services Layer).

The Solaris 1394 Software Framework Device Driver Interface provides a set of kernel interface routines to facilitate access to devices on an IEEE 1394 bus. The interface routines are intended for use by 1394 device drivers, also referred to as target drivers.

There are two kinds of target drivers: class drivers and vendor specific drivers. Class drivers adhere to a general standard for a particular kind of device and can drive any vendor's device that adheres to the same standard. For example, a class driver for the IEEE 1394 Digital Conferencing Camera specification can drive video conferencing cameras manufactures by a variety of vendors (even though each may have a different set of features). Vendor-specific drivers are built to drive a specialized non-standard device. The 1394 Software Framework supports both kinds of target drivers.

Features of the 1394 Framework

The 1394 Software Framework provides several features to support target drivers using the IEEE 1394 bus.

Asynchronous I/O

There are two sides of asynchronous I/O: issuing outgoing requests and handling incoming requests.

Outgoing Requests - The 1394 Framework provides the ability to send the basic set of IEEE 1394 asynchronous requests; read, write and lock. In addition to the IEEE 1394 defined set of lock request options, the Framework lock request interface also provides a set of bit and arithmetic functions.

For asynchronous requests, the 1394 Framework automatically determines the device's destination ID, sends the request using the local host's 1394 hardware interface, tracks the status of the request until the transaction completes, and supplies response information as needed.

Target drivers choose whether to: 1) block while waiting for the transaction to complete, 2) poll on the request completion status, or 3) have the 1394 Framework call a specified callback routine when the transaction completes. Using the poll and callback mechanisms, target drivers can issue several outstanding requests and poll or be notified for each completion.

Incoming Requests - The primary role of the 1394 Framework with respect to incoming asynchronous requests is to dispatch the request to the appropriate target driver or to handle the request on behalf of the appropriate target driver.

To support this, the 1394 Framework provides an allocation mechanism that target drivers use to reserve ranges of addresses within the 48-bit local node address space. Target drivers have several options available when allocating 1394 address space including the ability to specify a kernel virtual buffer to map to the allocated space. When the 'destination_offset' of an incoming request falls within an allocated address range, the 1394 Framework fulfills the request if possible, notifies the target driver if desired, and sends the response. Target drivers may also allocate 1394 address space with the characteristic that an incoming request to that space will be handled by hardware. In this case hardware directly accesses host memory bound to that address space, and transmits the appropriate response.

Isochronous I/O

Due to the potentially large volume of isochronous data and the critical isochronous timing needs, the 1394 Software Framework provides a mechanism designed to reduce call overhead and to maximize throughput.

Before starting isochronous I/O, a target driver sets up the overall sequence and structure of receive or transmit buffers, indicating other needs such as when the 1394 Framework should invoke a target driver's callback. Once isochronous I/O is started, the target driver can focus most of its time on handling the data.

The 1394 Framework mechanism for configuring isochronous I/O is the Isochronous Transfer Language (IXL). The IXL is a hardware independent set of control blocks that the target driver uses to direct isochronous DMA. The 1394 Software Framework converts the hardware independent IXL into the appropriate DMA directives for the local host 1394 interface hardware changes, the impact to the target driver is minimal or non-existent.

The 1394 Software Framework also facilitates peer to peer communication by tracking all target drivers with an interest os a particular isochronous stream, allocating a channel number and bandwidth as needed, and coordinating the target driver notification of stream starts and stops.

Bus Reset, Isochronous Resource Manager, Bus Manager

In addition to complying with the IEEE 1394 requirements for bus reset processing, such as cancelling pending asynchronous requests, the 1394 Framework provides several bus reset related features.

One of the most severe effects of a bus reset is the re-enumeration of all the nodes on the bus. The 1394 Software Framework assesses the post bus reset topology and determines the new node_IDs for all target driver instances. It can then reissue any uncompleted outgoing asynchronous requests on behalf of the issuing target driver, and each target driver can continue on without concern for their new node number.

As part of the topology evaluation, the 1394 Software Framework also creates a speed map to determine the maximum packet speed between any two nodes. The Framework uses the speed map to select the most efficient speed for target driver outgoing asynchronous requests.

In addition to the topology map and speed map which are part of 1394 bus manager duties, the 1394 Software Framework also contends for isochronous resource manager and bus manager. If it is bus manager, the Framework will ensure that the root is cycle master capable and optimize the gap count.

Hotplug

Another aspect of the Framework's topology evaluation is that it determines which devices, if any, have been removed from the bus and which ones have been added to the bus. For removed devices, the 1394 Software Framework calls into the Solaris Hotplug Framework to notify it that the device is offline. For added devices, the Framework reads the device's configuration ROM to determine the pertinent information, the Global Unique ID and often the Unit_Spec_Id and Unit_Sw_Version, and creates the Solaris "/devices" node using the Solaris Hotplug Framework interfaces.

Building a target driver

To ensure that the Solaris 1394 Software Framework is loaded, target drivers must link with a dynamic dependency on the Framework misc module. This is done using the '-N' flag with ld:

        ld -r -dy -Nmisc/s1394 -o target target1.o target2.o -o target

1394 Device /devices pathname

Solaris device entries for IEEE 1394 devices are created based on the device's global unique ID. The format of the name uses a prefix of "unit@" followed by the GUID in hexadecimal. An example /devices pathname for device A above is as follows ("tdA" is target driver A's minor name):

        /devices/pci@1f,4000/firewire@4,2/unit@0800460200000016,0:tdA

Adding a driver

Although the /devices name for the device is based on the GUID, the device driver itself is bound to the device(s) based on the first pair from the following list to exist in the device's configuration ROM:

        1. Unit_Spec_Id, Unit_Sw_Version
        2. Node_Spec_Id, Node_Sw_Version
        3. Node_Vendor_Id, Node_Hw_Version
        4. Module_Spec_Id, Module_Sw_Version
        5. Module_Vendor_Id, Module_Hw_Version

For further information on the layout of configuration ROM and the meaning of these values, refer to IEEE 1212-1994 Section 8 and IEEE 1394-1995 Section 8.3.2.5. For specific information about configuration ROM for a particular device class, refer to the device class specification.

After parsing configuration ROM and locating one of the pairs as shown above, the Solaris 1394 Software Framework provides the information to the hotplug framework. If a driver is configured to bind to the designated pair, the /devices and /dev entries are created and the driver's attach() routine is invoked. For example, to add a driver for a video conferencing camera which adheres to the 1394 Digital Camera Draft 1.04 (note that hexidecimal letters must be in lower case):

        add_drv -n -i \"firewire00a02d,000100\" tdA

Where firewire is the hardware interface, 00a02d is the Unit_Spec_ID and 000100 is the Unit_Sw_Version.

How the source is organized

All the source and headers for the Solaris 1394 Framework can be found under:

        usr/src/uts/common/io/1394
        usr/src/uts/common/sys/1394

The files themselves break down this way:

        * t1394.c - 1394 Target Driver Interfaces
        * s1394.c - 1394 Services LAyer Initialization and Cleanup Routines
        * s1394_addr.c - 1394 Address Space Routines
        * s1394_asynch.c - 1394 Services Layer Asynch Communications Routines
        * s1394_bus_reset.c - 1394 Services Layer Bus Reset Routines
        * s1394_csr.c - 1394 Services Layer CSR and Config ROM Routines
        * s1394_dev_disc.c - 1394 Services Layer Device Deiscovery Routines
        * s1394_hotplug.c - 1394 Services Layer Hotplug Routines
        * s1394_isoch.c - 1394 Services Layer Isoch Communications Routines
        * s1394_misc.c - 1394 Services Layer Miscellaneous Routines
        * nx1394.c - 1394 Services Layer Nexus Support Routines
        * h1394.c - 1394 Services Layer HAL Interfaces
        * t1394_errmsg.c - Utility function that targets can use to convert an
                           error code into a printable string.
        * s1394_cmp.c - 1394 Services Layer Connection Management Procedures Support Routines
        * s1394_fa.c  - 1394 Services Layer Fixed Address Support Routines
                        (Currently used only for FCP support)
        * s1394_fcp.c - 1394 Services Layer FCP Support Routines
        * t1394.h
            This is the primary header file for the 1394 Framework and it includes
            all other header files listed below.  In addition, it contains all 1394
            Framework interface routine prototypes as well as all data structure and
            defines beginning with the "t1394_" prefix.  (n.b. This one's pretty
            well-commented, if I say so myself.)
        * cmd1394.h
            This file contains all structures and defines for handling asynchronous
            commands.
        * id1394.h
            This file contains all structures and defines for managing a local
            isochronous DMA resource.
        * ieee1394.h
            This file contains general IEEE 1394 defines.
        * ieee1212.h
            This file contains general IEEE 1212 defines.
        * ixl1394.h
            This file contains all structures and defines for utilizing IXL programs.
        * h1394.h
           This file contains the structure and error codes used to communicate between
           the HAL and the rest of the 1394 Software Framework
        * s1394.h
           This file contains all of the structures used (internally) by the 1394
           Software Framework.
        * s1394_impl.h
           This file contains typedefs and defines used by all 1394 Software Framework
           files.

The source for our OpenHCI-compliant HAL driver can be found in:

        usr/src/uts/common/io/1394/adapters
        usr/src/uts/common/sys/1394/adapters

The files here are numerous, so I will hold off saying more about this driver until some later entries.

        hci1394_extern.c       hci1394_misc.c         hci1394.c
        hci1394_ioctl.c        hci1394_ohci.c         hci1394.conf
        hci1394_async.c        hci1394_isr.c          hci1394_s1394if.c
        hci1394_attach.c       hci1394_ixl_comp.c     hci1394_tlabel.c
        hci1394_buf.c          hci1394_ixl_isr.c      hci1394_tlist.c
        hci1394_csr.c          hci1394_ixl_misc.c     hci1394_vendor.c
        hci1394_detach.c       hci1394_ixl_update.c
        hci1394_isoch.c        hci1394_q.c

        hci1394.h              hci1394_extern.h       hci1394_state.h
        hci1394_async.h        hci1394_ioctl.h        hci1394_tlabel.h
        hci1394_buf.h          hci1394_isoch.h        hci1394_tlist.h
        hci1394_csr.h          hci1394_ixl.h          hci1394_tnf.h
        hci1394_def.h          hci1394_ohci.h         hci1394_vendor.h
        hci1394_descriptors.h  hci1394_q.h            hci1394_drvinfo.h
        hci1394_rio_regs.h

And the source for the existing target drivers (mentioned above) can be found in:

        usr/src/uts/common/io/1394/targets/av1394
        usr/src/uts/common/io/1394/targets/scsa1394

OK... So what's next?

So my basic plan is to continue with some discussion of driver attach() and detach() interfaces (see t1394_attach() and t1394_detach() in the source) and basic 1394 event processing. Then to move on to talk about the asynch interfaces, the isoch interfaces, and finally some of the miscellaneous interface routines. At this point, I thought I'd change the focus from a description of the interfaces and how to use them to a more detailed examination of some specific bits of internals code.

But I'm open to suggestions too. If you've read this far and feel like you may be interested in reading more, going through the code yourself, sharing you thoughts and comments, and end up with something specific you'd like to hear about... lemme know.

(2005-06-14 09:00:00.0) Permalink