Welcome to the Event Horizon

Friday Oct 06, 2006

Greetings,

During my recent five month stint at another company, I got into a debate with one of the long-time software engineers about code style. I was in the midst of working on a Linux video driver and had made some changes that I wanted him to have a look at. He made an off-hand comment about how I could feel free to change anything I wanted (he had written the driver I was working on). I responded that, while I'd like to change the way the code is formatted, I didn't see any point in doing so. It was just that the code was difficult to read.

One could hear the proverbial gloved hand smacking the face.

Alright, so it wasn't a knock-down, drag-out WWF style discussion, but I remembered the conversation yesterday while I was doing something I haven't done at Sun for quite some time; looking at code.

This is a great illustration of the topic we discussed (this code is freely available if you know where to look):

   int ret, field, size, count, nbfrs;
   VidDevice *vid;
   VbiDevice *vbi;
   V4LDevice *pp;
   printk(KERN_INFO AMD_VERSION "\n");
   pp = kmalloc(sizeof(*pp),GFP_KERNEL);
   if( pp == NULL ) return -ENOMEM;
   memset(pp,0,sizeof(*pp));
   strncpy(&pp->name[0],DRVNAME,sizeof(pp->name));
   pp->name[sizeof(pp->name)-1] = 0;
   pp->debug_level = debug;
   pp->irq = irq;
   lx_dev = pp;
   DMSG(1,"\n");
   vid = &pp->vid;
#ifdef CONFIG_PROC_FS
   lx_dev->proc = proc_mkdir(PROC_PATH,0);
   if( lx_dev->proc != NULL )
      create_proc_read_entry("video",0,lx_dev->proc,proc_lxv4l2_vid_read,vid);
#endif
   if( (ret=lx_init(pp)) != 0 ) return ret;
   if( (ret=vid_mem_init()) != 0 ) return ret;
   vbi = &pp->vbi;
   spin_lock_init(&vbi->lock);
   spin_lock_init(&vid->lock);
   vid->std_index = STD_SD_NTSC;
   vid->std_num = v4l_std_data[vid->std_index].num;
   vid->std_denom = v4l_std_data[vid->std_index].denom;
   v4l_default_capt_format(vid,&vid->capt);
   vid->capt_state = capt_state_idle;
   size = phys_bfrsz(vid->capt.sizeimage);
   count = v4l_max_buffers(size,vid);
   nbfrs = count - V4L_MIN_BFRS;
   if( nbfrs < V4L_MIN_BFRS ) nbfrs = V4L_MIN_BFRS;

There is nothing functionally wrong with the above code. (Well, maybe there is, but that's neither here nor there). In fact, the engineer who wrote the code was highly respected, even by me. However, just because a piece of code is functional doesn't mean it's "good". I mentioned that the code was difficult to follow. Said engineer responded that code shouldn't read like poetry.

The glove was now on the other hand.

I didn't come right out and disagree, but in my opinion that could not have been more wrong. It goes back to the "functional code != good code" argument. The fact that it took me significantly more time to understand the code due to the formatting was my reason for why it was not good code.

If you had a close enough look at the code above, you undoubtedly noticed many statements written in one line that would traditionally comprise multiple lines (look for the if statements). The other notable "feature" is the utter lack of whitespace. The engineer's rationalization for this type of code "style" (or lack thereof) was that "real estate is king". The more code that can fit on the screen at a time, the better.

I find it difficult to argue with that type of logic. Fundamentally, he and I are on different planes of reality. Anybody that writes code for a living must eventually come to understand that nobody owns a piece of code forever. Somebody else will no doubt be responsible for maintaining or extending your code at some point in the future. Personally, I'd rather not have a curse on my house for generations to come because I wanted to be able to see an extra five lines of code without having to scroll down. It seems to me this argument becomes even more salient when the code you write is visible by the entire world. The number of folks who would love to run into you in a dark alley late at night has now increased by several orders of magnitude.

What is it that makes poetry inherently interesting? It's a combination of the words that are chosen, the way those words are put together, and the underlying message being conveyed. In fact, even the way those words are presented on paper lends itself to the enjoyment of the work.

With all this in mind, how is software different from poetry? Good code is a combination of the same factors. An appropriate choice of constructs, the method used to compile the constructs into something cohesive, and how the way the code is presented allows for a rapid understanding of the problem being solved are higher contributing factors to successful software than how many characters you can squeeze into a window.

Suffice it to say the denouement was less than satisfactory. I am no more likely to convince someone that software is very much like poetry than I am to swim the English Channel. I take solace in the hope that I am not the only one that believes in software as poetry.

Cheers,

David

Sunday Sep 24, 2006

Hey now,

There are countless entries I could create in this, my music category. Unfortunately, one has to start somewhere, and one will start small. All good things in all good time, I suppose.

I am a vinyl junkie. I admit it. There aren't that many of us, but by some accounts our numbers are growing. You can also find us in places such as the Vinyl Asylum.

With that said, you can imagine the strange variety of topics that await discussion here. For today, though, I only wish to leave you with a couple of tidbits.

The first is the recent "release" of Alan Parsons' original 4-channel mix of Pink Floyd's Dark Side Of The Moon. If you are bit-torrent saavy, you may have already seen this flying about. It's a DTS encoded disc, so it's not of much value unless you happen to have a DTS decoder. If you do, though, and you're a Floyd fan, go get this. It should still be easy to find. You won't be disappointed.

The other tidbit I wanted to share with you was the joy I discovered while frequenting my favorite local record store yesterday. I don't know how many of you are familiar with Harvey Danger. Regardless, check this out. This band had the guts to release their latest album, "Little By Little" in its entirety as a free download. It is still available on their web site. If you haven't heard it, I implore you to check it out. If you do, and you enjoy it half as much as I have, I then implore you to buy it. I had been meaning to pick this up on CD ever since the day I downloaded it. I admit it has been quite a while since I did that, but my procrastination paid off yesterday when I found a copy of Little By Little in the new releases bin... on vinyl.

Happy spinning,

David

Friday Sep 22, 2006

Hello again,

In my previous life at Sun, I was a software engineer working on Leadville. More specifically, I spent almost two years working on the FibreChannel stack.

From the time I started working on Leadville, there were numerous recurring panics that all revolved around one central issue. That issue was called the "pd mutex", aptly named after the mutex that is supposed to protect a structure that represents a remote port on the SAN. That structure used to be called the "p"ort "d"evice structure (thus the name pd_mutex), although it has since been appropriately renamed to fc_remote_port_t.

These pd_mutex bugs would pop up on a regular basis because the structures weren't being properly accounted for within the stack.

By the nature of SANs, devices come and go. A host reboots, a user allocates a new LUN on an array, any number of solicited or unsolicited events cause remote ports to appear and disappear. In theory, when a remote port disappears, the structure that represents that remote port also disappears. In reality, it isn't that simple.

I knew there was a fundamental root cause to all the open bugs that were related to "pd_mutex". My goal was to reduce all the open bugs down to that common root cause. The following bug is the one I chose to engineer a solution for:

Bug #4792071

That bug had been open for approximately two years already when I took it over. An initial look at the code made it clear that although there was a field within the remote port structure that was supposed to maintain a reference count, that field was rarely used. I came to the conclusion that the root cause of all the pd_mutex bugs was that there was no adequate reference counting associated with the structure. Thus began two months of in-depth analysis, bug fix development, and testing to ensure that this bug and the other eight or so related bugs no longer exhibited the panics associated with the pd_mutex problem.

After the fix was integrated, the issues slowed to a trickle, but since then there have been several new bugs that have arisen that also point to issues with regard to proper bookkeeping of the remote port structure. One of those recent bugs is the following:

Bug #6255534

Unfortunately, the bug listing in opensolaris isn't very helpful, but this is where talk of threads and synchronization comes into play. This bug is a great example for discussion of fundamental locking methodology in a multi-threaded environment. As the synopsis of the bug indicates, a system panic occurred during Leadville testing while target-side cable pulls were being performed.

A brief code inspection revealed a race condition to be the root cause of the panic. At the time of the panic, there were two threads operating with the same remote port structure. One thread was running at interrupt context and was attempting to destroy a packet that had evidently failed transmission at an earlier time. The second thread was handling a state change notification, presumably one indicating that the remote port had disappeared.

Now, as a bit of background, part of my fix for bug ID 4792071 was to implement a flag in addition to consistent and proper use of a reference counter for each remote port structure. The flag is an indication that the transport layer has notified the ULP that a remote port exists. Thus, there are now two conditions that must be true in order for a remote port structure to be deallocated. First, the reference count must be zero. Second, the flag indicating that ULPs are aware of the remote port must be clear. Then, and only then will a remote port structure be freed.

With that in mind, have a quick look at the relevant piece of code for each thread just prior to the panic. The following code fragments can be seen in the opensolaris NWS source (nws-src-20060724).

Thread 1 (fctl/src/fctl.c:882 in function fc_ulp_uninit_packet):

...
    mutex_enter(&pd->pd_mutex);                                     
                                                                                
    ASSERT(pd->pd_ref_count > 0);                                   
    pd->pd_ref_count--;                                             
                                                                                                                                          
    if (pd->pd_state == PORT_DEVICE_INVALID &&                      
        pd->pd_ref_count == 0) {                                    
            fc_remote_node_t *node = pd->pd_remote_nodep;           
                                                                                
            mutex_exit(&pd->pd_mutex);                              
                                                                                                                             
            if ((fctl_destroy_remote_port(port, pd) == 0) &&        
                (node != NULL)) {                                   
                    fctl_destroy_remote_node(node);                 
            }                                                       
            return (rval);                                          
    }
...

Thread 2 (fctl/src/fctl.c:2907 in function fctl_ulp_statec_cb):

...
    mutex_enter(&pd->pd_mutex);                             
                                                                                
    pd->pd_ref_count--;                                     
    ASSERT(pd->pd_ref_count >= 0);                          
                                                                                
    if (clist->clist_map[count].map_state !=                
        PORT_DEVICE_INVALID) {                                                     
            mutex_exit(&pd->pd_mutex);                      
            continue;                                       
    }                                                       
                                                                                
    node = pd->pd_remote_nodep;                             
    pd->pd_aux_flags &= ~PD_GIVEN_TO_ULPS;                  
                                                                                
    mutex_exit(&pd->pd_mutex);                              
                                                                                                                                    
    if ((fctl_destroy_remote_port(port, pd) == 0) &&        
        (node != NULL)) {                                   
            fctl_destroy_remote_node(node);                 
    }
...

Both threads end up calling fctl_destroy_remote_node, so here is the relevant portion of that function:

int                                                                             
fctl_destroy_remote_port(fc_local_port_t *port, fc_remote_port_t *pd)           
{                                                                               
        fc_remote_node_t        *rnodep;                                        
        int                     rcount = 0;                                     
                                                                                
        mutex_enter(&pd->pd_mutex);                                             
                                                                                                                                                 
        if ((pd->pd_ref_count > 0) ||                                           
            (pd->pd_aux_flags & PD_GIVEN_TO_ULPS)) {                            
                pd->pd_aux_flags |= PD_NEEDS_REMOVAL;                           
                pd->pd_type = PORT_DEVICE_OLD;                                  
                mutex_exit(&pd->pd_mutex);                                      
                return (1);                                                     
        }                                                                       
                                                                                
        pd->pd_type = PORT_DEVICE_OLD;                                          
                                                                                
        rnodep = pd->pd_remote_nodep;                                           
                                                                                
        mutex_exit(&pd->pd_mutex);                                              
                                                                                
        if (rnodep != NULL) {                                                                                                            
            rcount = fctl_unlink_remote_port_from_remote_node(rnodep, pd);  
        }                                                                       
                                                                                
        mutex_enter(&port->fp_mutex);                                           
        mutex_enter(&pd->pd_mutex);                                             
                                                                                
        fctl_delist_did_table(port, pd);                                        
        fctl_delist_pwwn_table(port, pd);                                       
                                                                                
        mutex_exit(&pd->pd_mutex);                                              
        fctl_dealloc_remote_port(pd);                                           
                                                                                
        mutex_exit(&port->fp_mutex);                                            
                                                                                
        return (rcount);                                                        
}

As can be seen above, fctl_destroy_remote_port checks the PD_GIVEN_TO_ULPS flag and the reference count (pd_ref_count) prior to actually deallocating the remote port structure.

In bug ID 6255534, thread 1 caused the deallocation of the remote port structure. Thread 2 caused the panic at fctl_destroy_remote_port+4. If both conditions must be met in order for the remote port structure to be deallocated, how did it happen that thread 2 had a reference to a deallocated structure?

Here is a chronological flow for each thread (based on my analysis):

Thread 1                            Thread 2

Acquire pd_mutex                    .
Decrement pd_ref_count              .
Release pd_mutex                    .
Call fctl_destroy_remote_port       Acquire pd_mutex
.                                   Decrement pd_ref_count
.                                   Clear PD_GIVEN_TO_ULPS flag
.                                   Release pd_mutex
Acquire pd_mutex                    Call fctl_destroy_remote_port
Check if deallocation is ok         .
Release pd_mutex                    .
Deallocate remote port              .
                            Acquire pd_mutex
                                    panic

Once the cause of the panic is evident from an understanding of the above timeline, the solution to the problem should be clear. After thread 2 acquired the pd_mutex, decremented the reference count and cleared the PD_GIVEN_TO_ULPS flag, thread 1 was able to acquire the mutex. At that point, fctl_destroy_remote_port determined (incorrectly) that it was appropriate to deallocate the remote port structure.

As I am no longer responsible for Leadville support, I suggested that the most straight-forward fix to this bug was to require that the pd_mutex be held while calling fctl_destroy_remote_port. If this restriction were put into place, the race condition would be eliminated.

When this suggestion was made, the following question was raised:

"Even if the mutex is held across the call the fctl_destroy_remote_port, wouldn't the panic still occur when the second thread tried to obtain the pd_mutex?"

My response was "There is no longer a second thread".

It is getting pretty late on Friday night, so I suppose this is a good time to wrap this up. I can't believe I spent time on Friday night working on this entry. Perhaps this could shed some light.

Until next time,

David

Thursday Sep 07, 2006

Greetings,

Although I established this blog page quite a while ago, I left Sun before I ever posted anything. Fortunately, the winds of change have blown me back and it seems apropos that, like my manager, I jot down some random thoughts now and again. If you are reading this, well, thank you first :) Second, though, it implies knowledge that Sun encourages this activity. That is the gist of my rambling for the day.

Many months ago, I ran across an article that talked about why ketchup hasn't fundamentally changed for such a long time. Now, I honestly don't ponder things of this nature regularly, but there was something in that article that percolated to the top of my thoughts yesterday as I was contemplating my first blog entry. The article can be found here: The Ketchup Conundrum. It is a lengthy article, but it was section four that struck me (and gave me enough keywords to find the article). That section goes into detail about what makes Heinz ketchup as popular as it is. Their ketchup contains a well engineered combination of ingredients that cause your taste buds to fire on all cylinders as it were.

Now, to bungee back to engineering, let us combine the concept of Heinz ketchup and some of Dan's thoughts about engineering and management, specifically the Rules For Engineering. What Dan gets, as well as the vast majority of folks at Sun, is that there is a set of talents and skills that make for great engineering. In fact, to take it one step further, it is also a combination of engineers, managers, and all the other facets of any engineering company that must be in the right balance in order to attain the proper force multiplier. When this is accomplished, the results can far exceed what other, more mediocre teams or companies can accomplish.

This, my friends, is what Sun gets. Don't get me wrong. It would be far too easy for me, being an employee of Sun, to simply gush on and on about what a fantastic company Sun is. That's not my point, though. There are many companies out there creating outstanding products with teams of extremely talented people. I haven't worked for that many companies over the years, but I've seen enough to know that there are definitely companies that do not grok this concept.

As talented engineers, managers, marketing people, or whatever your individual passions and skill sets are, you are likely challenged to outperform the expectations of your peers, your management, and especially yourself when you see others around you attempting to do the same thing. Sun is changing, just like everybody else. Sun has issues, just like any other company. Sun isn't perfect, but the fact that they do understand these concepts, from Jonathon Schwartz all the way down to those of us in the trenches, is the exact reason why I am pleased to be back.

Cheers,

David