We decided to skip branch merges to Nevada builds 113-116. We had issues with the hardware in the lab and vboxes. Plus we wanted a stable environment for the BakeAThon. I'm having a hard time merging with build 117. One of the issues was a panic on the client (which I could patch) due to getting NFS4ERR_BADSESSION from a DS. I was able to see the DS replying on the exchange_id request in a packet trace. There are only a handful of places where the server returns this error code, so I was pretty sure I could reuse a Dtrace script I had laying around:
:nfssrv:mds_lease_chk:return
/args[1] != 0/
{
printf("rc1 = %d", args[1]);
}
Until I came along this code snippet:
*cs->statusp = resp->dsr_status =
NFS4ERR_UNSAFE_COMPOUND;
goto final;
...
final:
DTRACE_NFSV4_2(op__destroy__session__done,
struct compound_state *, cs,
DESTROY_SESSION4res *, resp);
}
Just great, no return value. How could I catch it?
Well, the answer is staring us right in the face, we can use op__destroy__session__done. I had to snoop around for examples in a colleague's home directory (which is kinda why I blog about this stuff, it is easier to find), but ended up with this:
:::op-bind-conn-to-session-done
{
bresp = (BIND_CONN_TO_SESSION4res *)arg1;
printf("rc1 = %d", bresp->bctsr_status);
}
:::op-destroy-session-done
{
dresp = (DESTROY_SESSION4res *)arg1;
printf("rc1 = %d", dresp->dsr_status);
}
Note that I wasn't too worried about being specific as to which module you found these calls in, I was betting on them being very unique.
Oh, and I was able to narrow down where the NFS4ERR_BADSESSION was coming from. And then I had to add debug statements to find out why. :-< I bet I might have been able to do it still with Dtrace. :->
So we had our first BakeAThon hosted by NetApp. I thought it went very well and it enabled the NetApp folks to see how much work goes into hosting the event.
I can't go into details that involves other vendors, because of a NDA, but I can say that we did a lot of testing on new features in both the client and the server. I'm not that familiar with the client side changes, I think we had the compound constructor changes by Bob Mastors and Karen Rochford was testing a rewrite of the layout handling code.
I was more excited about the changes on the server side:
I also brought a major rewrite of the layout handling code (allowing for multiple layouts, etc), device handling (allowing multiple datasets on the same DS) and the integration of the kspe (kernel simple policy engine), but I probably needed another week of development/testing on that code.
The other major development is that we are finally making the switch to using Virtual Boxes for our testing needs. We typically would want 3-4 machines per developer. We can refine this by having a pair of public communities (MDS plus 2 DSes). One as the stable system and the other as a sanity test rig. And then each client developer gets their own machine for a client and each server developer gets a community.
Well, with Virtual Boxes working on a wide range of host OSes, we can have the communities hosted on one or two beefier machines, perhaps sharing room with the build server, and each developer can bring up what they need on their laptops.
I'm going to be giving a presentation on July 7th over pNFS at the Oklahoma City OpenSolaris User Group - OKCOSUG. I'd like to give a live demo using virtual boxes, but I'm not making any promises...