DTrace as a Streams debugger
Since
OpenSolaris is
releasing today, I thought I'd take a few lines to recount how we can
use
DTrace to
snoop in
Streams
data. Here DTrace helped me to quickly and cleanly understand a
specific Streams bug (
6230528).
My usual business is Sustaining Solaris Streams and TCP/IP code, but I
occasionally moonlight supporting legacy communications protocols still
used by the Telco world. In that capacity I was involved in certifying
the legacy
Solstice
X.25 9.2 product as supportable on the (then) upcoming Solaris 10
release.
The X.25 product is highly configurable, and one of it's more esoteric
configs encapsulates the IP protocol over an X.25 link. To do this a
userland daemon, (x25netd) opens a stream to the product's ixe driver,
(a pseudo NIC driver) and uses the Streams I_LINK ioctl to create a
stream to the X.25 driver below. It then uses ifconfig to bring up the
pseudo NIC instance, pushing the IP module onto the stream, and voila
IP over X.25.
Well IP over X.25 functioned just fine, but occasionally when you
brought the X.25 product down, the system paniced in Streams code,
doing putnext() of a Streams message downstream from the ixe driver to
the X.25 layer.
The clear clue was this console message at every close:
WARNING:KERNEL: munlink: could not perform unlink ioctl, closing anyway
(22)
The I_UNLINK ioctl sent downstream by x25netd when it was being shutdown was
failing with EINVAL.
I needed to determine what syndrome caused the stream below ixe to
evaporate leading to the panic, while the I_UNLINK ioctl intended to
safely demolish that stream had failed (?)
DTrace is our friend here, we can use it to slice into the Stream and
select what we want to see, without adding debug and recompiling
multiple Streams modules and drivers, in this case from more than one
code base. DTrace has to be the world's most flexible debugging tool,
Victorinox watch out.
Let's write a D script restricting the stuff we view to the x25netd
process, and look at the munlink() and ioctl activity.
Examining
the code we can see the functions involved in performing a Streams
ioctl, we instrument those, looking for return codes and Streams errors.
Additionally we instrument the general purpose Streams routine for
moving data putnext(), but we restrict what we see to only M_IOCTL
Streams messages and potential replies to M_IOCTL.
#!/usr/sbin/dtrace -s
inline int M_IOCTL = 0x0e; /* ioctl; set/get params */
inline int M_IOCACK = 0x81; /* acknowledge ioctl */
inline int M_IOCNAK = 0x82; /* negative ioctl acknowledge */
fbt::strdoioctl:entry
/ execname == "x25netd" / /* All Streams ioctls sent by x25netd */
{
/* Separator */
printf("\n");
}
fbt::strdoioctl:entry
/ execname == "x25netd" / /* All Streams ioctls sent by x25netd */
{
icp = (struct strioctl *) arg1;
printf("IOCTL %d (0x%x), length %d", icp->ic_cmd, icp->ic_cmd, icp->ic_len);
}
fbt::strdoioctl:return
/ execname == "x25netd" / /* Return code in arg1 */
{
printf("RETURN=%d errno=%d pid=%d", arg1, errno, pid);
}
fbt::munlink:return
/ execname == "x25netd" / /* munlink() returns?? */
{
printf("RETURN=%d errno=%d pid=%d", arg1, errno, pid);
}
fbt::putiocd:return
/ execname == "x25netd" /
{
printf("RETURN=%d errno=%d pid=%d", arg1, errno, pid);
}
fbt::putnext:entry
/ execname == "x25netd" && (((mblk_t *) arg1)->b_datap->db_type == M_IOCTL)/
{
iocp = (struct iocblk *) ((mblk_t *) arg1)->b_rptr;
printf(" putnext msg from: %s:: M_IOCTL ioctl %d (0x%x) count %d",
(stringof((queue_t *) arg0)->q_qinfo->qi_minfo->mi_idname),
iocp->ioc_cmd, iocp->ioc_cmd, iocp->ioc_count);
}
fbt::putnext:entry
/ execname == "x25netd" && (((mblk_t *) arg1)->b_datap->db_type == M_IOCACK)/
{
iocp = (struct iocblk *) ((mblk_t *) arg1)->b_rptr;
printf(" putnext msg from: %s:: M_IOCACK ioctl %d (0x%x) count %d",
(stringof((queue_t *) arg0)->q_qinfo->qi_minfo->mi_idname),
iocp->ioc_cmd, iocp->ioc_cmd, iocp->ioc_count);
}
fbt::putnext:entry
/ execname == "x25netd" && (((mblk_t *) arg1)->b_datap->db_type == M_IOCNAK)/
{
iocp = (struct iocblk *) ((mblk_t *) arg1)->b_rptr;
printf(" putnext msg from: %s:: M_IOCNAK ioctl %d (0x%x) count %d",
(stringof((queue_t *) arg0)->q_qinfo->qi_minfo->mi_idname),
iocp->ioc_cmd, iocp->ioc_cmd, iocp->ioc_count);
}
fbt::strgeterr:return
/ execname == "x25netd" /
{
printf("RETURN=%d errno=%d pid=%d", arg1, errno, pid);
}
fbt:ixe:ixewput:entry /* What M_IOCTLs arrive in ixe */
/ execname == "x25netd" && ((mblk_t *) arg1)->b_datap->db_type == M_IOCTL /
{
iocp = (struct iocblk *) ((mblk_t *) arg1)->b_rptr;
printf(" ixe M_IOCTL %d (0x%x) received, count %d", iocp->ioc_cmd,
iocp->ioc_cmd, iocp->ioc_count);
}
Having started the X.25 product, the associated IP over X.25 NIC was
tested, then I started the D script and waited for all the DTrace
probes to be established. It was time to gather the data, by stopping
the IP over X.25 device.
0 13220 strdoioctl:entry
0 13220 strdoioctl:entry IOCTL 21261 (0x530d), length 24
0 11497 putiocd:return RETURN=0 errno=0 pid=1763
0 3511 putnext:entry putnext msg from: strwhead:: M_IOCTL ioctl 21261 (0x530d) count 24
0 3511 putnext:entry putnext msg from: ip:: M_IOCNAK ioctl 21261 (0x530d) count 0
0 13221 strdoioctl:return RETURN=22 errno=0 pid=1763
0 9815 munlink:return RETURN=0 errno=0 pid=1763
From this it was apparent that the munlink was actually SUCCEEDING, but
that the I_UNLINK ioctl was being NAK'ed by the IP module and never
reaching the ixe driver.
The end result was that the STREAMS framework part of the unlink
processing had occurred, and the linkage below ixe was gone, but
nothing had told ixe that the stream below was no more. If a message
from IP was in flight as the pseudo NIC was ifconfig'ed down, a system
panic was likely.
The munlink code has a special case to handle failure of the I_UNLINK
ioctl
upon closure of the stream.
error = strdoioctl(stp, &strioc, NULL, FNATIVE,
K_TO_K | STR_NOERROR | STR_NOSIG, crp, rvalp);
/*
* If there was an error and this is not called via strclose,
* return to the user. Otherwise, pretend there was no error
* and close the link.
*/
if (error) {
if (flag & LINKCLOSE) {
cmn_err(CE_WARN, "KERNEL: munlink: could not perform "
"unlink ioctl, closing anyway (%d)\n", error);
} else {
link_rempassthru(passq);
mutex_enter(&stpdown->sd_lock);
stpdown->sd_flag &= ~STRPLUMB;
cv_broadcast(&stpdown->sd_monitor);
mutex_exit(&stpdown->sd_lock);
mutex_exit(&muxifier);
return (error);
}
}
This allowed code later in munlink() to destroy the Streams linkage
despite the failure of the ioctl.
The special case doesn't seem like a good idea, however the root cause
of
this problem is elsewhere.
The IP module has no exclusive rights to the I_UNLINK ioctl, therefore
it should be passing it downstream rather than NAK'ing it, and this is
the information which the D script made clear immediately.
In fact it turned out that when the FireEngine code, (TCP/IP merge and
rewrite to use vertical perimeters) was putback in S10, it altered IP's
ioctl
handling, and where IP consumes an ioctl when used as a Streams driver,
it was assuming the right to do so, when pushed as a Streams module, on
another driver's stream!
The fix was to mark the subset of the ioctls consumed by IP as a driver
which are generic Streams ioctls, with a flag to indicate they must be
passed downstream when IP is pushed as a Streams module.
Search the source of
OpenSolaris
for
IPI_PASS_DOWN.
Technorati Tag:
OpenSolaris
Technorati Tag:
Solaris
Technorati Tag:
DTrace