Friday May 05, 2006


Speculative tracing is a useful way to trace code paths in case of errors in the code path. I've also found it useful to trace codepaths, in general (which is useful in understanding the code). For example, if the the data path taken by network packet, as it starts from the top of the network stack (stream head) and all the way down to the driver, is of interest, then a simple script like the one below does the trick:
#!/usr/sbin/dtrace  -Fs
#pragma D option nspec=10

fbt:sockfs:sostream_direct:entry
{
        self->spec = speculation();
        speculate(self->spec);
}

fbt:::
/self->spec/
{
        speculate(self->spec);
}

fbt:ce:ce_wput:entry
/self->spec/
{
        commit(self->spec);
        self->spec = 0;
}

With some basic knowledge of what you want as the entry and exit points of the path, you can create a speculation at the entry point (sostream_direct, a routine in the stream head, in the case above) and commit the speculation at the exit point (ce_wput, the cassini network drivers write route, in the example above). The output for an ftp data packet (i.e the TCP data path) looks like this:
# /d/spec_w.d
dtrace: script '/d/spec_w.d' matched 39427 probes
CPU FUNCTION
  0  -> sostream_direct
  0    -> mcopyinuio
  0      -> allocb_cred
  0        -> allocb
  0          -> kmem_cache_alloc
  0            -> kmem_cache_alloc_debug
  0              -> verify_and_copy_pattern
  0              <- verify_and_copy_pattern
  0              -> dblk_constructor
  0                -> kmem_cache_alloc
  0                  -> kmem_cache_alloc_debug
  0                    -> verify_and_copy_pattern
  0                    <- verify_and_copy_pattern
  0                    -> getpcstack
  0                    <- getpcstack
  0                    -> kmem_log_enter
  0                    <- kmem_log_enter
  0                  <- kmem_cache_alloc_debug
  0                <- kmem_cache_alloc
  0              <- dblk_constructor
  0              -> getpcstack
  0              <- getpcstack
  0              -> kmem_log_enter
  0              <- kmem_log_enter
  0            <- kmem_cache_alloc_debug
  0          <- kmem_cache_alloc
  0        <- allocb
  0        -> crhold
  0        <- crhold
  0      <- allocb_cred
  0      -> uiomove
  0        -> xcopyin_nta
  0        <- xcopyin_nta
  0      <- uiomove
  0    <- mcopyinuio
  0    -> tcp_wput
  0      -> msgdsize
  0      <- msgdsize
  0      -> mutex_owned
  0      <- mutex_owned
  0      -> conn_trace_ref
  0        -> mutex_owned
  0        <- mutex_owned
  0        -> getpcstack
  0        <- getpcstack
  0      <- conn_trace_ref
  0    <- tcp_wput
  0    -> squeue_enter
  0      -> servicing_interrupt
  0      <- servicing_interrupt
  0      -> mutex_owned
  0      <- mutex_owned
  0      -> tcp_output
  0        -> tcp_timeout
  0          -> conn_trace_ref
  0            -> mutex_owned
  0            <- mutex_owned
  0            -> getpcstack
  0            <- getpcstack
  0          <- conn_trace_ref
  0          -> timeout
  0          <- timeout
  0          -> timeout_common
  0            -> mutex_owned
  0            <- mutex_owned
  0          <- timeout_common
  0        <- tcp_timeout
  0        -> dupb
  0          -> kmem_cache_alloc
  0            -> kmem_cache_alloc_debug
  0              -> verify_and_copy_pattern
  0              <- verify_and_copy_pattern
  0              -> getpcstack
  0              <- getpcstack
  0              -> kmem_log_enter
  0              <- kmem_log_enter
  0            <- kmem_cache_alloc_debug
  0          <- kmem_cache_alloc
  0        <- dupb
  0      <- tcp_output
  0      -> tcp_send_data
  0        -> ire_trace_ref
  0          -> th_trace_ire_lookup
  0            -> mutex_owned
  0            <- mutex_owned
  0          <- th_trace_ire_lookup
  0          -> kmem_zalloc
  0            -> kmem_cache_alloc
  0              -> kmem_cache_alloc_debug
  0                -> verify_and_copy_pattern
  0                <- verify_and_copy_pattern
  0                -> getpcstack
  0                <- getpcstack
  0                -> kmem_log_enter
  0                <- kmem_log_enter
  0              <- kmem_cache_alloc_debug
  0            <- kmem_cache_alloc
  0          <- kmem_zalloc
  0          -> th_trace_rrecord
  0            -> getpcstack
  0            <- getpcstack
  0          <- th_trace_rrecord
  0        <- ire_trace_ref
  0        -> ire_to_ill
  0        <- ire_to_ill
  0        -> putnext
  0          -> mutex_owned
  0          <- mutex_owned
  0          -> mutex_owned
  0          <- mutex_owned
  0          -> ce_wput

Similarly,if the inbound data path is of interest, that is, as a packet is recieved on the network interface and travels all the way up to the stack, then reversing the roles in the script as show below does the trick:
#!/usr/sbin/dtrace  -Fs
#pragma D option nspec=10

fbt:ce:ce_intr:entry
{
        self->spec = speculation();
        speculate(self->spec);
}

fbt:::
/self->spec/
{
        speculate(self->spec);
}

fbt:genunix:strrput:entry
/self->spec/
{
        commit(self->spec);
        self->spec = 0;
}

And the output looks like this:
  0                              -> ce_intr
  0                                -> ddi_get_devstate
  0                                <- ddi_get_devstate
  0                                -> pci_dma_sync
  0                                  -> ddi_get_instance
  0                                  <- ddi_get_instance
  0                                  -> ddi_get_soft_state
  0                                  <- ddi_get_soft_state
  0                                  -> ddi_driver_name
  0                                    -> ddi_driver_major
  0                                    <- ddi_driver_major
  0                                    -> ddi_major_to_name
  0                                    <- ddi_major_to_name
  0                                    -> mod_major_to_name
  0                                    <- mod_major_to_name
  0                                  <- ddi_driver_name
  0                                  -> ddi_get_instance
  0                                  <- ddi_get_instance
  0                                  -> pci_debug
  0                                  <- pci_debug
  0                                  -> pci_debug
  0                                  <- pci_debug
  0                                  -> pci_debug
  0                                  <- pci_debug
  0                                  -> pci_pbm_dma_sync
  0                                  <- pci_pbm_dma_sync
  0                                <- pci_dma_sync
  0                                -> ce_dupb
  0                                  -> desballoc
  0                                  <- desballoc
  0                                  -> gesballoc
  0                                    -> kmem_cache_alloc
  0                                      -> kmem_cache_alloc_debug
  0                                        -> verify_and_copy_pattern
  0                                        <- verify_and_copy_pattern
  0                                        -> dblk_esb_constructor
  0                                          -> kmem_cache_alloc
  0                                            -> kmem_cache_alloc_debug
  0                                              -> verify_and_copy_pattern
  0                                              <- verify_and_copy_pattern
  0                                              -> getpcstack
  0                                              <- getpcstack
  0                                              -> kmem_log_enter
  0                                              <- kmem_log_enter
  0                                            <- kmem_cache_alloc_debug
  0                                          <- kmem_cache_alloc
  0                                        <- dblk_esb_constructor
  0                                        -> getpcstack
  0                                        <- getpcstack
  0                                        -> kmem_log_enter
  0                                        <- kmem_log_enter
  0                                      <- kmem_cache_alloc_debug
  0                                    <- kmem_cache_alloc
  0                                  <- gesballoc
  0                                <- ce_dupb
  0                                -> dvma_sync
  0                                <- dvma_sync
  0                                -> pci_fdvma_sync
  0                                  -> pci_debug
  0                                  <- pci_debug
  0                                <- pci_fdvma_sync
  0                                -> pci_dma_sync
  0                                  -> ddi_get_instance
  0                                  <- ddi_get_instance
  0                                  -> ddi_get_soft_state
  0                                  <- ddi_get_soft_state
  0                                  -> ddi_driver_name
  0                                    -> ddi_driver_major
  0                                    <- ddi_driver_major
  0                                    -> ddi_major_to_name
  0                                    <- ddi_major_to_name
  0                                    -> mod_major_to_name
  0                                    <- mod_major_to_name
  0                                  <- ddi_driver_name
  0                                  -> ddi_get_instance
  0                                  <- ddi_get_instance
  0                                  -> pci_debug
  0                                  <- pci_debug
  0                                  -> pci_debug
  0                                  <- pci_debug
  0                                  -> pci_debug
  0                                  <- pci_debug
  0                                  -> pci_pbm_dma_sync
  0                                  <- pci_pbm_dma_sync
  0                                <- pci_dma_sync
  0                                -> hcksum_assoc
  0                                <- hcksum_assoc
  0                                -> canputnext
  0                                <- canputnext
  0                                -> putnext
  0                                  -> mutex_owned
  0                                  <- mutex_owned
  0                                  -> mutex_owned
  0                                  <- mutex_owned
  0                                  -> ip_rput
  0                                  <- ip_rput
  0                                  -> ip_input
  0                                    -> ire_cache_lookup
  0                                      -> ire_trace_ref
  0                                        -> th_trace_ire_lookup
  0                                          -> mutex_owned
  0                                          <- mutex_owned
  0                                        <- th_trace_ire_lookup
  0                                        -> kmem_zalloc
  0                                          -> kmem_cache_alloc
  0                                            -> kmem_cache_alloc_debug
  0                                              -> verify_and_copy_pattern
  0                                              <- verify_and_copy_pattern
  0                                              -> getpcstack
  0                                              <- getpcstack
  0                                              -> kmem_log_enter
  0                                              <- kmem_log_enter
  0                                            <- kmem_cache_alloc_debug
  0                                          <- kmem_cache_alloc
  0                                        <- kmem_zalloc
  0                                        -> th_trace_rrecord
  0                                          -> getpcstack
  0                                          <- getpcstack
  0                                        <- th_trace_rrecord
  0                                      <- ire_trace_ref
  0                                    <- ire_cache_lookup
  0                                    -> ip_tcp_input
  0                                      -> ipcl_classify_v4
  0                                        -> conn_trace_ref
  0                                          -> mutex_owned
  0                                          <- mutex_owned
  0                                          -> getpcstack
  0                                          <- getpcstack
  0                                        <- conn_trace_ref
  0                                      <- ipcl_classify_v4
  0                                    <- ip_tcp_input
  0                                    -> ire_refrele
  0                                      -> ire_untrace_ref
  0                                        -> th_trace_ire_lookup
  0                                          -> mutex_owned
  0                                          <- mutex_owned
  0                                        <- th_trace_ire_lookup
  0                                        -> th_trace_rrecord
  0                                          -> getpcstack
  0                                          <- getpcstack
  0                                        <- th_trace_rrecord
  0                                        -> ire_trace_free
  0                                        <- ire_trace_free
  0                                        -> kmem_free
  0                                        <- kmem_free
  0                                        -> kmem_cache_free
  0                                          -> kmem_cache_free_debug
  0                                            -> kmem_log_enter
  0                                            <- kmem_log_enter
  0                                            -> getpcstack
  0                                            <- getpcstack
  0                                            -> kmem_log_enter
  0                                            <- kmem_log_enter
  0                                            -> copy_pattern
  0                                            <- copy_pattern
  0                                          <- kmem_cache_free_debug
  0                                        <- kmem_cache_free
  0                                      <- ire_untrace_ref
  0                                    <- ire_refrele
  0                                    -> squeue_enter_chain
  0                                      -> servicing_interrupt
  0                                      <- servicing_interrupt
  0                                      -> mutex_owned
  0                                      <- mutex_owned
  0                                      -> tcp_input
  0                                      <- tcp_input
  0                                      -> tcp_rput_data
  0                                        -> tcp_find_pktinfo
  0                                        <- tcp_find_pktinfo
  0                                        -> tcp_process_options
  0                                          -> tcp_parse_options
  0                                          <- tcp_parse_options
  0                                          -> tcp_mss_set
  0                                          <- tcp_mss_set
  0                                          -> tcp_maxpsz_set
  0                                            -> setmaxps
  0                                              -> claimstr
  0                                              <- claimstr
  0                                              -> strqset
  0                                                -> mutex_owned
  0                                                <- mutex_owned
  0                                              <- strqset
  0                                              -> releasestr
  0                                                -> cv_broadcast
  0                                                <- cv_broadcast
  0                                              <- releasestr
  0                                            <- setmaxps
  0                                            -> mi_set_sth_maxblk
  0                                              -> allocb
  0                                                -> kmem_cache_alloc
  0                                                  -> kmem_cache_alloc_debug
  0                                                    -> verify_and_copy_pattern
  0                                                    <- verify_and_copy_pattern
  0                                                    -> dblk_constructor
  0                                                      -> kmem_cache_alloc
  0                                                        -> kmem_cache_alloc_debug
  0                                                          -> verify_and_copy_pattern
  0                                                          <- verify_and_copy_pattern
  0                                                          -> getpcstack
  0                                                          <- getpcstack
  0                                                          -> kmem_log_enter
  0                                                          <- kmem_log_enter
  0                                                        <- kmem_cache_alloc_debug
  0                                                      <- kmem_cache_alloc
  0                                                    <- dblk_constructor
  0                                                    -> getpcstack
  0                                                    <- getpcstack
  0                                                    -> kmem_log_enter
  0                                                    <- kmem_log_enter
  0                                                  <- kmem_cache_alloc_debug
  0                                                <- kmem_cache_alloc
  0                                              <- allocb
  0                                              -> putnext
  0                                                -> mutex_owned
  0                                                <- mutex_owned
  0                                                -> mutex_owned
  0                                                <- mutex_owned
  0                                                -> strrput
 

Notice, how the packet travels up the stack starting from ce (the network driver), to IP (ip_* functions), to TCP (tcp_* functions) before finally landing into the stream head (strrput).

Technorati Tag:
Technorati Tag:
Technorati Tag:

Thursday May 04, 2006

If you have ever used an ifconfig command to plumb a network interface like: >ifconfig ce0 plumb and wondered what the internal STREAMS plumbing looks like, then read on. This blog entry is especially for those who are interested in a simplified view of the network interface plumbing and data flow in Solaris. The plumbing, for those interested in looking at code, happens in plumb_one_device(). First, ifconfig opens the network device: Fig 1: ifconfing opening a stream to the network interface.
It then pushes IP on top of the network device. This forms the stream over which IP packets are sent and recieved: Fig 2: IP pushed on top the network device.
Assuming that a resolver is needed, an ARP module is then pushed on this stream to form the IP-ARP stream. This allows IP to send queries direclty to ARP and process responses from it: Fig 3: ARP pushed on top of IP.
If ARP is needed (as in this example), then another stream to the device is opened: Fig 4: Opening another stream for ARP
And ARP pushed on top of it. This forms the ARP-device stream over which the actual ARP queries are sent out and responses processed from the wire: Fig 5: ARP pushed on top of the network device.
To allow ifconfig to exit, these two streams are PLINKed underneath a multiplexor stream: Fig 6: PLINK below the mux
Since PLINK is used, ifconfig can close the corresponding file descriptors and exit: Fig 7: ifconfig exits
The above diagram describes the "lower half" of setup. To understand how a packet actually travels from the application out to the wire and vice versa, one needs to see the other half (i.e "upper half") of the setup that happens when a socket is opened by the application: Fig 8: Final plumbing showing data path
On the outbound side (packet going from application to the wire), the packet travels upto IP where a routing lookup is done to determine the correct "network interface" (i.e "lower half") to send the packet out on. Accordingly the packet is placed in the correct output queue. On the inbound side (packet coming in from the wire), the packet is "fanned out" to the correct "upper connection" based on five-tuple (src IP addr, dest IP addr, src port, dst port, protocol) match. The bi-directional arrow highlights both the paths.

Technorati Tag:
Technorati Tag: