Interconnectedness of all things

Where does the SLP in prstat -m come from?

Thursday Jun 18, 2009

So we have some prstat output for a single process like this

exdev(5.11)$ prstat -c -m -p 2312 1            
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   1   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.24, 8.79, 8.84
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0  19   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.29, 8.80, 8.84
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.34, 8.80, 8.84
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.38, 8.81, 8.84
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.42, 8.82, 8.85
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.38, 8.81, 8.84
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   0  19   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.33, 8.80, 8.84
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.29, 8.79, 8.84
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.24, 8.78, 8.83
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.19, 8.77, 8.83
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.15, 8.75, 8.82
   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP  
  2312 clivek   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   2   1  19   0 gnome-netsta/2
Total: 1 processes, 2 lwps, load averages: 8.10, 8.74, 8.82
exdev(5.11)$ 

If performance is your worry and this process is on your business problem critical path, the obvious question is where is the significant amount of SLP time coming from. SLP, the man page defines as

     SLP

         The percentage of time the process has spent sleeping.

is not the most helpful definition, but means that the process is waiting for events such as network or disk I/O, inside poll or some other waiting state where the cpu has been given up.

So if we really want to know what the process is doing while it is sleeping, we can use this bit of D

#pragma D option quiet 

sched:::off-cpu
/curlwpsinfo->pr_state == SSLEEP && curpsinfo->pr_pid == $1/
{
	self->ts = timestamp;
}

sched:::on-cpu
/self->ts/
{
	@[execname,stack()] = sum(timestamp - self->ts);
	self->ts = 0;
}

tick-60s
{
	normalize(@, 1000000);
	printa("%20s %k %@u\n", @);
	exit(0);
}

which gives us output along the lines of

exdev(5.11)$ pfexec dtrace -s mstate_sleep.d 2312   
    gnome-netstatus- 
genunix`cv_wait_sig_swap_core+0x170
genunix`cv_wait_sig_swap+0x18
fifofs`fifo_read+0xca
genunix`fop_read+0x6b
genunix`read+0x2b8
genunix`read32+0x22
unix`_sys_sysenter_post_swapgs+0x14b
 2
    gnome-netstatus- 
genunix`cv_timedwait_sig_internal+0x1d6
genunix`cv_waituntil_sig+0xba
genunix`poll_common+0x461
genunix`pollsys+0xe4
unix`_sys_sysenter_post_swapgs+0x14b
 55101

exdev(5.11)$ 
So now we have the evidence that the process is waiting in poll. We could have infered the same from truss, but for more complex cases where there are a more diverse set of system events generated by the application, the picture will not be so clear.

Not that in a 60 second sample time, we have only 55101ms of SLP time accounted for. This is because the application (gnome-netstaus-) wakes up from its poll loop every 5 seconds so because of the way the D is constructed, we will miss any period before the 1st off-cpu and after the last on-cpu. Most application, this will be a few ms missed, but worth being aware of.

Off at a complete tangent, I learned this evening that the UK peak power consumption is around 50 Gigawatts. A additional wind farm is proposed in our area to come on stream in 2012 after 2 years of fighting its way through planning. Its output will be 140 Megawatts, so 1/350 of the UK peak requirement. Sounds a bit optimistic in terms of contribution, so going to have to check the numbers out.

Like this post? del.icio.us | furl | slashdot | technorati | digg
Comments:

Post a Comment:
Comments are closed for this entry.