Alan Hargreaves' Weblog
The ramblings of an Australian SaND TSC* Principal Field Technologist
* Solaris and Network Domain Technology Support Centre - The group I work forTags
(update 1) acoustic bind birthday blues bugs cec cec2007 cec2008 china cmt contention cringley debugging dogs dtrace earthquake encumbered-binaries extra flash funny google guitar halloween huron install kids linux liveupgrade locking mdb music mysql newyear niagra openjava opensolaris oracle patches patents percussion performance redhat secondlife security solaris sru sun support sxcr t2 t2000 timeslider ufs upgrade virtualbox windows youtube zfs
Monday Aug 23, 2004
Kprobes in Linux vs Dtrace
An article on osnews points at an article at IBM about Kprobes on Linux
While this is probably a step in the right direction I still have some concerns. I would encourage the author to look at adding in some more protection. i.e. Always practice safe probing.
- I don't see any checking for NULL Pointer dereferences for the printk's. If this is the case, then a poorly written kprobe can still take out a production box. In fact any bad piece of code could take it out.
- It stills rather clunky to get simple probes inserted. Looking through the article shows a lot of work required to get the probes in. The equivalent probe in dtrace would be
#!/usr/sbin/dtrace -s syscall::fork1:entry, syscall::forkall:entry, syscall::vfork:entry { printf("\n\tpid=%d kthread=0x%llx\n", pid, (long long)curthread); printf("\tt_state=0x%x cpu=%d\n", curthread->t_state, curthread->t_cpu->cpu_id); printf("\n\tCaller program is \"%s\"\n\n", execname); printf("\tUser Space stack\n"); ustack(); printf("\n\tKernel Space Stack\n"); stack(10); }Which gives us the following results# ./fork.d dtrace: script './fork.d' matched 3 probes CPU ID FUNCTION:NAME 2 207 vfork:entry pid=1443 kthread=0x300056a3c60 t_state=0x4 cpu=2 Caller program is "csh" User Space stack libc.so.1`vfork+0x20 csh`execute+0xcbc csh`process+0x360 csh`main+0xe94 csh`_start+0x108 Kernel Space Stack unix`syscall_trap32+0xccAlternately, with the knowledge that in Solaris each of these three system calls call cfork() (which you could also determine with dtrace), we could simply do#!/usr/sbin/dtrace -s fbt::cfork:entry { printf("\n\tpid=%d kthread=0x%llx\n", pid, (long long)curthread); printf("\tt_state=0x%x cpu=%d\n", curthread->t_state, curthread->t_cpu->cpu_id); printf("\n\tCaller program is \"%s\"\n\n", execname); printf("\tUser Space stack\n"); ustack(); printf("\n\tKernel Space Stack\n"); stack(10); }Which would give us exactly the same output as the calls to cfork() are done with tail recursion. On an x86 box it would look something like:# ./fork.d dtrace: script './fork.d' matched 1 probe CPU ID FUNCTION:NAME 0 3882 cfork:entry pid=669 kthread=0xffffffffd5ae6000 t_state=0x4 cpu=0 Caller program is "csh" User Space stack libc.so.1`vfork+0x45 csh`execute+0x12f csh`process+0x24b csh`main+0xa25 80580ea Kernel Space Stack unix`sys_call+0xda
Now there are also a couple of other nice things to consider here.
- No need to register and unregister the probe. If I'm not running the dtrace script, then the probe does not exist.
- If I want to change the query, I just edit the script.
- This one is actually a pretty basic probe. I can get much more complex information with very little effort, and as I have already stated, it's just a matter of modifying the script and the probe does not exist unless I am running the script.
But the most important thing to remember is that we have protection against the probe taking out the system. That means that we have no hesitation in running dtrace probes on production boxes, where outage time is measured in thousands of dollars per second (yes we have such customers).
Update
Dan Price made a suggestion which tidies the script up even more, meaning that even if we change the way that we do fork(), the script will remain working. This gives us stability with kernel releases as well. To see the new script, look at the comments for this entry.Posted at 12:19PM Aug 23, 2004 by Alan Hargreaves in Solaris Express | Comments[3]

