This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.
I bought myself a USB card from maplin.co.uk - a Velleman k8055 card, 2 analog inputs, 2 analog outputs, 5 digital inputs, 8 digital outputs. I bought it to experiment with libusb.
It took about an hour to solder together and as soon as I plugged it in to my Solaris laptop ( nv_b51) it was autodetected and HID attached to it - odd to have it export the HID class but easy to fix.
I took the usb vid and pid from /var/adm/messages and used update_drv -a to force a binding to the ugen driver. Unplugged and replugged the card and the ugen driver bound to the card.
Then via google I found a k8055 linux libusb application, using the studio 11 compilers ( free from www.sun.com) that built without error. The k8055 work perfectly, I can read the sensors and write values to the ports - taken all the fun out of it...
I think I'll turn it into a kernel driver to give me some practise..
So I have upgraded my laptop ( toshiba satellite A40) from nv build 27 right up to date at b51.
There was a small hick-up with my chipset and some new hardware graphics accelerators but a bit of /etc/driver_aliases fiddling cured that. It seems much faster but that could just be later versions of software.
I like the new firefox and the new music player.
I just installed gnokii - it took a bit of hacking but most of it is working now, just need a bit more work on the ringtone editor.
So the next job is to get the sunray server up to b51 for s10 pre fcs! without losing my wifes home directory..
It is "%k" as in
/usr/sbin/dtrace -i 'BEGIN{ @foo[ cpu, stack(3)] = count();} END{ printa("cpu %d stack %k count %@d\n", @foo);}'
Several folks have asked when should a program set the stack size rlimit.. just before exec() is the only sensible point.
Once your process has started up things have been mapped just below the reserved stack space, the size of which is the value of the stack space resource limit at the time the program assembled its address space ( ie during exec).
lets use pmpa and have a look..
ulimit -S -s 20000
ulimit -S -s
20000
sleep 20 & pmap $!
2460: sleep 20
00010000 8K r-x-- /usr/bin/sleep
00022000 8K rwx-- /usr/bin/sleep
00024000 8K rwx-- [ heap ]
FE700000 864K r-x-- /lib/libc.so.1
FE7E8000 32K rwx-- /lib/libc.so.1
FE7F0000 8K rwx-- /lib/libc.so.1
FE810000 8K r-x-- /platform/sun4u-us3/lib/libc_psr.so.1
FE820000 24K rwx-- [ anon ]
FE830000 184K r-x-- /lib/ld.so.1
FE86E000 8K rwx-- /lib/ld.so.1
FE870000 8K rwx-- /lib/ld.so.1
FFBFE000 8K rw--- [ stack ]
total 1168K
mdb
> (FFBFE000-FE870000)%0t1024=D
20024
>
so the first shared library ld.so.1 has been mapped below the reserved swap space.
ulimit -S -s 200000
ulimit -S -s
200000
sleep 20 & pmap $!
[1] 2463
2463: sleep 20
00010000 8K r-x-- /usr/bin/sleep
00022000 8K rwx-- /usr/bin/sleep
00024000 8K rwx-- [ heap ]
F3700000 864K r-x-- /lib/libc.so.1
F37E8000 32K rwx-- /lib/libc.so.1
F37F0000 8K rwx-- /lib/libc.so.1
F3840000 8K r-x-- /platform/sun4u-us3/lib/libc_psr.so.1
F3850000 24K rwx-- [ anon ]
F3860000 184K r-x-- /lib/ld.so.1
F389E000 8K rwx-- /lib/ld.so.1
F38A0000 8K rwx-- /lib/ld.so.1
FFBFE000 8K rw--- [ stack ]
total 1168K
mdb
> (FFBFE000-F38A0000)%0t1024=D
200056
>
So if I use setrlimit to change the current stack space setting to a bigger number then all future mappings will be pushed down below that reserved space but existing mappings won't move, and if your stack tries to grow over them you will get a segv signal. So you should only ever increase the stack size rlimit just before a call to exec().
This stack size will only affect the default stack for the main thread in a process, the stack for other threads are sized at thread_create() time either using the default 1MB or a program specified amount.
My customer was complaining that his server process was running out of memory, malloc() was returning NULL. A pmap() of the process showed it was a 32 bit application so limited to a touch less than 4GB of
address space. The pmap showed it had only 600MB of space used, a small stack section, lots and lots of shared libraries and a 500Mb heap ( malloc stuff) that was right up to the base of the shared libraries, so had no room to grow.
A bit of careful looking and there was an approximately 2gb hole in the address space starting at about 2gb - how odd!
After getting a truss of the application starting up it became obvious, it was performing a setrlimit( RLIMIT_STACK to RLIM_INFINITY) just before the hole appeared. That call sets the stack size to 2GB ( the stack starts out way up the top of the address space near 4gb on a 32 bit application), the kernel when handing out user address space has to avoid the area reserved for the stack, so all future mmaps are located below 2GB halfing the process's available address space.
Just as I thought it was cooling down enough for the dogs to start exercising again it goes and warms up again! Oh well I can put off getting the kickbike serviced for another couple of weeks.
What we need on the kickbike is a better rear brake, its got a 10 or 12 inch rear wheel and mountain bike v brakes. I find I have to squeeze the brakes really hard and we are getting through brake blocks at an amazing rate - maybe two samoyeds are too much.
As the 2nd car sped past me at 60 mph a few inches from my elbow I had this thought..
Lets just change the speed limit units and keep all the existing signs, all we would have to do is buy some 180 kph signs from the continent for motorways and the whole country would be safer - 30 kph is about 20 mph, 60 kph is about 40mph, lets do it tomorrow!
Back from a long holiday a collegue asked me to look at why a small c++ application was dying with SIGFPE on x86 boxes running Solaris 10. They had run dbx and truss and had worked out that it was taking a SIGFPE divide by zero trap on a idivl instruction deep in the flush of a i/o stream. The truss showed the fault as
Incurred fault #8, FLTIZDIV %pc = 0x0805065E
siginfo: SIGFPE FPE_INTDIV addr=0x0805065E
Received signal #8, SIGFPE [default]
siginfo: SIGFPE FPE_INTDIV addr=0x0805065E
So that would look like a divide by zero, dbx showed that the instruction was a idivl but the divisor register was not zero !
After a bit of looking at the AMD instruction documants we see that the idiv instruction can generate a "divide error" exception for two reasons - a divide by zero error and an integer overflow. The solaris kernel maps the "divide error" exception onto the FPE_INTDIV trap which truss reports but it could be caused by either cause. In this case we had an integer overlow as the result exceeded the capacity of a signed int. Now the folks who maintain the library that made the stream know to go look at their code.
I thought i was being clever with my algorithm for choosing new passwords, for reasons I can't remember this had "~" as the first character - BIG mistake. "~" is the escape character for lots of things like ssh and our service processor to host protocols.. good job I did not have "~#" or "~." as the first two characters! Something best avoided.
Turning on all kernel's TNF probes gathers you a big blob of data about what is going inside the kernel. Prior to Solaris 10 this is the only way to get accurate timing information for system calls. Recently, as in last night, I was trying to work out from one of these blobs why a write into a ufs filesystem might take a long time. I had the pid of the writing process so I could find all its threads, I could see one of them issue the write() system call, then I kept seeing the thread block and almost wake up before blocking again. It did this a number of times, obviously it was competing for a resource like a semaphore or a condvar or a mutex and not getting it. All tnfdump gives you is the address of the resource.
But if you use tnfdump -rx you see a bit more.
Here is what tnfdump gives you..
1995.449700 0.010800 10768 3 0x3002ec894a0 4 thread_block reason: 0x3002ec89632 stack:
0x63f8e8 : {
tnf_tag 0x22f8 thread_block
tnf_tag_arg 0x63f840
time_delta 0x4fd198
reason 0x3002ec89632
stack 0x63f900
}
0x63f900 : {
tnf_tag 0x2358 tnf_symbols
tnf_self_size 0x38
0 0x1011c800
1 0x100448cc
2 0x1007d5f8
3 0x1007d9e4
4 0x100adaa4
5 0x100adc68
}
So time for a modified tnfdump or another awk script to glue these things together so we can see why threads might block in the kernel.
so more on that when I have it working but until then if you have to gather kernel prex data, send in the raw output file from tnfxtract, a live dump or the nm of /dev/ksyms so we don't loose any information from the data.
This morning I got squeezed by a new grey Saab estate with child onboard approaching the QueensGate roundabout in Farnborough, she moved left into my lane and into my space even though I shouted a warning, she kept going until I had to brake to avoid a crash. Of course she had to stop at the traffic ahead so we had words.
She had to get into my space as she was in the wrong lane ....&**&^^%
The on the way home in Ewshot village by the excellent Windmill pub a 4x4 cut the corner across the junction and nearly hit me head on - thanks.
Oh yes and before I forget Hampshire County Council's response to my complaint about the roundabout outside the Sun camps was that it was difficult and they would think about it some more but in the mean time if I wore something bright drivers would notice me better - very patronising, very helpful. Prompted me to send my MP an email.
I used the find your mp page on the house of common's website but it errored most appropriately with "connection reset by peer" , most amusing.
Spending a lot of time writing test cases to try and reproduce system panics has led me to use an interesting(ish) methodology. You stare at the data structures in the dump, you stare at the code and see if you can work out how to get things into a similar state. Then comes the difficult bit, I've taken to working out what operations are possible from the userland code and then randomising them using lrand48().
So this weeks exercise has been to reproduce a panic in the poll() code that from code inspection is impossible. The per process file table (indexed by file descriptor) is per process but the poll structures are per lwp, so there is a linked list of interested threads attached to a file entry if any threads are polling on that file entry. In the dump there are 3 threads chained off one file entry. So we know that we have a multi threaded process performing poll() on a single file descriptor from several threads at once. We panic'ed in close as we traversed that list as one of the threads has been reused by a process that does no polling so its per thread poll structures were null. So now we know that it has threads exiting and threads closing the file that we are polling on.
So I wrote a threaded program that opened a net connection and then went into a loop, it randomly started a new thread, those threads then possibly polled on that connection, or possibly exited, or possibly closed the connection. The main thread dealt with all of this, re-opening the connection if it got closed, starting new threads as ones exited - all under the choice of lrand48().
Did this reproduce it ? No, so then I randomised the number and contents of the pollfd array passed to poll() and suddenly the machine paniced with an identical stack trace to the customer's machine - the power of lrand48()
The good news is that it is fixed in solaris 10 already...