Andrew Rutz's blog

Friday Jun 29, 2007

Recursive panic

I will dispell any rumor to the contrary: it is possible to call panic() during the execution of panic(). As is true for any (language that supports it and) function that performs it, this is an "act of recursion" !

I recently stumbled across an instance of a recursive-panic in a core file I was analyzing. After attaching mdb to the core file and running ::msgbuf to dump the console messages written by the kernel to its circular, in-memory buffer, I noticed a "curious, single line" buried within its output:

panic sync timeout
An annotated replay of the "tail" of ::msgbuf's output is shown here, along with our "curious, single line":
# ls
bounds    unix.0    vmcore.0
# mdb -k 0
Loading modules: [ unix krtld ... ]
> ::msgbuf
[...]
panic[cpu17]/thread=3000372c680:
BAD TRAP: type=28 rp=2a101871200 addr=124ed4c mmu_fsr=0

event_pvr_exec-5:
integer divide zero trap:
addr=0x124ed4c
pid=3751, pc=0x124ed4c, sp=0x2a101870aa1, tstate=0x80001606, context=0x1d64
g1-g7: ffff, fc00, 4000, 0, 0, 0, 3000372c680
000002a101870f20 unix:die+9c (28, 2a101871200, 124ed4c, 0, 2a101870fe0, 10000)
  %l0-3: 0000000000000000 0000000000000028 000000000000000f 000000000179ed54
  %l4-7: 0000030000074f40 0000000000000061 000000000179ed53 0000000001074000
000002a101871000 unix:trap+560 (2a101871200, 58, 0, 0, 300019aa000, 3000372c680)
  %l0-3: 0000000000000000 0000060012e8efd0 0000000000000028 00000600133be388
  %l4-7: 0000000000000000 000006001f282700 0000000000000001 00000300019aa180
000002a101871150 unix:ktl0+48 (4000, 40, 0, 0, 3002189493c, 0)
  %l0-3: 0000000000000007 0000000000001400 0000000080001606 00000000010195c4
  %l4-7: 0000000000f0007f 0000000000000100 0000000000000000 000002a101871200
000002a1018712a0 ssd:ssd_get_physical_geometry+204 (3001eda2898, 2a10187140c, 0, 0, 0, 60003df3580)
  %l0-3: 0000030021894938 0000000000000024 0000000000000000 000000007fffffff
  %l4-7: 0000000080000000 0000000080000000 000003002189493c 0000000000004000
000002a101871350 ssd:ssd_resync_geom_caches+b4 (60003df3580, ffffffff80000000, 200, 1, 3ec1, ff)
  %l0-3: 00000000000000ff 000000000000003f 0000030023a7b420 000003000e760780
  %l4-7: 00000000003f0000 0000000000ff0000 0000060003de0b00 0000000000020a2a
000002a101871420 ssd:ssd_validate_geometry+b4 (60003df3580, 1, 3000372c838, 1, 7, fc000092)
  %l0-3: 000002a101871484 0000000000000006 0000000000000200 000000000179ed52
  %l4-7: 0000000080000000 000000000000003c 0000060003de0b00 00000000018b8c00
000002a1018714e0 ssd:ssd_ready_and_valid+2cc (60003df3580, 3000372c680, 60003df36c0, 0, 2, 2)
  %l0-3: 0000000000000000 00000000018acd68 0000030000008ff0 0000060003de0b00
  %l4-7: 0000000000000000 0000000000000000 0000060003de0b00 0000060003de0b00
000002a1018715f0 ssd:ssdopen+260 (8, 60003df3580, 3, 18cfb68, 2, 60003df3650)
  %l0-3: 0000000000000000 0000000000000002 000000760000013b 0000000000000000
  %l4-7: 0000060003de0b00 0000000000000001 00000000018c7c00 0000000000000000
000002a1018716a0 specfs:spec_open+438 (2a101871930, 1, 60002c01ee8, 18c88f0, 600138a0de8, 30010464c80)
  %l0-3: 0000000000000001 0000060016670f40 0000000000000000 00000000ffffffff
  %l4-7: 0000000000000000 00000600166b13e0 00000600138a0e70 000000760000013b
000002a101871760 genunix:fop_open+78 (2a101871930, 2, 60002c01ee8, 1, 60016670f40, 1)
  %l0-3: 0000000000000001 00000300000a4dd8 00000600068a23d8 00000600068c9738
  %l4-7: 0000000000000000 00000600068a23f0 00000600068c9738 0000000000000004
000002a101871810 genunix:vn_openat+4d4 (0, 99c, 0, 1, 0, 1)
  %l0-3: 000000007fffffff 0000000000000000 0000000000000000 0000000000000000
  %l4-7: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
000002a1018719d0 genunix:copen+260 (ffffffffffd19553, ffbff60c, 0, 3b99c, 0, 1)
  %l0-3: 00000600133be590 00000600133be5b0 000003501b56956d 0000000000000000
  %l4-7: 00000000018a7c00 000000000000099c 0000000000000012 0000060012e8efd0

syncing file systems...
 
panic[cpu17]/thread=3000372c680:
panic sync timeout

dumping to /dev/md/dsk/d1, offset 64422019072, content: kernel
The characters in green were output by the first call to panic(), which was caused by a division-by-zero trap that occurred in the function ssd_get_physical_geometry(). The output shown in red is the output from the second panic (eg, the panic that occurred while servicing the first panic).

Our "curious, single line" (1), shown below, is found at the articulation between the first and second panics. The first panic calls vfs_syncall() to write any in-memory representation of a filesystem to its associated backing-store (2). As will be seen shortly, this execution times-out, and panic() is called again (3), this time with the argument "panic sync timeout". panic()'s design for re-entrancy ensures that most of its implementation is not repeated. Progress resumes with the writing of the core file to disk (4).


[...]
syncing file systems...             (2)
 
panic[cpu17]/thread=3000372c680:    (3)
panic sync timeout                  (1)

dumping to /dev/md/dsk/d1, offset 64422019072, content: kernel    (4)
panic()'s re-entrancy includes the use of "panic triggers", which are three global booleans which are atomically updated at specific points in panic()'s control flow:
  /*
   * Triggers for panic state transitions:
   */
 int panic_quiesce;                      /* trigger for CALM    -> QUIESCE */
 int panic_sync;                         /* trigger for QUIESCE -> SYNC */
 int panic_dump;                         /* trigger for SYNC    -> DUMP */
panic_trigger() is the function that atomically updates a specified trigger and returns True if the current call to the function caused the first (and only) change from an un-triggered state to a triggered state. If panic_trigger() returns False, then panic() knows that the associated trigger has already been triggered, and that an associated critical section of panic()'s control flow must not be run (again).

The high-level control flow of panic() is:

panic
  vpanic
    panicsys
      ...
      flush_windows()       ;; (on SPARC, write register stack to memory stack
      panic_trigger(&panic_quiesce)  ;; is first time panic has been called ?
      if is-first-panic
        switch thread's stack pointer, %sp, to panic_stack
      save registers to memory stack
      panicsys
        ...
        if (on_panic_stack)     ;; eg, "if is first-panic ..."
          ...
          panic_stopcpus()      ;; idle the current thread on (n-1) cpu's
          ...
          save lots of values in panic_xxx variables   ;; eg, panic_thread
          ...
        if panic_trigger(&panic_sync)    ;; have we not sync'd filesystems ?
          set cpu's interrupt level to ten  ;; CLOCK_LEVEL
          do_polled_io = 1;                 ;; force drivers to use polling vs interrupts
          vfs_syncall()

        if panic_trigger(&panic_dump)   ;; have we not written core file ? 
          set cpu's interrupt level to ten
          do_polled_io = 1;     ;; us polling vs interrupts  
          dumpsys()
        do a reboot    ;;  mdboot(A_REBOOT, ...)
The call-sites to the three triggers are shown in bold characters. As can now be seen, the first panic triggers panic_quiesce and panic_sync, and is in its call to vfs_syncall (to sync all the filesystems). When this sync'ing causes a panic, this second panic does not (re-)trigger either panic_quiesce or panic_sync, thus skipping much of panic's control flow. However, the second panic does trigger panic_dump, causing a call to dumpsys() to create the core file (using polled-IO, vs. interrupts).

"OK, this all 'sounds nice', but how do we know that this is what happened ?"

The trick lies in knowing which stack pointer, %sp, to use when "dumping the stack".

The stack above that is shown in green is the stack at the time the current thread's division-by-zero trap forced control to flow to die()'s call to panic(). The thread that enters panic() (and carries out panic()'s actions) is the same kernel thread (0x3000372c680) that was handling the open() system call (see frame at bottom of stack).

We also know that this same thread was the thread that was executing in vfs_syncall() when it panic'd again, as the same thread address is shown for the second panic:


[...]
syncing file systems...           
 
panic[cpu17]/thread=3000372c680: <----- !!!
panic sync timeout                 

dumping to /dev/md/dsk/d1, offset 64422019072, content: kernel    (4)
We also know that panic() obviously tries hard to ensure that "the dump" shows the state as it existed for any first panic, which means anything in the green part of the stack obviously only represents the first panic.

The trick is to look in what turns out to be the most obvious place, the per-thread state (kthread_t) of the thread that caused the second panic. When a thread is idled, Solaris saves the thread's last PC and stack pointer in kthread_t.t_pcb:

# mdb ...
> 3000372c680::print kthread_t t_pcb
{
    t_pcb.val = [ 0x105bec4, 0x2a100beb2d1 ]
}
> 0x105bec4/ia
panicsys+0x48:  call      -0x1aa88      
panicsys+0x4c:  
If we dump the stack using this stack-pointer, we obtain a lot of insight:
> 3000372c680::print kthread_t t_pcb.val[1] | ::stack
vpanic+0xcc(11d8d20, 2a100bebe78, 22bc8af, 64, 1813c00, 0)
panic+0x1c(11d8d20, 18ab000, 0, 18ab000, 1, 11d8c00)
cyclic_fire+0x64(60003df4eb8, 14b5188803b50, 3b9ac800, 0, 60003dfcb08, 60003e6f6f8)
cbe_level14+0x38(0, 0, e, 300019aa000, 20080, 100b170)
current_thread+0x140(1, 6001708de98, 1, 2710, 10741b8, 183c800)
clnt_cots_kcallit+0x580(6001708de40, 64, 0, 6001708de40, 6001708de60, 6001708deb8)
rfscall+0x58c(60002d52380, 15, 18e9ab0, 3c, 1841dd0, 18371e0)
rfs3call+0x60(60002d52380, 15, 7afa466c, 1842098, 7afa4820, 1842000)
nfs3_commit+0xc0(3001052f500, 1, 30012db13d0, 60002c01ee8, 30016159a38, 300161599c8)
nfs3_dispose+0x2c0(3001052f500, 32300000, 8000, 0, 60002c01ee8, 30016159bd8)
page_release+0xf8(70007de1680, 1, 168, 1836bb8, 1836800, 18a7000)
pvn_getdirty+0x1fc(70007de1680, 400, 55, e1c0fa355, 0, 1)
pvn_vplist_dirty+0x324(0, 400, 0, 0, 10000, 3001052f500)
nfs_putpages+0xcc(3001052f500, 0, 0, 400, 60002c01ee8, 600076241c0)
nfs3_putpage+0xcc(3001052f500, 0, 0, 1, 60002c01ee8, 0)
rflush+0x1c4(30024040000, 60002c01ee8, 30014f97480, 180a9, 7048efec, 3)
nfs3_sync+0x2c(0, 7048f000, 60002c01ee8, 1, 0, 0)
vfs_sync+0x98(18ae560, 185d550, 185dad0, 0, 0, 1)
vfs_syncall+0x48(a, a, 18b3000, 11d8000, 0, 1)
panicsys+0x4e8(3000372c680, 60002c01ee8, 184b800, 1813c00, 1077800, 0)
vpanic+0xcc(10741b8, 2a101870fa8, 2000, 1, 8, 8)
panic+0x1c(10741b8, 28, 2a101871200, 124ed4c, 0, 1c)
die+0x9c(28, 2a101871200, 124ed4c, 0, 2a101870fe0, 10000)
trap+0x560(2a101871200, 58, 0, 0, 300019aa000, 3000372c680)
ktl0+0x48(4000, 40, 0, 0, 3002189493c, 0)
ssd_get_physical_geometry+0x204(3001eda2898, 2a10187140c, 0, 0, 0, 60003df3580)
ssd_resync_geom_caches+0xb4(60003df3580, ffffffff80000000, 200, 1, 3ec1, ff)
ssd_validate_geometry+0xb4(60003df3580, 1, 3000372c838, 1, 7, fc000092)
ssd_ready_and_valid+0x2cc(60003df3580, 3000372c680, 60003df36c0, 0, 2, 2)
ssdopen+0x260(8, 60003df3580, 3, 18cfb68, 2, 60003df3650)
spec_open+0x438(2a101871930, 1, 60002c01ee8, 18c88f0, 600138a0de8, 30010464c80)
fop_open+0x78(2a101871930, 2, 60002c01ee8, 1, 60016670f40, 1)
vn_openat+0x4d4(0, 99c, 0, 1, 0, 1)
copen+0x260(ffffffffffd19553, ffbff60c, 0, 3b99c, 0, 1)
syscall_trap32+0xcc(ffbff60c, 0, 3b99c, 303240ff, 80808080, 1010101)
The green part of stack is the stack we saw when we ran ::msgbuf, and it represents the canonical stack as was present when the first call to panic() was done. The red part of the stack represents where the first call to panic() is within its control-flow. We can see the call to vfs_syncall(), and we can see for this core file that NFS and RFS code are being used to write out pages to one of their filesystems.

We also see current_thread() being called, which is a well-known function within Solaris. It runs as part of executing a high-level interrupt (eg, an interrupt whole PIL (Processor Interrupt Level)) is greater than LOCK_LEVEL (which currently is ten). As an aside, a non-high-level-interrupt's handler will have intr_thread on the stack). current_thread() calls cbe_level14(), which is used in the Cyclics implementation; "level14" gives a clue that this is a handler for a high-level interrupt whose PIL is fourteen.

The blue part of the stack represents the second panic.

To prove that this stack trace generates the "panic sync timeout" message, we disassemble the code at "cyclic_fire+0x64", as this is the last stack frame before the second panic occurs:

> cyclic_fire+0x64
> cyclic_fire+0x64::dis
cyclic_fire+0x3c:               sllx      %g4, 3, %g3
cyclic_fire+0x40:               ldx       [%g3 + %i5], %i1
cyclic_fire+0x44:               cmp       %i1, %l2
cyclic_fire+0x48:               bg,pn     %xcc, +0x80   <cyclic_fire+0xc8>
cyclic_fire+0x4c:               clr       %i3
cyclic_fire+0x50:               sra       %g2, 0, %o1
cyclic_fire+0x54:               sethi     %hi(0x3b9ac800), %i2
cyclic_fire+0x58:               add       %i5, %g3, %l4
cyclic_fire+0x5c:               add       %i2, 0x200, %l3
cyclic_fire+0x60:               mov       %l1, %o0
cyclic_fire+0x64:               call      -0x114        <cyclic_expire>
cyclic_fire+0x68:               mov       %l4, %o2
[...]
We see a call to cyclic_expire(), and from the source code, we know its signature is
static void cyclic_expire(cyc_cpu_t *cpu, cyc_index_t ndx, cyclic_t *cyclic)
From the instruction at cyclic_fire+0x68, we see that register %o2 received its value from local register %l4. If we dump our blue/red/green stack using ::stackregs instead of ::stack, we'll see this stack frame for the call to cyclic_expire:
> 3000372c680::print kthread_t t_pcb.val[1] | ::stackregs
[...]
000002a100beb5f1 cyclic_fire+0x64(60003df4eb8, 14b5188803b50, 3b9ac800, 0, 60003dfcb08, 60003e6f6f8)
  %l0-%l3:      60003c8a810      60003dab8c0    14b5188803da8         3b9aca00
  %l4-%l7:      60003e6f770    14b51995f5d50                5                1
  cbe_level14+0x38:        call      +0xa8338      
[...]
Dumping the memory referenced by %l4 will show us the third argument to cyclic_expire, which is of type cyclic_t:
> 60003e6f770::print -t cyclic_t      
{
    hrtime_t cy_expire = 0x14b5188803b50
    hrtime_t cy_interval = 0x3b9aca00
    int (*)() cy_handler = deadman
    void *cy_arg = 0
    uint32_t cy_pend = 0
    uint16_t cy_flags = 0
    cyc_level_t cy_level = 0x2
}
This cyclic runs the function deadman(), which has a few purposes, one of which is to check if the sync that is done as part of panic() has timed-out. deadman() checks the panic_sync trigger to see if a sync is in progress. If a time-out has occurred, we can see that line 1833 (below) calls panic() with our "curious, single line":
  1794  static void
  1795  deadman(void)
  1796  {
  1797          if (panicstr) {
[...]
  1819
  1820                  /*
  1821                   * If we are generating a crash dump or syncing filesystems and
  1822                   * the corresponding timer is set, decrement it and re-enter
  1823                   * the panic code to abort it and advance to the next state.
  1824                   * The panic states and triggers are explained in panic.c.
  1825                   */
  1826                  if (panic_dump) {
[...]
  1831                  } else if (panic_sync) {
  1832                          if (sync_timeleft && (--sync_timeleft == 0)) {
  1833                                  panic("panic sync timeout");
  1834                                  /*NOTREACHED*/
  1835                          }
  1836                  }
[...]
So, the Big Lesson here is that every line panic() does (and does NOT) print could be important in understanding what really occurred. Read every line of panic's output !!! =:-)

Monday Jun 25, 2007

Who needs a compiler ?

Yes, compilers are useful tools if one is going to run the created executable MANY times, but if the executable is only run a FEW times, then Interpretation is the way to go... or... if you were in the situation I was in, then binary rewriting is the best of both worlds.

I wanted to try a fix to libpicldevtree.so, but I did not have a Solaris 9 development environment at my disposal, and I did not want to wait the several hours it takes to download and build the closure of the files needed to build my lib.

My modification was to the picld binary, which runs as a daemon. I connected mdb to it and disassembled the function-of-interest. Specifically, the instruction at offset 0x28 was the one that needed to be modified:

# ps -ef | grep pi[cl]
    root   341     1  0 16:15:16 ?        0:01 /usr/lib/picl/picld
# mdb -p 341
Loading modules: [ ld.so.1 libthread.so.1 libc.so.1 libnvpair.so.1 ]
> libdevinfo_init::dis -n 20
libpicldevtree.so.1`libdevinfo_init:    save      %sp, -0x70, %sp
libpicldevtree.so.1`libdevinfo_init+4:  call      +8            <libpicldevtree.so.1`libdevinfo_init+0xc>
libpicldevtree.so.1`libdevinfo_init+8:  sethi     %hi(0x13800), %o1
libpicldevtree.so.1`libdevinfo_init+0xc:mov       %i0, %l3
libpicldevtree.so.1`libdevinfo_init+0x10:       add       %o1, 0x218, %o1
libpicldevtree.so.1`libdevinfo_init+0x14:       mov       %i1, %l2
libpicldevtree.so.1`libdevinfo_init+0x18:       add       %o1, %o7, %l1
libpicldevtree.so.1`libdevinfo_init+0x1c:       sethi     %hi(0xdc00), %o1
libpicldevtree.so.1`libdevinfo_init+0x20:       ld        [%l1 + 0x16c], %o0
libpicldevtree.so.1`libdevinfo_init+0x24:       call      +0x140c8      <PLT=libdevinfo.so.1`di_init>
libpicldevtree.so.1`libdevinfo_init+0x28:       add       %o1, 0x307, %o1
libpicldevtree.so.1`libdevinfo_init+0x2c:       orcc      %g0, %o0, %l0
I wanted to change the instruction so that 0x327 was added to %o1, not 0x307. This can easily be done to the executing process by using these mdb commands:
> libdevinfo_init+0x28/i
libpicldevtree.so.1`libdevinfo_init+0x28:       add       %o1, 0x307, %o1
> libdevinfo_init+0x28/X
libpicldevtree.so.1`libdevinfo_init+0x28:       92026307
> libdevinfo_init+0x28/W92026327
libpicldevtree.so.1`libdevinfo_init+0x28:       0x92026307      =       0x92026327
> libdevinfo_init+0x28/i
libpicldevtree.so.1`libdevinfo_init+0x28:       add       %o1, 0x327, %o1
However, picld is executed during init(1M) processing, and so the modification must be done to picld's binary file, so that the modification would be available on each of the several reboots that I was doing. (I was measuring the modification's effect on boot time, and therefore my needs were biased towards one who selects Interpretation over Compilation: e.g., I only need to run my binary several times; I was experimenting; I was not yet "married" to my modification).

As a result, I used the wonderful utility dd(1M) to modify libpicldevtree.so so that I replaced the original ADD instruction with the ADD instruction that I needed. The following script does the trick:

# cat a.sh
#!/usr/bin/ksh

in=libpicldevtree.so.1
out=${in}.new

# copy the first 4483. four-byte "blocks" from the input-file to the output-file.
#  note that the blocks are numbered from 0 to 4482.
#
dd if=$in of=$out bs=4 count=4483

# write the 32 bits of our new ADD instruction to the output-file at block 4483.:
#
echo "\0222\002\0143\047" | dd of=$out bs=4 seek=4483 count=1

# skip over the four-byte blocks (0-4483.) in both the input-file and output-file
#  and then commence copying each four-byte block from input-file to output-file until
#  end-of-file is reached.
dd if=$in of=$out bs=4 skip=4484 oseek=4484
The only remaining issue is: "where did all those magic numbers come from ?"

We begin with disassembling the libdevinfo_init function using the dis(1) command:

# dis -F libdevinfo_init libpicldevtree.so.1
                ****   DISASSEMBLER  ****

disassembly for libpicldevtree.so.1

section .text
libdevinfo_init()
        45e4:  9d e3 bf 90         save         %sp, -0x70, %sp
        45e8:  40 00 00 02         call         0x45f0
        45ec:  13 00 00 4e         sethi        %hi(0x13800), %o1
        45f0:  a6 10 00 18         mov          %i0, %l3
        45f4:  92 02 62 18         add          %o1, 0x218, %o1
        45f8:  a4 10 00 19         mov          %i1, %l2
        45fc:  a2 02 40 0f         add          %o1, %o7, %l1
        4600:  13 00 00 37         sethi        %hi(0xdc00), %o1
        4604:  d0 04 61 6c         ld           [%l1 + 0x16c], %o0
        4608:  40 00 50 32         call         0x186d0
        460c:  92 02 63 07         add          %o1, 0x307, %o1
        4610:  a0 90 00 08         orcc         %g0, %o0, %l0
We can see our ADD instruction in bold at byte offset 0x460c (which is relative to the start of the binary file). Our goal is, therefore, to copy bytes 0 through (0x460c - 1), replace the four bytes at 0x460c, then copy the remaining bytes starting at (0x460c + 4) until the end-of-file.

Even though 0x460c is a hexidecimal number in units of bytes, it will be easier to think of our problem in units of 32-bit words, as this is the size of the item we are re-writing (an instruction, which is four bytes on this SPARC binary). As a result, the following command computes how many four-byte "blocks" reside before byte offset 0x460c:

# echo "460c%4=D" | mdb
                4483
This command divides the hexidecimal number 0x460c by the number of bytes in a four-byte block and then translates the value into decimal, as that is the radix that dd uses. The "prolific" use of "4483" in the above script now becomes evident: we copy blocks 0 through 4482, re-write block 4483, then copy blocks 4484 until the end-of-file.

The only "incantation" left to resolve is this line in the script:

# write the 32 bits of our new ADD instruction to the output-file at block 4483.:
#
echo "\0222\002\0143\047" | dd of=$out bs=4 seek=4483 count=1
This line writes an octal representation of our new ADD instruction's hexidecimal representation (0x92026327) to stdin. dd uses this as the value of the input-file and then writes this four-byte block to the output-file. The seek argument tells dd to move the current file pointer (in the output-file) to block index 4483; the writing commences there.

0x92026327 was shown above to be the encoding of the ADD instruction that we want. Since echo only seems to be able to write the binary representation of a single octal digit, we have to compute the octal representation of each of the four, eight-bit bytes of 0x92026327. Using mdb, one gets:

sh> mdb
> 92=O
                0222
> 2=O
                02
> 63=O
                0143
> 27=O
                047
We can check our work by issuing this command and seeing the bolded digits, which represent the instruction that we modified:
# dd if=libpicldevtree.so.1.new bs=4 skip=4483 | od -x |more
0000000 9202 6327 a090 0008 1280 0004 0100 0000
0000020 81c7 e008 91e8 2001 4000 502e 0100 0000
0000040 e204 6024 9290 0008 1280 0004 d024 6000
0000060 81c7 e008 91e8 2001 9010 0018 9210 0019
[...]
However, the greatest satisfaction would have been if the file's checksum would have been identical, yet the number-of-blocks and number-of-bytes are identical:
# sum libpicldevtree.so.1*
10840 162 libpicldevtree.so.1.new
10808 162 libpicldevtree.so.1.orig
# ls -l libpicldevtree.so.1*
-rw-r--r--   1 root     other      82692 Jun 25 15:46 libpicldevtree.so.1.new
-rwxr-xr-x   1 root     sys        82692 Jul  4  2005 libpicldevtree.so.1.orig
In the end, though, after manually stopping picld, installing my binary, and re-starting picld, the output of prtdiag was correct, which meant that picld had successfully loaded. Also, /var/adm/messages did not show any errors).

As a result, a quick script to perform some binary re-writing greatly improved the turnaround time regarding running my experiment. Have fun (and danger!) with your own binary-rewriting. One last tip: the binary encoding for a SPARC NOP instruction is 0x100.0000. ...or, in our "pig-octal", it is:

echo "\01\00\00\00"

Calendar

Search my blog

Links

Navigation