Chris Quenelle's Weblog
Thoughts on developer tools.

All | Dbx | Development Tools | Life in General | OpenSolaris | plus | Software Philosophy | Sun Studio

fav comics

« Previous day (Nov 27, 2005) | Main | Next day (Nov 29, 2005) »
20051128 Monday November 28, 2005

Two bad Solaris bugs that affect dbx users

Updated information about both bugs.  See below:


There are two bugs that popped up in Solaris 10 since it went to FCS. One cases dbx to hang, and the other causes dbx to crash. The hang bug is 6329593 (pr_wait_die() can hang while waiting for SIGKILL to be processed) and the crash bug is 6283570 (misaligned ELF64 section heads). You can find out information about these bugs on the opensolaris.org web site (under bug database).

The hang bug is described in more detail on the What's New page for Sun Studio. It doesn't have any workaround that I know of, other than get the bug fixed. The bug is supposed to be fixed in Nevada build 25. I don't know what that translates to for OpenSolaris releases.

The crash bug will have a workaround in dbx available in a dbx patch coming out for Sun Studio 11, watch this space for availability, or check the sunsolve web site for Sun Studio 11 patches. I wrote a perl script to detect if any of your sparcv9 libraries suffer from this problem. The bad section alignment shows up on both sparc and x86 64-bit libraries, but it only causes dbx to crash on sparc machines.

#!/usr/bin/perl

# This script looks for sparcv9 libraries that will make
# dbx crash.  They result from solaris bug:
#   6283570 misaligned ELF64 section heads
#
# This script only looks in /usr and /usr/lib. But libraries
# in other directories might also suffer from the same bug.

use File::Find;
sub wanted {
  return unless -x && -f;
  return unless /.*\.so\.[0-9]$/;
  return unless `/bin/file $_ | grep 64-bit`;
  $out = `elfdump -e $_ | grep e_shoff`;
  $out =~ m/e_shoff:\s+(0x[0-9a-f]+)\s/;
  if ($1 =~ m/[4c]$/) {
     print "bad alignment in file: $File::Find::name\n";
  }
}

print "Looking for bad ELF section header table alignment in 64-bit files\n";
find(\&wanted, ( "/lib", "/usr/lib" ));

Late breaking update: I've figured out how to use the mediacast server, and I built a temporary, hacked, unsupported, (well you get the idea) dbx binary that doesn't fall over dead when it sees a misaligned elf section header. If you are running into this problem, you can download the bootleg binary and try it out. The usual caveats apply. I wouldn't recommend pasting it on top of the real dbx in a SS11 install directory, unless you have to. If you do have to, then save the original dbx binary and put it back before you apply the next Sun Studio patch. Okay, I'm done. That satisfies my "common sense" paranoia coefficient. You can find the binary here: bootleg dbx
(Don't use this link, get the patches below!)

Note:

Addendum as of Mar 14th, 2006: The patches for dbx to work around the crash bug are now available for Sun Studio 10 and Sun Studio 11, for both SPARC and x86 platforms.

Note:

Addendum about hang bug.  Dave Ford wrote up a good summary, and I'll  repost the information here:

Description

There is a kernel bug for Solaris 10 that causes dbx to hang immediately after loading program information for the program the user is debugging. The bug initially was found in build 18 of S10U1, but was also released in kernel patches for SPARC and x86.

Response.

After we detected this bug in build 18 of S10U1, we ensured that the bug was fixed by FCS, so that dbx would work with S10U1.

For users who applied bad patches to S10 and are experiencing this problem, use at least:

Note that 118844-27 requires several other patches:

Workaround

When dbx hangs, you can type control-c twice, or you can use the prun command on the dbx process ID.

Note:

The hang bug will show up as a side effect of the fix for Solaris bug 6272865 (race condition ...) and the hang bug itself is fixed by 6329593.  So if your version of Solaris has patches that "fix" 6272865 but not the fix for 6329593, then you need to get some more patches.  On Solaris 9, the patch that fixes the first bug also includes the fix for the regression that it causes.  So you shouldn't see the bug happen on Solaris 9.  For Solaris 9, the patches in question are: 120884-02 (for x86) or 117125-03 (for sparc).

 

Note:

The Soalris 10 patches needed to fix the crash bug (for older versions of dbx) are:


Posted by Chris Quenelle ( Nov 28 2005, 04:15:55 PM PST ) - Permalink - Comments [4] -

Older blog entries:

mug shot Chris Quenelle is a tools developer at Sun Microsystems. He's worked on performance and debugging tools at Sun for more than 10 years. He reads comic books and science fiction, and has more tivos than he can keep track of.

Calendar

RSS Feeds

Search

Links

Navigation

Referers