Walter Lee

why no core dump in my Sun SSL-enabled web server 6.1/7.0 after a crash ?

Monday Jun 02, 2008

Sometimes, we like to capture the core dump file for Sun web server 6.1 or 7.0 after a crash for analysis. However, even after we did the needed OS settings, e.g. ulimit -c unlimited , etc.  we still cannot see the core dump file after a crash.

This article assumes you have checked OS settings and using SSL in the web instance, e.g.

apple:/export/home/iws6.1sp9/https-apple.asia.sun.com> ulimit -a
core file size (blocks)     unlimited
data seg size (kbytes)      unlimited
file size (blocks)          unlimited
open files                  256
pipe size (512 bytes)       10
stack size (kbytes)         8192
cpu time (seconds)          unlimited
max user processes          29995
virtual memory (kbytes)     unlimited

(O.S. setting above looks ok to dump core)

It is useful to check a crash on a small running program, e.g.

sleep 60000 &
(then you can see the pid of the above background job)

then you can try to get a core dump by

kill -SEGV <pid of above sleep background job>

If it dumps out the core for above sleep job ok, then means your OS setting is ok.

However, you still cannot see the core out of our Sun web server 6.1/7.0 with SSL enabled, then read on.

If this is SSL enabled web instance, then the other possible reason is due to our security design to protect the core image (by default , not to dump out cores when it is SSL enabled).

To enable core dump in this case, you need to set SSL_DUMP=1,

e.g.

before I added this SSL_DUMP=1 into start script, I used plimit to check its process limits.

apple:/export/home/iws6.1sp9/https-apple.asia.sun.com> ptree 14501
14501 ./webservd-wdog -r /export/home/iws6.1sp9 -d /export/home/iws6.1sp9/https
  14502 webservd -r /export/home/iws6.1sp9 -d /export/home/iws6.1sp9/https-appl
    14503 webservd -r /export/home/iws6.1sp9 -d /export/home/iws6.1sp9/https-ap

apple:/export/home/iws6.1sp9/https-apple.asia.sun.com> plimit 14503
14503:  webservd -r /export/home/iws6.1sp9 -d /export/home/iws6.1sp9/https-app
   resource              current         maximum
  time(seconds)         unlimited       unlimited
  file(blocks)          unlimited       unlimited
  data(kbytes)          unlimited       unlimited
  stack(kbytes)         8192            unlimited
  coredump(blocks)      0               0  *** see no core allowed
  nofiles(descriptors)  1024            1024
  vmemory(kbytes)       unlimited       unlimited

(there will be no core from above !)
(plimit is a good tool to see what is allowed in this process.)

Then please try to set "SSL_DUMP=1; export SSL_DUMP" into the start script of the web instance,

e.g.

#
# Copyright (c) 2003 Sun Microsystems, Inc.  All rights reserved.
# Use is subject to license terms.
#

SSL_DUMP=1
export SSL_DUMP


# Detect the Path and OS.
default_path="/bin:/usr/bin"
if [ -z "$PATH" ]
.......

then restart,

apple:/export/home/iws6.1sp9/https-apple.asia.sun.com> cat logs/pid
14576
apple:/export/home/iws6.1sp9/https-apple.asia.sun.com> ptree 14576
14576 ./webservd-wdog -r /export/home/iws6.1sp9 -d /export/home/iws6.1sp9/https
  14577 webservd -r /export/home/iws6.1sp9 -d /export/home/iws6.1sp9/https-appl
    14578 webservd -r /export/home/iws6.1sp9 -d /export/home/iws6.1sp9/https-ap

to see the process coredump filesize allowed:

apple:/export/home/iws6.1sp9/https-apple.asia.sun.com> plimit 14578
14578:  webservd -r /export/home/iws6.1sp9 -d /export/home/iws6.1sp9/https-app
   resource              current         maximum
  time(seconds)         unlimited       unlimited
  file(blocks)          unlimited       unlimited
  data(kbytes)          unlimited       unlimited
  stack(kbytes)         8192            unlimited
  coredump(blocks)      unlimited       unlimited  *** now it is allowed to dump core after crash
  nofiles(descriptors)  1024            1024
  vmemory(kbytes)       unlimited       unlimited

then a test,

apple:/export/home/iws6.1sp9/https-apple.asia.sun.com/config> kill -SEGV 14578
apple:/export/home/iws6.1sp9/https-apple.asia.sun.com/config> ls -lrt
total 94166

-rw-------   1 root     other    48130312 May 28 10:50 core.webservd.14578

see it ok now.

You can also use plimit to check out other process limits for other troubleshooting, e.g.
nofiles(descriptors)  not enough, etc. I think you can also set up new process limits with plimit .
Please refer to plimit man page.

Walter

[0] Comments
Like this post? del.icio.us | furl | slashdot | technorati | digg