Basant Kukreja
Auto tuning of file descriptors in Sun Web Server
Web Server 7.0 uses a algorithm to divide file descriptors to various needs on unix systems.Here is the list of items which require file descriptors :
1. Web Applications. Web Server leaves 80% file descriptors for web applications.
2. Daemon session threads connections. For each daemon session thread Web
Server 7.0 expects a average of 4 file descriptors. One client socket
connection, requested file, an included file and a backend connection.
3. jdbc pools
4. access log counts for all virtual servers
5. Listener counts
6. file descriptors for file cache
7. keep alive file descriptors.
8. thread pool queue size
In above list (6), (7) and (8) are auto tuned that means if user specify them in server.xml, it uses those, if user doesn't specify then it divides the remaining or available file descriptors to (6), (7) and (8). Available file descriptors = Total descriptors - item (1) to (5)
If available file descriptors are more than 1024 then Web Server uses 1:16:16 ratio for item (6),(7),(8). If available file descriptors are less than 1024 then it uses 1:16:8 ratio. File cache is given the least importance and keep alive is given the highest importance. It also round of the numbers to the power of 2.
Note that Web Server doesn't uses above algorithm for Windows systems. On Windows it uses 64K descriptors for keep alive, 16K descriptors for thread pool queue size.
Posted at 06:22PM Aug 31, 2009 by Basant Kukreja in Sun Web Server | Comments[0]
Web Server top "wstop" video
Posted at 02:30PM Jul 09, 2009 by Basant Kukreja in Sun Web Server | Comments[0]
Using large pages support in Sun Web Server
Solaris has a multiple page size support which has been explained here:http://www.solarisinternals.com/wiki/index.php/Multiple_Page_Size_Support
I was running specweb ecommerce php benchmark and noticed that 4-8% of gains can be obtained on CMT systems by using 256M pages. When process address space getting bigger and bigger (specially for 64 bit), number of tlb entries grow significantly. I observed that in ecommerce specweb php runs, around 7% of CPU time was spent to handle tlb misses (trapstat output). When I used 256M pages on CMT systems, I found out that CPU time spent because of tlb misses reduced to 2%. One of the interesting thing I observed that Sun Web Server tends to slow down with time. In my setting, I had set the file cache heap size to 2.5GB. I was wondering why WebServer is slowing down with time. What was really happening, that when Web Server starts, it initally only allocate 256 MB so used process address space was smaller and hence less number of tlb misses. After some time as file cache starts expanding, more and more tlb misses starts happening. So server sounds sluggish after 20 minutes or so.
So if you are running Sun Web Server on CMT systems, you should try 256M page support.
To enable mpss support in Sun Web Server (64 bit) put the following in
bin/startserv:
LD_PRELOAD_64=/usr/lib/sparcv9/mpss.so.1; export LD_PRELOAD_64
MPSSHEAP=256M; export MPSSHEAP
For 32 bit Web Server :
LD_PRELOAD=/usr/lib/mpss.so.1; export LD_PRELOAD
MPSSHEAP=256M; export MPSSHEAP
"pagesizes -a" can be used to determine the max page sizes supported on your system. For systems which have only 4M pages, following can be used :
LD_PRELOAD_64=/usr/lib/64/mpss.so.1; export LD_PRELOAD_64
MPSSHEAP=4M; export MPSSHEAP
pmap -sx
# pmap -sx 1938
1938: webservd -d /opt/SUNWwbsvr/https-nsapiphp/config -r /opt/SUNWwbsvr -t
Address Kbytes RSS Anon Locked Pgsz Mode Mapped File
...
0000000100B00000 6080 6080 - - - rwx-- [ heap ]
00000001010F0000 244800 244800 244224 - 64K rwx-- [ heap ]
0000000110000000 3932160 3932160 3932160 - 256M rwx-- [ heap ]
0000000200000000 262144 262144 262144 - 256M rwx-- [ heap ]
FFFFFFFF68000000 4096 4096 4096 - 64K rwx-- [ anon ]
...
Note that few page sizes are mapped with 256M pages while others are mapped
64K pages.
mpss setting has been used in specweb jsp publications.
Posted at 06:13PM May 20, 2009 by Basant Kukreja in Sun Web Server | Comments[2]
Caching static content in reverse proxy in Sun Java System Web Server 7.0
Sun Java System Web Server 6.1 has a function named "check-passsthrough" which will look the resource in reverse proxy instance's docroot first. I was wondering how to do so in 7.0. I didn't find any documentation for that. The advantage of serving static content from reverse proxy docroot is that it is very efficient. Sun Web Server has a excellent caching capabilities which often backend servers don't have.Basically user need to do the following test :
* If file exists on local document root then serve the file else go to back-end instance.
Here is how we can do it in WS 7.0.
1) Define docroot variable for each virtual server :
<virtual-server>
...
<variable>
<name>docroot</name>
<value>/mydocroot/https-rpp/docs</value>
</variable>
</virtual-server>
2) Modify the obj.conf to look for local resources before redirecting :
<If not -f "$docroot/$path">
NameTrans fn="map" from="/" name="reverse-proxy-/" to="http:/"
</If>
Web Server 7.0, file cache is enabled by default. So all static content will be
cached by default. set-file-cache-prop wadm command can be used to tune the
file cache content.
Posted at 07:59PM Aug 12, 2008 by Basant Kukreja in Sun Web Server | Comments[0]
Hacking Sun Java System Web Server pblocks using dtrace.
# dtrace -qs watchpblocks.d 11463
Req->vars : ntrans-base="/var/www" path="/var/www/" required-rights="list" content-length="1912"
Req->reqpb : clf-request="GET / HTTP/1.1" method="GET" protocol="HTTP/1.1" uri="/"
Req->headers : user-agent="curl/7.16.1 (sparc-sun-solaris2.8) libcurl/7.16.1
OpenSSL/0.9.8d zlib/1.2.3 libidn/0.5.19" host="chilidev4.red.iplanet.com"
accept="*/*" content-type="text/html" status="200 OK"
transfer-encoding="chunked" content-length="2003"
11463 is the child process id of my test Web Server.
Here is the output of wstop2.pl (similar to my previous version wstop.pl)
# perl wstop2.pl -d 5 11463 12:49:55 Requests: 3 ( 0/sec) Bytes: 5736( 1147/sec) Requests: GETs: 3 POSTs: 0 HEADs: 0 TRACE: 0 Responses: 1xx:0 2xx:3 3xx:0 4xx:0 5xx:0 Requests Reqests/sec Bytes Sent URI 3 0 5736 / ^CSo how does it work. If you look at the dtrace script, you will find :
pid$1::flex_log:entry
{
...
}
So at the end of each request webserver calls flex_log method to log the
request, at that time above dtrace probe is fired. As with most NSAPI functions
Request and Session structure pointers are passed as an argument. dtrace script
parses the structure and try to decode the pblocks. For this technique to
work users need to have accesslog enabled which is almost always enabled.
The interesting aspect is that we don't need to do any configuration changes.
Caution : Since dtrace doesn't allow "for" loop or if/else logic so the pblock hash decoding is a complete hack. It may not work on all scenarios. Also on busy systems, lots of dtrace probes might be missed using this method. Since we are copying data from kernel land to user land several times.
Previous dtrace version, was much more lightweight as far as dtrace work is concerned. If future version of dtrace provides if/else and loop constuct inside dtrace script then the script could be improved and make it more reliable.
Also, right now these scripts will only work for 32 bit web servers. Here are the scripts :
watchpblocks.d
wstop2.pl
wbdtrace.pm
Posted at 02:18PM Aug 05, 2008 by Basant Kukreja in Sun Web Server | Comments[1]
Sun Java System Web Server and dtrace.
In this blog, I will show how to write a small NSAPI plug-in and load the plug-in in to the Web Server and insert a log level hook (AddLog). This NSAPI plug-in will contain a dtrace probe. When plug-in is loaded in to the Web Server, Web Server will register the dtrace probes with kernel. During execution of HTTP request, Web Server will invoke AddLog (at the end of the request) which will fire the dtrace probe. Dtrace probe will provide information about the request and response. Using this probe, it is possible to write "top" like utility for Web Server.
Here is my NSAPI plug-in's code wbdtrace.c
/*Here is the dtrace provider's code sjsws.d
* Copyright (c) 2005, 2008 Sun Microsystems, Inc. All Rights Reserved.
* Use is subject to license terms.
*/
#include "nsapi.h"
#include <sys/sdt.h>
int NSAPI_PUBLIC dtrace_log_request(pblock *pb, Session *sn, Request *rq)
{
DTRACE_PROBE4(sjsws, log__request,
(rq && rq->reqpb) ? pblock_pblock2str((pblock*) rq->reqpb, NULL) : NULL,
(rq && rq->headers) ? pblock_pblock2str((pblock*) rq->headers, NULL) : NULL,
(rq && rq->srvhdrs) ? pblock_pblock2str((pblock*) rq->srvhdrs, NULL) : NULL,
(rq && rq->vars) ? pblock_pblock2str((pblock*) rq->vars, NULL) : NULL,
);
return REQ_NOACTION;
}
/*We now need to compile this plug-in and load into the Web Server. Here is how I compiled the plug-in :
* Copyright (c) 2005, 2008 Sun Microsystems, Inc. All Rights Reserved.
* Use is subject to license terms.
*/
provider sjsws {
probe log__request(string, string, string, string);
};
$ cc -c -KPIC -mt -DXP_UNIX -I/opt2/installed/ws70u1/include wbdtrace.cNow I copied the plug-in to Web Server's plug-in directory :
$ dtrace -G -s sjsws.d wbdtrace.o
$ cc -KPIC -mt -G -L/opt2/installed/ws70u1/lib -lCrun -lCstd -ldl -lposix4 -lns-httpd40 -o libwbdtrace.so wbdtrace.o sjsws.o
$ cp libwbdtrace.so /opt/ws70u1/plug-ins/Now I loaded the plug-in into the Web Server and inserted a AddLog statement into obj.conf file. Here are the configuration changes which are required for this plug-in to work.
magnus.conf :obj.conf :
Init fn="load-modules" shlib="libwbdtrace.so" shlib_flags="(global|now)" funcs="dtrace_log_request"
<Object name="default"> ... AddLog fn="dtrace_log_request" </Object>
After making the configuration changes I restarted the Web Server :
$ bin/stopserv; bin/startservWeb Server is now started and plug-in is loaded. This dtrace NSAPI plug-in will register the dtrace probe (sjsws provider) with kernel. Now let us see how Web Server dtrace probe looks like using "dtrace -l":
# dtrace -l | grep sjswsNote that both primordial (53310) and child process (53309) have registered the probe with Solaris kernel. But since only child process will serve the request so only child process will fire the dtrace probes. Now let us write a small dtrace script (viewreq.d) which just print a message whenever our probe is fired.
53309 sjsws6523 libwbdtrace.so dtrace_log_request log-request
53310 sjsws6524 libwbdtrace.so dtrace_log_request log-request
sjsws*::dtrace_log_request:log-request
{
@[probefunc] = count();
printf("\narg0 = %x arg0= %s \n arg1=%s \nargs2=%s arg3 = %s\n",
arg0,
copyinstr(arg0),
copyinstr(arg1),
copyinstr(arg2),
copyinstr(arg3));
}
dtrace:::END
{
trace("End of dtrace\n");
}
Now I ran this script and sent a HTTP request "GET /" from browser.
# dtrace -s viewreq.dNow whenever message is logged, our dtrace probe is fired. Our dtrace script just printed the various pblocks.
dtrace: script 'viewreq.d' matched 3 probes
CPU ID FUNCTION:NAME 1 53310 dtrace_log_request:log-request arg0 = 7553c8 arg0= clf-request="GET / HTTP/1.1" ... ^C 0 2 :END End of dtrace
Based on the above dtrace NSAPI plug-in, it is possible to write monitoring code in scripting languages e.g perl. Here are some of the scripts :
countreq.pl
viewbrowser.pl
viewresponsecodes.pl
wbdtrace.pm
wstop.pl
These are perl scripts which encapsulates the dtrace script. Most of the text processing is written in perl. Dtrace probe provide the pblock content from where request and response information is obtained.
Here is the output of these scripts.
View various types of HTTP requests count with timestamp
# perl countreq.pl
Timestamp GETs POSTs HEADs TRACEs 12:41:13 0 0 0 0 12:41:18 5469 0 0 0 12:41:23 10000 0 0 0
View various types of HTTP requests count with client browsers
# perl viewbrowser.pl Browser GETs POSTs HEADs TRACEs Firefox 19 0 0 0 MSIE 0 0 0 0 Wget 0 0 0 0 Other 303 0 0 0
View HTTP requests' URIs and their response codes
# perl viewresponsecodes.pl Response Code Uri File Name 200 /index.html/index.html 304 /index.html /index.html 403 /img/gradation_header_L.gif /img/gradation_header_L.gif 403 /img/gradation_header_R.gif /img/gradation_header_R.gif ^C
View HTTP requests' URIs for a particular response code
# perl viewresponsecodes.pl -c 200 Response Code Uri File Name 200 /index.html/index.html ^C
A top like utility for Sun Web Server (similar to apachetop)
# perl wstop.pl -d 10 20:10:40 Requests: 4223 ( 422/sec) Bytes: 12930794(1293079/sec) Requests: GETs: 4223 POSTs: 0 HEADs: 0 TRACE: 0 Responses: 1xx:0 2xx:4025 3xx:68 4xx:130 5xx:0 Requests Reqests/sec Bytes Sent URI 601 60 7234838 /index.html 545 54 198925 /img/gradation_header_L.gif 498 49 3988482 /img/side_bg.gif 396 39 930204 /img/sun_logo.gif 359 35 56004 /img/gradation_header-btm.gif 342 34 2394 /cgi-bin/test.pl 250 25 10750 /img/a.gif 240 24 96720 /img/gradation_header_R.gif 206 20 52324 /img/footer_R.gif 191 19 58828 /img/gradation_header.gif 130 13 37960 /notfound.html 115 11 15410 /img/content_hline.gif 113 11 29041 /img/footer_L.gif 109 10 207754 /img/sjsws_title_text.gif 68 6 0 /yahoo 60 6 11160 /img/footer.gif ^CHere is the compiled binary for the plugin :
Solaris sparc
Solaris x86
Posted at 01:12PM Jun 16, 2008 by Basant Kukreja in Sun Web Server | Comments[4]
Reverse proxy and redirecting request.
syntax :
NameTrans fn="redirect" url-prefix="http://host:port/xyz"
It works fine in normal instances but it creates problem in reverse proxy
functionality. Reverse proxy have a thread named route_offline_thread which
uses "OPTIONS" HTTP method to find the health of the backend instance. The
result of the above is that OPTIONS request is also redirected and
route_offline_thread doesn't work properly. There is a very simple workaround
for this situation. Just do not redirect the OPTIONS request in RPP. Use the
following instead :
<If not $internal and not method = "OPTIONS">
NameTrans fn="redirect" url-prefix="http://host:port/xyz"
</If>
Posted at 07:51PM Jun 11, 2008 by Basant Kukreja in Sun Web Server | Comments[2]
Reverse proxy instance health check (route_offline_thread)
How Sun Web Server reverse proxy manages backend instances lifecycle.[Read More]Posted at 06:54PM Jun 11, 2008 by Basant Kukreja in Sun Web Server | Comments[0]
Connection pooling in Sun Java System Web Server
Connection pooling is one of the cool features of Sun Web Server. This feature has been there from a long time. When a request comes to Web Server, acceptor thread accept the requests and puts into the connection queue. One of the daemon session threads (also called worker threads) pulls the request from the connection queue and starts processing the request. After the request is served, if the client has requested the connection to be alive then connection will go to the keep alive pollmanager. Keep alive threads poll all keep alive connections, whenever a new request comes to the this connection then keep alive threads again puts the connection to the connection queue. If there is no further request to the same connection for some period (keep alive timeout) then keep alive threads close the connection.

The reason why Sun Web Server perform so well under high load conditions e.g in specweb is because of the above Conneciton pooling architecture. In Apache Web Server, prefork and worker mpm are most common mpms. One of the problem with these mpm is that connection is bound to a thread. So what this mean is that number of concurrent connection is typically the number of processing threads. If load increases beyond some point then connections will start timing out very soon. Event mpm tries to address this problem.
Tunables :
Here are some of the configuration parameters which affect the connection queue
directly.
(a) thread-pool --> queue-size
(b) http-listener->listen-queue-size.
(c) keepalive->timeout
(d) keepalive->pollinterval
listen-queue-size
As I wrote before, acceptor threads accept the connection and puts into the connection queue. The question is what really happen to new connections when OS is busy and OS has not yet scheduled the acceptor thread. OS kernel maintains the TCP connections on behalf of Web Server process. listen-queue-size is the number of connections kernel will accept before application accept the connection. If there is more than listen-queue-size connections before Web Server calls accept, then new connection will be rejected. This is not at all a common situation but could happen on very busy systems. Here is a small experiment to demostrate this situation:
Step 1 : I sent a stop signal to my child Web Server process so that acceptor thread
won't be able to accept any new connection.
Step 2 : I sent a 200 simultaneous request (using apache benchmark tool ab)
$ ab -c 200 -n 200 http://hostname/index.html
Step 3 : I ran the netstat -an command to see the connections
Here is the connection line looks.
192.18.120.211.3014 129.150.16.164.48150 5312 0 49437 0 ESTABLISHED
As expected, kernel will reject the new connections. Here is the ab output :
Benchmarking chilidev4.red.iplanet.com (be patient)
apr_poll: The timeout specified has expired (70007)
What happens if we disable the connection pool :
If we disable the thread-pool. In such cases, there is no connection queue, no daemon session threads. Acceptor threads themselves process the request. Here is the typical call stack will look like :
----------------- lwp# 12 / thread# 12 --------------------
fd5c1134 pollsys (fc8efc80, 1, fc8efc00, 0)
fd55d028 poll (fc8efc80, 1, 1388, 10624c00, 0, 0) + 7c
fe3d862c pt_poll_now (fc8efcec, 1, fc8efd1c, fc8efc80, 20, ffffffff) + 4c
fe3d9ed0 pt_Accept (247408, 1bd990, ffffffff, 15, 0, ffffffff) + cc
ff17db5c PRFileDesc*ListenSocket::accept(PRNetAddr&,const unsigned) (1bd708, 1bd990, ffffffff, 2060, 245a88, 1bd988) + c
ff16e67c int DaemonSession::GetConnection(unsigned) (2b7648, ffffffff, ff2ff800, ffffe800, ff26f000, ff2abc00) + 64
ff16edb0 void DaemonSession::run() (6c0008, 2b7648, 2b7668, 2b76f0, 2000, ffffffff) + 150
fef76d48 ThreadMain (2b7648, 2a6eb0, 0, fef89c00, ff16ec60, ff2abc64) + 1c
fe3de024 _pt_root (2a6eb0, fef76d2c, 400, fe3f6ba4, 0, fe3f891c) + d4
fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)
Few Call stacks :
Let us see the pstack information to see how it looks like. In my test configuration, I setup the min/max threads to 2 and disabled j2ee plugin. Then I dumped the stack using :
$ pstack <child_webservd_pid> | c++filt > pstack.txt
Let us see what are various threads doing :
1. Acceptor threads call stack :
Here is the call stack of acceptor thread.
----------------- lwp# 13 / thread# 13 --------------------
fd5c1134 pollsys (fc8bfc88, 1, fc8bfc08, 0)
fd55d028 poll (fc8bfc88, 1, 1388, 10624c00, 0, 0) + 7c
fe3d862c pt_poll_now (fc8bfcf4, 1, fc8bfd24, fc8bfc88, 20, ffffffff) + 4c
fe3d9ed0 pt_Accept (245368, 20ce50, ffffffff, 15, 1, ffffffff) + cc
ff17db5c PRFileDesc*ListenSocket::accept(PRNetAddr&,const unsigned) (1bd708, 20ce50, ffffffff, 3, 2ab988, 2ab988) + c
ff1782a4 void Acceptor::run() (12e488, 245908, 20ce48, 6, 3e8, 45) + 184
fef76d48 ThreadMain (12e488, 11d828, 0, fef89c00, ff178120, ff2ac444) + 1c
fe3de024 _pt_root (11d828, fef76d2c, 400, fe3f6ba4, 0, fe3f891c) + d4
fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)
2. Idle Daemon session thread's call stack :
Here is the stack trace of an idle deamon session thread :
----------------- lwp# 16 / thread# 16 --------------------
fd5c0408 lwp_park (0, 0, 0)
fd5ba49c cond_wait_queue (50a10, 2cfec8, 0, 0, 0, 0) + 28
fd5baa1c cond_wait (50a10, 2cfec8, 0, 1c, 0, fcd52d00) + 10
fd5baa58 pthread_cond_wait (50a10, 2cfec8, 1, fe3f8518, 5fc, 400) + 8
fe3d79e8 PR_WaitCondVar (50a08, ffffffff, 2a7700, 0, 2ab988, 0) + 64
ff17797c Connection*ConnectionQueue::GetReady(unsigned) (8bcc8, ffffffff, ffffffff, 8bcc8, 5fc, 2ab968) + c4
ff16e630 int DaemonSession::GetConnection(unsigned) (2b7648, ffffffff, ff2ff800, 0, ff26f000, ff2abc00) + 18
ff16edb0 void DaemonSession::run() (746008, 2b7648, 2b7668, 2b76f0, 2000, ffffffff) + 150
fef76d48 ThreadMain (2b7648, 2a7700, 0, fef89c00, ff16ec60, ff2abc64) + 1c
fe3de024 _pt_root (2a7700, fef76d2c, 400, fe3f6ba4, 1, fe3f891c) + d4
fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)
As connection queue is empty so this daemon session is waiting for request to arrive in connection queue.
3. Processing Daemon session thread's call stack :
Here is the another daemon session thread (which is processing the "/cgi-bin/test.pl" request) :
fd5c1248 read (1d, 758088, 2000)
ff167588 int ChildReader::read(void*,int) (3c9c4, 758088, 2000, 0, b71b00, 1) + 1c
ff0f2e30 INTnetbuf_next (758028, 1, 2000, 2001, 60, 758028) + 2c
ff13a364 int cgi_scan_headers(Session*,Request*,void*) (11ee70, 11eee8, 758028, 27c, ff29bbd8, 0) + 84
ff13abd4 int cgi_parse_output(void*,Session*,Request*) (758028, 11ee70, 11eee8, ff2a1c50, ff29bbd8, 6e) + 1c
ff13b8cc cgi_send (2c6dc8, 11ee70, 11eee8, 3c948, 1400, ff2a25cc) + 514
ff10df68 func_exec_str (2354f8, 2c6dc8, 0, fc348, 11eee8, 11ee70) + 1c0
ff10edc0 INTfunc_exec_directive (3d948, 2c6dc8, 11ee70, 11eee8, 280a28, 1) + 2a0
ff113b60 INTservact_service (0, 11ee70, 2c6dc8, 0, 11eee8, 27b470) + 374
ff11472c INTservact_handle_processed (0, 11eee8, 11eee8, 11ee70, 0, 2d6328) + 8c
ff172064 void HttpRequest::UnacceleratedRespond() (11edc8, 3, 1, ff26f400, 0, 20) + e34
ff170a24 int HttpRequest::HandleRequest(netbuf*,unsigned) (11edc8, 754008, 11edf8, 2000, 754068, 11edd0) + 7a8
ff16f10c void DaemonSession::run() (ffffffff, 280a08, 280a28, 280ab0, 1, ffffffff) + 4ac
fef76d48 ThreadMain (280a08, 266728, 0, fef89c00, ff16ec60, ff2abc64) + 1c
fe3de024 _pt_root (266728, fef76d2c, 400, fe3f6ba4, 1, fe3f891c) + d4
fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)
Note that the above thread is waiting for data from child cgi process (test.pl).
4. Keep alive thread's call stack :
Let us see the call stack of keep alive thread :
----------------- lwp# 9 / thread# 9 --------------------
fd5c0408 lwp_park (0, 0, 0)
fd5ba49c cond_wait_queue (2c6e98, 7092c8, 0, 0, 0, 0) + 28
fd5baa1c cond_wait (2c6e98, 7092c8, 0, 1c, 0, fcd51100) + 10
fd5baa58 pthread_cond_wait (2c6e98, 7092c8, 1, fe3f8518, 5fc, 400) + 8
fe3d79e8 PR_WaitCondVar (2c6e90, ffffffff, 13e790, 0, 0, 0) + 64
ff176024 void PollArray::GetPollArray(int*,void**) (25e4c8, fcc3fec0, fcc3fec4, 3, 0, ff2ac000) + 5c
ff176984 void KAPollThread::run() (15c5a8, 5, 4, ff2ac000, ff2ff8cc, fcc3fec4) + 6c
fef76d48 ThreadMain (15c5a8, 13e790, 0, fef89c00, ff176918, ff2ac3b8) + 1c
fe3de024 _pt_root (13e790, fef76d2c, 400, fe3f6ba4, 1, fe3f891c) + d4
fd5c0368 _lwp_start (0, 0, 0, 0, 0, 0)
Posted at 12:00PM Apr 28, 2008 by Basant Kukreja in Sun Web Server | Comments[2]
Sun Java System Web Server 7.0 monitoring - Part 1.
Web Server 7.0 has some interesting ways to monitor the server. In previous versions of the Web Server, we typically create a uri e.g /.perf to obtain perfdump output. When we wanted to get the perfdump output, we send a /.perf request. The problem with this approach is if Web Server is hanging because of some faulty application or in situations where Web Server is very busy processing the request then perfdump request also hung and we need to wait for /.perf request to get processed. In complicated situations, sometimes we were never able to obtain perfdump output. Web Server 7.0 comes to rescue in such situations.Let us do simple simple experiments to simulate a hanging Web Server. Let us first change the min/max daemon session threads and set those to 2 so that Web Server can only process 2 requests at a time. Here is thread-pool element of my server.xml ,
<thread-pool>
<min-threads>2</min-threads>
<max-threads>2</max-threads>
</thread-pool>
After changing the server.xml, let us restart the Web Server. Now let us create a small cgi script which takes more than 5 minutes to execute. This will help us simulate a hanging web server.
#!/usr/bin/perl
print "Content-Type: text/html\r\n";
print "\r\n";
sleep(300);
print "Hello\r\n";
Now let use send two test.pl requests in parallel from two browser windows or by some command line tools e.g curl
$ curl --dump-header - -o - http://localhost:8080/cgi-bin/test.pl &
$ curl --dump-header - -o - http://localhost:8080/cgi-bin/test.pl &
The above two requests will be served after 5 minutes. Now let us try to see
if we can obtain the perfdump.
$ curl --dump-header - -o - http://localhost:8080/.perf
<hung>
Ctrl-Z
[3]+ Stopped curl --dump-header - -o - http://localhost:8080/.perf
$ bg
[3]+ curl --dump-header - -o - http://localhost:8080/.perf &
Our perfdump (/.perf) request hung. Now let us start the admin server and try to get perfdump of instance from admin server.
$ cd admin-server
$ bin/startserv
...
$ ./bin/wadm get-perfdump --config=myhostname --node=localhost
...
Sessions:
---------------------------------------------------------------------------------------
Process Status Client Age VS Method URI Function
4952 response 192.168.1.115 110 myhostname GET /cgi-bin/test.pl send-cgi
4952 response 192.168.1.115 109 myhostname GET /cgi-bin/test.pl send-cgi
Voila! it worked. It showed me that there are two test.pl running and they are running since 110 seconds.
So how does it work. Why did /.perf request hung and why did get-perfdump wadm command work. Here is the explanation.
Since we created 2 daemon session threads and then send 2 test.pl requests. All daemon session threads got busy processing requests. When we send the /.perf request. Since there was no daemon session threads available so The request went into the connection queue and waiting in connection queue to be served.
This information is also available in perfdump. Here is the snippet.
ConnectionQueue:
-----------------------------------------
Current/Peak/Limit Queue Length 1/1/1226
Total Connections Queued 17
Note that there is 1 connection in the connection queue and that should be our "/.perf" request.
So it is now clear that if all daemon session threads are busy then web server won't be able to respond to monitoring requests based on uri. That was a big bottleneck in Web Server 6.1 and before. In situations when Web Server hung, we were not able to collect the perfdump output and it was not easy to find out what requests Web Server was processing at hanging stage.
Now the next question is how does get-perfdump wadm command works. Answer to this question lies into the monitoring architecture. There is a communiation channel from admin server to Web Server instance. All monitoring data is passed to admin server by that channel. Also Web Server worker process has a dedicated thread for serving monitoring requests. Even though all daemon session threads are busy, this thread can generate the performance data and pass it to admin server. This is exactly what happened when we send the get-perfdump wadm command.
There are several interesting things to note in the above exercise. Admin server was not running at the time when Web Server hung. We simply started the admin server and ran get-perfdump command. During admin server startup the communication channel between admin server and instance is set up. This communication channel setup is independent of startup order of admin server or Web Server instance. They could be started in any order. Even when Web Server was busy processing the requests, this channel was successfully initialized.
There are other monitoring means e.g stats-xml, wadm CLI commands and administration GUI. They all use the same channel to access monitoring data from instance.
Posted at 11:58PM Nov 03, 2007 by Basant Kukreja in Sun Web Server | Comments[0]
Monday Aug 31, 2009