GIRI MANDALIKA's SCRATCHPAD
- All
- Benchmarks
- CMT
- Enterprise
- Java
- MySQL
- Oracle
- Performance
- Solaris
- Sun
- Troubleshooting
- Workaround
Sun achieves the Magic Number 50,000 on T5440 with Oracle Business Intelligence EE 10.1.3.4
Less than two months ago, Sun Microsystems published an Oracle Business Intelligence benchmark with the best single system performance of 28,000 concurrent BI EE users at ~75% CPU utilization. Sun and Oracle Corporation announced another Oracle Business Intelligence benchmark result today with two identical T5440 servers in the Oracle BI Cluster serving 50,000 concurrent BI EE users.
An Oracle white paper with Sun's 50,000 user benchmark results can be accessed from Oracle's Business Intelligence web.
The hardware specifications for each of the T5440s are similar to the hardware that was used in the prior benchmark effort on a single T5440 server. However this time the Presentation Catalog (also frequently referred as the Web Catalog) was moved to a T5220 server where the NFS server was running. Besides this the only other change from the earlier 28,000 user benchmark exercise is the addition of another T5440 to the test rig.
The following graph shows the scalability of the application from one node to four nodes to eight nodes running on T5440 servers.

Without further ado, here is the summary of the benchmark results along with their significance and some interesting facts:
-
One of the major goals of this benchmark effort is to show the horizontal and vertical scalability of the application (OBIEE) by highlighting the superior performance and the resilience of the underlying hardware (T5440) and the operating system (Solaris). Needless to say the goal has been met.
-
Another goal of this benchmark is to show decent number of concurrent BI EE users executing transactions with good response times. Since we already showed the maximum load that can be achieved on a single BI instance (7500 users) and on a single T5440 server running multiple BI instances (28,000 users), this time we did not attempt to get the peak number that can be achieved from the two T5440 servers in the benchmark environment. Now that there is an additional server in the test setup that is taking care of the Presentation Catalog and the database server, 2 * 28000 = 56,000 BI EE users would have been an achievable target -- but we opted to stop at the "magic" and the "respectable" number 50,000 instead.
-
The entire benchmark run lasted for about 9 hours 45 minutes, and out of which 8 hours were the rampup hours where the 50,000 BI virtual users were logging into the application few users at a time. LoadRunner tool reported only 4 errors for the entire duration of the run; and there are zero errors in the 60 minute steady state period during which the statistics reported in the document were collected.
-
Two Sun SPARC Enterprise T5440 servers each with 4 x 8-Core 1.6 GHz UltraSPARC T2 Plus processors delivered the best performance of 50,000 concurrent BI EE users at around 63% CPU utilization.
-
The BI EE Cluster was deployed on two T5440 servers running Solaris 10 5/09 operating system. All the nodes in the BI Cluster were consolidated onto two T5440 servers using the free and efficient Solaris Containers virtualization technology.
-
The Presentation Catalog was hosted on ZFS powered file system that was created on top of four internal Solid State Drive (SSD) disks. The Catalog was shared among all eight BI nodes in the cluster as an NFS share. One 8-Core 1.2 GHz UltraSPARC T2 processor powered T5220 server was used to run the NFS server. Due to the minimal activity of the database, Oracle 11g database was also hosted on the same server. Solaris 10 5/09 is the operating system.
-
Solid State Drive (SSD) disks with ZFS file system showed significant I/O performance improvement over traditional disks for the Presentation Catalog activity. In addition, ZFS helped get past the UFS limitation of 32767 sub-directories in a Presentation Catalog directory.
-
Caching was turned ON at the application server, which led to minimal database activity on the server. Note hat the caching mechanism was turned ON even in the prior benchmark exercise.
-
The low end CoolThreads CMT Server T5220 and the mid-range T5440 server once again proved to be ideal candidates to deploy and run multi-thread workloads by exhibiting resilient performance when handling large number of simultaneous requests from 50,000 BI EE virtual users. T5220 handled large number of concurrent asynchronous read/write requests from eight different NFS clients.
-
NFS v3 was configured at the NFS Server as well as at the NFS Client nodes. NFS version 4 is the default on Solaris 10, and it might have worked as expected. However a handful of bug reports prompted us to go with the more matured and less buggy version 3.
-
3283 watts is the average power consumption when all the 50,000 concurrent BI users are in the steady state of the benchmark test. That is, in the case of similarly configured workloads, the T5440 server supports 15.2 users per watt of energy consumed and supports 5,000 users per rack unit.
-
A summary of the results with system-wide averages of CPU and memory utilization is shown below. The latest results are highlighted in blue color.
#Vusers Clustered #BI Nodes #CPU #Core RAM CPU Memory Avg Trx Response Time #Trx/sec 7,500 No 1 1 8 32 GB 72.85% 18.11 GB 0.22 sec 155 28,000 Yes 4 4 32 128 GB 75.04% 76.16 GB 0.25 sec 580 50,000 Yes 8 8 64 256 GB 63.32% 172.21 GB 0.28 sec 1031
TOPOLOGY DIAGRAM
The topology diagram in the benchmark results white paper is almost illegible. Here is the original topology diagram that was inserted into the white paper.

Quite frankly I'm not very proud of this drawing -- but that's the best that I could come up with in a short span. Rather than showing the flow of communication between each and every component in the benchmark setup, I simplified the drawing by introducing a "black box" sort of thing - "private network" - in the middle, which protected the drawing from getting messy.
CPU USAGE GRAPH
The following two-dimensional graph shows the CPU utilization patterns at all 3 nodes in the benchmark setup for the 60 minute steady state of the benchmark run. This graph was generated using the free GNUplot tool with sar data as the inputs.

COMPETITIVE LANDSCAPE
And finally here is a quick summary of all the results that are published by different vendors so far with similar benchmark kit. Feel free to draw your own conclusions. All this is public information. Check the corresponding benchmark reports by clicking on the URLs under the "#Users" column.
| Server | Processors | #Users | OS | ||||
|---|---|---|---|---|---|---|---|
| Chips | Cores | Threads | GHz | Type | |||
| 2 x Sun SPARC Enterprise T5440 (APP) 1 x Sun SPARC Enterprise T5220 (NFS,DB) |
8 1 |
64 8 |
512 64 |
1.6 1.2 |
UltraSPARC T2 Plus UltraSPARC T2 |
50,000 | Solaris 10 5/09 |
| 1 x Sun SPARC Enterprise T5440 | 4 | 32 | 256 | 1.6 | UltraSPARC T2 Plus | 28,000 | Solaris 10 5/09 |
| 5 x Sun Fire T2000 | 1 | 8 | 32 | 1.2 | UltraSPARC T1 | 10,000 | Solaris 10 11/06 |
| 3 x HP DL380 G4 | 2 | 4 | 4 | 2.8 | Intel Xeon | 5,800 | OEL |
| 1 x IBM x3755 | 4 | 8 | 8 | 2.8 | AMD Opteron | 4,000 | RHEL4 |
Before you go, do not forget to check the best practices for configuring / deploying Oracle Business Intelligence on top of Solaris 10 running on Sun CMT hardware.
Related Blog Posts:
T5440 Rocks [again] with Oracle Business Intelligence Enterprise Edition Workload
Posted at 04:20AM Oct 10, 2009 by Giri Mandalika in Benchmarks | Comments[0]
T5440 Rocks [again] with Oracle Business Intelligence Enterprise Edition Workload
A while ago, I blogged about how we scaled Siebel 8.0 up to 14,000 concurrent users by consolidating the entire Siebel stack on a single Sun SPARC® Enterprise T5440 server with 4 x 1.4 GHz eight-core UltraSPARC® T2 Plus Processors. OLTP workload was used in that performance benchmark effort.
We repeated a similar effort by collaborating with Oracle Corporation, but with an OLAP workload this time around. Today Sun and Oracle announced the 28,000 user Oracle Business Intelligence Enterprise Edition (OBIEE) 10.1.3.4 benchmark results on a single Sun SPARC Enterprise T5440 server with 4 x 1.6 GHz eight-core UltraSPARC T2 Plus Processors running Solaris 10 5/09 operating system. An Oracle white paper with Sun's 28,000 user benchmark results is available on Oracle's benchmark web site.
Some of the notes and key take away's from this benchmark are as follows:
-
Key specifications for the Sun SPARC Enterprise T5440 system under test are: 4 x UltraSPARC T2 Plus processors, 32 cores, 256 compute threads and 128 GB of memory in a 4RU space.
-
The entire OBIEE solution was deployed on a single Sun SPARC Enterprise T5440 server using Oracle BI Cluster software.
-
The BI Cluster was configured with 4 x BI nodes. Each of those BI nodes were configured to run inside a Solaris Container.
-
Each Solaris Container was configured with one physical processor (that is, 8 cores or 64 virtual cpus), and 32 GB physical memory.
-
Each BI node was configured to run BI Server, Presentation Server and OC4J Web Server
-
Two of the BI nodes have the BI Cluster Controller running (primary & secondary)
-
One out of four Containers was sharing CPU and memory resources with Oracle 11g RDBMS and the host operating system that are running in the global zone
-
-
Caching was turned ON at the application server, which led to minimal database activity on the server.
In other words, one can use these results only to size the hardware requirements for a complete BI EE deployment excluding the database server.
All the OBIEE benchmark results published so far are with the caching turned ON. This fact was not explicitly mentioned in some of the benchmark results white papers. Check the competitive Landscape for the pointers to different benchmark results published by different vendors.
-
From our experiments with the OBIEE benchmark workload, it appears that a BI deployment with a single non-cluster BI node could reasonably scale well up to 7,500 active users on a T5440 server. To scale beyond 7,500 concurrent users, you might need another instance of BI. Of course, your mileage may vary.
-
BI EE exhibited excellent horizontal scalability when multiple BI nodes were clustered using BI Cluster software. Four BI nodes in the Cluster were able to handle 28,000 concurrent users with minimal impact on the overall average transaction response times.
It appeared as though we can simply add more BI nodes to the BI Cluster to cope with the increase in user base. However due to the limited hardware resources, we could not try running beyond 4 nodes in the BI Cluster. As of today, the theoritical limit for the number of BI nodes in a Cluster is 16.
-
The underlying hardware must behave well in order for the application to scale and perform well -- so, credit goes to UltraSPARC T2 Plus powered Sun SPARC Enterprise T5440 server as well. In other words, it is fair to say the combination of (T5440 + OBIEE) performs and scales well on Solaris.
-
A summary of the results with system-wide averages of CPU and memory utilization is shown below.
#Vusers Clustered #BI Nodes #CPU #Core RAM CPU Memory Avg Trx Response Time #Trx/sec 7,500 No 1 1 8 32 GB 72.85% 18.11 GB 0.22 sec 155 28,000 Yes 4 4 32 128 GB 75.04% 76.16 GB 0.25 sec 580 -
Internal Solid State Drive (SSD) with ZFS file system showed significant I/O performance improvement over traditional disk for the BI catalog activity. In addition, ZFS helped get past the UFS limitation of 32,767 sub-directories in a BI catalog directory.
-
The benchmark demonstrated that 64-bit BI EE platform is immune to the 4 GB virtual memory limitation of the 32-bit BI EE platform -- hence can potentially support even more users and have larger caches as long as the hardware resources are available.
Solaris runs in 64-bit mode by default on SPARC platform. Consider running 64-bit BI EE on Solaris.
-
2,107 watts is the average power consumption when all the 28,000 concurrent users are in the steady state of the benchmark test. That is, in the case of similarly configured workloads, T5440 supports 13.2 users per watt of the power consumed; and supports 7,000 users per rack unit.
TOPOLOGY DIAGRAM:
A picture is worth a thousand words. The following topology diagram(s) says it all about the configuration.
1. Single Node BI Non-Cluster Configuration : 7,500 Concurrent Users

Even though the Solaris Container was shown in a cloud like graphical form, it has nothing to do with the "Cloud Computing". It is just a side effect of fancy drawing.
2. Four Node BI Cluster Configuration : 28,000 Concurrent Users

Here is a quick summary of all the results that are published by different vendors. Feel free to draw your own conclusions. All this is public information. Check the corresponding benchmark reports by clicking on the URLs under the "#Users" column.
| Server | Processors | #Users | OS | ||||
|---|---|---|---|---|---|---|---|
| Chips | Cores | Threads | GHz | Type | |||
| 1 x Sun SPARC Enterprise T5440 | 4 | 32 | 256 | 1.6 | UltraSPARC T2 Plus | 28,000 | Solaris 10 5/09 |
| 5 x Sun Fire T2000 | 1 | 8 | 32 | 1.2 | UltraSPARC T1 | 10,000 | Solaris 10 11/06 |
| 3 x HP DL380 G4 | 2 | 4 | 4 | 2.8 | Intel Xeon | 5,800 | OEL |
| 1 x IBM x3755 | 4 | 8 | 8 | 2.8 | AMD Opteron | 4,000 | RHEL4 |
CAUTION
Although T5440 possesses a ton of great qualities, it might not be suitable for deploying workloads with heavy single-threaded dependencies. The T5440 is an excellent hardware platform for multi-threaded, and moderately single-threaded/multi-process workloads. When in doubt, it is a good idea to leverage Sun Microsystems' Try & Buy program to try the workloads on the T5440 server before making the final call.
Check the second part of this blog post for the best practices for configuring / deploying Oracle Business Intelligence on top of Solaris 10 running on Sun CMT hardware.
Related Blog Posts:
- Sun T5440 Oracle BI EE World Record Performance
- World Record Performance of Sun CMT Servers
- Why does 1.6 beat 4.7?
- Siebel 8.0 on Sun SPARC Enterprise T5440 - More Bang for the Buck!!
Posted at 04:35PM Aug 17, 2009 by Giri Mandalika in Benchmarks | Comments[1]
Sun Studio: Debugging Multi-Threaded Applications with dbx
(Crossposting the three and half year old blog entry "as is" from my other blog hosted on blogger. It needs some serious editing, but I believe the content is still relevant. Source URL: http://technopark02.blogspot.com/2005/12/sun-studio-debugging-multi-threaded.html)Multi-threading lets different tasks to run concurrently in a single process, hence multi-threaded programs would run faster [or achieve better throughput] on machines with multiple processors and on CPUs with multiple cores. On an SMP (Symmetric Multi-Processing system, where multiple processors share a single memory system) system with no CMT (Chip Multi-Threading), software threads are executed on different processors; and on an SMP system with CMT, the threads are executed on cores, and logical processors in CMP (Chip Multi-Processing) processors. As revolutionary chip designs are evolving, many important commercial applications like Oracle, SAP, Siebel, PeopleSoft are designed to be multi-threaded.
Debugging a multi-threaded (MT in short) application is a bit hard, due to the number of software threads running in parallel, compared to a single threaded program where only one task will be running per process, at any given time. Thread synchronization plays an important role when concurrently running threads have to share global resources. Improperly synchronized threads may starve, and lead to unnecessary dead locks, and race conditions. So, it is good to have an MT aware debugger handy, during development and in support phases of software life cycle, to debug threading issues.
Fortunately on Solaris, Sun Studio's debugger, dbx, has support for MT applications that are designed to use Solaris threads, and/or POSIX threads. With dbx, it is possible to get information like thread state, stack trace, locks from all threads, navigate between threads, suspend/resume threads, put break points in a thread and can do step by step execution in a function in a designated thread. Note that Solaris Modular Debugger (mdb) also has support for MT programs; but this blog post concentrates on Studio's dbx.
Siebel processes were used to show various dbx commands in the following examples. Siebel is a multi-threaded application, written in C/C++.
Core dump analysisThe following example shows some useful commands to get the stack trace in the thread, where the process crashed. For more information about dbx commands, type help or help <command> in dbx environment ie., at dbx prompt.
% ls -lh core -rw------- 1 giri other 273M Dec 9 16:56 core % file core core: ELF 32-bit MSB core file SPARC Version 1, from 'siebprocmw' % /opt/SS11/SUNWspro/prod/bin/dbx siebprocmw core For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc Reading siebprocmw core file header read successfully Reading ld.so.1 Reading libsslcwsl.so Reading libssscsci.so Reading libssscscf.so ... ... Reading libsbcfui.so Reading libsbcfuiapps.so t@1 (l@1) terminated by signal KILL (Killed) 0xfd2bc7e0: ___nanosleep+0x0008: blu _cerror ! 0xfd2206a0Since we don't know which thread crashed the process, let's list all known threads with
threads command. threads -all lists all threads, including zombies.(dbx) threads
> t@1 a l@1 ?() LWP suspended in ___nanosleep()
t@2 b l@2 MwTimerThread() LWP suspended in __pollsys()
t@3 b l@3 MwAsyncSignalThread() sleep on 0xfd874078 in __lwp_park()
t@4 b l@4 MwThread() LWP suspended in __pollsys()
t@5 b l@5 MwThread() LWP suspended in __pollsys()
o t@6 b l@6 MwThread() signal SIGABRT in __lwp_kill()
t@7 b l@7 MwThread() LWP suspended in __pollsys()
t@9 b l@9 MwThread() LWP suspended in ___nanosleep()In the above list, t@1 is the current thread, which is indicated by ">", and the start function is not known (indicated with a "?()").
(dbx) thread current thread ($thread) is t@1 (dbx) where current thread: t@1 =>[1] ___nanosleep(0x4, 0xffbfd9a8, 0x0, 0xff000000, 0x0, 0x0), at 0xfd2bc7e0 [2] _sleep(0x64, 0x0, 0xfd2e8bc0, 0xfd0e2000, 0xfd0e2000, 0x0), at 0xfd2afaa0 [3] thr_t::do_thr_action(0xfd86ba10, 0xc, 0x1608, 0xfd86ba20, 0x1, 0x2), at 0xfd770e14 [4] thr_t::t_sleep(0xfb80f5c0, 0x0, 0xffbfdb0e, 0xffbfdb08, 0xfd8546cc, 0xffffffff), at 0xfd770c58 [5] MwWaitForMultipleObjects(0xfb80f5c0, 0x2, 0xfb80f5c8, 0x2, 0xffffffff, 0x9cd48), at 0xfd774dd4 [6] WaitForMultipleObjectsEx(0x2, 0xffbfde3c, 0x0, 0x100000, 0x0, 0x9cd48), at 0xfd77fe9c [7] OSDNTWait::WaitForThread(0xc, 0xffffffff, 0xffbfdecc, 0xd0108, 0x1004f, 0xff8a1d64), at 0xffa7b050 [8] OSDWaitTid(0xc, 0xffffffff, 0xffbfe7c4, 0x0, 0xc, 0xc), at 0xff05f1c4 [9] scfEventFacility::scfEventFac::ShutdownCmd(0xe14450, 0x1, 0x7, 0xfe4de0f4, 0xffbfe7c8, 0xff48f8d4), at 0xff819884 [10] scfEventFacility::scfEventFac::Shutdown(0xffbfe96c, 0xff877530, 0x0, 0x5e000, 0xff874e8c, 0x5e114), at 0xff819390 [11] ScfSisDetach(0x0, 0x0, 0x0, 0xffffffff, 0xffbfe96c, 0xfc81c), at 0xff781ed4 [12] _shutdown(0x6479c, 0x0, 0x651a8, 0x651a8, 0x7, 0x0), at 0x49c7c [13] wmain(0x12a, 0x6479c, 0x0, 0x0, 0xffbfedac, 0x6479c), at 0x4995c [14] main(0xfd85f310, 0xc94, 0xffbfef90, 0x54, 0xfd85f310, 0xc00), at 0x4d3cc
This is not exactly what we are looking for. The above call stack shows where the current thread (t@1) is waiting. Since our interest is to find out the thread that is responsible for the process crash, we need to look for an o before the thread id. t@6 is the ill fated thread in the list of all known threads; and the process was killed because of a SIGABRT in lwp_kill method. Note that OS provides the necessary abstraction for creating, and destroying threads; and also has the freedom of killing malfunctioning threads when things go haywire. In this example, __lwp_kill() was called by the operating system, due to some event which we are going to investigate.
thread -info <tid> command provides more information like what exactly happened in application code that triggered the forcible shutdown.
(dbx) thread -info t@6
Thread t@6 (0xfcb80c00) at priority 0
state: bound to l@6
base function: 0xfd770ff4: MwThread() stack: 0xfa380000[524288]
flags: BOUND|DETACHED|SUSPENDED
masked signals: SEGV
Currently active in __lwp_killObserve that kernel trapped an illegal memory access with a SEGV signal. The default behavior for a SEGV, is to shutdown the process with a possible core file generation (aka core dump). Let's switch to thread t@6 with thread <tid> command, and get to the instruction which raised the segmentation fault.
(dbx) thread t@6
t@6 (l@6) stopped in __lwp_kill at 0xfd2bd5ec
0xfd2bd5ec: __lwp_kill+0x0008: bcc,a,pt %icc,__lwp_kill+0x18 ! 0xfd2bd5fc
(dbx) thread
current thread ($thread) is t@6
(dbx) where
current thread: t@6
=>[1] __lwp_kill(0x0, 0x6, 0x0, 0x6, 0xffff0000, 0x0), at 0xfd2bd5ec
[2] raise(0x6, 0x0, 0xfd2a1af4, 0x42770, 0xfd2e4278, 0x6), at 0xfd25d884
[3] abort(0xe15220, 0x1, 0x0, 0xa6544, 0xfd2e7298, 0x0), at 0xfd23de38
[4] SehScanInvokeTryList(0x44bd308, 0x108000, 0xfd8571c4, 0x0, 0x2, 0x0), at 0xfd74c9d4
[5] Signal_Handler::raise(0xc0000005, 0xfa37cde8, 0x0, 0x2, 0xfa37cc80, 0x1800), at 0xfd74d778
[6] Raise_Exception::operator()(0x67670, 0xb, 0xfa37d0a0, 0xfa37cde8, 0xfd86a07c, 0x2c), at 0xfd74d8dc
[7] __sighndlr(0xb, 0xfa37d0a0, 0xfa37cde8, 0xfd74d7c8, 0x0, 0x1), at 0xfd2bc52c
---- called from signal handler with signal 11 (SIGSEGV) ------
[8] CSSSqlObj::GetTrxDbConn(0x458a7d8, 0x0, 0x1394478, 0x64c00, 0x0, 0x4611290), at 0xf91de72c
[9] CSSSqlObj::Execute(0x4611290, 0x0, 0x0, 0x0, 0x0, 0xfe4dd294), at 0xf91c7b98
[10] CSSBusComp::SqlExecute(0x4606640, 0x0, 0x0, 0x0, 0x1, 0x4b22e84), at 0xf9a9c160
[11] CSSBCBase::SqlExecute(0x4606640, 0x0, 0xfa37d6fc, 0x0, 0x1, 0xf57be3e8), at 0xf56c2294
[12] CSSBusComp::Execute(0x0, 0x0, 0x0, 0x0, 0x4606640, 0xfa37d7cc), at 0xf9a6b118
[13] CSSMsgBoardMaintSvc::UpdTaskHistory(0x44b5ae0, 0xfa37df90, 0x0, 0x4567d14, 0xf8611198, 0x489cd94), at 0xf85f2d48
[14] CSSMsgBoardMaintSvc::HandleEventDataList(0x44b5ae0, 0x43a0018, 0xff486b38, 0x0, 0xfa37e0ac, 0xf8611198), at 0xf85f5afc
[15] CSSMsgBoardMaintSvc::ReadTaskHistory(0x44b5ae0, 0x43a0018, 0xf85f4e60, 0x44b5ae0, 0x43a0018, 0x1), at 0xf85f53c0
[16] scfEventFacility::scfEventFac::CallRegSub(0x2a59448, 0x4109bd8, 0x0, 0x0, 0x8, 0x2), at 0xff81ad20
[17] scfEventFacility::scfEventFac::HandleCurrProcEvents(0xe14450, 0x7530, 0xe14450, 0xff432ef0, 0xff874e8c, 0x1),
at 0xff81b19c
[18] scfEventFacility::scfEventFac::scfEventThreadMain(0x0, 0x0, 0x0, 0x7400, 0xfa37fc90, 0xd0001), at 0xff81a7dc
[19] OSDWslThreadStart(0x101d58, 0xff81a580, 0x101d58, 0x6, 0x0, 0x101d70), at 0xff05bec8
[20] _AfxThreadEntry(0xffbfde34, 0xe9568, 0x0, 0x1, 0x0, 0x17289c), at 0xfeb95730
[21] MwThread(0x1, 0x0, 0x1, 0x0, 0xfd86bed0, 0xe15220), at 0xfd771230From the above stack trace it is clear that the binary doesn't contain necessary debug information to show high level instructions; so, let's try to get the disassembly with dis command.
(dbx) dis GetTrxDbConn / 50 More than one identifier 'GetTrxDbConn'. Select one of the following: 0) Cancel 1) `libsscfdm.so`#__1cPCSSModelPhysDefMGetTrxDbConn6MpkH_pnJCSSDbConn__ [non -g, demangles to: CSSModelPhysDef::GetTrxDbConn(const unsigned short*)] 2) `libsscfdm.so`#__1cJCSSSqlObjMGetTrxDbConn6kM_pnJCSSDbConn__ [non -g, demangles to: CSSSqlObj::GetTrxDbConn()const] > 2 0xf91de6c0: GetTrxDbConn : save %sp, -96, %sp 0xf91de6c4: GetTrxDbConn+0x0004: mov %i0, %i5 0xf91de6c8: GetTrxDbConn+0x0008: ld [%i0 + 388], %i0 0xf91de6cc: GetTrxDbConn+0x000c: cmp %i0, 0 0xf91de6d0: GetTrxDbConn+0x0010: be,pn %icc,GetTrxDbConn+0x60 ! 0xf91de720 0xf91de6d4: GetTrxDbConn+0x0014: sethi %hi(0x5b400), %l6 0xf91de6d8: GetTrxDbConn+0x0018: call GetTrxDbConn+0x20 ! 0xf91de6e0 0xf91de6dc: GetTrxDbConn+0x001c: mov %o7, %o7 0xf91de6e0: GetTrxDbConn+0x0020: sethi %hi(0x2d1400), %o5 0xf91de6e4: GetTrxDbConn+0x0024: xor %l6, 88, %l4 0xf91de6e8: GetTrxDbConn+0x0028: inc 420, %o5 0xf91de6ec: GetTrxDbConn+0x002c: sethi %hi(0x1000), %l5 0xf91de6f0: GetTrxDbConn+0x0030: add %o5, %o7, %l3 0xf91de6f4: GetTrxDbConn+0x0034: add %l5, 868, %l1 0xf91de6f8: GetTrxDbConn+0x0038: add %l3, %l4, %l2 0xf91de6fc: GetTrxDbConn+0x003c: ld [%l2], %l0 0xf91de700: GetTrxDbConn+0x0040: ld [%l0 + %l1], %o4 0xf91de704: GetTrxDbConn+0x0044: cmp %o4, 0 0xf91de708: GetTrxDbConn+0x0048: be,a,pn %icc,GetTrxDbConn+0x68 ! 0xf91de728 0xf91de70c: GetTrxDbConn+0x004c: ld [%i5 + 128], %i2 0xf91de710: GetTrxDbConn+0x0050: ld [%o4 + 88], %l7 0xf91de714: GetTrxDbConn+0x0054: cmp %i5, %l7 0xf91de718: GetTrxDbConn+0x0058: bne,a,pn %icc,GetTrxDbConn+0x68 ! 0xf91de728 0xf91de71c: GetTrxDbConn+0x005c: ld [%i5 + 128], %i2 0xf91de720: GetTrxDbConn+0x0060: ret 0xf91de724: GetTrxDbConn+0x0064: restore %g0, 0, %o0 0xf91de728: GetTrxDbConn+0x0068: ld [%i2 + 188], %i1 0xf91de72c: GetTrxDbConn+0x006c: ld [%i1 - 16], %i3 0xf91de730: GetTrxDbConn+0x0070: cmp %i3, 0 0xf91de734: GetTrxDbConn+0x0074: bge,pn %icc,GetTrxDbConn+0x90 ! 0xf91de750 0xf91de738: GetTrxDbConn+0x0078: add %i2, 188, %i4 0xf91de73c: GetTrxDbConn+0x007c: clr %o0 0xf91de740: GetTrxDbConn+0x0080: call RequiredConditionIsFalse [PLT] ! 0xf94b0684 0xf91de744: GetTrxDbConn+0x0084: mov 84, %o1 0xf91de748: GetTrxDbConn+0x0088: ld [%i4], %i1 0xf91de74c: GetTrxDbConn+0x008c: ld [%i5 + 388], %i0 0xf91de750: GetTrxDbConn+0x0090: call GetTrxDbConn ! 0xf90e0e00 0xf91de754: GetTrxDbConn+0x0094: restore %g0, 0, %g0 0xf91de758: GetTrxDbConn+0x0098: unimp 0x0 ... ...
To see the actual C++ instruction which seg faulted, compile the binary with -g (debug) option, and reproduce the crash. If the source code is readable from the location where you run the dbx session, you will see the actual high level instructions.
Some fun with an active process
The objective of this section is to show how to use some of the dbx commands to get some useful information, from a running MT process.
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
2754 giri 399M 302M sleep 59 0 0:00:34 2.0% siebmtshmw/21
% /opt/SS11/SUNWspro/prod/bin/dbx - 2754
For information about new features see `help changes'
To remove this message, put `dbxenv suppress_startup_message 7.5' in your .dbxrc
Reading -
Reading ld.so.1
Reading libsslcwsl.so
Reading libssscsci.so
Reading libssscscf.so
...
...
Reading libsscasvbc.so
Reading libswcasvfr.so
Attached to process 2754 with 21 LWPs
t@1 (l@1) stopped in __pollsys at 0xfd13d1c4
0xfd13d1c4: __pollsys+0x0004: ta 8
(dbx) threads
> t@1 a l@1 ?() running in __pollsys() <- t@1 is always the default current thread under dbx
t@2 b l@2 MwTimerThread() sleep on 0xfb80f4c0 in __lwp_park()
t@3 b l@3 MwAsyncSignalThread() sleep on 0xfd774078 in __lwp_park()
t@4 b l@4 MwThread() running in __pollsys()
t@5 b l@5 MwThread() running in __pollsys()
t@6 b l@6 MwThread() sleep on 0xf9b7eb80 in __lwp_park()
t@7 b l@7 MwThread() running in __pollsys()
t@8 b l@8 MwThread() running in _so_recv()
t@9 b l@9 MwThread() sleep on 0xf927fb68 in __lwp_park()
t@10 b l@10 MwThread() sleep on 0xf877f500 in __lwp_park()
t@11 b l@11 MwThread() sleep on 0xf867fa40 in __lwp_park()
t@12 b l@12 MwThread() sleep on 0xf857fa50 in __lwp_park()
t@13 b l@13 MwThread() sleep on 0xf847fa38 in __lwp_park()
t@14 b l@14 MwThread() running in __pollsys()
t@15 b l@15 MwThread() sleep on 0xf827f490 in __lwp_park()
t@16 b l@16 MwThread() running in __pollsys()
t@17 b l@17 MwThread() sleep on 0xf807f490 in __lwp_park()
t@18 b l@18 MwThread() running in __pollsys()
t@19 b l@19 MwThread() sleep on 0xf4c7f490 in __lwp_park()
t@20 b l@20 MwThread() running in __pollsys()
t@21 b l@21 MwThread() sleep on 0xf4a7f490 in __lwp_park()Put a break point in thread 21 (t@21) for all calls to memcpy():
(dbx) stop in memcpy -thread t@21 More than one identifier 'memcpy'. Select one of the following: 0) Cancel 1) `libc.so.1`memcpy 2) `libc_psr.so.1`memcpy a) All > a dbx: warning: 'memcpy' has no debugger info -- will trigger on first instruction dbx: warning: 'memcpy' has no debugger info -- will trigger on first instruction Will create handlers for all 2 hits (2) stop in _private_memcpy -thread t@21 <- implicit break point set by dbx (3) stop in _memcpy -thread t@21 <- implicit break point (dbx) cont t@21 (l@21) stopped in _memcpy at 0xfe1f04c0 0xfe1f04c0: _memcpy : nop
Note that dbx is synchronous -- when any thread or lightweight process (LWP) stops, all other threads and LWPs stop as well.
(dbx) thread current thread ($thread) is t@21 (dbx) where current thread: t@21 =>[1] _memcpy(0x5080e14, 0xff406b38, 0x2, 0x36, 0x1, 0x6c), at 0xfe1f04c0 [2] SSstring::GetWriteBuffer(0xf4a7e6ac, 0xff406b28, 0xff874e8c, 0x32, 0x0, 0xff3b2ef0), at 0xff31ffcc [3] sciProcState::sciBlock::FormatLatchName(0xf4a7e6ac, 0x1, 0x7, 0x853c, 0xffa30bd8, 0x8400), at 0xffa02744 [4] sciProcState::sciProcState(0x5ad31f8, 0xf9fc0000, 0xf4a7e644, 0xff406b3c, 0x0, 0x0), at 0xffa012c4 [5] sciProcState::GetSciProcState(0xf4a7e7f8, 0x26fcb8, 0x5ad31f8, 0xff88db30, 0x5f5e4, 0x61e6c90), at 0xffa014f0 [6] SciCheckShutdown(0xf4a7e8cc, 0x34151f8, 0x74, 0x26fcb8, 0x0, 0x2ef798), at 0xff9fe0e4 [7] SciGetInterrupt(0x0, 0x6a20950, 0x0, 0xf4a7e864, 0x25cd94, 0x1da84), at 0xff9fde40 [8] _smiMessageQ::ProcessMessage(0x15f85c0, 0x6a20950, 0x0, 0x0, 0x24a360, 0x32e18f0), at 0x2158e4 [9] _smiMessageQ::ProcessRequest(0x3380c48, 0x6a20950, 0x191, 0x2, 0x5ae22f0, 0x15f85c0), at 0x21461c [10] _smiWorkQueue::ProcessWorkItem(0x15f98b8, 0x3380c48, 0x6a20950, 0x5ae2390, 0x0, 0x101f180), at 0x208d08 [11] _smiWorkQueue::WorkerTask(0x15f98b8, 0x5b7f6b8, 0x3326338, 0x1500e0, 0x0, 0x0), at 0x208764 [12] SmiThrdEntryFunc(0x32f72d8, 0x70000f, 0x700010, 0x0, 0x0, 0x0), at 0x1f7a0c [13] OSDWslThreadStart(0x3380568, 0x1f75a0, 0x3380568, 0x15, 0x0, 0x3380760), at 0xfefdbec8 [14] _AfxThreadEntry(0xf4b7de5c, 0x3386210, 0x0, 0x1, 0x0, 0x17289c), at 0xfeb95730 [15] MwThread(0x1, 0x0, 0x1, 0x0, 0xfd76bed0, 0x33cdc40), at 0xfd671230
Let's step into memcpy() with stepi, and observe how the thread state changes.
(dbx) stepi
t@21 (l@21) stopped in _memcpy at 0xfe1f04c4
0xfe1f04c4: _memcpy+0x0004: nop
(dbx) threads
t@1 a l@1 ?() running in __pollsys()
t@2 b l@2 MwTimerThread() sleep on 0xfb80f4c0 in __lwp_park()
t@3 b l@3 MwAsyncSignalThread() sleep on 0xfd774078 in __lwp_park()
t@4 b l@4 MwThread() running in __pollsys()
t@5 b l@5 MwThread() running in __pollsys()
t@6 b l@6 MwThread() sleep on 0xf9b7eb80 in __lwp_park()
t@7 b l@7 MwThread() running in __pollsys()
t@8 b l@8 MwThread() running in _so_recv()
t@9 b l@9 MwThread() sleep on 0xf927fb68 in __lwp_park()
t@10 b l@10 MwThread() sleep on 0xf877f500 in __lwp_park()
t@11 b l@11 MwThread() sleep on 0xf867fa40 in __lwp_park()
t@12 b l@12 MwThread() sleep on 0xf857fa50 in __lwp_park()
t@13 b l@13 MwThread() sleep on 0xf847fa38 in __lwp_park()
t@14 b l@14 MwThread() running in __pollsys()
t@15 b l@15 MwThread() sleep on 0xf827f490 in __lwp_park()
t@16 b l@16 MwThread() running in __pollsys()
o t@17 b l@17 MwThread() breakpoint in _memcpy()
o t@18 b l@18 MwThread() breakpoint in _memcpy()
o t@19 b l@19 MwThread() breakpoint in _memcpy()
t@20 b l@20 MwThread() running in __pollsys()
*> t@21 b l@21 MwThread() single stepped in _memcpy()In the above example, t@17, t@18 and t@19 are stopped at calls to memcpy(); and t@21 stepped into memcpy(). Get out of memcpy() with step up command.
(dbx) step up _memcpy returns 84413972 t@21 (l@21) stopped in SSstring::GetWriteBuffer at 0xff31ffd4 0xff31ffd4: GetWriteBuffer+0x0114: ld [%i1 + 4], %i2
Clear the break point (in current thread) with clear command
(dbx) cont t@21 (l@21) stopped in _memcpy at 0xfe1f04c0 0xfe1f04c0: _memcpy : nop (dbx) clear cleared (3) stop in _memcpy -thread t@21Locks
thread -blocks [<tid>] lists all locks held by the given thread, blocking other threads. If tid is not specified, dbx lists the locks held by the current thread. In the following example, t@21 (current thread) is not holding any locks.
(dbx) thread -blocks Locks held by t@21:
thread -blockedby [<tid>] shows the synchronization object (monitor) on which the given thread is blocked. If tid is not specified, dbx shows this information for the current thread. Note that only sleeping threads must be in blocked state.
(dbx) thread -blockedby t@10 Thread t@10 is blocked by: 0xf877f500 (0xf877f500): thread condition variable (dbx) thread -blockedby t@12 Thread t@12 is blocked by: 0xf857fa50 (0xf857fa50): thread condition variable (dbx) thread -blockedby t@17 Thread t@17 is not asleep
syncs command lists all synchronization objects ie., locks/monitors.
(dbx) syncs All locks currently known to libthread: 0x01020320 (0x01020320): thread mutex(unlocked) 0x010203f8 (0x010203f8): thread mutex(unlocked) 0xf827f490 (0xf827f490): thread condition variable 0xf827f4a0 (0xf827f4a0): thread mutex(unlocked) 0xf877f500 (0xf877f500): thread condition variable 0xf877f510 (0xf877f510): thread mutex(unlocked) 0xf927fb68 (0xf927fb68): thread condition variable 0xf927fb78 (0xf927fb78): thread mutex(unlocked) 0xf867fa40 (0xf867fa40): thread condition variable 0xf867fa50 (0xf867fa50): thread mutex(unlocked) 0xf9b7eb80 (0xf9b7eb80): thread condition variable 0xf9b7eb90 (0xf9b7eb90): thread mutex(unlocked) 0x015c2ed8 (0x015c2ed8): thread mutex(unlocked) 0x015c2f38 (0x015c2f38): thread mutex(unlocked) 0x015c2f18 (0x015c2f18): thread mutex(unlocked) 0x015c2dd8 (0x015c2dd8): thread mutex(unlocked) 0x015c34d8 (0x015c34d8): thread mutex(unlocked) 0x03325fb8 (0x03325fb8): thread mutex(unlocked) 0x033264b8 (0x033264b8): thread mutex(unlocked) 0x033261b8 (0x033261b8): thread mutex(unlocked) 0x017a6ce8 (0x017a6ce8): thread mutex(locked) 0xfa4f4314 (0xfa4f4314): process mutex(locked) 0x0332c438 (0x0332c438): thread mutex(unlocked) 0x0332c348 (0x0332c348): thread mutex(unlocked) 0x02fcd7e8 (0x02fcd7e8): thread mutex(unlocked) 0x0028f860 (0x0028f860): thread mutex(unlocked) __1cUCSSSISLocalTransSrvrKs_instLock_+0x8 (0xff1ee220): thread mutex(unlocked) 0x034150e8 (0x034150e8): thread mutex(unlocked) 0x034151d8 (0x034151d8): thread mutex(unlocked) __uberdata+0x80 (0xfd168c40): thread mutex(unlocked) 0x01878b98 (0x01878b98): thread mutex(unlocked) 0x01878aa8 (0x01878aa8): thread mutex(unlocked) 0xfa4c7e9c (0xfa4c7e9c): process mutex(unlocked) libc_malloc_lock (0xfd1676f8): thread mutex(unlocked) 0x0179cb30 (0x0179cb30): thread mutex(unlocked) 0x0179c830 (0x0179c830): thread mutex(unlocked) 0xfa5c2664 (0xfa5c2664): process mutex(unlocked) 0xfa5c2c94 (0xfa5c2c94): process mutex(unlocked) 0x0161dd90 (0x0161dd90): thread mutex(unlocked) 0x0101f6e0 (0x0101f6e0): thread mutex(unlocked) 0x0101f718 (0x0101f718): thread mutex(unlocked) 0x0101f770 (0x0101f770): thread mutex(unlocked) 0x0101f508 (0x0101f508): thread mutex(locked) 0x0101f5a8 (0x0101f5a8): thread mutex(unlocked) 0x015bfe90 (0x015bfe90): thread mutex(unlocked) 0x015bfe20 (0x015bfe20): thread mutex(unlocked) 0x015bfe58 (0x015bfe58): thread mutex(unlocked)
To get information about a synchronization object at a given address, use sync -info <address>
(dbx) sync -info 0x0028f860 0x0028f860 (0x28f860): thread mutex(unlocked) Lock is unowned No threads are blocked by this lock (dbx) sync -info 0xf877f500 0xf877f500 (0xf877f500): thread condition variable (dbx) sync -info 0xfd1676f8 libc_malloc_lock (0xfd1676f8): thread mutex(unlocked) Lock is unowned No threads are blocked by this lockTracing
trace command can be used to trace the executed source lines, function calls, or variable changes. The following example traces the thread creation, and prints a message whenever a thread gets created.
(dbx) trace thr_create
(4) trace thr_create
(dbx) cont
trace: thread created t@22 on l@22
trace: thread created t@23 on l@23
Reading libsrlcver.so
Reading libsscafsbc.so
...
(dbx) threads
*> t@1 a l@1 ?() signal SIGINT in __pollsys()
t@2 b l@2 MwTimerThread() sleep on 0xfb80f4c0 in __lwp_park()
t@3 b l@3 MwAsyncSignalThread() sleep on 0xfd774078 in __lwp_park()
...
...
t@20 b l@20 MwThread() running in __pollsys()
t@21 b l@21 MwThread() sleep on 0xf4a7f490 in __lwp_park()
t@22 b l@22 MwThread() running in __pollsys() <- new thread
t@23 b l@23 MwThread() sleep on 0xea6ff490 in __lwp_park() <- new threadIn the above example, there is no information about who created the threads t@22 & t@23. Even to get that information, use when command as shown below:
(dbx) when thr_create { echo "New thread $newthread was created by thread $thread"; }
(6) when thr_create { kprint "New thread ${newthread} was created by thread ${thread}"; }
(dbx) cont
New thread t@24 was created by thread t@10
New thread t@25 was created by thread t@24$newthread and $thread are pre-defined variables of dbx, which holds the thread ID of a newly created thread, and the thread ID of the current thread, respectively.
Similarly thread exits can be traced as follows:
(dbx) trace thr_exit (5) trace thr_exit (dbx) cont New thread t@26 was created by thread t@10 New thread t@27 was created by thread t@26 trace: thr_exit t@27Suspending/Resuming threads
To suspend the execution of a thread, run the command thread -suspend <tid>; to resume the suspended thread, thread -resume <tid>
(dbx) thread -suspend t@26 Thread t@26 suspended (dbx) thread -resume t@26 Thread t@26 unsuspendedBreak point with
stop commandThe following example shows how to set a break point to stop the execution, when a new thread with id t@34 gets created.
(dbx) stop thr_create t@34 (9) stop thr_create t@34 (dbx) cont t@10 (l@10) stopped in tdb_event_create at 0xfd1377e8 0xfd1377e8: tdb_event_create : retl trace: thread created t@34 on l@34 (dbx) where <- who initiated the new thread creation? entire call stack current thread: t@10 =>[1] tdb_event_create(0x2, 0x1084, 0x3ff, 0x0, 0xfc8e1c00, 0x1000), at 0xfd1377e8 [2] _thrp_create(0x180, 0x10f8, 0xfd1377e8, 0x1e, 0xc1, 0xfde32000), at 0xfd138c04 [3] _pthread_create(0xf877f310, 0x0, 0xfd670ff4, 0xf877f318, 0x0, 0xfd168bc0), at 0xfd12d104 [4] MwCreateThread(0x0, 0xfeb95630, 0xf877f414, 0x4, 0x0, 0x9383cb0), at 0xfd671460 [5] CreateThread(0x0, 0x0, 0xfeb95630, 0xf877f414, 0x4, 0x9383cb0), at 0xfd67d124 [6] CWinThread::CreateThread(0x9383c80, 0x4, 0x0, 0x0, 0xfd164278, 0x88cabc9), at 0xfeb95f1c [7] AfxBeginThread(0xffa7a420, 0x88cabc0, 0x0, 0x0, 0x4, 0x0), at 0xfeb958a4 [8] WslCreateThread(0xfefdbe00, 0x5c135c0, 0x0, 0x88cabc0, 0xf877f584, 0x16b8c), at 0xffa7a4cc [9] OSDCreateThread(0x211200, 0x5b40660, 0x0, 0x0, 0x5ab1590, 0x5c135c0), at 0xfefdc16c [10] SmiDispatchThrdMain(0x101f180, 0x5ab1588, 0x5ab1590, 0xf877fd64, 0xf877fcec, 0xff40f8d4), at 0x1f53f4 [11] OSDWslThreadStart(0x10b8ad0, 0x1f5240, 0x10b8ad0, 0xa, 0x0, 0x15d07e8), at 0xfefdbec8 [12] _AfxThreadEntry(0xffbfeaac, 0x2f4948, 0x0, 0x1, 0x0, 0x17289c), at 0xfeb95730 [13] MwThread(0x1, 0x0, 0x1, 0x0, 0xfd76bed0, 0x15cd558), at 0xfd671230Light Weight Processes (LWPs)
Application (user) threads are not visible to the kernel. Kernel treats light weight processes (LWPs) as the only schedulable entities within a process. LWPs bridge the user level and kernel level threads. Each process contains one or more LWPs; and each LWP is associated with a kernel thread. Prior to Solaris 9, each of LWPs would run one or more user level threads (ie., 1xN). From Solaris 9 onwards, there is one LWP for every user level thread (ie., 1x1).
Use lwps command to list all LWPs in the process.
(dbx) lwps l@1 running in _private_mprotect() l@2 running in __lwp_park() l@3 running in __lwp_park() l@4 running in __pollsys() l@5 running in __pollsys() l@6 running in __lwp_park() l@7 running in __pollsys() l@8 running in _so_recv() l@9 running in __lwp_park() l@10 running in __lwp_park() l@11 running in __lwp_park() l@12 running in __lwp_park() l@13 running in __time() l@14 running in __pollsys() l@15 running in __lwp_park() l@16 running in __pollsys() o l@17 breakpoint in SSstring::GetWriteBuffer() l@18 running in __lwp_unpark() o l@19 breakpoint in SSstring::GetWriteBuffer() l@20 running in __pollsys() *>l@21 breakpoint in SSstring::GetWriteBuffer()
lwp command displays the current LWP. To switch to a different LWP, use lwp <lwpid>. lwp -info [<lwpid>] shows some useful information for a given LWP.
(dbx) lwp current LWP ($lwp) is l@21 (dbx) lwp -info l@21 breakpoint in SSstring::GetWriteBuffer() masked signals are: (dbx) lwp -info l@12 l@12 running in __lwp_park() masked signals are: (dbx) lwp l@18 t@18 (l@18) stopped in __pollsys at 0xfd13d1c4 0xfd13d1c4: __pollsys+0x0004: ta 8
Scalability issues
In general, MT applications that make heavy use of the standard {Solaris operating system's} memory allocator, may exhibit poor scalability. This problem occurs when multiple threads are in malloc() or free() waiting to obtain the mlock.
If the application suffers from this scalability issue, the top of the thread stacks (which can be obtained using either dbx or pstack command) will appear as below:
lwp_park mutex_lock_queue slow_lock freeor
lwp_park mutex_lock_queue slow_lock malloc
One such problem was described in this Solaris forum's thread slow_lock making application hang.
MT aware memory allocatorsmtmalloc, umem libraries of Solaris distribution will resolve this kind of scalability problem. libmtmalloc was introduced in Solaris 7; and libumem was introduced in Solaris 9 Update 3. These userland memory allocators are packaged as a drop-in replacement to the standard malloc() and free() library calls; so, to take advantage of these allocators, link the MT application with any of these allocators.
mtmalloc, umem allocators are a redesign of the standard library; and hence results in finer grained locking. These libraries will significantly outperform the standard library in cases where multiple concurrent requests are made to the memory allocator. In the case of a single threaded application, the standard memory allocator will however provide better performance. The standard memory allocator also provides a smaller memory footprint. Note that the trade-off with mtmalloc, umem allocators is much bigger memory footprint, due to the way the memory gets allocated. For these reasons the standard memory allocator may be preferred in cases where the advantages of mtmalloc and umem, do not apply. Make sure to experiment with these memory allocators to see which one fits best for your application.
Linking with mtmalloc or umemAt compile time, the application can be linked against mtmalloc or umem library. Adding -lmtmalloc or -lumem, option to the link line results in the application being linked appropriately.
% cc -mt -o my_program my_program.c -lmtmalloc or
% cc -mt -o my_program my_program.c -lumem
You can check the library dependency with ldd my_program.
If re-building the application by linking with mtmalloc or umem, is not feasible, either of these libraries can be preloaded with LD_PRELOAD environment variable, when the program is executed.
% setenv LD_PRELOAD libmtmalloc.so
% ./my_program
or
% setenv LD_PRELOAD libumem.so
% ./my_program
You can verify whether the library is preloaded, with pldd `pgrep my_program`.
Posted at 01:46AM Jun 23, 2009 by Giri Mandalika in Benchmarks | Comments[0]
PeopleSoft HRMS 8.9 Self-Service Benchmark on M3000 & T5120 Servers
Sun published the PeopleSoft HRMS 8.9 Self-Service benchmark results today. The benchmark was conducted on 3 x Sun SPARC Enterprise M3000 and 1 x Sun SPARC Enterprise T5120 servers. Click on the following link for the full report with the benchmark results.
Admittedly it is Sun's first PeopleSoft benchmark after a hiatus of over five years. However I am glad that we came up with a very nice cost effective solution in our comeback effort to the PeopleSoft applications' benchmarking.
Some of the notes and highlights from this competitive benchmark are as follows.
-
The benchmark measured the average search and save transaction response times at a peak load of 4,000 concurrent users.
-
4,000 users is the limitation of the benchmark kit. All vendors using this benchmark kit are bound to this limitation . Hence it is easy to compare the performance as the throughput achieved by each vendor will be the same. In comparing the benchmark results from workloads like these, lower average [transaction response times, CPU, memory utilizations] and the hardware in use (lesser the better), usually indicate better performance.
-
IBM and Sun are the only vendors who published benchmark results with PeopleSoft HRMS 8.9 Self-Service benchmark kit.
-
Sun's benchmark results are superior relative to IBM's best published result on a combination of z990 2084-C24 and eServer pSeries p690 servers. While I leave the price comparisons to the reader1, I'd like to show the performance numbers extracted from the benchmark reports published by Sun and IBM. All the following data/information is available in the benchmark reports. Feel free to draw your own conclusions.
Average Transaction Response Times
Vendor Single User
Search (sec)4,000 Users
Search (sec)Single User
Save (sec)4,000 Users
Save (sec)Sun 0.78 0.77 0.71 0.74 IBM 0.78 1.35 0.65 1.01 Average CPU Utilizations
Vendor Web Server
CPU%App Server1
CPU%App Server2
CPU%DB Server
CPU%Sun 23.10 66.92 67.85 27.45 IBM 45.81 59.70 N/A 40.66 Average Memory Utilizations
Vendor Web Server
GBApp Server1
GBApp Server2
GBDB Server
GBSun 4.15 3.67 3.72 5.54 IBM 5.00 15.70 N/A 0.3 (Huh!?) Hardware Configuration
Vendor: Sun Microsystems
Topology Diagram
Tier Server
ModelServer
CountProcessor Processor
SpeedProcessor
Count#Cores per
ProcessorMemory Web T5120 1 UltraSPARC-T2 1.2 GHz 1 4 8 GB App M3000 2 SPARC64-VII 2.52 GHz 1 4 8 GB DB M3000 1 SPARC64-VII 2.52 GHz 1 4 8 GB 2 x Sun Storage J4200 arrays were used to host the database. Total disk space: ~1.34 Terabytes. Consumed only 120 GB disk space -- 115 GB for data on one array; and 5 GB for redo logs on the other array.
Vendor: IBM
Tier Server
ModelServer
CountProcessor Processor
SpeedProcessor
Count#Cores per
ProcessorMemory Web p690 (7040-681) 1 POWER4 1.9 GHz 4 NA (?) 12 GB App p690 (7040-681) 1 POWER4 1.9 GHz 12 NA (?) 32 GB DB zSeries 990, model 2084-C24 1 z990 Gen1 ??? 6 NA (?) 32 GB 1 x IBM TotalStorage DS8300 Enterprise Storage Server, 2107-922 ws used to host the database. Total disk space: ~9 Terabytes.
-
The combination of Sun SPARC Enterprise M3000 and T5120 servers consumed 1030 Watts on the average in a 7RU space in achieving 4,000 concurrent users. That is, in the case of similarly configured workloads, M3000/T5120 support 3.88 users per watt of the power consumed; and 571 users per rack unit.
Just like our prior Siebel and Oracle E-Business Suite Payroll 11i benchmarks, Sun collaborated with Oracle Corporation in executing this benchmark. And we sincerely thank our peers at Oracle Corporation for all their help and support over the past few months in executing this benchmark.
___________I'm planning to post some of the tuning tips to run PeopleSoft optimally on Solaris 10. Stay tuned ..
1: It is relatively hard to obtain IBM's server list prices. On the other hand, it is very easy to find the list prices of Sun servers' from http://store.sun.comPosted at 12:21PM Feb 17, 2009 by Giri Mandalika in Benchmarks | Comments[4]
Saturday Oct 10, 2009
