GIRI MANDALIKA's SCRATCHPAD
Ramifications of the Solaris 10 kernel patch 137111
Summary
A recent code change in Solaris 10 inadvertantly exposed an inherent bug in some of the 32-bit applications that rely on their own memory allocators. Due to this, some of the 3rd party applications which were working earlier without the KU 137111 may crash on Solaris 10/SPARC with the KU 137111 (any revision).
Symptoms & the Cause
It was identified that majority of such application failures are mainly due to the applications' custom memory allocator that incorrectly returns 4-byte aligned mutexes in place of the required 8-byte aligned mutexes. In Solaris, mutex_t and pthread_mutex_t structures have been defined to be aligned on an 8-byte boundary. Both of those structures contain the upad64_t member, which is a double even for the 32-bit applications. The natural alignment of a double is 8 bytes; and per the SPARC Compliance Definition 2.4, the structures must be aligned according to their strictest member. That is, applications which create 4-byte aligned mutexes are technically non-compliant on Solaris/SPARC (for the sake of simplicity, such code will be referred to as the non-complying code for the remainder of this blog entry).
Due to a change in the implementation of the userland mutexes introduced by CR 6296770 in KU 137111-01, objects of type mutex_t and pthread_mutex_t must start at 8-byte aligned addresses. If this requirement is not satisfied, all non-compliant applications on Solaris/SPARC may fail with the signal SEGV with a callstack similar to the following one or with similar callstacks containing the function mutex_trylock_process.
*_atomic_cas_64(0x141f2c, 0x0, 0xff000000, 0x1651, 0xff000000, 0x466d90) set_lock_byte64(0x0, 0x1651, 0xff000000, 0x0, 0xfec82a00, 0x0) fast_process_lock(0x141f24, 0x0, 0x1, 0x1, 0x0, 0xfeae5780) ...
Patches & the Next Steps
Note that only non-compliant 32-bit applications will be affected by the KU 137111. All other complying 32-bit applications continue to run as expected even with the KU 137111 - hence the customers, partners, ISVs and the other software vendors must understand the fact that it is not a Solaris issue. Customers running into this issue must work with the respective software vendors to obtain a patch/fix. We suggest the ISVs and the rest of the software vendors to pro-actively check their 32-bit native code for any discrepancies like the one mentioned in this blog entry.
In our testing of some of the enterprise applications, we have identified Oracle's Siebel CRM as one of the potential applications that is vulnerable to the KU 137111. It appears that IBM's Lotus Domino Server is also prone to a crash on Solaris 10 with the same kernet patch. Speaking of these two known cases, Oracle/Siebel and IBM/Lotus Domino customers (running Solaris) should approach Oracle and IBM Corporations respectively but not Sun Microsystems for a proper fix.
As it may take some time for the ISVs / software vendors to identify and fix the non-complying code in their applications, Sun is planning to provide an interim fix to the mutex byte alignment issue in the form of a Solaris kernel patch. As of this writing, we expect the fix to be integrated into the KU 137137-07. The fix is already available in the latest update of the Solaris, Solaris 10 10/08. Those who cannot upgrade to Solaris 10 10/08 from the prior versions of Solaris 10 must wait for the patch KU [Updated 12/07/08] 137137-07 137137-09.
One must note that the fix in Solaris is a tentative one that allows the non-complying code to run on SPARC hardware for the time being. There is no guarantee that the non-complying code continues to run 'as is' in the future with new Solaris kernel patches and/or major updates/releases of the Solaris operating system. So the best long term solution is for the software vendors to fix the non-compliant code before it is too late.
Acknowledgments
Steve S and Roger F of Sun Microsystems.Posted at 06:15PM Nov 01, 2008 by Giri Mandalika in Solaris | Comments[1]
Siebel on Sun CMT hardware : Best Practices
The following suggested best practices are applicable to all Siebel deployments on CMT hardware (Tx00, T5x20, T5x40) running Solaris 10 [Note: some of this tuning applies to Siebel running on conventional hardware running Solaris]. These recommendations are based on our observations from the 14,000 user benchmark on Sun SPARC Enterprise T5440. Your mileage may vary.
All Tiers- Ensure that the system's firmware is up-to-date.
- Check the Sun System Firmware Release Hub for the latest firmware.
- Upgrade to the latest update release of Solaris 10.
- Note to the customers running Siebel on Solaris 5/08: apply the kernel patch 137137-07 as soon as it is available on sunsolve.sun.com web site. Patch 137137-07 and later revisions, Solaris 10 10/08 will have the workaround to a critical Siebel specific bug. Oracle Corporation will eventually fix the bug in their codebase - in the meantime Solaris is covering for Siebel and all other 32-bit applications with their own memory allocators that return unaligned mutexes. Check the RFE 6729759 Need to accommodate non-8-byte-aligned mutexes and Oracle's Siebel support document 735451.1 Do NOT apply Kernel Patch 137111-04 on Solaris 10 for more details.
- Enable 256M large pages on all nodes. By default, the latest update of Solaris 10 will use a maximum of 4M pages even when 256M pages are a good fit.
- 256M pages can be enabled with the following
/etc/systemtunable.* 256M pages set max_uheap_lpsize=0x10000000
- Set the following in a shell or add the following lines to the shell's profile (bash/ksh).
ulimit -n 2048 export LD_PRELOAD_32=/usr/lib/extendedFILE.so.1:$LD_PRELOAD_32
Technically the file descriptor limit can be set to as high as 65536. However from the application's perspective, 2048 is a reasonable limit.
To improve the scalability of the multi-threaded workloads, preload MT-hot object-caching memory allocation library like libumem(3LIB), mtmalloc(3MALLOC).
- eg., To preload the
libumem library, set the LD_PRELOAD_32 environment variable in the shell (bash/ksh) as shown below.
export LD_PRELOAD_32=/usr/lib/libumem.so.1:$LD_PRELOAD_32
Web and the Application servers in the Siebel Enterprise stack are 32-bit. However Oracle 10g or 11g RDBMS on Solaris 10 SPARC is 64-bit. Hence the path to the libumem library in the PRELOAD statement differs slightly in the database-tier as shown below.
export LD_PRELOAD_64=/usr/lib/sparcv9/libumem.so.1:$LD_PRELOAD_64
Be aware that the trade-off is the increase in memory footprint -- you may notice 5 to 20% increase in the memory footprint with one of these MT-hot memory allocation libraries preloaded.
Application fared well with the following set of TCP/IP parameters on Solaris 10 5/08.
ndd -set /dev/tcp tcp_time_wait_interval 60000 ndd -set /dev/tcp tcp_conn_req_max_q 1024 ndd -set /dev/tcp tcp_conn_req_max_q0 4096 ndd -set /dev/tcp tcp_ip_abort_interval 60000 ndd -set /dev/tcp tcp_keepalive_interval 900000 ndd -set /dev/tcp tcp_rexmit_interval_initial 3000 ndd -set /dev/tcp tcp_rexmit_interval_max 10000 ndd -set /dev/tcp tcp_rexmit_interval_min 3000 ndd -set /dev/tcp tcp_smallest_anon_port 1024 ndd -set /dev/tcp tcp_slow_start_initial 2 ndd -set /dev/tcp tcp_xmit_hiwat 799744 ndd -set /dev/tcp tcp_recv_hiwat 799744 ndd -set /dev/tcp tcp_max_buf 8388608 ndd -set /dev/tcp tcp_cwnd_max 4194304 ndd -set /dev/tcp tcp_fin_wait_2_flush_interval 67500 ndd -set /dev/udp udp_xmit_hiwat 799744 ndd -set /dev/udp udp_recv_hiwat 799744 ndd -set /dev/udp udp_max_buf 8388608
Siebel Application Tier
- All T-series systems (T1000/T2000, T5120/T5220, T5120/T5240, T5440) support the 256M page size. However Siebel's
siebmtshwscript restricts the page size to 4M. Comment out the following lines in$SIEBEL_HOME/siebsrvr/bin/siebmtshw.# This will set 4M page size for Heap and 64 KB for stack MPSSHEAP=4M MPSSSTACK=64K MPSSERRFILE=/tmp/mpsserr LD_PRELOAD=/usr/lib/mpss.so.1 export MPSSHEAP MPSSSTACK MPSSERRFILE LD_PRELOAD
- Configure the Object Managers in such a way that each OM will be handling at least 200 active users. Siebel's standard recommendation of 100 or less users per Object Manager is suitable for conventional systems but not ideal for CMT systems like Tx000, T5x20, T5x40, T5440. Sun's CMT systems are ideal for running multi-threaded processes with tons of LWPs per process. Besides, there will be significant improvement in the overall memory footprint with less number of Siebel Object Managers.
- Oracle 10g R2 10.2.0.4 32-bit client is supposed to have a fix for the process crash issue - however it wasn't verified in our test environment.
Siebel Database Tier
- Eliminate double buffering by forcing the file system to use direct I/O.
Oracle database caches the data in its own cache within the shared global area (SGA) known as the database block buffer cache. Database reads and writes are cached in block buffer cache so the subsequent accesses for the same blocks do not need to re-read the data from the operating system. On the other hand, file systems on Solaris default to reading the data though the global file system cache for improved I/O operations. That is, by default each read is cached potentially twice - one copy in the operating system's file system cache, and the other copy in Oracle's block buffer cache. In addition to double caching, there is also some extra CPU overhead for the code which manages the operating system's file system cache. The solution is to eliminate double caching by forcing the file system to bypass the OS file system cache when reading and writing to the disk.
- In the 14,000 user benchmark setup, the UFS file systems (holding the data files and the redo logs) were mounted with the
forcedirectiooption.eg.,mount -o forcedirectio /dev/dsk/<partition> <mountpoint>
- In the 14,0000 user benchmark setup, there are two Sun StorateTek 2540 arrays connected to the T5440 - one array was holding the data files, where as the other was holding the Oracle redo log files.
- In the 14,0000 user benchmark setup, two online redo logs were configured each with 10 GB disk space. When all 14,000 concurrent users are on-line, there is only one log switch in a 60 minute period.
Oracle Database Initialization Parameters
DB_FILE_MULTIBLOCK_READ_COUNT to appropriate value. DB_FILE_MULTIBLOCK_READ_COUNT parameter specifies the maximum number of blocks read in one I/O operation during a sequential scan.
- In the 14,0000 user benchmark configuration,
DB_BLOCK_SIZE was set to 8 kB. During the benchmark run, the average reads are around 18.5 kB per second. Hence setting DB_FILE_MULTIBLOCK_READ_COUNT to a high value does not necessarily improve the I/O performance. A value of 8 for the database init parameter DB_FILE_MULTIBLOCK_READ_COUNT seems to perform better.CPU_COUNT to 64. Otherwise, by default Oracle RDBMS assumes 128 and 256 for the CPU_COUNT on T5240 and T5440 respectively. Oracle's optimizer might use a completely different execution plan when it notices such a large number for the CPU_COUNT; and the resulting execution plan need not necessarily be an optimal one. In the 14,000 user benchmark, setting CPU_COUNT to 64 produced optimal execution plans.
_enable_NUMA_optimization to FALSE. On these multi-socket servers, _enable_NUMA_optimization will be set to TRUE by default. During the 14,000 user benchmark run, we noticed intermittent shadow process crashes with the default behavior. We didn't realize any additional gains either with the default NUMA optimizations.
Siebel Web Tier
- Upgrade to the latest service pack of Sun Java Web Server 6.1 (32-bit).
- Run the Sun Java Web Server in multi-process mode by setting the
MaxProcsdirective inmagnus.confto a value that is greater than 1. In the multi-process mode, the web server can handle requests using multiple processes with multiple threads in each process.- When you specify a value greater than 1 for the
MaxProcs, the web server relies on the operating system to distribute connections among/between multiple web server processes. However many modern operating systems including Solaris do not distribute connections evenly, particularly when there are a small number of concurrent connections. - Tune the maximum simultaneous requests by setting the
RqThrottleparameter in themagnus.conffile to appropriate value. A value of 1024 was used in the 14,000 user benchmark.
Posted at 07:30AM Oct 13, 2008 by Giri Mandalika in Solaris | Comments[1]
Solaris/SPARC: Oracle 11gR1 client for Siebel 8.0
First things first - Oracle 11g Release 1 for Solaris/SPARC is available now; and can be downloaded from here.
In some Siebel 8.0 environments where Oracle database is being used, customers might be noticing intermittent Siebel object manager crashes under high loads when the work is actively being done by tons of LWPs with less number of object managers. Usually the call stack might look something like:
/export/home/siebel/siebsrvr/lib/libsslcosd.so:0x4ad24
/lib/libc.so.1:0xc5924
/lib/libc.so.1:0xba868
/export/home/oracle/lib32/libclntsh.so.10.1:kpuhhfre+0x8d0 [ Signal 11 (SEGV)]
/export/home/oracle/lib32/libclntsh.so.10.1:kpufhndl0+0x35ac
/export/home/siebel/siebsrvr/lib/libsscdo90.so:0x3140c
/export/home/siebel/siebsrvr/lib/libsscfdm.so:0x677944
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cQCSSLockSqlCursorGDelete6M_v_+0x264
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSDbConnOCacheSqlCursor6MpnMCSSSqlCursor__I_+0x734
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSDbConnRTryCacheSqlCursor6MpnMCSSSqlCursor_b_I_+0xcc
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSSqlObjOCacheSqlCursor6MpnMCSSSqlCursor_b_I_+0x6e4
/export/home/siebel/siebsrvr/lib/libsscfdm.so:__1cJCSSSqlObjHRelease6Mb_v_+0x4d0
/export/home/siebel/siebsrvr/lib/libsscfom.so:__1cKCSSBusComp2T5B6M_v_+0x790
/export/home/siebel/siebsrvr/lib/libsscacmbc.so:__1cJCSSBCBase2T5B6M_v_+0x640
/export/home/siebel/siebsrvr/lib/libsscasabc.so:0x19275c
/export/home/siebel/siebsrvr/lib/libsscfom.so:0x318774
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cWCSSSWEFrameMgrInternalPReleaseBOScreen6M_v_+0x10c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalJBuildView6MpkHnLBUILDSCREEN_pnSCSSSWEViewBookmark_ipnJCSSBusObj_pnPCSSDrilldownDef_p1pI_I_+0x18b0
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalJBuildView6MrpnKCSSSWEView_pkHnLBUILDSCREEN_pnSCSSSWEViewBookmark_ipnJCSSBusObj_pnPCSSDrilldownDef_p4_I_+0x50
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrUActionBuildViewAsync6MpnbACSSSWEActionBuildViewAsync_pnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x38c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrODoPostedAction6MpnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x104
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cPCSSSWEActionMgrSCheckPostedActions6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo_ri_I_+0x378
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cWCSSSWEFrameMgrInternalSInvokeAppletMethod6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo_rnOCSSStringArray__I_+0x12f8
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorMInvokeMethod6MpnQCSSSWEHtmlStream_pnKCSSSWEArgs_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x88c
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorP_ProcessCommand6MpnQCSSSWEHtmlStream_pnNWWEReqModInfo_rpnJWWECbInfo__I_+0x860
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorOProcessCommand6MpnUCSSSWEGenericRequest_pnVCSSSWEGenericResponse_rpnNWWEReqModInfo_rpnJWWECbInfo__I_+0x9bc
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:
__1cSCSSSWECmdProcessorOProcessCommand6MpnRCSSSWEHttpRequest_pnSCSSSWEHttpResponse_rpnJWWECbInfo__I_+0xd8
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cSCSSServiceSWEIfaceHRequest6MpnMCSSSWEReqRec_pnRCSSSWEResponseRec__I_+0x404
/export/home/siebel/siebsrvr/lib/libsscaswbc.so:__1cSCSSServiceSWEIfaceODoInvokeMethod6MpkHrknOCCFPropertySet_r3_I_+0xa80
/export/home/siebel/siebsrvr/lib/libsscfom.so:__1cKCSSServiceMInvokeMethod6MpkHrknOCCFPropertySet_r3_I_+0x244
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:__1cOCSSSIOMSessionTModInvokeSrvcMethod6MpkHp13rnISSstring__i_+0x124
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:
__1cOCSSSIOMSessionMRPCMiscModel6MnMSISOMRPCCode_nMSISOMArgType_LpnSCSSSISOMRPCArgList_4ripv_i_+0x5bc
/export/home/siebel/siebsrvr/lib/libsstcsiom.so:
__1cOCSSSIOMSessionJHandleRPC6MnMSISOMRPCCode_nMSISOMArgType_LpnSCSSSISOMRPCArgList_4ripv_i_+0xb98
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cJCSSClientLHandleOMRPC6MpnMCSSClientReq__I_+0x78
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cJCSSClientNHandleRequest6MpnMCSSClientReq__I_+0x2f8
/export/home/siebel/siebsrvr/lib/libsssasos.so:0x8c2c4
/export/home/siebel/siebsrvr/lib/libsssasos.so:__1cLSOMMTServerQSessionHandleMsg6MpnJsmiSisReq__i_+0x1bc
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cNsmiMainThreadUCompSessionHandleMsg6MpnJsmiSisReq__i_+0x16c
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cM_smiMessageQdDOProcessMessage6MpnM_smiMsgQdDItem_li_i_+0x93c
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cM_smiMessageQdDOProcessRequest6Fpv1r1_i_+0x244
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cN_smiWorkQdDueuePProcessWorkItem6Mpv1r1_i_+0xd4
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cN_smiWorkQdDueueKWorkerTask6Fpv_i_+0x300
/export/home/siebel/siebsrvr/bin/siebmtshmw:__1cQSmiThrdEntryFunc6Fpv_i_+0x46c
/export/home/siebel/siebsrvr/lib/libsslcosd.so:0x5be88
/export/home/siebel/siebsrvr/mw/lib/libmfc400su.so:__1cP_AfxThreadEntry6Fpv_I_+0x100
/export/home/siebel/siebsrvr/mw/lib/libkernel32.so:__1cIMwThread6Fpv_v_+0x23c
/lib/libc.so.1:0xc57f8
Setting the Siebel environment variable SIEBEL_STDERROUT to 1 shows the following heap dump in StdErrOut directory under Siebel enterprise logs directory.
% more stderrout_7762_23511113.txt
********** Internal heap ERROR 17112 addr=35dddae8 *********
***** Dump of memory around addr 35dddae8:
35DDCAE0 00000000 00000000 [........]
35DDCAF0 00000000 00000000 00000000 00000000 [................]
Repeat 243 times
35DDDA30 00000000 00000000 00003181 00300020 [..........1..0. ]
35DDDA40 0949D95C 35D7A888 10003179 00000000 [.I.\5.....1y....]
35DDDA50 0949D95C 0949D8B8 35D7A89C C0000075 [.I.\.I..5......u]
...
...
******************************************************
HEAP DUMP heap name="Alloc environm" desc=949d8b8
extent sz=0x1024 alt=32767 het=32767 rec=0 flg=3 opc=2
parent=949d95c owner=0 nex=0 xsz=0x1038
EXTENT 0 addr=364fb324
Chunk 364fb32c sz= 4144 free " "
EXTENT 1 addr=364f8ebc
Chunk 364f8ec4 sz= 4144 free " "
EXTENT 2 addr=364f7d5c
Chunk 364f7d64 sz= 4144 free " "
EXTENT 3 addr=364f6d04
Chunk 364f6d0c sz= 4144 recreate "Alloc statemen " latch=0
ds 2c38df34 sz= 4144 ct= 1
...
...
EXTENT 406 addr=35ddda54
Chunk 35ddda5c sz= 116 free " "
Chunk 35dddad0 sz= 24 BAD MAGIC NUMBER IN NEXT CHUNK (6)
freeable assoc with mark prv=0 nxt=0
Dump of memory from 0x35DDDAD0 to 0x35DDEAE8
35DDDAD0 20000019 35DDDA5C 00000000 00000000 [ ...5..\........]
35DDDAE0 00000095 0000008B 00000006 35DDDAD0 [............5...]
35DDDAF0 00000000 00000000 00000095 35DDDB10 [............5...]
...
...
EXTENT 2080 addr=d067a6c
Chunk d067a74 sz= 2220 freeable "Alloc statemen " ds=2b0fffe4
Chunk d068320 sz= 1384 freeable assoc with mark prv=0 nxt=0
Chunk d068888 sz= 4144 freeable "Alloc statemen " ds=2b174550
Chunk d0698b8 sz= 4144 recreate "Alloc statemen " latch=0
ds 1142ea34 sz= 112220 ct= 147
223784cc sz= 4144
240ea014 sz= 884
28eac1bc sz= 900
2956df7c sz= 900
1ae38c34 sz= 612
92adaa4 sz= 884
2f6b96ac sz= 640
c797bc4 sz= 668
2965dde4 sz= 912
1cf6ad4c sz= 656
10afa5e4 sz= 656
2f6732bc sz= 700
27cb3964 sz= 716
1b91c1fc sz= 584
a7c28ac sz= 884
169ac284 sz= 900
...
...
Chunk 2ec307c8 sz= 12432 free " "
Chunk 3140a3f4 sz= 4144 free " "
Chunk 31406294 sz= 4144 free " "
Bucket 6 size=16400
Bucket 7 size=32784
Total free space = 340784
UNPINNED RECREATABLE CHUNKS (lru first):
PERMANENT CHUNKS:
Chunk 949f3c8 sz= 100 perm "perm " alo=100
Permanent space = 100
******************************************************
Hla: 255
ORA-21500: internal error code, arguments: [17112], [0x35DDDAE8], [], [], [], [], [], []
Errors in file :
ORA-21500: internal error code, arguments: [17112], [0x35DDDAE8], [], [], [], [], [], []
----- Call Stack Trace -----
NOTE:
function being called is offset bytes from
the _PROCEDURE_LINKAGE_TABLE_.
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
F2365738 CALL
D79741AC ? D79735F8 ?
F2ECD7A0 ?
F286DDB8 PTR_CALL 00000000 949A018 ? 14688 ? B680B0 ?
F2365794 ? F2ECD7A0 ? 14400 ?
F286E18C CALL
1004 ? 1000 ? 1000 ?
F286DFF8 CALL
D79743E0 ? 949E594 ?
...
...
__1cN_smiWorkQdDueu CALL __1cN_smiWorkQdDueu 1C8F608 ? 18F55F38 ?
30A5A008 ? 1A98E208 ?
FDBDF178 ? FDBE0424 ?
__1cQSmiThrdEntryFu PTR_CALL 00000000 1C8F608 ? FDBE0424 ?
1AB6EDE0 ? FDBDF178 ? 0 ?
1500E0 ?
__1cROSDWslThreadSt PTR_CALL 00000000 1ABE8250 ? 600140 ? 600141 ?
105F76E8 ? 0 ? 1AC74864 ?
__1cP_AfxThreadEntr PTR_CALL 00000000 0 ? FF30A420 ? 203560 ?
1AC05AF0 ? E2 ? 1AC05A30 ?
__1cIMwThread6Fpv_v PTR_CALL 00000000 D7A7DF6C ? 17F831E0 ? 0 ? 1 ?
0 ? 17289C ?
_lwp_start()+0 ? 00000000 1 ? 0 ? 1 ? 0 ? FCE6C710 ?
1AC72680 ?
----- Argument/Register Address Dump -----
Argument/Register addr=d7974250. Dump of memory from 0xD7974210 to 0xD7974350
D7974200 0949A018 00014688 00B680B0 F2365794
D7974220 F2ECD7A0 00014400 D7974260 F286DDB8 800053FC 80000000 00000002 D79743E0
D7974240 4556454E 545F3231 35303000 32000000 F23654D4 F2365604 00000000 0949A018
D7974260 00000000 0949A114 0000000A F2ECD7A0 FC873504 00001004 F2365794 F2F0E8D4
D7974280 0949A018 00000000 F2F0E8D4 00001004 00001000 00001000 D79742D0 F286E18C
D79742A0 F6CB7400 FC872CC0 00000004 00CDE34C 4556454E 545F3437 31313200 00000000
D79742C0 00000000 00000001 D79743E0 00000000 F2F0E8D4 00001004 00001000 00007530
D79742E0 00007400 00000001 FC8731D4 00003920 0949A018 00000000 000042D8 00000001
D7974300 D79743E0 0949E594 D7974330 F286DFF8 D7974338 F244D5A8 00000000 01000000
D7974320 364FBFFF 01E2C000 00001000 00000000 00001000 00000000 F2ECD7A0 F23654D4
D7974340 000000FF 00000001 F2F0E8D4 F25F9824
Argument/Register addr=d797345c. Dump of memory from 0xD797341C to 0xD797355C
D7973400 F2365738
...
...
Argument/Register addr=1eb21388. Dump of memory from 0x1EB21348 to 0x1EB21488
1EB21340 00000000 00000000 FE6CE088 FE6CE0A4 0000000A 00000010
1EB21360 00000000 00000000 00000000 00000010 00000000 00000000 00000000 FEB99308
1EB21380 00000000 00000000 00000000 1C699634 FFFFFFFF 00000000 FFFFFFFF 00000001
1EB213A0 00000000 00000000 00000000 00000000 00000081 00000000 F16F1038 67676767
1EB213C0 00000000 FEB99308 00000000 00000000 FEB99308 FEB46EF4 00000000 00000000
1EB213E0 00000000 00000000 FEB99308 00000000 00000000 00000001 0257E3B4 03418414
1EB21400 00000000 00000000 FEB99308 FEB99308 00000000 00000000 00000000 00000000
1EB21420 00000000 00000000 00000000 00000000 1EB213B0 00000000 00000041 00000000
1EB21440 0031002D 00410036 00510038 004C0000 00000000 00000000 00000000 00000000
1EB21460 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1EB21480 00000109 00000000
----- End of Call Stack Trace -----
Although I'm not sure what exactly is the underlying issue for the core dump, my suspicion is that there is some memory corruption in Oracle client's code; and the Siebel Object Manager crash is due to the Oracle bug 5682121 - Multithreaded OCI clients do not mutex properly for LOB operations. The fix to this particular bug would be available in Oracle 10.2.0.4 release; and is already available as part of Oracle 11.1.0.6.0. In case if you notice the symptoms of failure as described in this blog post, upgrade the Oracle client in the application-tier to Oracle 11gR1 and see if it brings stability to the Siebel environment.
Original blog post is at:
http://technopark02.blogspot.com/2007/11/solarissparc-oracle-11gr1-client-for.html
Posted at 09:44PM Nov 27, 2007 by Giri Mandalika in Solaris | Comments[0]
Oracle 10gR2/Solaris x64: Must Have Patches for E-Business Suite 11.5.10
If you have an Oracle E-Business Suite 11.5.10 database running on Oracle 10gR2 (10.2.0.2) / Solaris x86-64 platform, make sure you have the following two Oracle patches to avoid concurrency issues and intermittent Oracle shadow process crashes.
Oracle patches
1) 4770693 BUG: Intel Solaris: Unnecessary latch sleeps by default
2) 5666714 BUG: ORA-7445 ON DELETE
Symptoms
1) If the top 5 database timed events look something similar to the following in AWR, it is very likely that the database is running into the bug 4770693 Intel Solaris: Unnecessary latch sleeps by default.
Top 5 Timed Events
| Event | Waits | Time(s) | Avg Wait(ms) | % Total Call Time | Wait Class |
|---|---|---|---|---|---|
| latch: cache buffers chains | 94,301 | 169,403 | 1,796 | 86.1 | Concurrency |
| CPU time | -- | 5,478 | -- | 2.8 | -- |
| wait list latch free | 247,466 | 4,756 | 19 | 2.4 | Other |
| buffer busy waits | 14,928 | 1,382 | 93 | .7 | Concurrency |
| db file sequential read | 98,750 | 552 | 6 | .3 | User I/O |
Apply Oracle server patch 4770693 to get rid of the concurrency issue(s). Note that the fix will be part of the release 10.2.0.3.
2) If the application becomes unstable and if you notice core dumps in cdump directory, have a look at the corresponding stack traces generated in udump directory. If the call stack looks similar to the following stack, apply Oracle server patch 5666714 to overcome this problem.
Alert log will have the following errors:
Errors in file /opt/oracle/admin/VIS/udump/vis_ora_1040.trc:
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [2168] [] [] []
Fri Jun 15 01:30:38 2007
Errors in file /opt/oracle/admin/VIS/udump/vis_ora_1040.trc:
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [9] [] [] []
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [2168] [] [] []
Fri Jun 15 01:30:38 2007
Errors in file /opt/oracle/admin/VIS/udump/vis_ora_1040.trc:
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [9] [] [] []
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [9] [] [] []
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [2168] [] [] []
% more vis_ora_1040.trc
...
...
*** 2007-06-15 01:30:38.403
*** SERVICE NAME:(VIS) 2007-06-15 01:30:38.402
*** SESSION ID:(1111.4) 2007-06-15 01:30:38.402
Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x878
*** 2007-06-15 01:30:38.403
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [SIGSEGV] [Address not mapped to object] [2168] [] [] []
Current SQL statement for this session:
INSERT /*+ IDX(0) */ INTO "INV"."MLOG$_MTL_SUPPLY" (dmltype$$,old_new$$,snaptime$$,change_vector$$,m_row$$)
VALUES (:d,:o,to_date('4000-01-01:00:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:m)
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst()+23 ? 0000000000000001 00177A9EC 000000000 0061D0A60
000000000
ksedmp()+636 ? 0000000000000001 001779481 000000000 00000000B
000000000
ssexhd()+729 ? 0000000000000001 000E753CE 000000000 0061D0B90
000000000
_sigsetjmp()+25 ? 0000000000000001 0FDDCB7E6 0FFFFFD7F 0061D0B50
000000000
call_user_handler() ? 0000000000000001 0FDDC0BA2 0FFFFFD7F 0061D0EF0
+589 000000000
sigacthandler()+163 ? 0000000000000001 0FDDC0D88 0FFFFFD7F 000000002
000000000
kglsim_pin_simhp()+ ? 0000000000000001 0FFFFFFFF 0FFFFFFFF 00000000B
173 000000000
kxsGetRuntimeLock() ? 0000000000000001 001EBF830 000000000 005E5D868
+683 000000000
kksfbc()+7361 ? 0000000000000001 001FB60A6 000000000 005E5D868
000000000
opiexe()+1691 ? 0000000000000001 0029045D0 000000000 0FFDF9250
0FFFFFD7F
opiall0()+1316 ? 0000000000000001 0028E9FB9 000000000 000000001
000000000
opikpr()+536 ? 0000000000000001 00290B2DD 000000000 0000000B7
000000000
opiodr()+1087 ? 0000000000000001 000E7BE1C 000000000 000000001
000000000
rpidrus()+217 ? 0000000000000001 000E8058E 000000000 0FFDFA6B8
0FFFFFD7F
skgmstack()+163 ? 0000000000000001 003F611D0 000000000 005E5D868
000000000
rpidru()+129 ? 0000000000000001 000E808A6 000000000 005E6FAD0
000000000
rpiswu2()+431 ? 0000000000000001 000E7FD8C 000000000 0FFDFB278
0FFFFFD7F
kprball()+1189 ? 0000000000000001 000E86E6A 000000000 0FFDFB278
0FFFFFD7F
kntxslt()+3150 ? 0000000000000001 0030601F3 000000000 005F7C538
000000000
kntxit()+998 ? 0000000000000001 003058EBB 000000000 005F7C538
000000000
0000000001E4866E ? 0000000000000001 001E4864B 000000000 000000000
000000000
delrow()+9170 ? 0000000000000001 0032020B7 000000000 000000002
000000000
qerdlFetch()+640 ? 0000000000000001 0033545F5 000000000 0EF38B020
000000003
delexe()+909 ? 0000000000000001 0032034EA 000000000 005E6FC50
000000000
opiexe()+9267 ? 0000000000000001 002906368 000000000 000000001
000000000
opiodr()+1087 ? 0000000000000001 000E7BE1C 000000000 0FFDFCD10
0FFFFFD7F
ttcpip()+1168 ? 0000000000000001 003D031AD 000000000 0FFDFEDF4
0FFFFFD7F
opitsk()+1212 ? 0000000000000001 000E77C41 000000000 000E7BA00
000000000
opiino()+931 ? 0000000000000001 000E7B0D8 000000000 005E5B8F0
000000000
opiodr()+1087 ? 0000000000000001 000E7BE1C 000000000 000000000
000000000
opidrv()+748 ? 0000000000000001 000E76A11 000000000 0FFDFF6D8
0FFFFFD7F
sou2o()+86 ? 0000000000000001 000E73E6B 000000000 000000000
000000000
opimai_real()+127 ? 0000000000000001 000E3A7C4 000000000 000000000
000000000
main()+95 ? 0000000000000001 000E3A694 000000000 000000000
000000000
0000000000E3A4D7 ? 0000000000000001 000E3A4DC 000000000 000000000
000000000
--------------------- Binary Stack Dump ---------------------
========== FRAME [1] (ksedst()+23 -> 0000000000000001) ==========
Dump of memory from 0x00000000061D0910 to 0x00000000061D0920
0061D0910 061D0920 00000000 0177A9EC 00000000 [ .........w.....]
========== FRAME [2] (ksedmp()+636 -> 0000000000000001) ==========
Dump of memory from 0x00000000061D0920 to 0x00000000061D0A60
0061D0920 061D0A60 00000000 01779481 00000000 [`.........w.....]
0061D0930 0000000B 00000000 061D0EF0 00000000 [................]
0061D0940 05E5B96C 00000000 05E5C930 00000000 [l.......0.......]
0061D0950 05E5C930 00000000 FE0D2000 FFFFFD7F [0........ ......]
0061D0960 061D0A40 00000000 00000000 00000000 [@...............]
0061D0970 00000000 00000000 00000000 00000000 [................]
...
...
After installing the Oracle database patches, check the installed patches by running opatch lsinventory on your database server.
% opatch lsinventory
Invoking OPatch 10.2.0.1.0
Oracle interim Patch Installer version 10.2.0.1.0
Copyright (c) 2005, Oracle Corporation. All rights reserved..
Oracle Home : /oracle/product/10.1.0
Central Inventory : /export/home/oracle/oraInventory
from : /oracle/product/10.1.0/oraInst.loc
OPatch version : 10.2.0.1.0
OUI version : 10.2.0.1.0
OUI location : /oracle/product/10.1.0/oui
Log file location : /oracle/product/10.1.0/cfgtoollogs/opatch/opatch-2007_Aug_10_21-56-03-PDT_Fri.log
Lsinventory Output file location :
/oracle/product/10.1.0/cfgtoollogs/opatch/lsinv/lsinventory-2007_Aug_10_21-56-03-PDT_Fri.txt
--------------------------------------------------------------------------------
Installed Top-level Products (3):
Oracle Database 10g 10.2.0.1.0
Oracle Database 10g Products 10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 1 10.2.0.2.0
There are 3 products installed in this Oracle Home.
Interim patches (2) :
Patch 4770693 : applied on Thu Aug 02 15:27:23 PDT 2007
Created on 12 Jul 2006, 11:52:39 hrs US/Pacific
Bugs fixed:
4770693
Patch 5666714 : applied on Fri Jul 20 10:21:33 PDT 2007
Created on 29 Nov 2006, 04:52:58 hrs US/Pacific
Bugs fixed:
5666714
--------------------------------------------------------------------------------
OPatch succeeded.
Posted at 10:18PM Aug 10, 2007 by Giri Mandalika in Solaris | Comments[1]
Patches to get extended FILE solution on Solaris 10
The issue is discussed in the blog post Solaris: Workaround to stdio's 255 open file descriptors limitation.
If the system is running any existing major customer releases of Solaris 10 i.e., Solaris 10 3/05 through Solaris 10 11/06, extended FILE facility can be installed on these systems by applying the kernel patch, 125100-04 or later & libc patch, 120473-05 or later.
Systems running Solaris Express (SX) or any OpenSolaris distribution after build 39 does not need the patches mentioned above to get the extended FILE solution for stdio's 256 open files limitation.
Do not forget to read the man pages of extendedFILE(5), enable_extended_FILE_stdio(3C), fopen(3C), fdopen(3C) and popen(3C) to enable the extended FILE solution by pre-loading /usr/lib/extendedFILE.so.1 (run-time solution) or by using the enhanced stdio's interfaces, fopen(), fdopen(), popen() or with the new interface enable_extended_FILE_stdio().
--
Acknowledgments:
Peter Shoults, Sun Microsystems
Posted at 09:48PM May 13, 2007 by Giri Mandalika in Solaris | Comments[4]
Saturday Nov 01, 2008
