Tuesday May 10, 2005
Oracle 10g on Solaris 10
Hidden parameters to optimize Oracle 10g on Solaris 10
Hidden parameters to optimize Oracle 10g on Solaris 10
You may have recently installed Oracle 10g on Solaris 10 and wander
into the wonderful world of Oracle hidden parameters. Every time Oracle
is producing a new vintage of the unbreakable database we get a bulkload
of new mysterious parameters. For the Oracle DBA eye, some of them have
a very explicit name (_lgwr_async_io). Some of them have names directly
extracted from a martian dictionary (see _kghdsidx_count).
Now, of course, your noble intent is to do tuning, not debugging. What about
if you obtain a very sexy
"ORA-03113: end-of-file on communication channel"
on your first 1000 users attempt ?
Well, looking into the Oracle Net Dispatcher log, you will see an helpful :
"NS Primary Error: TNS-12535: TNS:operation timed out
NS Secondary Error: TNS-12606: TNS: Application timeout occurred"
And you call Oracle and they will tell us : This is a bug, Sir. Please go
in sqlnet.ora and do not specify the SQLNET.INBOUND_CONNECT_TIMEOUT parameter.
One problem fixed....the only fixed by a documented feature that you should
not use....great start.
Starting the workload again and now you observe some FULL TABLE SCAN. Oops...
I know how to fix this one and here are the "create index" statements.
Unfortunately, the unbreakable database send you a very rude
ORA-00600 [kcbgtcr_5], or ORA-00600 [kcbgcur_3] error message.
Good thing this young lady from oracle had the coolest voice in the world
so it not a problem to call again. And a certain John answers the phone...
Excuse me, may I speak with Virgina ?...ok, I'll wait.
Yes, this is a bug again (3392439) and to fix it , just type :
"ALTER SYSTEM FLUSH BUFFER_CACHE" . Interesting... or you can put this in
your pfile "_db_cache_pre_warm=false" . Oracle is easy.
(By the way, some more 600 errors can occur on Oracle 10g for Solaris x86
and the previous parameters do not fix them. You will need the very
entertaining "_enable_NUMA_optimization = FALSE" to keep going...)
Here we are... my 1000 users are running.
Looking at Statspack and system statistics, I notice a lot of pressure on
the shared pool and latch contention.
First, I made sure I was using ISM with " _use_ism_for_pga = true" Yep...
Then, I discovered that we can now segment the shared pool into multiple separate
zones, each protected by bound latches. How to do this ?
Just say " _kghdsidx_count = 4" and you will get four of those. The maximum
is apparently seven. No idea why....And I can not find this martian dictionary.
And running again.... but oracle is still singing the latch contention hymn.
Could I have a high level of contention on certain blocks ?
To find the culprit, I queried V$LATCH_CHILDREN for the address and joined it
to V$BH to identify the blocks protected by this latch (doing so will show all
blocks that are affected by the warm block).
Two way to fix this :
- If this is on an index (use DBA_EXTENTS to find out this common case) ,
use a reverse-key index.
- If not, set _db_block_hash_buckets to the prime number just larger than twice
the number of buffers.
Do not forget you must have one LRU latch minimum for each database writer.
You can increase them with a very elegant "_db_block_lru_latches= xx"
Just tell me why this is undocumented as it appears absolute best practice ?
And here I am, running again. Now that I fixed the latch issue, the contention
has moved to the log writer. No surprise.
A new feature of Oracle 10g is log parallelism that you can obtain with :
_lgwr_async_io=false
_log_parallelism_dynamic=true
and the tuning of _log_parallelism_max
Looking further into this, it does not provide full parallelism.
And because this is not a 24x7 production system, looks like you can also
do a really,really exciting :
_log_private_parallelism=true
(Common sense could have been _log_parallelism_private=true but this
Oracle engineers like poetry too...)
Oracle did not crash (unbreakable,right ) and I am running as fast as ever.
I realized later that I really did not need to update v$pga_advice all the time
(_smm_advice_enabled=false) or enable auto tuning of undo_retention
(_undo_autotune=false) as I really need this CPU cycles for my transactions
and not for the Oracle kernel.
Finally, here I am using the 21st century software jewel, DTrace
And realize that I am not using malloc() anymore but mmap(). Great !
But can I tune the mmap byte preallocation....oh,yes. Here is our final
undocumented pearl : _realfree_heap_pagesize_hint . Only 28 letters, what
do you think ?
Unbreakable, yes ! Simple, not yet ....
May 10 2005, 09:07:44 AM PDT Permalink
I recently discovered that another cause of the mysterious "ORA-03113: end-of-file on communication channel" message is Solaris 10 patch 119564-01. It affects Oracle 10.1.0.2 and 10.1.0.4 on Solaris 10 (at least with my test setup running in a zone). The Oracle trace output does not immediately provide any help (see below), but selectively backing out patches one at a time identified 119564-01.
Someone should probably file a bug on this with Sun...
So if your trace output looks like this (note the prune_cpus()), you're probably affected:
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
18196
kstwlb: SGA is no longer mapped
Exception signal: 10 (SIGBUS), code: 1 (Invalid address alignment), addr: 0x600000007, PC: [
0xffffffff7b900d3c, prune_cpus()+164]
*** 2005-05-27 09:31:57.866
ksedmp: internal or fatal error
ORA-07445: exception encountered: core dump [prune_cpus()+164] [SIGBUS] [Invalid address ali
gnment] [0x600000007] [] []
Current SQL information unavailable - no SGA.
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedmp()+1008 CALL ksedst() 1052D83F0 ? 104BFE418 ?
104BFE428 ? 000000000 ?
1052D8CE8 ? 000000008 ?
ssexhd()+992 CALL ksedmp() 000000002 ? 000105000 ?
105128000 ? 000105128 ?
000105000 ? 000000001 ?
__sighndlr()+12 PTR_CALL 0000000000000000 00000000A ? 1052DFEF0 ?
000105000 ? 000105000 ?
000000010 ? 10512C000 ?
call_user_handler() CALL __sighndlr() 00000000A ? 1052DFEF0 ?
+992 1052DFC10 ? 1002CDB40 ?
000000000 ? 000000009 ?
Posted by Matt on May 27, 2005 at 09:42 AM PDT #
Posted by Jim Watson on May 02, 2006 at 12:40 PM PDT #
Posted by Konstantinos Hairopoulos on October 22, 2006 at 09:06 AM PDT #
Nope.
Posted by 141.146.28.114 on March 17, 2008 at 08:14 AM PDT #
pls could any one tell me how to instal oracle 10g on solaris 10
Posted by wairhe Emefe on May 28, 2009 at 07:45 PM PDT #