Saturday Nov 11, 2006
I had the pleasure of hearing an old Sequent friend Kevin Closson speak about NAS architecture at a recent OSWOUG meeting. It was an interesting and energetic discussion on the direction of NAS in commodity servers. If you are interested at all in the direction of storage technology for databases, you should check out Kevin's blog and paper on this technology.
Tuesday Oct 31, 2006
In light of the recent announcements by Oracle on the creation of "Larry Linux", I thought it would be prudent to comment about where Solaris fits into this picture. Oracle was hoping to fill a perceived hole in the support of Linux for the Enterprise. Oracle believes they can do this better than Red Hat. While won't comment on whether or not Oracle can support Linux better than Red Hat, I am looking forward to seeing the "tars" from Larry Linux
Anyway, this recent announcement led me to realize that Solaris "now more than ever" is the best choice to run your enterprise. If you want open, Solaris is open. If you want iron-clad Unfakeable, Unshakeable, Unbreakable, ... well Solaris has been running enterprises since before Linux was a twinkle in Torvalds eye. Finally, if it comes down to price - Oracle will not cut you any breaks on Database support... You may have to mortgage your house just to get a years support for Oracle. But if you want to save some pocket change on OS support, Solaris beats Red Hat and Larry Linux in that category as well.
Red Hat: $999
Larry Linux: $399
Solaris: $120
Seems like a pretty easy choice to me...
Monday Oct 30, 2006
You would think that the "DUAL" table, a simple stub table, would not be a performance topic - but I have seen this for years on high-end benchmarks. People develop applications or tests for applications which tend to over-use the DUAL table. Most commonly, this comes in the form of "select abc.nextseq from DUAL" and "select sysdate from DUAL". This is typically, not a problem for small severs with a low level of concurrency, but it can be bottle-neck on high-end severs with lots of processors.
The problem with DUAL (in Oracle 9i and below) is that this "fake table" hashes to a "real" cache line
If over-used it can cause a "cache buffers chains" latch contention like crazy. The most dangerous over-use situations are systemic ones. I can get around these issues in most benchmark environments, but cringe when I see the embedded use DUAL.
In BEA websphere, there is a parameter called "TestConnectionsOnReservere". This parameter sends a SQL statement to the database before *EVERY* user statement.... talk about overhead! This not only adds SQL*Net round trips increasing network use, but most commonly uses the "SQL SELECT 1 from DUAL" as the test statement
What is worse, the overhead just continues to increase as the load is increased. Ken Gottry discusses the performance impact in an article he wrote. This study used a 2-way server to show the performance impact. It is much worse on a high-end server.
What can you do?
Avoid setting the TestConnectionsOnReserve within BEA. The performance cost in terms of potential latch contention and network over-head is too high. If you must use this paramenter, use the "X$DUAL" table instead. Oracle 10g, uses this by default and while it avoids the latching issues, the networking component this parameter is still present.
Wednesday Aug 16, 2006
As part of the Second Edition of the famous
Solaris Internals and the new Solaris Performance and Tools book a performance tuning
Wiki has been created. This site is meant to be a living document where best practices, tuning information, and tips are collected.
I have began contributing Oracle performance information to the Solaris applications specific tuning Wiki. I hope you enjoy this repository of information regarding performance on Sun systems.
Friday Aug 04, 2006
"Why does Oracle call times() so often? Is something broken?
When using truss or dtrace to profile Oracle shadow processes, one often sees a lot of calls to "times". Sysadmins often approach me with this query.
root@catscratchb> truss -cp 7700
^C
syscall seconds calls errors
read .002 120
write .008 210
times .053 10810
semctl .000 17
semop .000 8
semtimedop .000 9
mmap .003 68
munmap .003 5
yield .002 231
pread .150 2002
kaio .003 68
kaio .001 68
-------- ------ ----
sys totals: .230 13616 0
usr time: 1.127
elapsed: 22.810
At first glance it would seem alarming to have so many times() calls, but how much does this really effect performance? This question can best be answered by looking at the overall "elapsed" and "cpu" time. Below is output from the "procsystime" tool included in the
Dtrace toolkit.
root@catscratchb> ./procsystime -Teco -p 7700
Hit Ctrl-C to stop sampling...
^C
Elapsed Times for PID 7700,
SYSCALL TIME (ns)
mmap 17615703
write 21187750
munmap 21671772
times 90733199 <<== Only 0.28% of elapsed time
semsys 188622081
read 226475874
yield 522057977
pread 31204749076
TOTAL: 32293113432
CPU Times for PID 7700,
SYSCALL TIME (ns)
semsys 1346101
yield 3283406
read 7511421
mmap 16701455
write 19616610
munmap 21576890
times 33477300 <<== 10.6% of CPU time for the times syscall
pread 211710238
TOTAL: 315223421
Syscall Counts for PID 7700,
SYSCALL COUNT
munmap 17
semsys 84
read 349
mmap 350
yield 381
write 540
pread 3921
times 24985 <<== 81.6% of syscalls.
TOTAL: 30627
According to the profile above, the times() syscall accounts for only 0.28% of the overall response time. It does use 10.6% of sys CPU. The usr/sys CPU percentages are "83/17" for this application. So, using the 17% for system CPU we can calculate the overall amount of CPU for the times() syscall: 100*(.17*.106)= 1.8%.
Oracle uses the times() syscall to keep track of timed performance statistics. Timed statistics can be enabled/disabled by setting the init.ora parameter "TIMED_STATISTICS=TRUE". In fact, it is an *old* benchmark trick to disable TIMED_STATISTICS after all tuning has been done. This is usually good for another 2% in overall throughput. In a production environment, it is NOT advisable to ever disable TIMED_STATISTICS. These statistics are extremely important to monitor and maintain application performance. I would argue that disabling timed statistics would actually hurt performance in the long run.
Tuesday Jul 11, 2006
There are multiple ways to gather trace data. You can instrument the application, pick an oracle sid from sysdba, turn on tracing for all users (ouch), or use a login trigger to narrow down to a specific user. Each of these methods have merit, but recently I desired to gather traces at various user levels.
The problem with most packaged applications, is that they all use the *same* userid. For this Oracle 10G environment, I used this fact to filter only connections of the type that I wanted to sample. I wanted to gather 10046 event trace data when the number of connections was 10, 20, or 30. To achieve this, I used a logon trigger and sampled the number of sessions from v$session to come up with the connection count. I have found this little trick to be very useful in automating collection without modifying the application. I hope this can be useful to you as well.
create or replace trigger trace_my_user
after logon on database
DECLARE
mycnt int;
BEGIN
SELECT count(*)
INTO mycnt
FROM v$session
WHERE username='GLENNF';
if (user='GLENNF') and ((mycnt=10) or (mycnt=20) or (mycnt=30)) then
dbms_monitor.session_trace_enable(null,null,true,true);
end if;
end;
/
Tuesday Mar 14, 2006
If you have been using Method-R for response time based profiling, then you will appreciate this note. The "Cache Buffers Chains" latch pops up from time-to-time when trying to scale applications on high-end systems. It is usually a sign of an application issue so locating the source of contention is critical. There are several notes in metalink (42152.1 and 163424.1) which describe how to find hot blocks, but nothing that uses the the Oracle "10046 event" trace files.
Below is output from a trace file (10gR1) which shows the application waiting on a CBC latch:
WAIT #3: nam='latch: cache buffers chains' ela= 18996 p1=15245584968 p2=116 p3=1
The ADDR in "p1=" is a decimal value which can be converted into hex and used to query the x$bh, v$latch_children, and sys.dba_extents tables to find the objects that are contending CBCs.
I created a script "
CBC_p1_to_obj.sql" which hides the nasty sql and takes the ADDR as input. Below is an example from a recent experiment:
SQL> @CBC_p1_to_obj
Function created.
Enter value for cbc_addr_p1: 15245584968
old 12: x.hladdr = to_hex('&&CBC_ADDR_P1') and
new 12: x.hladdr = to_hex('15245584968') and
SEGMENT_NAME EXTENT# BLOCK# TCH CHILD# SLEEPS
----------------------------------- ---------- ---------- ---------- ---------- ----------
SYSMAN.MGMT_METRICS 6 8 2 944 0
SYSMAN.MGMT_METRICS_RAW_PK 32 113 2 944 0
SYSMAN.MGMT_METRICS_RAW_PK 33 90 2 944 0
SYSMAN.MGMT_METRICS_1HOUR_PK 18 21 2 944 0
SYSMAN.MGMT_METRICS_1HOUR_PK 17 44 2 944 0
SYS.I_DEPENDENCY1 18 14 1 944 0
SYS.WRI$_ADV_TASKS_IDX_01 0 1 1 944 0
XDB.SYS_LOB0000043477C00008$$ 0 3 1 944 0
OLAPSYS.MRAC_OLAP2_AW_PHYS_OBJ_T 0 3 1 944 0
OLAPSYS.MRAC_OLAP2_AW_PHYS_OBJ_T 0 3 1 944 0
XDB.SYS_LOB0000043477C00008$$ 0 3 1 944 0
SEGMENT_NAME EXTENT# BLOCK# TCH CHILD# SLEEPS
----------------------------------- ---------- ---------- ---------- ---------- ----------
SYS.C_OBJ# 16 118 0 944 0
DG.T1PK 19 115 0 944 0
13 rows selected.
I hope you will find this script useful. Let me know if you experience any issues.
Thursday Feb 16, 2006
There seems to be some confusion about how an Oracle
instance uses multi-core processors. From a database point of view, it
can use all CPU resource that is offered by Solaris. To find out what
CPU resources are available, use the "prtdiag" and "psrinfo" commands.
The example below shows
a single board USIV domain on a SF25K. There are a total of 4 * USIV Processors running 8 cores @1.2GHz each. The prtdiag output shows the 4 processors with two CPU ID's each.
The psrinfo command simply shows all 8 cores.
catscratchb:root> /usr/sbin/prtdiag
System Configuration: Sun Microsystems sun4u Sun Fire 15000
System clock frequency: 150 MHz
Memory size: 32768 Megabytes
========================= CPUs =========================
CPU Run E$ CPU CPU
Slot ID ID MHz MB Impl. Mask
-------- ------- ---- ---- ------- ----
/SB01/P0 32, 36 1200 16.0 US-IV 2.2
/SB01/P1 33, 37 1200 16.0 US-IV 2.2
/SB01/P2 34, 38 1200 16.0 US-IV 2.2
/SB01/P3 35, 39 1200 16.0 US-IV 2.2
catscratchb:root> /usr/sbin/psrinfo
32 on-line since 02/07/2006 18:00:23
33 on-line since 02/07/2006 18:00:25
34 on-line since 02/07/2006 18:00:25
35 on-line since 02/07/2006 18:00:25
36 on-line since 02/07/2006 18:00:25
37 on-line since 02/07/2006 18:00:25
38 on-line since 02/07/2006 18:00:25
39 on-line since 02/07/2006 18:00:25
Oracle does size internal latching structures based on the number of CPUs available. This sizing does NOT disallow Oracle shadow process from using CPU resource, it simply makes the system better suited to scale. To find out how many CPUs Oracle thinks are available, run the
following SQL command as sysdba. We have found that Oracle sizes its data structures based on the number or CPU IDs that are reported by psrinfo.
SQL> connect / as sysdba
SQL> select ksppinm name, ksppstvl value, ksppdesc description
from x$ksppi x, x$ksppcv y
where (x.indx = y.indx)
and ksppinm like '%cpu_count%'
order by name;
NAME VALUE DESCRIPTION
--------------- ------ ----------------------------------
cpu_count 8 number of CPUs for this instance
Friday Dec 09, 2005
I am working on shaping the next generation of the Transaction Process Council OLTP benchmark "TPC-E". To make this benchmark more useful and avoid the problems of TPC-C, I would like to have input from people who actually run *real* database systems. I am looking for characteristics such as Locical IO per transaction to help better determine if we are going the right direction with the TPC-E workload.
So, if you have statspack data that you would not mind sharing, please send it my way along with a short description of the machine and CPU resource consumed.
Thanks,
Glenn.Fawcett@Sun.com.
Monday Sep 19, 2005
On several occasions, I have ran into situations where Sun customers are using tnsping as an indicator of network performance. While tnsping does show if a connection to a database exists, the response time is not a true indicator of connect time.
"tnsping" works in the idle loop of the Oracle listener and will not respond until all connections queued by the listener have completed. On a busy system, new connections will be serviced before tnsping responses. This is especially apparent when the "queuesize" parameter has been increased in the listener.ora file. Generally, it is good to service real clients before a ping request, however if you are using this to determine response time, it is not valid. A real performance issue still may exist, but you won't know until digging further.
To get true connect times, I wrote a simple script with a "timex" in front of "select * from dual;" through a listener. "myping.sh" measures the true connect time as well as getting the session count and timestamp. You have to modify the connect strings to connect to your database.
I hope this is helpful,
Glenn
oracle@sumocat:~% myping.sh
Usage: ./myping.sh intvl count
oracle@sumocat:~% myping.sh 20 10
Date_and_Timestamp SessionCNT Connect_Time(sec)
09/19/05 11:08:05 20 0.260000
09/19/05 11:08:25 19 0.280000
09/19/05 11:08:46 19 0.270000
09/19/05 11:09:06 20 0.260000
09/19/05 11:09:27 20 0.270000
Monday Aug 29, 2005
Recently, I have done a few presentations for local customers on Oracle/Sun Performance. I used a presentation originally developed for an internal performance round-table. This original presentation was created by Doug Miller, Vincent Carbone, and I. I changed it around a little for customers. I hope you find this useful.
Presentation: "Oracle/Sun Performance Factors"
Tuesday Jun 21, 2005
Ever since I started reading the "Optimizing Oracle Performance" book by Cary Millsap, I have been salivating over getting a copy of the Hotsos profiler - but alas it is too expensive. Recently I tried to find a copy of their free version "Sparky" but it had been pulled due to support issues.
Finally, I stumbled upon orasrp. This analyzer is written in python and can be used standalone to produce html, or you can browse a directory of trace files via your web browser.
Monday Aug 30, 2004
In the past you could only sample Oracle Wait Events and often missed interesting information. In 10g Oracle introduced a new view, V$ACTIVE_SESSION_HISTORY, which keeps a historical view of wait events. I ran across a great article in the Database Journal which describes how to use this new feature... Pretty cool!
Monday Aug 30, 2004
The RAW vs COOKED debate that has been going on for some time. You will find little argument which is more efficient at caching DB buffers. If and Oracle process can get a block by doing a block get, this uses much less code than issuing an IO and getting it from the UFS cache. If this is the case, why have I ran into numerous DBAs that have tried Raw once but saw worse or no performance improvement from RAW?
Further investigation usually reveals that the FS cache was providing caching benefit, that was not able to be realized within Oracle. Without modifying the SGA size and possibly storage parameters on active tables/indexes, it is hard to realize any benefit with Raw.
A good place to start is by looking at your largest objects. If your statistics are current, you can use the following SQL. OEM also does a good job of showing table sizes.
SQL> select table_name, CACHE, num_rows, BLOCKS/128 MB
from all_tables
order by MB desc
/
TABLE_NAME CACHE NUM_ROWS MB
------------------------------ ----- ---------- ----------
BIGACTIVETABLE N 20000000 13258.2891
LOOKUP Y 50000 150.5703
...
Also, You need to know the index sizes so..
SQL> select index_name,
num_rows,
leaf_blocks/128 MB ,
BUFFER_POOL
from all_indexes
where owner = 'STSC'
order by MB desc
/
INDEX_NAME NUM_ROWS MB BUFFER_
------------------------------ ---------- ---------- -------
BIGACTIVETABLE_PK 20000000 9245.6875 DEFAULT
....
Since we now know what the largest tables and indexes are, we should expect to see them in the buffer cache. This can be done by querying the v$bh table.
SQL> SELECT object_name, count(*)/128 MB
FROM v$bh, all_objects
WHERE object_id=objd
GROUP BY object_name
ORDER BY MB desc
/
OBJECT_NAME MB
------------------------------ ----------
LOOKUP 120.09375
BIGACTIVETABLE 100.98438
BIGACTIVETABLE_PK 75.23438
....
Notice that "BIGACTIVETABLE" and it's index are not being cached very well. This is due to the fact that the storage parameter on the table and index are set to "NO CACHE". This causes Oracle to not actively try to cache these blocks. By alter these storage parameters
"alter table BIGACTIVETABLE cache;"
and increasing the SGA, these tables and indexes can use more Oracle buffer cache. Note, that you can also separate objects into the DEFAULT, KEEP, and RECYCLE buffer pools. Statspack reports calculate hit ratios for the DEFAULT, KEEP, are RECYCLE buffer pools separately. I tend to put all objects I am actively trying to KEEP into the KEEP pool so I can figure my caching efficiency via statspack.
I hope this helps inspire you to give this a try. It is not black magic, give it a try.
Take Care,
Glenn
Tuesday Jun 08, 2004
This is useful for monitoring index builds and large DSS queries. The last thing you want to happen after a query has been running for a few hours is to run out of temporary space. Statspack will give you essentially the same information, but I like to have a simple little query for interactive monitoring.
SQL> select 100*(u.tot/d.tot) "pct_temp_used" FROM
(select sum(u.blocks) tot from v$tempseg_usage u) u,
(select sum(d.blocks) tot from dba_temp_files d) d
/
pct_temp_used
-------------
7.375
Hi Glenn,
Thank you for this article on Oracle 1...