Yufei Zhu's Weblog

Tuesday Aug 22, 2006

CoolStack

1.html We just announced coolstack, a collection of most commonly used opensource software, compiled with sun studio 11, optimized for Solaris/Ultrasparc Platform.

It includes an AMP package (Apache, PHP and MySQL) and individual MySQL, Squid, PHP, Perl and Tomcat package.
It would make life easier to install SAMP on Solaris. It also includes README in each package explaining what flag and option have been used to compile each package in case you need to recompile.

Try it out and let us know how you feel about it.

Tuesday Aug 08, 2006

MySQL InnoDB performance tuning on T2000

innodb.html

MySQL is one of the most popular opensource database.  It supports several pluggable storage engines, MyISAM manages non-transactional tables and it is good performer especially on MySQL 5.1.  The InnoDB and BDB storage engines provide transaction-safe tables. They were acquired by Oracle not so long ago.  The MEMORY storage engine which formerly known as heap storage engine provides in-memory tables.

The InnoDB supports both file system and raw disks on Solaris. It does row level locking.
 
MySQL InnoDB allocates a few types of memory buffers out of the heap.  The main buffer that is sized by  innodb_buffer_pool_size is used to cache data and indexes of the tables. Each buffer block is 16k bytes.  There is a dedicated thread that does pread to bring in the data to the buffer from the disk. There are also other types of memory buffers like sort buffer, log buffer, query_cache buffer.  This is how to monitor MySQL statistics to get idea of whether need to increase those buffer size.


mysql> show innodb status 

mysqladmin extended-status

It is important to understand how MySQL executes select queries by explain. That can help optimize the select performance.

MySQL is single process multi-threaded. For each new user connection, there is 1 threaded created from MySQL. We
can control the number of concurrent threads that can run simultaneous by limiting  innodb_thread_concurrency. Normally on T2000 we set it equal to the number of cpus.
 
There is no dedicated log write threads.  There is one io thread that wakes up frequently to do group writes but in our evaluation I rarely see it fires.  Each user thread could issue pwrite when the transaction is commited. And it is serialized! The default innodb_flush_log_at_trx_commit is 1, meaning that after each transaction is committed, there is a pwrite first, then followed by fsflush.  If set innodb_flush_log_at_trx_commit to 0, the performance would improve quite dramatically, pwrite and fsflush is only done every one second instead of after each transaction commit. But the risk is that customer might lose one second of data in case of power loss. innodb_flush_log_at_trx_commit  = 2 would issue pwrite after each commit but fsflush every 1 second.  If log disk response time is not fast enough, it could become a huge performance bottleneck.  It is very important to size enough IOPs for log disks.  We have seen performance being doubled or tripled right away by using a few more disks for the log files.  For performance purpose,  if you use filesystems to store datafiles and log files, you should use forcedirectio.

This is an example of how to use Solaris Volume Manager(svm) to create a soft partition and then set up the logfile links to this location

metadb -d -f c1t0d0s7 c1t1d0s7
metadb -a -f c1t0d0s7 c1t1d0s7

Then create a concatenation

metainit d10 1 4 c1t0d0s0 c1t1d0s0 c1t2d0s0 c1t3d0s0

You can stop here and do newfs on /dev/md/rdsk/d10 or continue to create a soft partition(for example 10G) on d10

metainit d101 -p d10 10G  

After that

newfs /dev/md/rdsk/d101
mount -o forcedirectio /dev/md/dsk/d101  /logs


There is innodb_flush_method.  The default is performing very well on Solaris which uses fsync() to flush both the data
and log files.
 

When MySQL starts, there are 10 threads created. Thread 1 handles network connections and create new threads for new user connections. Then there are 4 io threads. There is 1 log write thread doing group commits once a while. There is one insert thread, InnoDB stores data physically by its primary key order and insertion would not cause random reads on the disk, but for the non unique secondary index insert, it could cause a lot of random reads. So to avoid that, InnoDB checks whether the secondary index page is in the buffer pool. If it is, InnoDB does the insertion directly to  the index page. If the index page is not found in the buffer pool, InnoDB inserts the record to a special insert buffer structure. The insert buffer is kept so small that it fits entirely in the buffer pool, and insertions can be done very fast. Periodically, the insert buffer is merged into the secondary index trees in the database. Often it is possible to merge several insertions to the same page of the index tree, saving disk I/O operations. There is another thread that I am not sure its purpose.  Thread 6 is handling rollback. Thread 7 and 8 are monitoring thread. Thread 9 is the master thread. Thread 10 is the signal handler thread

Here are a few steps of how to tune MySQL InnoDB performance on T2000 or other Solaris platforms
  • Choose at least MySQL version 5.0.22.  In my experience MySQL 4.1  has more user locks inside hence it doesn't scale well.   MySQL 5.1 beta  is even better than 5.0.

  • If your database size is more than 4GB, using MySQL 64bit is better because you can use more than 4GB memory for innoDB database buffers.

  • MySQL 5.0.22 64bit from MySQL website is a pretty good performer.  But in case if you want to compile it on your own on Solaris, You can refer a good  whitepaper from Jenny Chen from Sun Microsystem.  Compile time can be reduced to a few minutes using parallel build "dmake" in sun studio 11 

  • You can use libumem or libmtmalloc to get scalable memory allocation performance on multi-threaded MySQL.  The way to do this is before mysql is started,  setenv LD_PRELOAD_64 /usr/lib/sparcv9/libumem.so.  This example is for 64bit MySQL
  • Always keep an eye on MySQL internal statistics. Normally you can look at reads versus hits and tell if you need to increase certain buffer size
    • mysql> show innodb status;
    • mysql> show full processlist;
    • mysqladmin extended-status
  • Making sure there is no io bottleneck. Collect  "iostat -xtcnz 5" on the system and look at that what is the disk response time (column asvc_t in miliseconds) and what is the average outstanding ios(column actv). Making sure that you have enough IOPs for the logfile disks and datafile disks.
  • On solaris systems, you can use "prstat -Lmc" to monitor  the active processes on the systems.
  • MySQL startup options are very important. Following is an example of /etc/my.cnf I use to evaluate MySQL sysbench OLTP performance on T2000
#
#MySQL configure
#
# The MySQL server
[mysqld]
port=3306
socket=/tmp/mysql.sock
user=root
datadir=/data
basedir=/usr/local/mysql
max_connections=1600
max_connect_errors=10
table_cache=2048
max_allowed_packet=1048576
binlog_cache_size=1048576
key_buffer_size=16777216
max_heap_table_size=67108864
sort_buffer_size=65536
join_buffer_size=1048576
query_cache_size=26214400
thread_cache=16
thread_concurrency=32
thread_stack=64K
ft_min_word_len=4
default_table_type=MYISAM
transaction_isolation=REPEATABLE-READ
tmp_table_size=64M
skip-locking
skip-concurrent-insert
server-id=1
#######below is for innodb############
innodb_status_file=0
innodb_data_file_path=ibdata1:100M:autoextend
innodb_buffer_pool_size=1800M
innodb_additional_mem_pool_size=20M
innodb_log_file_size=200M
innodb_log_files_in_group=2
innodb_log_buffer_size=80M
innodb_flush_log_at_trx_commit=1
innodb_lock_wait_timeout=300
innodb_max_dirty_pages_pct=90
innodb_thread_concurrency=32


Monday Aug 07, 2006

MySQL MyISAM performance tuning on T2000

myisam.html

The default storage engine of MySQL is MyISAM.  Unlike InnoDB,  it stores each MyISAM table in three files, the schema file(.frm), the data file(.MYD) and the index file(.MYI). It only supports filesystem. It does table level locking. 

Compared to InnoDB, MyISAM doesn't have its own data buffer cache, it only has index buffer cache, the key buffer (variable key_buffer_size).  We need to use file system buffer cache for data cache for MyISAM tables.

 
Here are a few steps of how to tune MySQL MyISAM performance on T2000 or other Solaris platforms

  • There is a very good performance feature in MySQL 5.1 beta  that allows MyISAM using mmap memory instead of malloced buffers. If you experienced very high mutex contention in earlier MySQL releases,  you can get a huge performance improvments.  How do you know if you have this problem?

    • If you are using MySQL 5.0 or earlier version, during the test you can collect "lockstat sleep 1" and if you see ufs_lockfs_begin and ufs_lockfs_end in the callers column. You are likely to get the performance boost by going to the latest 5.1. You need to have  "myisam_use_mmap=1" in /etc/my.cnf to be able to use this feature. This is an example of lockstat output that has the performance issue.
Adaptive mutex spin: 140294 events in 1.145 seconds (122505 events/sec)

Count indv cuml rcnt spin Lock Caller
-------------------------------------------------------------------------------
31341 22% 22% 0.00 9 0x600052c4d10 ufs_lockfs_end+0x70
30952 22% 44% 0.00 10 0x600052c4d10 ufs_lockfs_begin+0xe4

  • You can use libumem or libmtmalloc to get scalable memory allocation performance on multi-threaded MySQL.  The way to do this is before mysql is started,  setenv LD_PRELOAD_64 /usr/lib/sparcv9/libumem.so.  This example is for 64bit MySQL. For 32bit MySQL you can do setenv LD_PRELOAD /usr/lib/libumem.so
  • Since we have to use filesystem buffer cache to cache data for MyISAM tables, it is important to tune segmap_percent in /etc/system. The default segmap_percent is 12% on solaris, that means you can only get to use 12% of the system memory for filesystem buffer cache.  It depends on your database size, but setting it too high could lead low memory on the system causing excessive paging.  In our case, we set it to 80%. In /etc/system, set segmap_percent=80. You need to reboot the system to make it effective.
  • key cache is important feature for MyISAM to cache index blocks that allows multiple threads to access key buffer simultaneously. You can check the performance of key cache by doing "show status" and look at what is the ratio of key_reads and key_read_requests. It should be less than 1%.  Otherwise, you might need to increase key_buffer_size. The maximum for key_buffer_size is 4G however you can create multiple key cache (The size limit of 4GB applies to each cache individually, not as a group.) That will also help on the situation where access to one key cache structure does not block access to the other key cache
  • Making sure there is no io bottleneck. Collect  "iostat -xtcnz 5" on the system and look at that what is the disk response time (column asvc_t in milliseconds) and what is the average outstanding ios(column actv).
  • On solaris systems, you can use "prstat -Lmc" to monitor  the active processes on the systems.

Calendar

Feeds

Search

Links

Navigation

Referrers