Tuesday Apr 21, 2009
Tuesday Apr 21, 2009
Today Sun announces MySQL 5.4. This is a great day for customers as
they can use systems with many cores much more efficiently. Its a great day for
the MySQL community and the MySQL performance team because we made it happen.
MySQL 5.4 includes a lot of community contributed fixes as well as many
fixes from our team. Mikael
and Allan are blogging about all the cool
new features and the great scalability of MySQL 5.4. I thought I will take
this opportunity to blog about some of the things we tried, and rejected.
Sometimes there are a lot of things to be learnt from things that
do not work
Early on during our performance investigation, we were trying to see if we can reduce some of the contention in Innodb locks. If you are not familiar with Innodb locks, I suggest you read Tim Cook's excellent presentation to MySQL University on this very topic.
In a nutshell, Innodb has 4 kinds of lock modes (shared, exclusive, intention shared, and intention exclusive). Since POSIX synchronization methods support maximum of 2 modes (reader or writer), Innodb implements its own set of locking primitives using a condition variable and a regular mutex. Innodb also implements its own spin followed by a block. When it gets a mutex in an not contended case, it is very fast. However, when spinning for a lock fails, it gets interesting.
A failed spin for a lock means it has to block for the lock to be available. Currently Innodb uses the wait array interface to keep track of who is waiting on what mutex. Unfortunately there is a global mutex protecting the wait array. This global mutex (sync_primary_wait_array) has shown to be hot in some high thread count experiments.
The following callstack illustrates this perfectly.
nsec ---- Time Distribution --- count Stack
16384 | | 8 mysqld`os_mutex_enter+0x4
32768 | | 11 mysqld`sync_array_free_cell+0x28
65536 | | 7 mysqld`mutex_spin_wait+0x194
131072 | | 9 mysqld`trx_undo_assign_undo+0x30
262144 |@ | 34 mysqld`trx_undo_report_row_operation+0x168
524288 |@@@@ | 122 mysqld`btr_cur_update_in_place+0x160
1048576 |@@@@@ | 157 mysqld`btr_cur_optimistic_update+0xbc
2097152 |@@@@@@ | 186 mysqld`row_upd_clust_rec+0x50
4194304 |@@ | 86 mysqld`row_upd_clust_step+0x5f0
As you can see, mutex_spin fails, it has to wait, it looks for a free cell in the sync_array to block. Before it can find a free cell, it has to lock the sync_array and search for a free cell. It blocks (using OS primitives, not Innodb locking) on the sync_primary_wait_array.
I implemented a quick prototype to measure the improvement in performance before asking the Innodb folks to take a closer look. My performance tests showed very minimal improvement in performance.
An important thing to note is that this contention happened when it failed to get a mutex. A thread fails to get a mutex if another thread has already acquired the mutex. So it does not really matter how long (within reasonable limits), the thread that failed to acquire the mutex took to sleep on the lock.
Richard Smith had a great idea of totally bypassing the sync_array interface. He prototyped it and got some good gains, but I will let him talk about it.
This just illustrates how hard performance work can be at times. Sometimes the
number of ideas rejected is more than what got accepted