Monday Apr 21, 2008
Monday Apr 21, 2008
On April 15-16, we demoed a few DTrace probes for MySQL 5.0 integrated with Chime visualization Tool at MySQL users conference 2008. Here is an snapshot of the DTrace probes in chime showing the query execution time/count in Chime tool:


The DTrace probes inserted into MySQL 5.0 in the demo are:
provider mysql {
probe data__receive__start(int);
probe data__receive__finish(int);
probe query__plan__start(char *);
probe query__plan__finish(char *);
probe query__execute__start(void *, char *, char *, const char *, char *);
probe query__execute__finish(void *, char *, char *, const char *, char *,int);
probe query__cache__hit(void *, char *, char *, const char *, char *);
probe query__cache__miss(void *, char *, char *, const char *, char *);
probe myisam__wrlck__start();
probe myisam__wrlck__finish();
probe innodb__index__next__start(char *);
probe innodb__index__next__finish(char *);
probe innodb__index__next__same__start(char *);
probe innodb__index__next__same__finish(char *);
probe innodb__index__prev__start(char *);
probe innodb__index__prev__finish(char *);
probe innodb__index__first__start(char *);
probe innodb__index__first__finish(char *);
probe innodb__index__last__start(char *);
probe innodb__index__last__finish(char *);
probe innodb__rnd__init__start(char *);
probe innodb__rnd__init__finish(char *);
probe innodb__rnd__end__start(char *);
probe innodb__rnd__end__finish(char *);
probe innodb__rnd__next__start(char *);
probe innodb__rnd__next__finish(char *);
probe innodb__rnd__pos__start(char *);
probe innodb__rnd__pos__finish(char *);
probe flush__log__start();
probe flush__log__finish();
probe innodb__wait__buff__start();
probe innodb__wait__buff__finish();
};
Many people seeing the demo showed interest to try with the chime tool, per their requests, I'm putting the DTrace patch and chime tool for MySQL DTrace with the instructions on how to use the tool:
1. Apply the DTrace patch for MySQL 5.0.41
#gpatch -p2 -i mysql-5.0.41-dtrace.patch
2. Compile the MySQL src with the DTrace patch integrated:
#
./configure --enable-dtrace --prefix=/usr/local/mysql --with-innodb ->32 bit
# CFLAGS="-m64", CXXFLAGS="-m64" ./configure --enable-dtrace
DTRACEFLAGS='-64'
--prefix=/usr/local/mysql --with-innodb ->64 bit
In MySQL 6.0, there are a few DTrace probes inserted into the source code, you can try with the probes by compiling with "--enable-dtrace" configure option.
3. Install the chime tool for MySQL DTrace: chime.demo.tar by uncompress the chime.demo.tar from /
# tar xvf chime.demo.tar
4. Start Chime Tool as root:
#/opt/OSOL0chime/bin/chime
Instead of using Chime to show the probes in the user interface, we can also use the test scripts: dtrace_test.tar to run the DTrace probes. For example, by running the following script, we can get the time spending by each SQL statements waiting for reading page synchronous from disk, so that we can tuning the innodb_buffer_pool_size or SQL accordingly.
#cat innodb_buffer_wait.d
#!/usr/sbin/dtrace -qs
mysql*:::query-execute-start
{
this->query = copyinstr(arg4);
}
mysql*:::innodb-wait-buff-start
{
self->init = vtimestamp;
}
mysql*:::innodb-wait-buff-finish
/self->init/
{
@innittime[this->query] = sum(vtimestamp - self->init);
self->init = 0;
}
profile:::tick-5s
{
printf("------------------------------------------\n");
printf("Date: %Y\n",timestamp);
printf("Time wait for innodb buffer pool available");
printa(@inittime):
printf("------------------------------------------\n");
}
# ./innodb_buffer_wait.d
----------------------------------------------------------------------------------------
Date: 2008 April 17 13:34:43
Time wait for innodb buffer pool available
SELECT c from sbtest where id between ? and ? order by c 1976800
SELECT SUM(K) from sbtest where id between ? and ? 6184600
...
-----------------------------------------------------------------------------------------
Tuesday Oct 09, 2007
MySQL is one of the world's most popular open source databases, and it is widely used and becoming the database-of-choice for many leading Web 2.0 sites. Like most database servers, the most common bottleneck in the enterprise environment encountered by MySQL is disk I/O. To maximize the performance of MySQL for disk I/O bound workloads on the Solaris operating system on CMT servers (e.g. the Sun Fire T2000 Server), configuration and tuning of MySQL server variables is critical and can make a big difference to performance, as does the optimization of the Solaris filesystem for MySQL, and the configuration of storage arrays.
Configuration Issues
MySQL server's performance can be optimized using various configuration settings. The first step is to read the configuration and system variables by running the command:
mysql> show variables;
Once you have done this, you can take appropriate action to configure/tune the variables for better performance. This action can be one of the following:
1. Change a value in the my.cnf configuration file, or mysql start up options
2. Configure the optimum number of user connections
3. Optimize Solaris file system performance
4. Setup and configure storage disk array
5. Make database schema changes, such as changing design of one or more tables, or adding or modifying indexes
6. Optimize the queries used by the application
This document will only concentrate on changing MySQL configuration settings and file system tuning. Storage configuration and changes at the database design and application level are not covered.
MySQL Server Variables
MySQL has many variables that can be adjusted to change MySQL behavior or for performance purpose. For I/O bound workloads, the most important parameters are memory related variables. MySQL includes several storage engines, including MyISAM, InnoDB, HEAP, and Berkeley DB (BDB), some variables apply to one of the storage engines only, some variables are used in the SQL layer applying to all the storage engines. While using the ACID transaction supported Innodb storage engine, first, we don’t need to configure the following memory-related variables, saving precious memory in the disk I/O bound workload:
· bulk_insert_buffer_size
· key_buffer_size
· key_cache_age_threshold, key_cache_block_size, key_cache_division_limit
· read_buffer_size, read_rnd_buffer_size
There are several memory-related variables that apply to all storage engines
· join_buffer_size – A buffer used for full join. When there are large joins without indexes, increase this buffer size to improve the efficiency.
· sort_buffer_size – A buffer used for the sort result set allocated by each thread. This can speed up ORDER BY and GROUP BY queries.
· query_cache_size – Set this variable to a nonzero value to enable query caching
· query_cache_limit – The maximum size of the cached result set, the larger result set won’t be cached
· query_cache_min_res_unit –query cache allocate memory blocks with the minimum size set by this variable. When the application has a lot of queries with small results,the default block size (4KB) may lead to memory fragmentation. So with small resultsets, decreasing it to 2048 or 1024 bytes might improve performance; with large query resultsets, increasing it to 8192, 16384 or more may improve performance query_cache_type: 0=OFF, 1=ON
The MySQL query cache stores the identical SELECT queries issued by clients to the database server. This makes it possible to locate and re-issue the same queries without repetitive hard parsing activities. MySQL also stores the query's result set in the query cache, which can significantly reduce the overhead of creating complex result sets for queries from the disk or memory caches, reducing both physical and logical I/O. This can speed up applications where repetitive queries of products are being issued. If you see a high value for qcache_hits compared to your total queries at runtime or a low value for qcache_free_memory seen from the mysql>show status; you probably need to increase the value of the query_cache_size parameter accordingly. Otherwise, you would decrease the value of the query_cache_size parameter to save memory resources for the other MySQL cache buffers. If qcache_hit is 0 in the runtime, you would completely turn off the query cache by setting query_cache_type as 0, together with setting query_cache_size as 0, since there is some overhead caused by having the query cache enabled besides wasting the memory resource. If the application uses many simple SELECT queries without them being repeated, having the query cache enabled may actually impede performance by 5-10%. However, for applicationswith many repeated SELECT queries with large resultsets, the performance increase set by the query cache can be 200% or more.
· tmp_table_size – set the maximum memory to allocate to a temporary table automatically created during query execution before MySQL converts it into an on-disk MyISAM table. When you see a lot of queries with the state value shown as “copying to tmp table on disk” when running the mysql>SHOW PROCESSLIST(or mysqladmin –i10 processlist extended-status) command, this means that the temporary resultset was larger than the value set by tmp_table_size, so that the MySQL thread copies the temporary table from RAM to disk. , In such cases, increasing the value can speed up execution of large queries; otherwise, decrease the value to save memory for the MySQL I/O bound workload.
· table_cache – Size this cache to keep most tables open since opening tables can be expensive. The optimum value for table_cache is directly related to the number of tables that need to be opened simultaneously in order to perform multiple-table joins. The table_cache value should be no less than the number of concurrent connections times the largest number tables involved in any one join. 1024 is a good value for applications with a couple of hundred tables (each connection has its own entry). You should check the Open_tables status variable to see if it is large compared to table_cache
MySQL Innodb Only Memory-related variables
· innodb_buffer_pool_size – Set the amount of memory allocated to both Innodb data and index buffer cache. If the server requests data available in the cache, the data can be processed right away. Otherwise, the operating system will request that the data be loaded from the disk into the buffer. It is important to set this value as high as possible to use the more efficient innodb data and index buffer cache instead of operating system buffer. For the sysbench I/O bound workload on a T2000 server with 8G RAM, increasing innodb_buffer_pool_size from 4G to 5G can improve performance by around 11%.
· innodb_additional_mem_pool_size - Sets the amount of memory allocated to the buffer storing the InnoDB internal data dictionary and other internal data structures. This parameter does not affect performance much, so set it to 20M (For applications with more tables, more memory needs to be allocated here) for the sysbench OLTP I/O bound test case.
· innodb_log_buffer_size - Set the amount of memory allocated to the buffer storing InnoDB write-ahead log entries. For large transactions, the log can be loaded into the log buffer instead of writing log to the log files on disk untill the log buffer is flushed on each transaction commit. If you see large log I/Os in the show innodb status output at runtime, you probably need to set a larger value for the innodb_log_buffer_size parameter to save disk I/O. For workloads which don’t have long transactions like sysbench, it is not necessary to waste memory resources by setting a higher value for the log buffer; it is fine to set it to 8Mbytes.
Other MySQL Innodb variables impacting I/O Performance
· innodb_flush_log_at_trx_commit - InnoDB flushes the transaction log to disk approximately once per second in the background. As a default, innodb_flush_log_at_trx_commit is set to 1, meaning the log is flushed to the disk at a transaction commit, and modifications made by the transaction won’t be lost during a MySQL, OS, or HW crash. For workloads running with many small transactions, you can reduce disk I/O to the logs to improve performance by setting the innodb_flush_log_at_trx_commit parameter to 0, meaning no log flushing on each transaction commit. However, the transaction might be lost if MySQL crashes. In the sysbench OLTP I/O bound workload test on a T2000 server, setting innodb_flush_log_at_trx_commit =0 in the read-only test can improve performance by 4%. You can set this value to 2 to flush the log to the OS cache to save disk I/O on each transaction commit.
· innodb_log_file_size – Set the size of each log file in a log group. InnoDB writes to the log files in a circular fashion, so the bigger innodb_log_file_size, the less checkpoint flush activity, reducing disk I/O, but increasing recovery time. In the show innodb status output, if there are large page writes in the BUFFER POOL AND MEMORY part, you will need to increase this parameter.
Configure an Optimum Number of User Threads
MySQL is a single-process, multithreaded application. There is one master thread with highest priority to control the server. For every client request, it creates a dedicated user thread running at normal priority in the thread pools to process the user request and send back the result to each client once the result is ready. And there is one single user thread that waits for input from the console, and a group of utility threads running at lower priority to handle some background tasks. Currently, MySQL cannot scale well with the number of concurrent user connections. On a T2000 server, in the OLTP I/O bound read-write sysbench test, MySQL can scale from 2 up to 64 concurrent user threads to reach the peak performance point. After that, increasing the number of user connections will increase the user level lock contention observed from prstat –mL output(LCK) to reduce MySQL performance. For applications where the number of user connections is tunable, you need to test to get the optimum number of user connections for peak performance. For applications where the number of user connections is not tunable, the innodb_thread_concurrency parameter can be configured to set the number of threads working inside the InnoDB engine. You need to increase this value when you see many queries in the queue in show innodb status. Setting this value at 0 will disable it. On the T2000 server, we set it to be around 2*(Num of disks) in the sysbench OLTP I/O bound workload test. Testing and tuning the optimal value for the innodb_thread_concurrency parameter according to the kind of workload, and behavior of your system at runtime, can affect performance significantly.
Optimize File system Performance on T2000
File system performance have a big impact on system performance -- particularly when running an I/O bound workload with a database size much bigger than system memory. How to configure the file system for better performance depends on the workload access pattern: random or sequential. For a sequential workload, we can increase the file system cluster size (the maxcontig parameter) to allow read ahead or writing back more data from/to the disk to reduce the total number of I/O operations. For random workloads, we can reduce the file system cluster size to match the innodb I/O size. On the Solaris platform, maxcontig is set as 128 by default, which will trigger read-ahead for the whole file system cluster length (128*8 Kbytes on Solaris Sparc, 128*4Kbytes on Solaris x86) or the maximum size of physical I/O set in maxphys system variable. In the sysbench OLTP I/O bound test (a random workload,) for example, it can saturate a disk and significantly degrade performance because of it. One way to solve this problem is to reduce the value of the maxcontig parameter. In the sysbench OLTP I/O bound tests on the T2000, changing the maxcontig value to be 2 by using the tunefs –a 2 /dev/dsk/c4t1d0s6 command on the file system improved performance by 10%-13%. The shortcoming of this solution is that it will impact the performance of other sequential workloads on your system. The other way to improve performance is to disable file system caching with the UFS mount option: forcedirectio (mount –o remount,forcedirectio /data). Since innodb has its own buffer cache for the data and index(set by innodb_buffer_pool_size) which is more efficient than the operating system cache, we can use filesystem directio to save the double buffering and automatically disable read-ahead to benefit the random workload. On a T2000 installed with Solaris 10 update 1 to update 3, it is important to add set auto_lpg_maxszc=1 to the /etc/system suggested by Aleksandr Guzovskiy to reduce max pagesize for heap/stack/mmap to be 64k(default is up to be 256M) while using filesystem directio, otherwise, you may encounter a big performance drop with directio while multiple concurrent user threads are accessing the MySQL database. On the T2000 in the sysbench OLTP I/O bound tests, using directio improved performance by 14%-17%. The shortcoming of using filesystem directio is that it will significantly impact the performance of other applications on your system which don’t have internal caching (like the MySQL MyISAM engine which does not have its own data buffer cache) but instead depend on the filesytem caching to implement buffering.
Expected MySQL Performance On Niagara 2 UltraSPARC T2 Server
For disk I/O workload, MySQL can benefit the new features on Niagara 2 servers compared to Niagara 1 servers:
Larger L2 Cache: The on-chip 4MB L2 cache on Niagara 2 server can cache frequently accessed memory to get better MySQL performance.
Larger memory: Up to 512GB memory of fully buffered DIMMs in the integrated memory controller with an aggregated memory bandwidth of 64 gbps on the Niagara 2 server can buffer larger databases into the memory and reduce the time the CPU spends waiting for data to arrive. By adjusting how much memory MySQL innodb uses, we can expect to get significant performance improvements.
X8 PCI Express: PCI-E directly on-chip can reduce latency to speed up MySQL disk I/O performance as expected.
More paper on UltraSPARC T2 Server Technology, performance, etc.
Allan Pack's Weblog: CMT Comes Of Age
Example MySQL options
Here is the example of /etc/my.cnf on T2000(32x1200MHz, 8GB RAM, Solaris 10 11/06) in sysbench OLTP I/O bound test(100M-row):
[mysqld]
port = 3306
socket = /tmp/mysql.sock
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
log-error = /data/error.txt
user=root
skip-locking
max_connections = 3000
table_cache = 1024
max_allowed_packet = 1M
sort_buffer_size = 64K
thread_cache = 8
thread_concurrency = 32
query_cache_size = 0M
query_cache_type = 0
default-storage-engine = innodb
transaction_isolation = REPEATABLE-READ
tmp_table_size = 1M
innodb_data_file_path = ibdata1:100M:autoextend
innodb_buffer_pool_size = 5500M
innodb_additional_mem_pool_size = 20M
innodb_log_file_size =1900M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit =1
innodb_lock_wait_timeout = 300
innodb_max_dirty_pages_pct = 90
innodb_thread_concurrency =32
Example /etc/system on T2000
set pcie:pcie_aer_ce_mask=0x1
set ip:dohwcksum=0
set autoup=900
set tune_t_fsflushr=1
set auto_lpg_maxszc=1
Monday Aug 20, 2007
MySQL data service running on a clustered-server model provide orderly startup, shutdown, fault monitoring, and high availability(failover) mechanism compared to the single server model. It can be free donwloaded, and Sun also released Open HA cluster, derived from the Sun Cluster 3.2 agents including HA MySQL data service. The open HA cluster also provide build tools necessary to develop new features, build and use the code. In addition, Sun cluster 3.2 Data service for MySQL has the advantages including:
It is easy to install and configure with straightforward GUI and command line interfaces.
Expanded support for Solaris Containers(Solaris zones)
Expanded support for SMF
ZFS is fully supported as a local highly available fileystem
The following is the OS and HW setup for two-node cluster:
Solaris 10 11/06 installed on the two nodes
Each node has two network interfaces to be used as point-to-point private interconnects, and one network interface connect to the public network interface.
Two storage device(SE6120) connected to the two nodes

Figure
1:MySQL Replication Clustered Model
In the above MySQL clustered-server configuration, logical host name is set as the failover IP address within the same subnet. When the MySQL resource online, the failover IP is plumbed on the node where the MySQL resource is running. If a failover happen, the IP address moves along with the MySQL resource to the failover node. In this example, the configuration is:
|
|
Name |
Interface |
IP address |
|---|---|---|---|
|
Logical Hostname |
10.6.241.210 |
|
|
|
Node1 |
Sunfire-x64-241-02 |
bge0 |
10.6.241.208 |
|
Node2 |
Sunfire-x64-241-03 |
bge0 |
10.6.241.209 |
For the private network interfaces as cluster interconnection, Sun cluster installation will configure the network assigning the private network addresses. Note: Do NOT configure the private network interfaces before installing Sun cluster, otherwise, you will fail to install Sun cluster and get the error message as:
Adapter “ce0” is already in use as a public network adapter
After the HW and network is setup as the above, the general task flow of MySQL data service installation and configuration is:

Figure 2:MySQL Data Service Installation & Configuration Flow chart
Step 1: Plan Installation
Sun cluster requires to set aside a special file system named as: /globaldevices on one of the local disks for use in managing global devices on the two nodes.This file system is later mounted as a cluster file system as:
(optional) Setup cluster environment
PATH=/usr/bin:/usr/cluster/bin:/usr/local/mysql/bin:/usr/sbin:/usr/ccs/bin:/usr/cluster/man:$PATH
On both nodes, update the /etc/inet/ipnodes file with all public hostnames for the cluster
Sunfire-x64-241-02# vi /etc/inet/ipnodes 127.0.0.1 localhost 10.6.241.208 Sunfire-x64-241-02 loghost 10.6.241.209 Sunfire-x64-241-03 Sunfire-x64-241-03# vi /etc/inet/ipnodes 127.0.0.1 localhost 10.6.241.209 Sunfire-x64-241-03 loghost 10.6.241.208 Sunfire-x64-241-02
On both nodes, add the following entry in the /etc/system file
set ce:ce_taskq_disable=1 exclude:lofs
The first entry supports ce adapters for the private interconnect, and the second entry disable the loopback file system(LOFS)
5. local-mac-address? variable must set as true for Ethenet adapters. On Solaris x86, use the command: #eeprom local-mac-address?=true; On Solaris SPARC, change the local-mac-address? Variable to be ture from OBP OK> prompt
Step 2: Sun Cluster and MySQL data service Installation
Solaris cluster(sun cluster, Sun cluster Geographic Edition and Sun cluster agents) 3.2 is downloadable at:
http://www.sun.com/download/products.xml?id=4581ab9e
On both nodes, run the installer command to bring up the installer GUI . The installation of Sun cluster and Sun cluster for HA MySQL is straightforward to follow the instructions on the screen. Choose Sun cluster Core 3.2 core software, Sun cluster HA for MySQL to install, and choose Configure later before the installation.
Step 3: Configure Sun Cluster Software
Perform this procedure from one node of the cluster to configure Sun Cluster software on all nodes of the cluster:
1. # /usr/cluster/bin/scinstall
2. From the main menu, pick the “Option 1” to “Create a new cluster or add a cluster node”
3. From the new cluster and Cluster Node menu, pick the “Option 1” to “Create a new cluster”
4. From the Typical or Custom Mode, pick the “Option 1” for typical mode
5. From the Cluster Nodes menu, type the node name: “Sunfire-x64-241-02”, and “Sunfire-x64-241-03”
6. Provide the first and the second private adapter name: ce0, bge1
7. Type “no” for “ disable automatic quorum device selection”
8. Type “yes” for “ create the new cluster”
9. Type “no” for “ Interrupt cluster creation for sccheck errors”
At this point, the scinstall utility configures all cluster nodes and reboot the cluster. The cluster is established when all nodes have successfully booted into the cluster. Sun Cluster installation output is logged in /var/cluster/logs/install/scinstall.log.N file.
10. Verify cluster setup:
Sunfire-x64-241-03# clquorum list
Sunfire-x64-241-03
Sunfire-x64-241-02
Sunfire-x64-241-03# clnode status
Cluster Nodes ===
--- Node Status ---
Node Name Status
--------- ------
Sunfire-x64-241-03 Online
Sunfire-x64-241-02 OnlineWhen the scinstall utility finishes, The cluster is now ready to configure the components you will use to support highly available MySQL, including device groups, and file systems.
Step 4: Configure Volume Manager and File System
In a shared disk set configuration in this example, two hosts are physically connected to the same set of disks. When one node fails, another node has exclusive access to the disks. Each node can control a shared disk set, but only one host can control it at a time.
From one node: Sunfire-x64-241-02, create one disk set for MySQL data service, so that Sunfire-x64-241-02 is make as the primary node:
Sunfire-x64-241-02# metaset -s mysqlset -a -h Sunfire-x64-241-02 Sunfire-x64-241-03
Verify that the configuration of the disk sets is correct and visible to both nodes.
Sunfire-x64-241-02# metaset Set name = mysqlset, Set number = 1 Host Owner Sunfire-x64-241-02 Yes
Sunfire-x64-241-03
From the primary node: Sunfire-x64-241-02, list the DID mappings:
Sunfire-x64-241-02# cldevice show |grep Device
=== DID Device Instances ===
... DID Device Name: /dev/did/rdsk/d5 Full Device Path: Sunfire-x64-241-02:/dev/rdsk/c3t60003BACCC90200046264D58000A22E3d0 Full Device Path: Sunfire-x64-241-03:/dev/rdsk/c5t60003BACCC90200046264D58000A22E3d0
...
Add /dev/did/rdsk/d5 to the MySQL disk setup
Sunfire-x64-241-02# metaset -s mysqlset -a /dev/did/rdsk/d5
Verify that the configuration of the disk set is correct
Sunfire-x64-241-02# metaset -s mysqlset
On both nodes, create the /etc/lvm/md.tab file with the following entries
mysqlset1/d0 -m mysqlset1/d10 mysqlset1/d10 1 1 /dev/did/rdsk/d5s0 mysqlset1/d1 -p mysqlset1/d0 50G
mysqlset1/d2 -p mysqlset1/d0 50Gmysqlset1/d3 -p mysqlset1/d0 50G
7. From Sunfire-x64-241-02, take ownership for the mysql disk set and activate the volume
Sunfire-x64-241-02# cldevicegroup switch -n Sunfire-x64-241-02 mysqlset Sunfire-x64-241-02# metainit -s mysqlset -a
8. Verify the status of the volume for the disk setup
Sunfire-x64-241-02# metastat ... Status: Okay ...
9. Create the cluster file system for use Sun Cluster HA for MySQL. From Sunfire-x64-241-02,create te file systems: Sunfire-x64-241-02# newfs /dev/md/mysqlset1/rdsk/d1 Sunfire-x64-241-02# newfs /dev/md/mysqlset1/rdsk/d2 Sunfire-x64-241-02# newfs /dev/md/mysqlset1/rdsk/d3
10. On both node, create the mount-point directory for the file systems
# mkdir -p /global/mysql - mysql master and slave servers
# mkdir -p /global/mysql-data1 – data directory for mysql master server
# mkdir -p /global/mysql-data2 – data directory for mysql slave server11. On both node, add entries to the /etc/vfstab file for the above mount points
12. On both nodes, mount the file systems, and verify that the file systems are mounted
# mount /global/mysql on /dev/md/mysqlset/dsk/d1 read/write/setuid/devices/intr/largefiles/logging/noquota/global/xattr/nodfratime/onerror=panic/dev=1544001 on Tue Aug 14 17:12:45 2007
/global/mysql-data1 on /dev/md/mysqlset/dsk/d2 read/write/setuid/devices/intr/forcedirectio/largefiles/logging/noquota/global/xattr/nodfratime/onerror=panic/dev=1544002 on Tue Aug 14 17:12:50 2007
/global/mysql-data2 on /dev/md/mysqlset/dsk/d3 read/write/setuid/devices/intr/forcedirectio/largefiles/logging/noquota/global/xattr/nodfratime/onerror=panic/dev=1544003 on Tue Aug 14 17:12:56 2007
13. Verify the Sun cluster installation & configuration before registering and configuring Sun cluster HA for MySQL
Sunfire-x64-241-02# cluster check
Step 5: Configure Sun Cluster Resource for MySQL
Register SUNW.gds, SUNW.HAStoragePlus resource type
Sunfire-x64-241-02# scrgadm -a -t SUNW.gds
Sunfire-x64-241-02# scrgadm -a -t SUNW.HAStoragePlus
Create MySQL resource group named MySQL-failover-resource-group
Sunfire-x64-241-02# scrgadm -a -g MySQL-failover-resource-group
Create the HAStoragePlus resource named MySQL-has-resource in the MySQL-failover-resource-group resource group for MySQl disk storage
Sunfire-x64-241-02# scrgadm -a -j MySQL-has-resource -g MySQL-failover-resource-group -t SUNW.HAStoragePlus -x FilesystemMountPoints=/global/mysql-data1,/global/mysql-data2
Create a logical hostname resource named MySQL-lh-resource
Sunfire-x64-241-02# scrgadm -a -L -j MySQL-lh-resource -g MySQL-failover-resource-group -l 10.6.241.210
To verify the logical hostname resource is online, you can run “ifconfig -a” to see if the virtual IP address is configured on the network interface.
Enable the failover resource group including the MySQL disk storage and logical hostname resources.
Sunfire-x64-241-02# scswitch -Z -g MySQL-failover-resource-group
Step 6: Install and Configure MySQL
On both nodes, install MySQL under /usr/local/mysql, which is symbol link to /global/mysql
Sunfire-x64-241-02# mysql_install_db –datadir=/global/mysql-data1
Sunfire-x64-241-02# chown -R root .
Sunfire-x64-241-02# chown -R mysql /global/mysql-data1
Sunfire-x64-241-02# chgrp -R mysql .
Copy the sample “my.cnf_sample_master” and “my.cnf_sample_slave” under “/opt/SUNWscmys/etc” to the MySQl data directory(mysql-data1, mysql-data2) of the MySQL master and slave machine.
Modify the sample my.cnf file to point to the right directories for the data and log files. “bind-address” must be set with the logical hostname as “10.6.241.210” in this example. Please note: it need to set bind-address = ip number, because of the unfixed MySQL bug on Solaris amd64 OS.
Step 7: Modify MySQL Configuration Files
Go to the directory /opt/SUNWscmys, add cluster's information in the mysql_config file.
Sunfire-x64-241-02# vi mysql_config ... MYSQL_USER=root MYSQL_PASSWD=password MYSQL_HOST=10.6.241.210 ->Logical hostname(IP) FMUSER=fmuser FMPASS=fmuser MYSQL_SOCK=/tmp/10.6.241.212.sock MYSQL_NIC_HOSTNAME="Sunfire-x64-241-02 Sunfire-x64-241-03" ->Physical hostname
Add cluster's information in the ha_mysql_config file
Sunfire-x64-241-02# vi ha_mysql_config ... RS=MySQL-failover-resource-group RG=MySQL-failover-resource-group PORT=3306 LH=10.6.241.210 HAS_RS=mysql-has-resource
... BASEDIR=/usr/local/mysql DATADIR=/global/mysql-data1 MYSQLUSER=mysql MYSQLHOST=Sunfire-x64-241-02 FMUSER=fmuser FMPASS=fmuser LOGDIR=/global/mysql-data1/logs CHECK=YES
Step 8: Enable Sun cluster HA For MySQL
Start MySQL server with “–skip-grant-table” option
Sunfire-x64-241-02# mysqld_safe –defaults-file=/global/mysql-data1/my.cnf –datadir=/global/mysql-data1 –skip-grant-table –user=mysql &
Change the password for root as “password”
Sunfire-x64-241-02# mysql -S /tmp/10.6.241.210.sock -u root
mysql>UPDATE mysql.user set Password=PASSWORD('password')
-> where User='root';
mysql>FLUSH PRIVILEGES;
Shutdown and restart the MySQL servers without the “–skip-grant-table” option on the node where the resource group is online(check it with the “scstat -g” command)
Prepare the Sun cluster specific test database.
Sunfire-x64-241-02# cd /opt/SUNWscmys
Sunfire-x64-241-02# ./mysql_register -f /opt/SUNWscmys/util/mysql_config
sourcing /opt/SUNWscmys/util/mysql_config and create a working copy under /opt/SUNWscmys/util/mysql_config.work
MySQL version 5 detected on 5.10/SC3.2 Add faulmonitor user (fmuser) with password (fmuser) with Process-,Select-, Reload- and Shutdown-privileges to user table for mysql database for host Sunfire-x64-241-02 Add SUPER privilege for fmuser@Sunfire-x64-241-02 Add faulmonitor user (fmuser) with password (fmuser) with Process-,Select-, Reload- and Shutdown-privileges to user table for mysql database for host Sunfire-x64-241-03 Add SUPER privilege for fmuser@Sunfire-x64-241-03 Create test-database sc3_test_database Grant all privileges to sc3_test_database for faultmonitor-user fmuser for host Sunfire-x64-241-02 Grant all privileges to sc3_test_database for faultmonitor-user fmuser for host Sunfire-x64-241-03 Flush all privileges Mysql configuration for HA is done
Shutdown the mysql server
Sunfire-x64-241-02# mysqladmin -S /tmp/10.6.241.210.sock shutdown -p
Register resource
Sunfire-x64-241-02# cd /opt/SUNWscmys
Sunfire-x64-241-02# ./ha_mysql_register -f /opt/SUNWscmys/util/ha_mysql_config
Enable each MySQL resource
scswitch -e -j MySQL-has-resource
scswitch -e -j MySQL-lh-resource
Step 9: Verify Sun cluster HA for MySQL configuration
Once all the MySQL resources are created and configured, and online(check with the “scstat -g” command), you should go ahead to see if the MySQL database can successfully fail over to each node configured in the resource group(MySQL-failover-resource-group). This can be verified by running with “scswitch” to switch MySQL resource group to another node to fail the resouce group to:
#scswitch -z -g MySQL-failover-resource-group -h Sunfire-x64-241-03
If you can successfully migrate the database to each node in the cluster, you now have highly available MySQL database.
At this point, by following the above nine steps, I have completed the basics of deploying highly available MySQL database. To deploy the Sun Cluster MySQL data service to achieve the maximum availability into the production environment, I would highly recommend you read through the Sun cluster documents and MySQL data service guides and verify everything in a test environment in advance.
Friday May 25, 2007
Inserting user-defined DTrace probes into MySQL source
code is very useful to help user identify the performance problems in the
application level and the database server, In addition, the cost of the USDT
probe is basically neglectable. Each probes inserted
into the src can be enabled by adding the code like:
If
(PROVIDER_PROBE_ENABLED()
{
PROVIDER_PROBE(arg0,…);
}
The steps
to add DTrace probes into MySQL
is very straightforward.
Step 1:
Figure out what probes are needed to insert into the source code
This is the difficult part
that requires you understand the MySQL implementation
details. Generally, it is good to insert probes to clarify the DB response time
distribution including processing query, waiting on locks and latches, doing
disk I/O, receiving/sending back data. You can
certainly define more probes deep into each of the MySQL
engines (such as: define probes to measure the cost of innodb
sync spin wait)
Step 2: Define
Provider and probes
Create a mysqlprovider.d
file as:
provider mysql {
probe query__execute__start(int);
probe query__execute__finish(int);
…
};
It is required to define the
probes with easy to understand name. The two underscore(__)
is translated to hyphen(-) in the D script file, so the above two probes are
called query-execute-start and query-execute-finish
Step 3: Define
header file for probes
Create mysqlprovider.h
file as:
#ifndef _MYSQLPROVIDER_H
#define _MYSQLPROVIDER_H
#ifdef ENABLE_DTRACE
#define MYSQL_QUERY_EXECUTE_START(arg0)
\
__dtrace_mysql__query_execute__start(arg0)
#define MYSQL_QUERY_EXECUTE_START_ENABLED()
\
__dtraceenabled_mysql__query_execute__start()
#define MYSQL_QUERY_EXECUTE_FINISH(arg0)
\
__dtrace_mysql__query_execute__finish(arg0)
#define MYSQL_QUERY_EXECUTE_FINISH_ENABLED()
\
__dtraceenabled_mysql__query_execute__finish()
extern void __ dtrace_mysql__query_execute__start(int)
extern int __ dtraceenabled_mysql__query_execute__start(void)
extern void __ dtrace_mysql__query_execute__finish(int)
extern int __ dtraceenabled_mysql__query_execute__finish(void)
#else
/*
*Unless DTrace is explicitly
enabled with –enable-dtrace, the MYSQL macros will
expand to no-ops.
*/
#define MYSQL_QUERY_EXECUTE_START(arg0)
\
__dtrace_mysql__query_execute__start(arg0)
#define MYSQL_QUERY_EXECUTE_START_ENABLED()
\
__dtraceenabled_mysql__query_execute__start()
#define MYSQL_QUERY_EXECUTE_FINISH(arg0)
\
__dtrace_mysql__query_execute__finish(arg0)
#define MYSQL_QUERY_EXECUTE_FINISH_ENABLED()
#endif
#endif /*
_MYSQLPROVIDER_H */
Step 4: Insert
the probes into source code
You need to include the
header file created for DTrace probes before
inserting the probe macro. And in order to monitor the server behavior as
expected, it requires the knowledge of the MySQL
source code to add the probe macro into the right place.
#include <mysqlprovider.h>
mysql_parse {
…
bool
mysql_execute_command(THD *thd)
{
MYSQL_QUERY_EXECUTE_START(thd->thread_id);
…
case SQLCOM_EXECUTE:
{
mysql_sql_stmt_execute(thd);
MYSQL_QUERY_EXECUTE_FINISH(thd->thread_id);
Break;
}
….
}
Step 5:
Build MySQL with DTrace
You will need to specify the
“—enable-dtrace” as the configure option to make the DTrace probes available in MySQL
on Solaris 10 and above. On the other operating system without the DTrace facility, the DTrace
probes are disabled as default.
In the Makefile,
you can compile the 64-bit MySQL with DTrace probes as bellow:
mysqlproviders.o: mysqlproviders.d $(mysqld_OBJECTS)
dtrace -G -64 -s mysqlproviders.d $(mysqld_OBJECTS)
Now, at this point, you have
completed inserting the DTrace probes into MySQL, and the probes are ready to use. For example, to use
the query-execute-start and query-execute-stop probes, you can write a simple D
script(query-execute.d) to
measure the time spending on the query execution for each session.
#!/usr/sbin/dtrace –qs
mysql*:::query-execute-start
{
self->init = timestamp;
}
mysql*:::query-execute-finish
/self->init/
{
@inittime[args[0]]
= sum(timestamp – self->init);
self->init = 0;