- Executive Summary
- Document Scope
- Benefits of Solaris Cluster
- Setup environment
- Installation of Solaris Cluster and HA MySQL Service s/w
- Configuring a failover file system using a ZFS pool
- Installation of MySQL Server
- Upgrading Testing : MySQL 5.0 to 5.1
- Internal test suite run
- S/W Fault Test Run
- S/W Fault Regression Test Run
- Uninstalling Solaris Cluster
- Software links
- References
Executive Summary
This document illustrates the deployment process of MySQL on Solaris cluster (SC). It also focuses on regression and failover testing of HA MySQL, and describes the tests performed. Solaris 10 fully supports MySQL and the HA cluster application agent (data service) for MySQL.
A cluster provides a single view of services for applications such as databases, web services, and file services. Services can scale to additional processors with the addition of nodes. A data service is an application designed to run under the control of a cluster.
The MySQL Open Source database is multi-threaded and consists of an SQL server, client programs and libraries, administrative tools, and APIs. Sun acquired MySQL AB in January 2008.
The objective is to help facilitate adoption of the Sun stack by the Open Source community, and enhance Sun's strong commitment to open source products, including Open Source databases.
Document Scope
The scope of the deployment environment comprises MySQL Community Server on a Solaris 10 global zone. While the platform of choice was SPARC, the results are identical if deployed on an x86-64 platform.
The scope of testing includes upgrade testing, regression testing, and failover testing. Testing with MySQL Cluster, and performance benchmarking is outside of the scope.
Benefits of Solaris Cluster
Reduced system downtime, availability of systems and applications, and increased application throughput via scalable services are some of the benefits realized when using Solaris Cluster.
The Solaris Cluster framework includes Sun Cluster, Sun Cluster Geographic Edition, developer tools and support for commercial and open-source applications through agents. It provides high availability and disaster recovery for local and geographic clusters. This leads to an increased choice in storage replication and networking configurations.
A data service consists of an application, cluster configuration files, and management methods that control starting stopping, and monitoring of the application. The public network interface is a logical hostname or a shared address.
Solaris Cluster provides for Higher Availability (HA) for enterprise applications. HA components can survive a single s/w or h/w failure. The I/O fencing feature keeps shared storage transparent to a faulty node, and ensures integrity of data. For example, in a 2-node cluster that has a disabled interconnect, each node may assume that it's part of a cluster, and may try and form one. This potential split-brain scenario is avoided by I/O fencing.
A logical interface is presented to the applications. There is no single point of failure in the NICs. Striped traffic over the interconnects with transparent failover results in better network utilization.
Solaris cluster has a proven monitoring and failover mechanism. A wider choice of low latency, high bandwidth interconnects for deployment exists, that yield higher throughputs. Example : Infiniband, Dolphin.
Device path names are uniform across nodes for the shared LUNs, thereby resulting in easier manageability. Node times can be synchronized during configuration.
A comprehensive list of HA applications (data services) are available, and all configurations are certified and tested. Tight integration with the Solaris kernel results in better fault management and heartbeat mechanism.
Setup Environment
A traditional cluster includes two or more nodes cabled together via private interconnects, and connected simultaneously to shared storage devices (multihost devices). Network adapters provide client access to the cluster. Different architectures and topologies can be used for deployment. The core cluster and data service software, and disk management software are used collectively to monitor, access and administer various resources.
The setup comprises two SPARC V1280 systems connected by two private interconnects.The MySQL Community Server version used is 5.0.45 .
- SPARC platform :
- Nodes : v1280-137-03 and v1280-137-04
- CPU : 12 UltraSPARC-III+ x 1200 MHz
- Memory : 49,152 MB and 98,304 MB
- Storage : SE 6120 (14 x 73 GB )
- Operating System : Solaris 10 SPARC, Update 4
After a default OS installation, it is recommended to install the patches for the OS version being used, from http://sunsolve.sun.com .
Ensure that the SC3.x version being used has support for Solaris 10, Update x.
Installation of Solaris Cluster and HA MySQL Service s/w
This section describes the installation and configuration of the Sun Cluster and the HA MySQL service s/w components.
On the first node, perform these steps as root user :
Remove the previous product registry, if any :
# /var/sadm/prod/SUNWentsys5/uninstall
Download and uncompress the SC s/w .zip file : Navigate to sun.com, click on 'downloads' , and follow the links; (suncluster-3_2-ga-solaris-sparc.zip)
Set the PATH variable to include the directory of the Sun Cluster binaries :
# PATH=$PATH:/usr/cluster/bin:/usr/ccs/bin:/usr/sfw/bin; export PATH
Navigate to the sc/Solaris_sparc sub-directory, and type :
# ./installer
1. Choose 'No' for :
Install the full set of Sun Java(TM) Enterprise System Products and Services?
2. Choose options 4,6 to select these products, and then 'A' on each to install all its components :
[X] 4. Sun Cluster 3.2
[X] 6. Sun Cluster Agents 3.2
3. Choose 1 for :
Upgrade the shared components that were installed in the previous step ?
4. Choose 2 :
Configure Later - Manually configure following installation
5. # scinstall
6. Choose 1 :
Create a new cluster or add a cluster node
7. Choose 2 :
Create just the first node of a new cluster on this machine
8. Choose the 'Typical' mode of operation
9. Choose mysql-ca as the name of the cluster
10. Choose 'Yes' for :
Do you want to run sccheck (yes/no) ?
11. Enter the 2 node names when prompted, and press Control-D to complete the configuration :
v1280-137-03
v1280-137-04
^D
12. Configure at least two cluster transport adapters (eg. ce0 and ge0 here).
Choose 'Yes' to confirm each as being dedicated.
13. Choose 'No' for :
Do you want to disable automatic quorum device selection (yes/no) [no]?
14. Choose 'yes' for :
Do you want scinstall to reboot for you (yes/no) [yes]?
15. Choose 'Enter' to reconfirm the options to scinstall
Monitor via an nts console (Esc. sequence : shift + tilde followed by '.')
On the 2nd node, perform all the above steps (1-15 as applicable), as done for the 1st node, except for these differences :
16. Choose 1 :
Create a new cluster or add a cluster node
17. Choose :
Add this machine as a node in an existing cluster
18. Choose 'Typical' mode of operation, and supply the cluster name chosen earlier
19. Choose yes for :
Do you want to use autodiscovery (yes/no) [yes]?
20. Choose remaining prompts similar to those for the first node.
Once the nodes have booted into the cluster, the configuration of Sun Cluster and HA MySQL s/w are complete.
Configuring a failover file system using a ZFS pool
This section describes the creation and configuration of a failover file system using a ZFS pool.
The database will reside on a global file system whereas the installation will be on local file systems. A failover file system has increased performance over a cluster file system as all nodes do not have to commit. However, as only one node sees the file system at a time, there is a slight increase in failover time.
On the first node, perform these steps as root user :
1. Choose a suitable shared disk among those in your setup. The disk names would have two entries, and can be listed by typing :
# scdidadm -L
2. Execute the zpool command as is (volume mysql3 and the specified shared disk name are for illustration):
# zpool create mysql3 c1t20030003BA13E6A1d0
3. Mount the volume, and configure a resource group :
# cd /
# zfs create mysql3/sqlvol
# zfs set mountpoint=/global/mysql mysql3/sqlvol
# clrg create mysql-rg // create a failover resource group
# clrt register SUNW.HAStoragePlus
# scrgadm -at SUNW.HAStoragePlus -x Zpools=mysql3 -g mysql-rg -j sql-stor // create the HAStoragePlus resource named 'sql-stor' in the mysql-rg resource group for the MySQL disk storage.
# clrg online mysql-rg
// bring the failover resource group online; the MySQL Disk storage resource previously added gets enabled. Subsequently, add and enable other resources to it, such as the logical host resource, gds data service resource, and the MySQL resource.
# clrg status mysql-rg // check the status of the resource group
Installation of MySQL Server
This section describes the installation and configuration of the MySQL Server.
On the first node, perform these steps as the root user :
If a MySQL environment already exists, do a cleanup, and remove the MySQL packages; otherwise proceed to Step 1 below :
# clrg offline mysql-rg
# clrg delete -F mysql-rg
# rm -rf /global/mysql/*
# pkgrm mysql
1. Obtain a standalone, separate hostname and IP for use as the MySQL resource group failover address. (in this eg. v1280-logical,10.x.x.x)
2. On each node, download the packaged release binary file (pkg. format) from the MySQL site dev.mysql.com , and uncompress it into the target directory.
# pkgadd -d mysql-5.0.45-solaris10-sparc-64bit.pkg
# cd /usr/local
# ln -s //opt/mysql/mysql mysql // soft link to the mysql binaries directory
3. From the node on which /global/mysql is mounted, bind the failover IP to the resource group :
# scrgadm -aLl v1280-logical -g mysql-rg // create a resource for the logical hostname in mysql-rg
# clrs enable v1280-logical // enable the logical host resource
# chown -R mysql:mysql /global/mysql
Modify the default mysql configuration file (/global/mysql/my.cnf) :
# cp /opt/SUNWscmys/etc/my.cnf_sample_master /global/mysql/my.cnf
bind-address=10.x.x.x,socket=/tmp/v1280-logical.sock,
log=/global/mysql/logs/log1,log-bin=/global/mysql/logs/bin-log, innodb_data_home_dir=/global/mysql/innodb
# /opt/mysql/mysql/scripts/mysql_install_db --datadir=/global/mysql // install the mysql grant tables
# chown -R mysql:mysql /global/mysql
# cd /global/mysql
# mkdir logs innodb
Modify and configure the default file mysql_config , in order to create a fault monitor user and a test database for the MySQL instance. The fault monitor attempts to restart the server in case of a shutdown :
# cp /opt/SUNWscmys/util/mysql_config /global/mysql
MYSQL_BASE=/opt/mysql/mysql;MYSQL_USER=root; MYSQL_PASSWD=admin123;FMUSER=fmuser;FMPASS=fmuser; MYSQL_SOCK=/tmp/v1280-logical.sock; MYSQL_NIC_HOSTNAME="v1280-137-03 v1280-137-04"
# chown -R mysql:mysql /global/mysql
Test MySQL server startup and shutdown :
#/opt/mysql/mysql/bin/mysqld --defaults-file=/global/mysql/my.cnf
--basedir=/opt/mysql/mysql --datadir=/global/mysql
--user=root --pid-file=/global/mysql/mysql.pid &
# chown -R mysql:mysql /global/mysql
# /opt/mysql/mysql/bin/mysqladmin shutdown -S /tmp/v1280-logical.sock
Restart the MySQL server; grant database resource privileges to the administrator user:
#/opt/mysql/mysql/bin/mysqld --defaults-file=/global/mysql/my.cnf
--basedir=/opt/mysql/mysql/ --datadir=/global/mysql
--user=root --pid-file=/global/mysql/mysql.pid &
# /opt/mysql/mysql/bin/mysqladmin -S /tmp/v1280-logical.sock password 'admin123' // enable the admin user to access a local MySQL instance with a MySQL logical i/p
# /opt/mysql/mysql/bin/mysql -S /tmp/v1280-logical.sock -uroot -padmin123
> use mysql
> grant all on *.* to 'root'@'v1280-137-03' identified by 'admin123';
> grant all on *.* to 'root'@'v1280-137-04' identified by 'admin123';
> grant all on *.* to 'root'@'v1280-logical' identified by 'admin123';
> UPDATE user SET Grant_priv='Y' WHERE User='root' AND Host='v1280-logical';
> UPDATE user SET Grant_priv='Y' WHERE User='root' AND Host='v1280-137-03';
> UPDATE user SET Grant_priv='Y' WHERE User='root' AND Host='v1280-137-04';> exit
Configure MySQL for HA, and register MySQL as a failover data service. Modify the default configuration file ha_mysql_config to include the cluster information :
# /opt/SUNWscmys/util/mysql_register -f /global/mysql/mysql_config
# cp /opt/SUNWscmys/util/ha_mysql_config /global/mysql
RS=mysql;RG=mysql-rg; PORT=3306;
LH=v1280-logical; HAS_RS=sql-stor;
mysql specifications : BASEDIR=/opt/mysql/mysql; DATADIR=/global/mysql;
MYSQLUSER=mysql;MYSQLHOST=v1280-logical;
FMUSER=fmuser; FMPASS=fmuser; LOGDIR=/global/mysql/logs; CHECK=yes
Register the SUNW.gds data service before registering and enabling the MySQL resource :
# chown -R mysql:mysql /global/mysql
# clrt register SUNW.gds
# /opt/SUNWscmys/util/ha_mysql_register -f /global/mysql/ha_mysql_config
# clrs enable mysql & // enable the HA MySQL resource specified by RS in the configuration file
Manually test the failover :
# clrg switch -n 2 mysql-rg
# clrg status mysql-rg
Upgrade Testing : MySQL 5.0 to 5.1
Simulate a production environment scenario, wherein database upgrades are the norm. Upgrade the MySQL server from version 5.0.45 to 5.1.22 with minimal or no disruption to the underlying stack.
As root user, set PATH and disable the mysql resource :
# PATH=/ws/onnv-tools/SUNWspro/SS11/bin:/usr/ccs/bin:/usr/local/mysql/bin:
/usr/local/mysql/libexec:/usr/sbin:/usr/bin:/usr/cluster/bin:/opt/mysql/mysql/bin
# export PATH
# clrs disable mysql
On both the nodes, replace the binaries under /opt/mysql/mysql dir. from 5.0.45-64 to 5.1.22-64 :
# rm -rf /opt/mysql/mysql/*
# cd /opt/mysql/mysql
# gunzip -c mysql-5.1.22-rc-solaris10-sparc-64bit.tar.gz|tar xvf -
On the first node, comment out the 'innodb_arch_dir' entry setting in /global/mysql/my.cnf (and any other innodb setting that is not valid for 5.1) .
On the first node, restart the 'mysqld' server unmonitored :
#./mysqld --defaults-file=/global/mysql/my.cnf --basedir=/opt/mysql/mysql --datadir=/global/mysql --user=root --pid-file=/global/mysql/mysql.pid &
# chown -R mysql:mysql /global/mysql/innodb /global/mysql/logs
/global/mysql/mysql
Run the mysql_upgrade utility to upgrade and repair the grant tables in the target database :
# ./mysql_upgrade -S /tmp/v1280-logical.sock -uroot -padmin123
# ./mysqlcheck -S /tmp/v1280-logical.sock -uroot -padmin123 --all-databases // run optionally
Refresh the grant privileges for the administrator :
# /opt/mysql/mysql/bin/mysql -S /tmp/v1280-logical.sock -uroot -padmin123
> use mysql
> grant all on *.* to 'root'@'v1280-137-03' identified by 'admin123';
> grant all on *.* to 'root'@'v1280-137-04' identified by 'admin123';
> grant all on *.* to 'root'@'v1280-logical' identified by 'admin123';
> UPDATE user SET Grant_priv='Y' WHERE User='root' AND Host='v1280-logical';
> UPDATE user SET Grant_priv='Y' WHERE User='root' AND Host='v1280-137-03';
> UPDATE user SET Grant_priv='Y' WHERE User='root' AND Host='v1280-137-04';
Shutdown the server, and enable the mysql resource and fault monitor :
# /opt/mysql/mysql/bin/mysqladmin shutdown -S /tmp/v1280-logical.sock
--user=root --password=admin123
# clrs enable mysql
Internal test suite run
The MySQL benchmark suite (currently single-threaded) is part of a server installation. It can be used to determine which database operations in an implementation perform well or poorly.
The following run illustrates the setup and completion of the test suite. The actual performance numbers are not the focus, this being a default implementation.
Run the internal test suite on a MySQL 5.0.45 32-bit implementation :
On both nodes, download and install the the MySQL DBD driver, and the Perl Modules to access the database servers :
As root user, download and uncompress the latest DBI/DBD files from :
http://www.cpan.org/modules/by-category/07_Database_Interfaces/DBD/
(Eg. Currently DBI-1.602.tar.gz and DBD-mysql-4.006.tar.gz).
Set a soft link to /usr/bin/perl , and a compiler path in PATH (Eg.)
# cd /usr/local/bin
# ln -s /usr/bin/perl perl
# PATH=/ws/onnv-tools/SUNWspro/SS11/bin:/usr/ccs/bin:/usr/local/mysql/bin:
/usr/local/mysql/libexec:/usr/sbin:/usr/bin:/usr/cluster/bin
# export PATH
Install the DBI module :
# cd DBI-1.602
# perl Makefile.PL
# make
cc -c -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -xarch=v8
-D_TS_ERRNO -xO3 -xspace -xildoff -DVERSION=\"1.602\"
-DXS_VERSION=\"1.602\" -KPIC
"-I/usr/perl5/5.8.4/lib/sun4-solaris-64int/CORE"
-DDBI_NO_THREADS DBI.c
# make test
...
All tests successful, 34 tests and 379 subtests skipped.
Files=126, Tests=5617, 159 wallclock secs (140.38 cusr + 15.71 csys = 156.09 CPU)
test.pl done
# make install
Install the MySQL DBD driver :
# cd DBD-mysql-4.006
# perl Makefile.PL
# make
..
cflags (mysql_config) = -I/usr/local/mysql/include -mt
-D_FORTEC_ -xarch=v8..
..
cc -c -I/usr/perl5/site_perl/5.8.4/sun4-solaris-64int/auto/DBI
-I/usr/local/mysql/include -mt -D_FORTEC_ -xarch=v8
-DDBD_MYSQL_INSERT_ID_IS_GOOD -g -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64 -xarch=v8 -D_TS_ERRNO -xO3 -xspace -xildoff
-DVERSION=\"4.006\" -DXS_VERSION=\"4.006\" -KPIC
"-I/usr/perl5/5.8.4/lib/sun4-solaris-64int/CORE" dbdimp.c
# make test
# make install
Run the test suite :
# cd /opt/mysql/mysql/sql-bench
# ./run-all-tests --socket /tmp/v1280-logical.sock --user=root
--password=admin123
Benchmark DBD suite: 2.15
...
alter-table: Total time: 28 wallclock secs ( 0.05 usr 0.07 sys + 0.00 cusr 0.00 csys = 0.12 CPU)
ATIS: Total time: 40 wallclock secs ( 5.17 usr 0.83 sys + 0.00 cusr 0.00 csys = 6.00 CPU)
big-tables: Total time: 30 wallclock secs ( 5.56 usr 0.43 sys + 0.00 cusr 0.00 csys = 5.99 CPU)
connect: Total time: 244 wallclock secs (55.75 usr 45.62 sys + 0.00 cusr 0.00
csys = 101.37 CPU)
create: Total time: 208 wallclock secs ( 4.86 usr 3.79 sys + 0.00 cusr 0.00
csys = 8.65 CPU)
insert: Total time: 2118 wallclock secs (434.92 usr 132.81 sys + 0.00 cusr 0.00 csys = 567.73 CPU)
select: Total time: 934 wallclock secs (40.27 usr 8.64 sys + 0.00 cusr 0.00 csys = 48.91 CPU)
transactions: Test skipped because the database doesn't support transactions
wisconsin: Total time: 19 wallclock secs ( 2.28 usr 1.68 sys + 0.00 cusr 0.00 csys = 3.96 CPU)
...
All 9 tests executed successfully
Totals per operation:
Operation seconds usr sys cpu tests
alter_table_add 12.00 0.01 0.01 0.02 100
alter_table_drop 11.00 0.01 0.00 0.01 91
connect 20.00 9.00 4.02 13.02 10000
connect+select_1_row 23.00 9.42 4.61 14.03 10000
...
update_with_key_prefix 43.00 7.12 5.40 12.52 100000
wisc_benchmark 3.00 1.31 0.04 1.35 114
TOTALS 3633.00 543.28 193.53 736.81 3425950
S/W fault test run
A. Perform a manual failover of an executing MySQL client transaction:
On the primary node (resources and db instance up), execute multiple transactions that insert records in a table. Check that transactions are either committed or rolled back, and that the data is consistent. Measure the switchover (controlled failover) time:
Node 1 - 1st terminal window :
Create a script (eg. insert-data) with a few transactions :
BEGIN;
INSERT INTO t(f) VALUES (1);
INSERT INTO t(f) VALUES (1);
...
COMMIT;
BEGIN;
INSERT INTO t(f) VALUES (2);
INSERT INTO t(f) VALUES (2);
...
COMMIT;
Create an innoDB table and run the script :
# /opt/mysql/mysql/bin/mysql -S /tmp/v1280-logical.sock -uroot -padmin123
mysql> CREATE TABLE t (f INT) TYPE=InnoDB;
mysql> source insert-data
Node 1 - 2nd window :
Let the script run for a while. Then, perform a manual failover onto Node 2 :
# clrg switch -n 2 mysql-rg
Node 2 - Window :
Verified that the resource group fails over, and that the mysql process, and /global/mysql are available.
# ps -ef|grep mysql
root 8084 987 0 02:54:56 ? 0:00 /bin/sh -c /opt/SUNWscgds/bin/gds_probe -R mysql -T SUNW.gds:6 -G mysql-rg
root 8085 8084 0 02:54:56 ? 0:00 /opt/SUNWscgds/bin/gds_probe -R mysql -T SUNW.gds:6 -G mysql-rg
mysql 8045 1 0 02:54:45 ? 0:01 ./bin/mysqld --defaults-file=/global/mysql/my.cnf --basedir=/opt/mysql/mysql --
Node 1 - 3rd window..
The script aborts soon after 'clrg switch' begins executing. Those transactions that completed before the abort are committed to the database successfully. The in flight transaction rolls back completely, and pending transactions are not executed. All the threads are stopped, tables and logs flushed, and a clean shutdown is performed.
Browse /var/adm/messages on both nodes :
Node 1 :
Mar 14 00:32:46 v1280-137-03 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <gds_svc_stop> completed successfully
Node 2 :
Mar 14 00:32:52 v1280-137-04 Cluster.RGM.rgmd: [ID 224900 daemon.notice] launching method <gds_svc_start> for resource <mysql>, resource group <mysql-rg>, node <v1280-137-04>, timeout <300> seconds
Failover time is the difference between the moment that gds_svc_stop completes on Node 1 and the moment that gds_svc_start launches on Node 2 = 6 sec.
B. Panic or reboot a node, and measure the failover time:
Panic :
On the adminstrative console of the primary node (resources and db instance up), type :
# uadmin 5 0 // simulate a panic
> ok boot
Browse /var/adm/messages on the secondary node :
Node 2 :
Apr 15 01:17:53 v1280-137-04 cl_runtime: [ID 446068 kern.notice] NOTICE: CMM: Node v1280-137-03 (nodeid = 1) is down.
Node 2 :
Apr 15 01:19:00 v1280-137-04 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <gds_svc_start> completed successfully for resource <mysql>, resource group <mysql-rg>, node <v1280-137-04>, time used: 18% of timeout <300 seconds>
Failover time is the difference between the moment the fault is injected on Node 2 (MySQL DB instance no
longer available) to the moment gds_svc_start is completed on the same node (MySQL DB instance available again) = 67 sec. Here, the nearest equivalent fault injection message reported on Node 2 is that of Node 1 going down.
Reboot :
On the primary node (resources and db instance up), type :
# reboot
Browse /var/adm/messages on both nodes :
Node 1 :
Apr 14 23:13:19 v1280-137-03 Cluster.PNM: [ID 226280 daemon.notice] PNM daemon exiting.
Node 2 :
Apr 14 23:14:15 v1280-137-04 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <gds_svc_start> completed successfully for resource <mysql>, resource group <mysql-rg>, node <v1280-137-04>, time used: 6% of timeout <300 seconds>
Failover time is the difference between the moment the fault is injected on Node 1 (MySQL DB instance no
longer available) to the moment gds_svc_start is completed on Node 2 (MySQL DB instance available again) = 56 sec. Here, the nearest equivalent fault injection message reported on Node 1 is that of the pnmd daemon exiting.
C.) Kill the database server process, and measure the restart time :
On the primary node (with all resources and db up), type :
# kill -9 <pid of mysqld>
Browse /var/adm/messages on Node 2 :
Node 2 :
Apr 14 03:36:30 v1280-137-04 Cluster.RGM.rgmd: [ID 784560 daemon.notice] resource mysql status on node v1280-137-04 change to R_FM_FAULTED
Node 2 :
Apr 14 03:37:20 v1280-137-04 Cluster.RGM.rgmd: [ID 515159 daemon.notice] method <gds_svc_start> completed successfully for resource <mysql>, resource group <mysql-rg>, node <v1280-137-04>, time used: 15% of timeout <300 seconds>
Failover time is the difference between the moment the fault is injected on Node 2 (MySQL DB instance no
longer available) to the moment gds_svc_start is completed on the same node (MySQL DB instance available again) = 50 sec. Here, the nearest equivalent fault injection message reported on Node 1 is that of the resource status on Node 1 being faulted.
S/W fault regression test run
An internal regression test suite (developed by Sun's cluster team) was used to execute cluster regression tests. The suite comprises a set of s/w fault tests, and is automated. The kit s/w is installed onto a client machine. The client is configured to access the cluster, and invokes the tests.
Eg. client m/c is v490-240-01 :
On the cluster machines and on the client, add the 'dats' user, and install the test suite package (SUNWdats).
As root user, add in /etc/passwd :
dats:x:55556:10:DATS Test User:/opt/SUNWdats:/bin/ksh
Add (or exists) in /etc/group :
staff::10:
# pwconv
# su - dats
> cd /net/dv2.sfbay/vol/qevol/scqe/biweekly/24/lab/sparc/SUNWdats
> pkgadd -d . SUNWdats
in /.rhosts :
+ dats
v1280-137-03 +
v1280-137-04 +
Set PATH=/usr/sbin:/usr/bin:/opt/SUNWdats/tset_dataservice/bin
On the cluster nodes, correspondingly set /.rhosts :
+ dats
v1280-137-03 +
v1280-137-04 +
v490-240-01 +
On the client, supply input to generate the data services configuration file :
> cd /opt/SUNWdats/tset_dataservice/bin
> ./get_dsinfo
Enter the name of the output file : mysql-fault
Enter the name of one of the cluster nodes : v1280-137-03
Obtaining cluster configuration information. Please wait...
Do you want to run tests using New Command Set ? [y/n] : y
Select the Data Service Type
1) Failover dataservice with one resource group
2) Scalable dataservice with one Shared Address Resource group
and one Scalable Resource Group
3) Pre-created Resource Group Configuration
4) Other
Enter your selection : 3
Obtaining the registered Resource Group Names on the cluster. Please wait...
These are the registered Resource Groups on the cluster : mysql-rg
Which Resource Groups are needed for this dataservice ?
Enter the Resource Group Names.
End the list with a blank line.
Resource Group Name ? mysql-rg
Resource Group Name ?
You entered the following Resource Group Names :
mysql-rg
Is this correct ? [y/n] : y
Obtaining the Resource Groups/Resources Properties. Please wait...
Processing Resource Group mysql-rg (failover)
Do you have a client program for Resource Group mysql-rg ? [y/n] : n
Resource sql-stor of SUNW.HAStoragePlus:4
Resource sql-stor of SUNW.HAStoragePlus:4 done
Resource v1280-logical of SUNW.LogicalHostname:2
Resource v1280-logical of SUNW.LogicalHostname:2 done
Resource mysql of SUNW.gds:6
Enter the application daemon processes.
End the list with a blank line.
Daemon Process Name ? mysqld
Daemon Process Name ?
You entered the following values
mysqld
Is this correct ? [y/n] : y
Enter the fault monitor daemon processes.
End the list with a blank line.
Fault Monitor Daemon Process Name ? gds_probe
Fault Monitor Daemon Process Name ?
You entered the following values
gds_probe
Is this correct ? [y/n] : y
Resource mysql of SUNW.gds:6 done
Processing of Resource Group mysql-rg Done
Run the test suite :
> ./run_dats -f mysql-fault
The output results are logged into this directory, and indicate the status of each test along with the sub-commands executed :
/opt/SUNWdats/dataservice_results/mysql-ca/results/log.xxxxx
The test suite comprises the following :
Non-Reboot tests :
1: Registration of Resource types for the data service
2: Creation of resource group(s) and resources
3: Bringing the resource group(s) online
# clrg online mysql-rg
4: Disabling Application resources
# clrg online -emM mysql-rg
# clrs disable mysql
5: Enabling Application resources
# clrs enable sql-stor
6: Taking the resource group(s) that contain the application resources offline
# clrg offline mysql-rg
7: Disabling the fault monitor for the application resources
# clrs unmonitor mysql
8: Enabling the fault monitor for the application resources
# clrs monitor mysql
9: Switchover of resource groups containing the application resources
10: Kill the application daemon process repeatedly to exceed the Retry_count within the Retry_interval. This should result in the restarting of the data service on the same node until Retry_count is reached, and failover of the data service after the subsequent kill attempt
# scrgadm -c -j mysql -y Retry_interval=1450
# scrgadm -c -g mysql-rg -y Pingpong_interval=360
# kill -9 <pid of mysqld> // repeat 2-3 times after mysqld restarts; failover should occur eventually
11: If the fault monitor daemon processes associated with a resource are killed, they should automatically be restarted
# kill -9 <pid of gds_probe>
12: Killing pnmd process should not affect the data service
13: Unmanage the resource group and manage it again
# clrs disable mysql
# clrs disable sql-stor
# clrs disable v1280-logical
# clrg offline mysql-rg
# clrg unmanage mysql-rg
Group: mysql-rg v1280-137-03 Unmanaged No
Group: mysql-rg v1280-137-04 Unmanaged No
# clrg manage mysql-rg
Group: mysql-rg v1280-137-03 Offline No
Group: mysql-rg v1280-137-04 Offline No
# clrs enable v1280-logical
# clrs enable sql-stor
# clrs enable mysql
14: Removing application resource
15: Removing the resource group
16: Removing the resource type
17: Check for the presence of the client program information
18: If a daemon process associated with a resource is killed, it should automatically be restarted
19: Rebooting the Primary Zone should not affect the availability of the dataservice. The resource groups/resources should failover to the next available potential Primary node/zone
Reboot tests :
1: Rebooting the primary node should not affect the availability of the data service
2: Killing rgmd on primary node should not affect the availability of the data service
3: Failback property of the resource group should work as expected
4: Check for the presence of the client program information
Uninstalling Sun Cluster
Check the Sun Cluster documentation for uninstalling Sun Cluster. The following is a brief procedure :
On Node 2, execute :
# scswitch -S -h v1280-137-04
# shutdown -g0 -y -i0
On Node 1, execute :
# scconf -c -q node=v1280-137-04,maintstate
# scstat -q
Node votes: v1280-137-04 0 0 Offline
# scconf -r -h node=v1280-137-04
If these messages appear :
scconf: Failed to remove node (v1280-137-04) - node is still cabled or otherwise in use.
scconf: Node "v1280-137-04" is still cabled.
scconf: Node "v1280-137-04" is still in use by quorum device "d2".
Then, execute these :
# clnode clear -F v1280-137-04
# scconf -r -h node=v1280-137-04
# scstat -n
Cluster node: v1280-137-03 Online
# reboot -- -x
# scinstall -r
On Node 2 :
> ok boot -x
Software links
- Solaris Cluster : http://www.sun.com/cluster
- Open Solaris : http://www.opensolaris.org
- MySQL Database : http://www.sun.com/mysql
References
- Solaris Cluster Data Service for MySQL Guide for Solaris OS : http://docs.sun.com/app/docs/doc/819-1088
Muy bueno felicidades, vamos a implementar algo asi.
Posted by Juan Garcia on May 11, 2008 at 08:57 PM IST #