- Introduction
- Activity
- Setup and build environment
- MySQL Configuration options
- Studio Compiler flags
- Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test
- Recommended compiler options
- Studio 12 64-bit (1-8 threads)- sysbench read-only oltp test
- Software documentation links
- References
- Acknowledgements
Introduction
Solaris 10, Sun's flagship OS is multi-platform, scalable and yields massive performance advantages for databases, Web, and Java technology-based services. Its advanced features include security (Process Rights Management), system observability (DTrace), system resource utilization (containers and virtualization), an optimized network stack, data management, system availability (Predictive Self Healing), interoperability tools, Support & Services (s/w subscription, h/w support, technical help).
Sun Studio compiler delivers high-performance, optimizing C, C++, and Fortran compilers for the Solaris OS on SPARC, and for both Solaris and Linux on x86/x64 platforms, including the latest multi-core systems.
Sun Fire™ SPARC servers pack up to 4 UltraSPARC IV Chip Multi threading processors delivering up to eight concurrent threads in 32 GB of memory. Coupled with Solaris 10, these servers are capable of delivering very high levels of throughput for demanding departmental and enterprise applications.
MySQL, the most popular Open Source database was developed, distributed, and supported by a commercial company MySQL AB, now part of Sun as a result of an acquisition. MySQL is multi-threaded and consists of an SQL server, client programs and libraries, administrative tools, and APIs. Java client programs that use JDBC connections can access a MySQL server via the MySQL Connector/J interface.
Activity
The objective was to recommend a set of high performance Studio compiler flags for 32-bit integration with project webstack. webstack addresses the Open Solaris community needs for web tier technologies. It is a bundle of open source software delivered in Solaris and supported by Sun, and contains software that Sun considers critical to its business.
The MySQL source code was compiled on a Sun Fire™ SPARC system with sets of Sun Studio run time flags.The resulting binary for each set was then run against the sysbench workload to obtain the performance throughput.
The recommended flags were integrated into webstack with appropriate MySQL configuration options.
Setup and Build environment
The OS Update version used is Solaris 10, Update 4 (s10x_u4wos_11) The C and C++ Compilers are part of the Studio compiler collection. The MySQL Community Server version used is 5.0.4x . The sysbench kit version used is v0.3.3 .
The MySQL Server and the sysbench kit are installed on a Sun Fire™ SPARC server.
- Database Node :
- CPU : 4 core UltraSPARC-IV x 1350 MHz
- Memory : 32,768 MB
- Operating System : Solaris 10, Update 4
MySQL Configuration options :
| Option | Possible reason for inclusion |
|---|---|
| --prefix | Specify installation dir. |
| --xxdir | Specify a directory for serving a purpose |
| --with-server-suffix | Adds a suffix to the mysqld version string |
| --enable-thread-safe-client | Make mysql_real_connect() thread-safe with this option, and recompile the distribution to create a thread-safe client library, libmysqlclient_r |
| --with-mysqld-libs | Include libs in mysqld |
| --with-named-curses=-lcurses | Use specified curses libraries instead of those automatically found by configure |
| --with-client-ldflags=-static | compile statically linked programs |
| --with-mysql-ldflags=-static | compile statically linked programs |
| --with-pic | try to use only PIC objects, and omit usage of non-PIC objects |
| --with-big-tables | Support tables with more than 4 GB rows even on 32 bit platforms |
| --with-yassl | To use SSL connections; configure to use the bundled yaSSL library |
| --with-readline | Do not use system readline or bundled copy |
| --with-xx-storage-engine | Enable the xx Storage Engine |
| --with-innodb | Include the InnoDB table handler |
| --with-extra-charsets=complex | Additionally include all character sets that can't be dynamically loaded to be compiled into the server |
| --enable-local-infile | Permits usage of LOAD DATA (LOCAL INFILE) with files on client-side file system. This adds flexibility. With LOCAL, no access to the server is needed except for the MySQL connection |
| --with-ndb-cluster | Enables support for the ndb cluster storage engine on applicable platforms |
| --with-zlib-dir=bundled | Helps the linker find -lz (libz.so) when linking client programs |
Studio Compiler flags :
| Compiler Options | Possible reason for inclusion |
|---|---|
| -m64 or -m32 | Specifies the memory model for the compiled binary object, and generates optimal code. |
| -mt | Macro option that expands to -D_REENTRANT -lthread |
| -fsimple=1 | The optimizer is not allowed to optimize completely without regard to roundoff or exceptions. A floating-point computation cannot be replaced by one that produces different results with rounding modes held constant at runtime. Include this explicitly in the C++ flags |
| -fns=no | Selects SSE flush-to-zero mode and, where available, denormals-are-zero mode; causes subnormal results to be flushed to zero; where available, causes subnormal operands to be treated as zero |
| -xbuiltin=%all | Improves the optimization of code that calls for standard library functions |
| -xO3 | Generates a high level of optimization. |
| -xstrconst | Inserts string literals into the read-only data section of the text segment |
| -xlibmil | Selects the appropriate assembly language inline templates for the floating-point option and platform |
| -xlibmopt | Enables the compiler to use a library of optimized math routines. |
| -xtarget=generic | Specifies the target system for instruction set and optimization. It sets -xarch, -xchip and -xcache |
| -xrestrict | Tells the compiler that there is no pointer aliasing between the arguments in functions |
| -xprefetch=auto | Enables prefetch instructions |
| -xprefetch_level=3 | Controls the aggressiveness of automatic insertion of prefetch instructions as set by -xprefetch=auto |
| -xunroll=2 | Suggests to the optimizer to uroll loops n times. Instructions called in multiple iterations are combined into a single iteration. Register usage and code size may increase. |
| -xalias_level | Provides information to the compiler about pointer usage, and enables it to perform type-based alias analysis and optimizations. |
Studio 11 32-bit (1-16 threads)- sysbench read-only oltp test; bottom cell numbers correspond to tps throughput when run with with SUNPRO_C source code change
| 1 | 2 | 4 | 8 | 16 | |
|---|---|---|---|---|---|
| 1. Release binary (Studio 10) : -xO3 -Xa -xstrconst -mt -D_FORTEC_ -xarch=v8 -xc99=none [for C++ , use -noex , and remove -Xa and -xstrconst] | 181.35 | 335.26 | 594.94 | 914.80 | 942.06 |
| 2. Studio 11 Baseline : -xlibmil -xO3 -DHAVE_RWLOCK_T -mt -fsimple=1 -fns=no | 183.28 184.22 | 337.34 338.78 | 598.87 597.71 | 833.04 804.49 | 929.94 871.35 |
| 3. -xbuiltin=%all | 186.10 185.59 | 345.30 342.60 | 604.56 603.12 | 812.25 921.53 | 930.23 942.36 |
| 4. -xbuiltin=%all -xunroll=2 | 188.53 189.39 | 347.28 349.89 | 613.24 613.07 | 927.57 846.86 | 941.56 888.54 |
| 5. -xbuiltin=%all -xprefetch=auto -xprefetch_level=3 | 184.16 186.40 | 343.20 344.76 | 602.33 604.19 | 839.48 813.32 | 943.60 948.48 |
| 6. -xbuiltin=%all -xalias_level=std [=simple for C++] | 186.81 187.77 | 347.56 346.61 | 610.44 611.10 | 817.28 923.46 | 946.79 922.31 |
| 7. -xbuiltin=%all -xtarget=native | 190.70 190.84 | 354.24 353.79 | 619.54 620.02 | 828.70 849.72 | 948.16 898.97 |
| 8. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 | 188.21 188.44 | 348.44 348.51 | 614.24 613.37 | 840.94 926.02 | 948.76 942.59 |
| 9. -xbuiltin=%all -xunroll=2 -xprefetch=auto -xprefetch_level=3 -xalias_level | 187.86 189.38 | 348.22 347.57 | 618.12 618.38 | 850.12 851.36 | 941.20 909.14 |
Recommended compiler options for integration with webstack
A.) The recommended Studio 11 compiler flags for webstack on SPARC are ' -xbuiltin=%all' and ' -xtarget=native, -xunroll=2'. With -xtarget, a throughput increase of 3.4%-4% was observed over the baseline. With -xunroll, a throughput increase of 2.3%-3.6% was observed over the baseline.
B.)The SUNPRO_C source change yielded a throughput increase in two-thirds of the cases over those run without this change. This option can be used for SPARC and x64 platforms. The original MySQL sources have explicit inlining of small support functions only with gcc and Visual C++. However, this inlining is found to help Sun Studio as well, and can be enabled with the following change to the header file $MYSQL_HOME/innobase/include/univ.i on line 61:
#if !defined(GNUC) && !defined(WIN) && !defined(__SUNPRO_C)
Studio 12 64-bit (1-64 threads)- sysbench read-only oltp test;
| Compiler Options | 1 | 2 | 4 | 8 |
|---|---|---|---|---|
| 1. Release binary (64-bit) : -m64 -O2 -mtune=k8 [LDFLAGS=-static-libgcc] | 182.93 | 335.54 | 592.00 | 902.69 |
| 2. Studio 12 (64-bit): -Xa -fast -m64 -xarch=sparc -xstrconst -mt [for C++, append -noex -fsimple=1 -fns=no and remove -Xa] | 197.81 | 346.33 | 586.13 | 685.93 |
| 3. Feedback Optimization added (FBO): As in 3 with -xprofile=use:dir ] | 223.09 | 385.44 | 635.79 | 719.65 |
| 4. FBO + Loop Unrolling : As in 4 with -xunroll=2 | 227.25 | 388.14 | 631.89 | 724.14 |
| 5. FBO + Prefetching : As in 4 with -xprefetch=auto -xprefetch_level=3 | 228.64 | 395.34 | 651.52 | 723.02 |
| 6. FBO + Restricted Pointer Parameters : As in 4 with -xrestrict=%all | 228.91 | 393.56 | 635.01 | 736.71 |
The Studio 12 compiler flags that performed the best were -xrestrict=%all and '-xprefetch=auto -xprefetch_level=3', when used with FBO. These combinations gave a throughput increase of 8% - 15% over the baseline studio64 (without FBO).
Software documentation links
- Solaris 10 OS : (here)
- Sun Studio 12 Compiler Collection : (here)
- Sun Fire™ Servers : (here)
- MySQL Database : (here)
- sysbench site : (here)