Tuesday July 05, 2005 Is RAC better on a few large computer nodes, or on many small nodes? This question came up in the office yesterday. We hear a lot from Oracle about using Real Application Clusters with Linux and commodity systems and the like, but which is really the best solution? And why?
Obviously everyone has opinions about Oracle RAC, but the question was about what published material was there out there that could be shown to a customer. For me, this material falls into two camps:
Technically, RAC cannot make anything go any faster. A single transaction on a RAC database will take longer time to complete than on a non-RAC database. This is because of both the extra work that a RAC database has to do with checking ownership of blocks containing data before accessing them, and also any messages and data blocks that need to be transferred over the cluster interconnect between the nodes as part of that transaction. This is all extra to what would occur on a non-RAC database, and so must make each transaction take longer time to complete.
Although transferring a block between two nodes is faster than a disk access (say 1 millisecond compared to 10 millisecond), it is still far slower than a memory access within a node (1 microsecond or less). It is not the absolute values which are important here, which can vary a lot, but the relative scale of each. So, although a cache fusion block transfer between nodes is 10 times quicker than reading that block from disk, it is still about 1,000 times slower than accessing that block from memory in the computer system. See Oracle RAC's Secret in Dave Brillhart's Blog for a more detailed description of this issue.
What RAC does offer is both higher levels of availability and out of the box scalability, if you need them.
Using RAC on two or more nodes significantly reduces any failover time due to a single system failure. As well as reducing the time to failover any service between nodes to potentially sub-minute, it can reduce the percentage of users affected by the single node failure. If you load balance user connections across all the nodes in the RAC cluster, then only those on the crashed node are affected. Other users' connections remain, and are uninterrupted.
Clustering systems together using RAC also allows you to support a greater workload than a single system could handle. However, scalability is not what I call linear. By linear I mean that twice the resource results in twice the throughput. With RAC the best published results give about 1.8 times the throughput for 2 times the resources. This 1.8 is compounded each time you double the resource, so that 4 nodes would give you about 3.24 times the throughput of 1 node (1.8 * 1.8). If you went to 8 nodes, you would expect 5.8 times the throughput of 1 node. This is why most published implementations of RAC use no more than 4 nodes.
The only scenario under which RAC can offer better than 1.8 scalability factor, is if the workload is highly partitioned between the nodes in the cluster, with no data sharing between the users on the nodes. In essence you end up with a number of independent nodes, with discrete user populations, who never share data and so never need to get a data block from another node via the cluster interconnect. But the users are all hosted on a shared, clustered database giving very fast failover times in the event of any node crashing for any reason. This can give good scalability along with high levels of availability.
Which brings us to the practical side of using Oracle RAC. What is involved in actually deploying RAC and maintaining it on a day to day basis? For me, the sweet spot is between 2 and 4 nodes in the cluster. Any more than this and the management and maintenance tasks take up too much time and resources.
Any maintenance operation for a node has to be repeated across all the nodes. And the more nodes the longer such operations will take. Most upgrades, whether the operating system or Oracle database, will involve downtime of the service on the node being upgraded. While other users can continue on the other nodes of the cluster, these upgrades have to be done one after the other, one node at a time. So more nodes just makes things harder and take longer. And if you wanted to minimise disruption to users by only doing 1 node each night, it would take you over a week to upgrade an 8 node system, and half a month for 16 nodes.
More nodes also leads to a greater rate of node failure. If 1 node has a failure rate of x failures per year, then 8 nodes will fail at 8x per year i.e. 8 times as often. So, for example, if we expected an average of 1 failure per year for a single node, then an 8 node cluster would have a node failing every 1.5 months or 6 weeks (presuming a 4 week month) on average, and a 16 node cluster would have a node failure every 3 weeks on average.
This is why somewhere between 2 and 4 nodes offers all the benefits of high availability and load balancing, while minimising the impact of maintenance and management tasks, and the frequency of node failures.
How are customers doing this in the real world? In fact, how are Oracle deploying RAC themselves? Oracle have consolidated over 70 separate database instances into one single global instance using Oracle RAC. And they decided that the best architecture to deploy this on to achieve their targets of performance and availability was a 4 node cluster. And the platform for this? A cluster of Sun Fire 12Ks, each with 36 CPUs.
In fact, since then these systems have been upgraded, and the CPU count increased. Larry Ellison seems very happy with these big Sun servers. (If that link is broken, here is another link to a cached copy of that article on Google.)
So, from a technical and practical point of view between 2 and 4 nodes seems to be the best architecture for deploying business critical databases on using Oracle Real Application Clusters.
( Jul 05 2005, 12:48:21 PM BST ) Permalink Comments [3]