Thursday Oct 22, 2009
Thursday Oct 22, 2009
A white paper titled "Addressing Virtualization and High-Availability Needs with Sun Solaris Cluster", written by IDC analyst, Jean Bozman, is now available!
What you will find from the paper includes:
- Businesses' High Availability needs in the Virtualized IT Environment and how Solaris Cluster addresses these requirements
- Why Virtualization Software and High Availability Software are being used to protect applications
- Worldwide Availability and Clustering Software Revenue, 2008-2013
- Integration of High Availability and Virtualization Use Cases
- Solaris Cluster Customer Snapshots
We look forward to your comments on the product.
-Meenakshi
Director, Solaris Cluster
Thursday Jun 25, 2009
At first sight, a single-node cluster may seem to be a pointless thing. After all, what sort of high availability can you get from one node? 
That might be a valid point if HA alone were the only consideration, but there are quite a few other ways in which single-node clusters can be useful. Two of the most useful ones are:
SCGE allows two clusters, separated by enough distance that a disaster at one site will not affect the the other, to be managed together. Several data replication products (AVS, SRDF, Oracle DataGuard, etc.) can be managed within this two-cluster partnership to ensure that the DR site has up-to-date information, ready to take over service.
Obviously this configuration requires two clusters, but if we assume that the DR site will be needed only in the (hopefully rare) instance of a disaster, and probably occasionally during maintenance of the primary, there is no need for it to be an exact copy of the primary site. In fact, it can be a single node. All that is required is that it be running Solaris Cluster software, i.e. it can be a Single-Node Cluster.
Carrying this idea further, it is also fully supported to have single-node clusters at both primary and secondary sites.
As I mentioned at the start, this won't give very much in the way of High Availability in the event of a local primary-site server failure, but you may not need that. Strange though it might seem at first glance, HA isn't a prerequisite for DR, it depends entirely on your business continuity needs (and that's a subject for a future blog entry).
No special tricks or configurations are required for this, it just works “out of the box”. With two single-node clusters and AVS (SNDR) replication between them, you have a fully-supported Disaster Recovery configuration, implemented with no special additional hardware. Larger sites with external storage arrays and replication also work just as well with a single-node cluster as with a multi-node configuration.
Another place where a single-node cluster can be really useful is when developing cluster-based software, especially cluster agents. With the support for Solaris Containers (aka zones) that was added in Solaris Cluster 3.2, this has become even easier.
Fully testing a cluster agent requires that you simulate failures, such as system crashes or disconnections, and ensure that the agent reacts correctly. This is also true when testing that a given application operates correctly in a cluster environment. It's not something that you'd normally want to do on your desktop. However, providing an extra pair of systems in a cluster as lab test equipment for each developer is costly, and takes up valuable lab space and energy.
The solution? A single-node cluster, with some zones configured. With Solaris Cluster 3.2 you can specify zones (in the format of nodename:zonename) in the nodelist of an application resource group, see the clrg(1CL) manpage. The cluster software, running in the global zone, manages those applications just as if they were on separate physical nodes. You can request that the resource groups be switched between zones, or even crash or halt zones to test that automatic recovery is performed correctly. All without leaving your desk or rebooting your development system.
I hope that's given a brief idea of what can be done today with a single node. What might the future hold? Well, people will jump on me if I promise anything, but I really like some of the ideas that the Open HA Cluster guys have been demonstrating. Take a look at Thorsten's whitepaper if you want to try clustering VirtualBox systems - on your laptop!
As always, join us at http://www.opensolaris.org/os/community/ha-clusters/ to discuss this or any other cluster topics.
Steve McKinty
SCGE Architect
Friday Apr 24, 2009
Solaris Cluster is now supported with the latest in Intel technology: the Sun Fire x4170 (1U) and x4270/x4275 (2U) leading-edge x64 servers.
Included in the latest platform portfolio is the Sun Blade x6270 Intel Xeon 5500 Nehalem Constellation blade. The new servers support a slew of features, such as double the compute threads (16 Hyper-Threads) and more memory than before at 144GB.
With the addition of PCIe Gen 2, larger storage capacities can be achieved with double the I/O. This is all wrapped around a package that includes SAS, SATA, and flash-based Solid State Disks (SSD).
Combine this with Solaris and Solaris Cluster, and you've got a scalable architecture for multi-threaded applications with mission-critical capability.
Solaris Cluster configurations on x4170, x4270, and x4275, along with the x6270, are now supported with the following Solaris Cluster releases and configurations:
* Solaris Solaris 10
* Solaris Cluster Solaris Cluster 3.2
* 8-Node support
* N*N
* 8-Node N+1
* 8-Node Cluster Pairs * 8 Node Pair + N
* 8-Node Campus Cluster
Feel free to contact me or your sales representative for more detailed information!
Roger Autrand
Sr. Manager - Solaris Cluster
Friday Mar 06, 2009
In addition to my day job as an engineer on the Sun Cluster team, I spent most of my nights and weekends last year writing a tutorial and reference book on OpenSolaris. OpenSolaris Bible, as it's titled, was released by Wiley last month and is available from amazon.com and all other major booksellers. At almost 1000 pages, my co-authors Dave, Jerry, and I were able to be fairly comprehensive, covering topics from the bash shell to the xVM Hypervisor, and most everything in between. You can examine the table of contents and index on the book website.
Of particular interest to readers of this blog will be Chapter 16, “Clustering OpenSolaris for High Availability.” (After working on Sun Cluster for more than 8 years, I couldn't write a book like this without a Chapter on HA clusters!) Coming at the end of Part IV, “OpenSolaris Reliability, Availability, and Serviceability”, this chapter is a 70 page tutorial in using Sun Cluster / Open HA Cluster. After the requisite introduction to HA Clustering, the chapter jumps in with instructions for configuring a cluster. Next, it goes through two detailed examples. The first shows how to make Apache highly available in failover mode using a ZFS failover file system. The second demonstrates how to configure Apache in scalable mode using the global file system. Following the two examples, the chapter covers the details of resources, resource types, and resource groups, shows how to use zones as logical nodes, and goes into more detail on network load balancing. After a section on writing your own agents using the SMF Proxy or the GDS, the chapter concludes with an introduction to Geographic Edition.
This chapter should be useful both as a tutorial for novices as well as a reference for more advanced users. I enjoyed writing it (and even learned a thing or two in the process), and hope you find it helpful. Please don't hesitate to give me your feedback!
Nicholas Solter
Monday Mar 02, 2009
We've had many questions, comments and complaints about IP address "problems" when using highly available services in a Sun Cluster environment. We found out that most, if not all of these were related to configurations where firewalls were configured between the service running on the cluster, and the clients connecting to the cluster.
So, what is the problem? The firewall administrators often make the assumption that a packet sent from a client to the logical IP address of an HA service will generate a response IP packet with exactly the same logical IP address as the source address. So, they configure an appropriate firewall rule and wonder why this rule does not work, i.e., instead there were IP packets coming back from an HA service that did not match this rule.
Then they start researching
the network configuration on the cluster node that hosts the HA service and find
out that the logical IP
address used by that service was set to a state called "DEPRECATED". And they think this is the root cause
of their problem - which (we think) is not the case.
As address selection can become very complicated in complex network setups, the following will be true for the typical simple network setup found at most installations.
Let's look at the address selection for an outgoing packet a bit more closely. First we must make a distinction between TCP (RFC 793) and UDP (RFC 768). TCP is a connection-oriented protocol, i.e. a connection is established between a client and a service. Using this connection, source and target addresses are always used appropriately; in a Sun Cluster environment the source address of a packet sent by the service to a client will usually be the logical IP address of that HA service - but only if the client used the logical service address to send its request to the service.
So, this will not cause any problems with firewalls, because you know exactly which IP addresses will be used as source addresses for outgoing IP addresses.
Let's look into UDP now. UDP is a connectionless protocol, i.e., there is no established connection between a client and a server (service). A UDP-based service can choose its source address for outgoing packets by binding itself to a fixed address, but most services don't do this. Instead, they accept incoming packets from all network addresses configured. For those readers who are familiar with network programming, the typical code segment has the following lines in it:
struct sockaddr_in address;
...
address.sin_addr.s_addr = INADDR_ANY;
...
if (bind (..., (struct sockaddr *) &address, ...) == 0)
Using this typical piece of code, the UDP service listens on all configured IP addresses, and the outbound source address is
set by the IP layer and the choosing algorithm is complex and cannot be influenced. Details can be read in Infodoc 204569
(access on SunSolve for SPECTRUM contract holders only); but we think
they are not that relevant here, except for this quote:
"IP addresses associated with interfaces marked as DEPRECATED will
not normally be used as source addresses by IP
unless deprecated interfaces are all that is available, in which case they will
be used."
DEPRECATED flag
So, now DEPRECATED comes into play. A DEPRECATED address will - normally - not be used as a source address!
First, why does Sun Cluster set HA IP addresses, i.e. logical or floating addresses into state DEPRECATED? Because
they are floating addresses - there is no guarantee that they will stay on one node. In failure situations an HA IP address will
float to another node together with its service. Or if the administrator decides
to migrate a service; or when the service is
stopped, the logical IP address might disappear on one node.
Let's have a look at services where IP communication is initiated from a cluster
node. E.g. a cluster node might try
to mount an external NFS share on this node temporarily. Whether this is UDP or
TCP based NFS would not matter in this case!
The IP layer would choose a source address; it could
be the logical IP address of an HA service that happens to run on the same system - if it were not DEPRECATED. Now, imagine
the NFS mount is successful, is using the logical IP address and NFS transfers work fine.
Now, the HA service that owns the HA IP address is switched to another
node in the cluster; its IP address would also switch. What would happen to the
NFS traffic between this node and the external
NFS server? It would fail. Packets coming from the NFS server would reach a different node now; namely that of the HA service
that switched, taking its IP address with it. (And the NFS client on the cluster
node would fail as well.....)
So, that is the reason for setting the DEPRECATED flag on HA IP addresses; remember the quote above: "...marked as DEPRECATED will not normally be used...".
Although not setting the DEPRECATED flag would improve the probability that the address potentially be used by the IP layer
as a source address, there is no guarantee and at the end, this would not help.
But the DEPRECATED flag helps to prevent
major problems on cluster nodes.
Back to the original question: how can I make my firewall rules work? There are 4 possibilities - in prioritized order, best practice first:
DEPRECATED;
Sun Cluster sets the DEPRECATED flag on HA service IP addresses by
design and it is a good thing, as it
prevents strange problems with IP based clients on cluster nodes to happen. Not
setting it, would not solve the problems
reported.