Friday Jul 18, 2008

I've written recently about the virtues of Solaris Containers and it's certainly true that from my perspective Containers are often times the best choice when the goals are to increase server utilization and reduce system footprint. The fact remains however that Containers represent only one of many consolidation technology options available to systems administrators. In this article, I'll identify four basic technology categories and touch on some considerations when selecting the best technology for your specific use-case.

Virtualization Taxonomy

When it comes to consolidating multiple users and workloads onto a single system there is a wide range of options available depending on the server hardware architecture and host operating system selected. That said, most of the available options fall into two general categories or four specific sub-categories. In this next section I'll briefly discuss the architecture and characteristics of:
  • Multiple Operating System approaches: Physical Domains and Bare Metal Hypervisors
  • Single Operating System approaches: OS Virtualization and Resource Management

Physical Domains

Physical Domains are the most mature, high performance and highly available of the consolidation options. They are also the most expensive and least flexible option and are usually only available on mid-range and large-scale enterprise class servers. Server architectures which support this capability allow a server to be physically and electrically partitioned into some small number of individual servers with each partition having its own dedicated CPU, Memory and IO buses. Examples include Sun's M-Series Servers. Granularity is usually limited to a single CPU/Memory board and usually to a physical IO assembly. As a result and given the memory density and CPU performance of modern systems, each domain or partition is typically very powerful and will minimally contain 2-4 CPU sockets, 16GB of RAM and multiple PCI buses. Each partition then runs its own completely independent copy of an operating system which has full and direct control over its associated hardware resources resulting in maximum performance. Availability is optimized as most individual hardware events will be isolated to one partition and so won't impact other physical partitions or their workloads.

Bare Metal Hypervisors

Bare Metal also known as type 1 hypervisors are a newer technology which allow the creation of Logical (vs. Physical) Domains. Examples of type 1 hypervisors are VMware's ESX Server, Sun's upcoming xVM Server and LDom technology, and Microsoft's Hyper-V. Hypervisors consist of firmware or software (sometimes small, sometimes not) that abstracts the underlying physical server hardware from guest operating systems. As with Physical Domains, each domain runs a fully independent version of the OS with each OS supporting different kernel patch levels. This virtualization approach provides more flexibility than Physical Domains and can be implemented at lower price-points, although there may be an additional license and support costs for the hypervisor. Unlike Physical Domains where there is a tight coupling of CPU, Memory and IO to a given partition, hypervisor enabled logical domains can bring together arbitrary combinations of CPU cores/threads, physical memory as well as disk and network IO. The result is that even a relatively modest 1-4 socket commodity server can host dozens of individual domains of all shapes and sizes each with their own complete and independent operating system.

In exchange for this increased flexibility there is a trade-off in overall performance and efficiency when compared with physical domains. The degree of impact varies depending on the guest OS, hypervisor implementation and especially on the IO demands of the individual workloads. Service availability can also be impacted, depending on the resiliency of the application architecture. Lower cost systems often lack high levels of hardware redundancy and specific hardware failure events or bugs in the hypervisor code may impact multiple logical domains, their OS instances and hosted workloads.

OS Virtualization

The two previous technologies use different approaches to enable a single server to host multiple independent operating systems. Another approach available in some Operating Systems is OS Virtualization. With OS virtualization users and applications are still left with the impression that they have their own individual server and OS yet there is actually only one physical copy of the kernel running. Solaris Containers fall into this category. As discussed in previous posts, some advantages to Solaris containers are: they are lightweight (no-overhead); require no additional license fees; don't negatively impact performance; run on both SPARC and x86 hardware; and don't require hardware resources to be carved up and mapped to specific Containers in advance. That said, Solaris Containers do support physical mapping and several types of policy based Resource Management to increase overall control and predictability. Some drawbacks to Containers are: all Containers must be halted when the kernel (global zone) is patched; some applications which require direct hardware control aren't supported; downtime associated with upgrading a system (with Containers) to the latest Solaris Update can be prohibitive unless using an alternate boot environment with Live Upgrade.

Since OS Virtualization provides a level of abstraction above the OS layer this approach can be combined with either of the two previous multi-OS technologies without incurring additional overhead. For example, this hybrid approach can be useful for consolidating multiple related workloads onto powerful Physical Domains to provide additional granularity and flexibility.

Resource Management

Resource Management is a fourth consolidation technology that receives less attention but is still important to consider. Like OS Virtualization, this technology works within a given OS instance and is used to provide lighweight policy control over individual user processes. In most cases where a single OS instance is sufficient, OS virtualization will usually be best. However, when there are no security benefits from isolating workloads or when IP addresses are scarce, Resource Management technologies like the Project mechanism and Fair Share Scheduler (FSS) can ensure that competing users and processes stay within resources limits and don't unfairly or inappropriately consume shared resources.

A good example of where Resource Management was the appropriate technology is a recent customer scenario where hundreds of individual engineers shared a low-end utility server running Solaris. The server didn't feature physical domains and using a hypervisor approach would have required considerable overhead and administrative setup. It also didn't make sense to provision a separate Container for each user and allocate a unique IP address, file system, name space, etc. when all that was needed was to make sure no one user could monopolize the entire server unless he or she was the only one using it. Using the Solaris Project mechanism, each user was assigned an equal share value so that, upon login, their child processes could utilize no more than a percentage of the machine equal to 1 / (the total number of users on the system) when the system was loaded.

Summary

So, which technology approach is best for a given use case? The answer is "it depends". It's important to carefully consider your workloads technical and operational requirements as well as your organization's technical depth and operational maturity. Prior to wide scale deployment of these technologies make sure your support organizations understand which group is responsible for managing this new technology layer and have the necessary training and hands-on expertise. It's also important that the organization has already made progress in establishing and reducing the number of standard operational environments prior to large-scale deployment as virtualization can easily cause an explosion in the number of managed OS instances and create considerable scale problems with support.

Some application or service requirements to consider when selecting the right technology and implementation are: availability, security, performance, amount of IO, Service Level Objective (SLOs), maintenance windows. Likewise, individual technologies should be evaluated and compared based upon: performance, Total Cost of Ownership (TCO), maturity, complexity, availability of management tools, adherence to open standards, and barriers to exit.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed