Thursday Mar 29, 2007
Thursday Mar 29, 2007
Did you know that Solaris Containers has the largest HCL of any server virtualization solution?
Here are three examples:
Is that metric relevant? Many factors should affect your virtualization choice. One of them is hardware choice: "does my choice of server virtualization technology limit my choice of hardware platform?"
The data points above show sufficient choice in commodity hardware for most people, but Containers maximizes your choice, and only Containers is supported on multiple hardware architectures.
Thursday Mar 22, 2007
Two previous blogs described my quest to create and boot 500 zones on one system as efficiently as possible, given my hardware constraints. But my original goal was testing the sanity of the limit of 8,191 zones per Solaris instance. Is the limit too low, or absurdly high? Running 500 zones on a sufficiently large system seemed reasonable if the application load was sufficiently small per zone. How about 1,000 zones?
Modifying my scripts to create the 501st through 1,000th zones was simple enough. The creation of 500 zones went very smoothly. Booting 1,000 zones seemed too easy...until somewhere in the 600's. Further zones didn't boot, or booted into administrative mode.
Several possible obstacles occurred to me, but a quick check of Richard and Jim's new Solaris Internals edition helped me find the maximum number of processes currently allowed on the system. The value was a bit over 16,000. And those 600+ zones were using them all up. A short entry in the global zone's /etc/system file increased the maximum number of processes to 25,000:
set max_nprocs=25000
Unfettered by a limit on the number of concurrent processes, I re-booted all the zones. More then 900 booted, but the same behavior returned: many zones did not boot properly. The running zones were not using all 25,000 PID slots. To re-diagnose the problem I first verified that I could create 25,000 processes with a "limited fork bomb." I was temporarily stumped until a conversation I had with some students in my LISA'06 class "Managing Resources with Solaris 10 Containers." One of them had experienced a problem on a very large Sun computer that was running hundreds of applications, though they weren't using Containers.
They found that they were being limited by the amount of software thread (LWP) stack space in the kernel. LWP stack pages are one of the portions of kernel memory that are pageable. Space for pageable kernel memory is allocated when the system boots and cannot be re-sized while the kernel is running.
The default size depends on the hardware architecture. For 64-bit x86 systems the default is 2GB. The kernel tunable which controls this is segkpsize, which represents the number of kernel memory pages that are pageable. When these pages are all in use, new LWPs (threads) cannot be created.
With over 900 zones running, prstat(1M) showed over 77,000 LWPs in use. To test my guess that segkpsize was limiting my ability to boot 1,000 zones, I added the following line to /etc/system and re-booted:
set segkpsize=1048576This doubles the amount of pageable kernel memory to 4GB on AMD64 systems. With that, booting my 1,000 zones was boring, as it should be.
Final statistics for 1,000
running zones included:
Conclusions:
Thursday Mar 15, 2007
As I said last time, zone-clone/ZFS-clone is time- and space-efficient. And that entry looked briefly at cloning zones. Now let's look at the integration of zone-clones and ZFS-clones.
Instead of copying every file from the original zone to the new zone, a clone of a zone that 'lives' in a ZFS file system is actually a clone of a snapshot of the original zone's file system. As you might imagine, this is fast and small. When you use zone-clone to install a zone, most of the work is merely copying zone-specific files around. Because all of the files start out identical from one zone to the next, and because each zone is a snapshot of an existing zone, there is very little disk activity, and very little additional disk space is used.
But how fast is the process of cloning, and how small is the new zone?
I asked myself those questions, and then used a Sun Fire X4600 with eight AMD Opeteron 854's and 64GB of RAM to answer them. Unfortunately the system only has its internal disk drives. The disk drive was the bottleneck most of the time. I created a zpool from one disk slice on that drive, which is neither robust nor efficient. But it worked.
Creating the first zone took 150 seconds, including creating the ZFS file system for the zone, and used 131MB in the zpool. Note that this is much smaller than the disk space used by other virtualization solutions. Creating the next nine zones took less than 50 seconds, and used less than 20MB, total, in the zpool.
The length of time to create additional zones gradually increased. Creation of the 200th through 500th zones averaged 8.2 seconds each. Also, the disk space used gradually increased per zone. After booting each zone several times, they each used 6MB-7MB of disk space. The disk space used per zone increased as each zone made its own changes to configuration files. But the final rate of creation was 489 zones per hour.
But will they run? And are they as efficient at memory usage as they are at disk usage?
I booted them from a script, sequentially. This took roughly 10 minutes Using the "memstat" tool of mdb, I found that each zone uses 36MB of RAM. This allowed all 500 zones to run very comfortably in the 64GB on this system. This small amount was due to the model used by sparse-root zones: a program that is running in multiple zones shares the program's text pages.
The scalability of performance was also excellent. A quick check of CPU usage showed that all 500 zones used less than 2% of the eight CPUs in the system. Of course, there weren't any applications running in the zones, but just try to run 500 guest operating systems in your favorite hypervisor-based virtualization product...
But why stop there? 500 zones not enough for you? Nah, me neither. How about 1,000 zones? That sounds like a good reason for a "Part 3."
New features added recently to Solaris zones improve on their excellent efficiency:
So, maybe computers hate me for pushing them out of their comfort zone. Or maybe it's something else.