Wednesday Sep 05, 2007
Wednesday Sep 05, 2007
This update to Solaris 10 has many new features. Of those, many enhance Solaris Containers either directly or indirectly. This update brings the most important changes to Containers since they were introduced in March of 2005. A brief introduction to them seems appropriate, but first a review of the previous update.
Solaris 10 11/06 added four features to Containers. One of them is called "configurable privileges" and allows the platform administrator to tailor the abilities of a Container to the needs of its application. I blogged about configurable privileges before, so I won't say any more here.
At least as important as that feature was the new ability to move (also called 'migrate') a Container from one Solaris 10 computer to another. This uses the 'detach' and 'attach' sub-commands to zoneadm(1M).
Other, minor new features, included:
Earlier releases of Solaris 10 included the Resource Capping Daemon. This tool enabled you to place a 'soft cap' on the amount of RAM (physical memory) that an application, user or group of users could use. Excess usage would be detected by rcapd. When it did, physical memory pages owned by that entity would be paged out until the memory usage decreased below the cap.
Although it was possible to apply this tool to a zone, it was cumbersome and required cooperation from the administrator of the Container. In other words, the root user of a capped Container could change the cap. This made it inappropriate for potentially hostile environments, including service providers.
Solaris 10 8/07 enables the platform administrator to set a physical memory cap on a Container using an enhanced version of rcapd. Cooperation of the Container's administrator is not necessary - only the platform administrator can enable or disable this service or modify the caps. Further, usage has been greatly simplified to the following syntax:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set physical=500m zonecfg:myzone:capped-memory> end zonecfg:myzone> exitThe next time the Container boots, this cap (500MB of RAM) will be applied to it. The cap can be also be modified while the Container is running, with:
global# rcapadm -z myzone -m 600mBecause this cap does not reserve RAM, you can over-subscribe RAM usage. The only drawback is the possibility of paging.
For more details, see the online documentation.
Virtual memory (i.e. swap space) can also be capped. This is a 'hard cap.' In a Container which has a swap cap, an attempt by a process to allocate more VM than is allowed will fail. (If you are familiar with system calls: malloc() will fail with ENOMEM.)
The syntax is very similar to the physical memory cap:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set swap=1g zonecfg:myzone:capped-memory> end zonecfg:myzone> exitThis limit can also be changed for a running Container:
global# prctl -n zone.max-swap -v 2g -t privileged -r -e deny -i zone myzoneJust as with the physical memory cap, if you want to change the setting for a running Container and for the next time it boots, you must use zonecfg and prctl or rcapadm.
The third new memory cap is locked memory. This is the amount of physical memory that a Container can lock down, i.e. prevent from being paged out. By default a Container now has the proc_lock_memory privilege, so it is wise to set this cap for all Containers.
Here is an example:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set locked=100m zonecfg:myzone:capped-memory> end zonecfg:myzone> exit
Many existing resource management features have a new, simplified user interface. For example, "dedicated-cpus" re-use the existing Dynamic Resource Pools features. But instead of needing many commands to configure them, configuration can be as simple as:
global# zonecfg -z myzone zonecfg:myzone> add dedicated-cpu zonecfg:myzone:dedicated-cpu> set ncpus=1-3 zonecfg:myzone:dedicated-cpu> end zonecfg:myzone> exitAfter using that command, when that Container boots, Solaris:
Also, three existing project resource controls were applied to Containers:
global# zonecfg -z myzone zonecfg:myzone> set max-shm-memory=100m zonecfg:myzone> set max-shm-ids=100 zonecfg:myzone> set max-msg-ids=100 zonecfg:myzone> set max-sem-ids=100 zonecfg:myzone> exitFair Share Scheduler
A commonly used method to prevent "CPU hogs" from impacting other workloads is to assign a number of CPU shares to each workload, or to each zone. The relative number of shares assigned per zone guarantees a relative minimum amount of CPU power. This is less wasteful than dedicating a CPU to a Container that will not completely utilize the dedicated CPU(s).
Several steps were needed to configure this in the past. Solaris 10 8/07 simplifies this greatly: now just two steps are needed. The system must use FSS as the default scheduler. This command tells the system to use FSS as the default scheduler the next time it boots.
global# dispadmin -d FSSAlso, the Container must be assigned some shares:
global# zonecfg -z myzone zonecfg:myzone> set cpu-shares=100 zonecfg:myzone> exitShared Memory Accounting
One feature simplification is not a reduced number of commands, but reduced complexity in resource monitoring. Prior to Solaris 10 8/07, the accounting of shared memory pages had an unfortunate subtlety. If two processes in a Container shared some memory, per-Container summaries counted the shared memory usage once for every process that was sharing the memory. It would appear that a Container was using more memory than it really was.
This was changed in 8/07. Now, in the per-Container usage section of prstat and similar tools, shared memory pages are only counted once per Container.
global# zonecfg -z global zonecfg:myzone> set cpu-shares=100 zonecfg:myzone> set scheduling-class=FSS zonecfg:myzone> exitUse those features with caution. For example, assigning a physical memory cap of 100MB to the global zone will surely cause problems...
| Argument or Option | Meaning |
|---|---|
| -s | Boot to the single-user milestone |
| -m <milestone> | Boot to the specified milestone |
| -i </path/to/init> | Boot the specified program as 'init'. This is only useful with branded zones. |
Allowed syntaxes include:
global# zoneadm -z myzone boot -- -s global# zoneadm -z yourzone reboot -- -i /sbin/myinit ozone# reboot -- -m verboseIn addition, these boot arguments can be stored with zonecfg, for later boots.
global# zonecfg -z myzone zonecfg:myzone> set bootargs="-m verbose" zonecfg:myzone> exit
Also, the privilege proc_priocntl can be added to a Container to enable the root user of that Container to change the scheduling class of its processes.
This also allows a Container to control its own network configuration, including routing, IP Filter, the ability to be a DHCP client, and others. The syntax is simple:
global# zonecfg -z myzone zonecfg:myzone> set ip-type=exclusive zonecfg:myzone> add net zonecfg:myzone:net> set physical=bge1 zonecfg:myzone:net> end zonecfg:myzone> exit
The latter ability requires more explanation. An existing challenge in the maintenance of zones is patching - each zone must be patched when a patch is applied. If the patch must be applied while the system is down, the downtime can be significant.
Fortunately, Live Upgrade can create an Alternate Boot Environment (ABE) and the ABE can be patched while the Original Boot Environment (OBE) is still running its Containers and their applications. After the patches have been applied, the system can be re-booted into the ABE. Downtime is limited to the time it takes to re-boot the system.
An additional benefit can be seen if there is a problem with the patch and that particular application environment. Instead of backing out the patch, the system can be re-booted into the OBE while the problem is investigated.
Solaris 10 8/07 contains a new framework called Branded Zones. This framework enables the creation and installation of Containers that are not the default 'native' type of Containers, but have been tailored to run 'non-native' applications.
This was only a brief introduction to these many new and improved features. Details are available in the usual places, including http://docs.sun.com, http://sun.com/bigadmin, and http://www.sun.com/software/solaris/utilization.jsp.
One question to Live Upgrade. Is Live Upgrade supported, if non-global zones are on ZFS e.g. /export/zones ?
Posted by Thorleif Wiik on September 11, 2007 at 04:42 PM EDT #
In order for Live Upgrade to upgrade non-global zones on ZFS file systems, it must know how to use ZFS. It doesn't yet, but that is in development.
Posted by 192.18.101.5 on September 11, 2007 at 05:04 PM EDT #