Containers in SX build 56
The many Resource Management (RM) features in Solaris
have been developed and evolved over the course of years and several releases.
We have resource controls, resource pools, resource capping
and the Fair Share Scheduler (FSS). We have rctls, projects, tasks,
cpu-shares, processor sets and the rcapd(1M). All of these features
have different commands and syntax to configure the
feature. In some cases, particularly with resource pools, the
syntax is quite complex and long sequences of commands are needed
to configure a pool. When you first look at RM it is not immediately
clear when to use one feature vs. another or if some combination
of these features is needed to achieve the RM objectives.
In Solaris 10 we introduced Zones, a lightweight system virtualization capability. Marketing coined the term 'containers' to refer to a combination of Zones and RM within Solaris. However, the integration between the two was fairly weak. Within Zones we had the 'rctl' configuration option, which you could use to set a couple of zone specific resource controls, and we had the 'pool' property which could be used to bind the zone to an existing resource pool, but that was it. Just setting the 'zone.cpu-shares' rctl wouldn't actually give you the right cpu shares unless you also configured the system to use FSS. But, that was a separate step and easily overlooked. Without the correct configuration of these various, disparate components even a simple test, such as a fork bomb within a zone, could disrupt the entire system.
As users started experimenting with Zones we found that many of them were not leveraging the RM capabilities provided by the system. We would get dinged in evaluations because Zones, without a correct RM configuration, didn't provide all of the containment users needed. We always expected Zones and RM to be used together, but due the the complexity of the RM features and the loose integration between the two, we were seeing that few Zones users actually had a proper RM configuration. In addition, our RM for memory control was limited to rcapd running within a zone and capping RSS on projects. This wasn't really adequate.
About 9 months ago the Zones engineering team started a project to try to improve this situation. We didn't want to just paper over the complexity with things like a GUI or wizards, so it took us quite a bit of design before we felt like we hit upon some key abstractions that we could use to truly simplify the interaction between the two components. Eventually we settled upon the idea of organizing the RM features into 'dedicated' and 'capped' configurations for the zone. We enhanced resource pools to add the idea of a 'temporary pool' which we could dynamically instantiate when a zone boots. We enhanced rcapd(1M) so that we could do physical memory capping from the global zone. Steve Lawrence did a lot of work to improve resident set size (RSS) accounting as well as adding new rctls for maximum swap and locked memory. These new features significantly improve RM of memory for Zones. We then enhanced the Zones infrastructure to automatically do the work to set up the various RM features that were configured for the zone. Although the project made many smaller improvements, the key ideas are the two new configuration options in zonecfg(1M). When configuring a zone you can now configure 'dedicated-cpu' and 'capped-memory'. Going forward, as additional RM features are added, we anticipate this idea will evolve gracefully to add 'dedicated-memory' and 'capped-cpu' configuration. We also think this concept can be easily extended to support RM features for other key parts of the system such as the network or storage subsystem.
Here is our simple diagram of how we eventually unified the RM view within Zones.
With these enhancements, it is now almost trivial to configure RM for a zone. For example, to configure a resource pool with a set of up to four cpu's, all you do in zonecfg is:
Over the course of the project we discussed these ideas within the opensolaris Zones community where we benefited from much good input which we used in the final design and implementation. The full details of the project are available here and here.
This work is available in Solaris Express build 56 which was just posted. Hopefully folks using Zones will get a chance to try out the new features and let us know what they think. All of the core engineering team actively participates in the zones discuss list and we're happy to try to answer any questions or just hear your thoughts.
In Solaris 10 we introduced Zones, a lightweight system virtualization capability. Marketing coined the term 'containers' to refer to a combination of Zones and RM within Solaris. However, the integration between the two was fairly weak. Within Zones we had the 'rctl' configuration option, which you could use to set a couple of zone specific resource controls, and we had the 'pool' property which could be used to bind the zone to an existing resource pool, but that was it. Just setting the 'zone.cpu-shares' rctl wouldn't actually give you the right cpu shares unless you also configured the system to use FSS. But, that was a separate step and easily overlooked. Without the correct configuration of these various, disparate components even a simple test, such as a fork bomb within a zone, could disrupt the entire system.
As users started experimenting with Zones we found that many of them were not leveraging the RM capabilities provided by the system. We would get dinged in evaluations because Zones, without a correct RM configuration, didn't provide all of the containment users needed. We always expected Zones and RM to be used together, but due the the complexity of the RM features and the loose integration between the two, we were seeing that few Zones users actually had a proper RM configuration. In addition, our RM for memory control was limited to rcapd running within a zone and capping RSS on projects. This wasn't really adequate.
About 9 months ago the Zones engineering team started a project to try to improve this situation. We didn't want to just paper over the complexity with things like a GUI or wizards, so it took us quite a bit of design before we felt like we hit upon some key abstractions that we could use to truly simplify the interaction between the two components. Eventually we settled upon the idea of organizing the RM features into 'dedicated' and 'capped' configurations for the zone. We enhanced resource pools to add the idea of a 'temporary pool' which we could dynamically instantiate when a zone boots. We enhanced rcapd(1M) so that we could do physical memory capping from the global zone. Steve Lawrence did a lot of work to improve resident set size (RSS) accounting as well as adding new rctls for maximum swap and locked memory. These new features significantly improve RM of memory for Zones. We then enhanced the Zones infrastructure to automatically do the work to set up the various RM features that were configured for the zone. Although the project made many smaller improvements, the key ideas are the two new configuration options in zonecfg(1M). When configuring a zone you can now configure 'dedicated-cpu' and 'capped-memory'. Going forward, as additional RM features are added, we anticipate this idea will evolve gracefully to add 'dedicated-memory' and 'capped-cpu' configuration. We also think this concept can be easily extended to support RM features for other key parts of the system such as the network or storage subsystem.
Here is our simple diagram of how we eventually unified the RM view within Zones.
| dedicated | capped
---------------------------------
cpu | temporary | cpu-cap
| processor | rctl*
| set |
---------------------------------
memory | temporary | rcapd, swap
| memory | and locked
| set* | rctl
*
memory sets
and
cpu caps
are under development but are not yet part of Solaris.
With these enhancements, it is now almost trivial to configure RM for a zone. For example, to configure a resource pool with a set of up to four cpu's, all you do in zonecfg is:
zonecfg:my-zone> add dedicated-cpu zonecfg:my-zone:dedicated-cpu> set ncpus=1-4 zonecfg:my-zone:dedicated-cpu> set importance=10 zonecfg:my-zone:dedicated-cpu> endTo configure memory caps, you would do:
zonecfg:my-zone> add capped-memory zonecfg:my-zone:capped-memory> set physical=50m zonecfg:my-zone:capped-memory> set swap=128m zonecfg:my-zone:capped-memory> set locked=10m zonecfg:my-zone:capped-memory> endAll of the complexity of configuring the associated RM capabilities is then handled behind the scenes when the zone boots. Likewise, when you migrate a zone to a new host, these RM settings migrate too.
Over the course of the project we discussed these ideas within the opensolaris Zones community where we benefited from much good input which we used in the final design and implementation. The full details of the project are available here and here.
This work is available in Solaris Express build 56 which was just posted. Hopefully folks using Zones will get a chance to try out the new features and let us know what they think. All of the core engineering team actively participates in the zones discuss list and we're happy to try to answer any questions or just hear your thoughts.
Posted by savagex on February 01, 2007 at 01:18 PM PST #