
Thursday February 01, 2007
Well, the Good Wine's A-Flowin'
Jerry
has
posted
an excellent summary of the work he and Steve Lawrence architected as part of Project Duckhorn. The project encompassed a number of new features which are detailed
here
(see the quoted proposal in
Jeff Victor's
message),
here
and
here.
As Jerry discusses, zones and OpenSolaris resource management previously could be combined but it required quite a bit of knowledge to integrate these technologies together. The team was able to define some sensible yet powerful
abstractions
that really bring to fruition the notion of a
container.
Technorati Tag:
Containers
Technorati Tag:
OpenSolaris
Technorati Tag:
Solaris
Technorati Tag:
Virtualization
Technorati Tag:
Zones
( Feb 01 2007, 03:45:24 PM PST )
Permalink

Tuesday January 09, 2007
Privilege (Set Me Free)
One of the perhaps lessor known features of
Solaris Containers
or
Zones
is that applications running inside these virtualized environments
execute with less privileges than applications executing outside the
container. This is enforced through the
Solaris Privileges
framework which was also introduced in the
Solaris 10
release.
When comparing virtualization solutions, typically OS level
virtualization mechanisms like Zones or
FreeBSD Jails
are thought to provide less security than mechanisms where a machine
architecture is virtualized, such as with the family of products from
VMware
or with
paravirtualization
mechanisms such as
Xen,
in which the guest OS is ported to the virtualized machine
architecture. One reason for that is there is usually weaker
separation between virtualized OS environments since at several levels
in the kernel there is some sharing of data structures and code paths.
However in some cases, OS level virtualization provides an advantage
for certain aspects of security. For example, with Solaris Containers
the privilege mechanism in the kernel enforces limitations on the types
of operations an application can perform. Consider the case of the
ability to create or "plumb" a software networking interface using
ifconfig(1M)
or set an IP address on that interface. In some situations, one wants
to allow such operations inside a virtualized environment because a
particular application requires the ability to change an existing IP
address or to toggle an interface up or down. The ramification of
this, however, is that a malicious or naive user inside the environment
might change their IP address to something not expected with the
results ranging from disruption in the network topology to the
potential of
spoofing
another machine on the network. In addition, most applications do not
actually require the ability to change their environment's IP address
or create new network interfaces or even know the name of the interface
in their environment. Rather, they typically want one or more IPv4 or
IPv6 addresses which they can
bind(3SOCKET)
to.
In the case of Solaris Containers, the
privilege
to
set the IP address of an interface
is not given to any applications running inside a container and there
is no way for an application to escalate or grow the set of privileges
from those they started out with. The end result, in this example, is
that the root password or super-user privileges can be given to a user
inside a container but they will be unable to manipulate or affect the
topology of the network or impersonate another machine and potentially
gain access to its network traffic.1
Until recently, the set of privileges a container's applications were
limited to was fixed. However starting with both
Solaris Express 5/06
and
Solaris 10 11/06,
the global zone administrator can change this set of privileges. What
this means from a practical point of view is that containers can become
more capable by adding some of the privileges that are not usually
present. An example here might be the ability to run
DTrace
from within the container2.
Dan
provided an excellent writeup on the details for doing so
here.
As another example, by adding some additional privileges to the
container's default privilege set, a Network Time Protocol (NTP) server
can be deployed in the container which is preferable from a security
point of view, especially for a server that might be facing a hostile
Internet. In order to configure the container appropriately, the list
of privileges that it requires needs to be known. Solaris 10 currently
ships with the
3-5.93e
version of
xntpd(1M),
which is the daemon that implements the NTP server capability. This
particularly daemon actually can take advantages of three
privileges that are not normally present within a container. The
first, perhaps obviously, is the privilege to change the system clock -
sys_time.
With the addition of this privilege, xntpd will be able to successfully
set the system clock when it needs to.
However it also turns out that the daemon tries to both lock down its
memory pages and also run in the real-time scheduling class. It does
this so that the daemon can maintain accurate time particularly in the
face of other system activity. These two operations are also covered
by unique privileges -
proc_lock_memory
and
proc_priocntl.
Tying these privileges3
together, we can take an existing container and configure it to be a
NTP server. In this example, Sun's internal network routes IP
multicast and so I will leverage that to connect to the network's NTP
servers listening on the standard multicast address of 224.0.1.1 for
NTP:
For example, consider this update to the configuration of the zone
myzone:
global# zonecfg -z myzone
zonecfg:myzone> set limitpriv=default,proc_lock_memory,proc_priocntl,sys_time
zonecfg:myzone> commit
zonecfg:myzone> exit
global# zoneadm -z myzone boot
Then from within the newly booted container, I will set up the
configuration of the server itself and start the service:
myzone# cp -p /etc/inet/ntp.client /etc/inet/ntp.conf
myzone# svcadm enable network/ntp
The property that was set in the container's configuration,
limitpriv, consists of a list of privileges similar to the form
expected by
user_attr(4)
and
priv_str_to_set(3C).
In this particular example, the container's privilege set is limited to
the standard default set of privileges plus the three additional
privileges required by the NTP server.
It is worthwhile to note that privileges can also be taken away
by preceding them with an exclamation mark (!) or a minus sign (-).
This can allow a container to be booted in which applications have even
fewer privileges than usual. For example, to take away the ability to
generate ICMP datagrams from the zone named twilight, the global
zone administrator would configure the container as follows:
global# zonecfg -z twilight set limitpriv=default,!net_icmpaccess
There are a few restrictions on what privilges can be added to a
container as well as some concerning which ones can be removed. For
more details, please see the
original proposal
and the ensuing discussion on the
zones-discuss mailing list.
This proposal and many others concerning containers and other parts of
OpenSolaris have benefited greatly from the participation of the
OpenSolaris Zones Community.
Information about each of these proposals can be found
here.
1
The actual privilege check in the kernel for this particular case
occurs
here.
2
The ability to use DTrace inside a non-global zone is at the present
time restricted to Solaris Express as some additional changes to DTrace
were required. However, these changes should be appearing in an
upcoming Solaris 10 release.
3
Starting with Solaris Express 11/06, the privilege to lock memory has
actually been added to the container's default set. This is because
additional resource controls
have been added that can limit the amount of memory applications within
a container can lock so it is no longer necessary to make this
privilege an optional one.
Technorati Tag:
Containers
Technorati Tag:
OpenSolaris
Technorati Tag:
Solaris
Technorati Tag:
Virtualization
Technorati Tag:
Zones
( Jan 09 2007, 05:08:42 PM PST )
Permalink

Wednesday June 14, 2006
In My Reflecting Pool
A year ago today, the realization of something that many of us at Sun
had pushed and wished for finally came true - the open sourcing of the
Solaris
source code and the creation of the
OpenSolaris Project.
On that date, I
wrote
about one aspect of OpenSolaris that I had been working on for a number
of years, but what really was even more exciting than the technology
being released was the possibilities for the future.
Reflecting after a year, I see a tremendous amount of progress
including accomplishments in areas, such as the
selection of a source code management (SCM) system,
which I dared not hope to be complete after one year's time. Many of
the changes that have taken place the past year represent fundamental
changes in the way Sun does Solaris development and though the
OpenSolaris community has a long way to grow, everyone should feel good
about how much has already taken place. And the fact that there are
already four distributions based on OpenSolaris including
Schillix,
BeleniX,
NexentaOS,
marTux
as well Sun's own
Solaris Express
is a reason to celebrate.
One of the areas of OpenSolaris that I was fortunate to have worked on
the past year was with a team working on a proposal on what the
OpenSolaris development process should look like. The team was led by
Teresa
and I was asked by
Andy
if I wanted to contribute to this effort. The team consisted of a
number of people both within Sun and outside including
John Beck,
Rich Teer,
Al Hopper,
Stephen Hahn,
Ed Hunter, Joe Kowalski,
Keith Wesolowski,
Casper Dik,
and
Bill Sommerfeld.
Although the
development process draft
that we eventually published does look in many ways like the
Software Development Framework
used within Sun for its product development, the process by which the
proposal itself was developed was entirely organic. The team initially
discussed what the scope of the proposal should be and examined the
high-level requirements of an operating system such as OpenSolaris.
These design principles and other
fundamentals
were something that was always kept at the forefront when we then
examined other open-source projects including
Apache,
FreeBSD,
Linux,
NetBSD
and
OpenOffice.
After reviewing other open-source projects and their development
processes, we brainstormed over the steps necessary to take an idea
from conception to realization, again taking into account the guiding
requirements discussed earlier. One very important notion that weighed
heavily on our discussions was that of "shrink to fit", where steps in
the process can be reduced or even eliminated when it makes sense. The
result is a fairly streamlined process that is meant to handle both the
introduction of large pieces of framework into OpenSolaris as well as
the simple bug fix.
The draft was
released
last November and we received many insightful comments from the
community. I would definitely encourage others who have not read the
draft to do so and provide comments to the above thread or on the
OpenSolaris
cab-discuss
forum.
As exciting the first year of OpenSolaris has been, it seems obvious
that the coming year is going be even more so. And as impressive as it
is having a hundred community integrations into OpenSolaris this first
year, I suspect that we will be seeing a far higher number in the
coming year along with the introduction of some large scale projects
where the community will be playing an larger part in the design,
implementation and integration phases.
Technorati Tag:
OpenSolaris
Technorati Tag:
Solaris
( Jun 14 2006, 09:57:51 AM PDT )
Permalink

Monday September 12, 2005
Three Conductors and Twenty-five Sacks of Mail
I would like to thank everyone that came out to the
Silicon Valley OpenSolaris Users Group (SVOSUG)
meeting on August 31st and for bearing with me as we struggled with
getting the laptop and projector to play nice with one another. The
slides
for the presentation are now available.
The questions and feedback on Zones were excellent and it was great to
see the level of interest in OpenSolaris. Special thanks to
Dan
and Allan of the Zones project team for taking notes and lending
support and of course, to the other
Alan
for all his work in organizing the meetings.
One frequently asked question which came up at the meeting is how can a
zone support a writable directory under /usr, such as
/usr/local, when the former is usually mounted read-only from
the global zone.
The easiest way to support such a directory is to add a
lofs(7FS) file system for the zone using
zonecfg(1M). One simply needs to specify a directory in the
global zone to serve as backing store for the zone's
/usr/local directory and then edit the zone's configuration as
follows:
global# mkdir -p /usr/local
global# mkdir -p /path/to/some/storage/local/twilight
global# zonecfg -z twilight
zonecfg:twilight> add fs
zonecfg:twilight:fs> set dir=/usr/local
zonecfg:twilight:fs> set special=/path/to/some/storage/local/twilight
zonecfg:twilight:fs> set type=lofs
zonecfg:twilight:fs> end
zonecfg:twilight> commit
zonecfg:twilight> exit
global#
The next time the zone boots, it will have its own writable
/usr/local directory.
Speaking of frequently asked questions,
Jeff
has compiled a
Zones and Containers FAQ
which provides a list of the common questions that have been asked
since Zones were introduced along with their answers. The FAQ along
with a great deal of other information can also be found on the
redesigned
OpenSolaris Zones Community
page, which was recently given a well needed makeover by Dan.
Technorati Tag:
Containers
Technorati Tag:
OpenSolaris
Technorati Tag:
Solaris
Technorati Tag:
Virtualization
Technorati Tag:
Zones
( Sep 12 2005, 08:16:34 PM PDT )
Permalink

Monday August 29, 2005
Lastly Through a Hogshead of Real Fire!
Tomorrow evening, August 30th at 7:30, at the
Silicon Valley OpenSolaris Users Group (SVOSUG)
meeting, I will have the pleasure of talking about Zones, the
virtualization software available in OpenSolaris. This month's meeting
will be held upstairs from the auditorium at Sun's Santa Clara campus -
directions are
available
.
In addition to the talk (for which I will post slides soon), there will
be a panel discussion to discuss anything related to OpenSolaris and
hopefully there will be a status update of where things stand, which
build has been released and other related news.
Thanks again to
Alan
for organizing these meetings and to the community for attending and
bringing their questions, concerns and enthusiasm. We look forward
to seeing you at the user group meeting.
This is a stereo recording.
A splendid time is guaranteed for all.
Technorati Tag:
Containers
Technorati Tag:
OpenSolaris
Technorati Tag:
Solaris
Technorati Tag:
Virtualization
Technorati Tag:
Zones
( Aug 29 2005, 07:55:49 PM PDT )
Permalink

Tuesday June 14, 2005
These Boots are Made for Walkin'
One of the most gratifying and exciting aspects of the
OpenSolaris
project is a return (for me, at least) to working on operating system
design and research with the larger, open community. In another era
while I was an undergraduate at
Berkeley,
I was fortunate enough to see the 2.x and 4.x BSD development effort up
close and to see the larger community formed between the University and
external organizations that had
UNIX
source licenses. It was not an open source community, of course, but
it was a community none the less, and one that shared fixes, ideas and
other software projects built on top of the operating system. Our
hopes for OpenSolaris are that in addition to releasing operating
system source code that can be used for many different purposes, Sun
and the community will innovate together while maintaining the core
values that
Solaris
provides today.
One of the many pieces of OpenSolaris which is of personal interest is
the
Zones
virtualization technology introduced in Solaris 10. Zones provide a
lightweight but very efficient and flexible way of consolidating and
securing potentially complex workloads. There is a wealth of technical
information about Zones in Solaris available at the
OpenSolaris Zones Community
and the
BigAdmin System Administration Portal.
One of the things about Zones that people notice right away is how
quickly they boot. Of course, booting a zone does not cause a system
to run its power-on self-test (POST) or require the same amount of
initialization
that takes place when the hardware itself is booting. However, I
thought it might be useful to do a tour of the dance that takes place
when a non-global zone is booted. I call it a dance since there
is a certain amount of interplay between the primary players -
zoneadm,
zoneadmd
and the kernel itself - that warrants an explanation.
Although the virtualization that Zones provides is spread throughout
the source code, the primary implementation in the kernel can be found
in
zone.c.
As with many OpenSolaris frameworks, there is a big block
comment at the start of the file which is very useful for
understanding the lay of the land with respect to the code. Besides
describing the data structures and locking strategy used for Zones,
there is a description of the states a zone can be in from the kernel's
perspective and at what points a zone may transition from one state to
another. For brevity, only the states covered during a zone boot are
listed here
...
*
* Zone States:
*
* The states in which a zone may be in and the transitions are as
* follows:
*
* ZONE_IS_UNINITIALIZED: primordial state for a zone. The partially
* initialized zone is added to the list of active zones on the system but
* isn't accessible.
*
* ZONE_IS_READY: zsched (the kernel dummy process for a zone) is
* ready. The zone is made visible after the ZSD constructor callbacks are
* executed. A zone remains in this state until it transitions into
* the ZONE_IS_BOOTING state as a result of a call to zone_boot().
*
* ZONE_IS_BOOTING: in this shortlived-state, zsched attempts to start
* init. Should that fail, the zone proceeds to the ZONE_IS_SHUTTING_DOWN
* state.
*
* ZONE_IS_RUNNING: The zone is open for business: zsched has
* successfully started init. A zone remains in this state until
* zone_shutdown() is called.
...
It is important to note here that there are a number of zone states not
represented here - those are for zones which do not (yet) have a kernel
context. An example of such a state is for a zone that is in the
process of being installed. These states are defined in
libzonecfg.h.
One of the players in the zone boot dance is the zoneadmd
process which runs in the global zone and performs a number of critical
tasks. Although much of the virtualization for a zone is implemented
in the kernel, zoneadmd manages a great deal of a zone's
infrastructure as outlined in
zoneadmd.c
/*
* zoneadmd manages zones; one zoneadmd process is launched for each
* non-global zone on the system. This daemon juggles four jobs:
*
* - Implement setup and teardown of the zone "virtual platform": mount and
* unmount filesystems; create and destroy network interfaces; communicate
* with devfsadmd to lay out devices for the zone; instantiate the zone
* console device; configure process runtime attributes such as resource
* controls, pool bindings, fine-grained privileges.
*
* - Launch the zone's init(1M) process.
*
* - Implement a door server; clients (like zoneadm) connect to the door
* server and request zone state changes. The kernel is also a client of
* this door server. A request to halt or reboot the zone which originates
* *inside* the zone results in a door upcall from the kernel into zoneadmd.
*
* One minor problem is that messages emitted by zoneadmd need to be passed
* back to the zoneadm process making the request. These messages need to
* be rendered in the client's locale; so, this is passed in as part of the
* request. The exception is the kernel upcall to zoneadmd, in which case
* messages are syslog'd.
*
* To make all of this work, the Makefile adds -a to xgettext to extract *all*
* strings, and an exclusion file (zoneadmd.xcl) is used to exclude those
* strings which do not need to be translated.
*
* - Act as a console server for zlogin -C processes; see comments in zcons.c
* for more information about the zone console architecture.
*
* DESIGN NOTES
*
* Restart:
* A chief design constraint of zoneadmd is that it should be restartable in
* the case that the administrator kills it off, or it suffers a fatal error,
* without the running zone being impacted; this is akin to being able to
* reboot the service processor of a server without affecting the OS instance.
*/
When a user wishes to boot a zone, zoneadm will attempt to
contact zoneadmd
via a
door
that is used by all three components for a number of things including
coordinating zone state changes. If for some reason zoneadmd is not
running, an attempt will be made to
start it.
Once that has completed, zoneadm tells zoneadmd to
boot the zone
by supplying the appropriate
zone_cmd_arg_t
request via a door call. It is worth noting that the same door is used
by zoneadmd to return messages back to the user executing zoneadm and
also as a way for zoneadm to indicate to zoneadmd the
locale
of the user executing the boot command so that messages are localized
appropriately.
Looking at the
door server
that zoneadmd implements, there is some straightforward sanity checking
that takes place on the argument passed via the door call as well as
the use of some of the technology that came in with the introduction of
discrete privileges in Solaris 10.
if (door_ucred(&uc) != 0) {
zerror(&logsys, B_TRUE, "door_ucred");
goto out;
}
eset = ucred_getprivset(uc, PRIV_EFFECTIVE);
if (ucred_getzoneid(uc) != GLOBAL_ZONEID ||
(eset != NULL ? !priv_ismember(eset, PRIV_SYS_CONFIG) :
ucred_geteuid(uc) != 0)) {
zerror(&logsys, B_FALSE, "insufficient privileges");
goto out;
}
kernelcall = ucred_getpid(uc) == 0;
/*
* This is safe because we only use a zlog_t throughout the
* duration of a door call; i.e., by the time the pointer
* might become invalid, the door call would be over.
*/
zlog.locale = kernelcall ? DEFAULT_LOCALE : zargp->locale;
Using
door_ucred,
the user credential can be checked to determine whether the request
originated in the global zone,1 whether the user making the request had
sufficient privilege to do so2 and whether the request was a result of
an upcall from the kernel. That last piece of information is used,
among other things, to determine whether or not messages should be
localized by
localize_msg.
It is within the door server implemented by zoneadmd that transitions
from one state to another take place. There are two states from which
a zone boot is permissible, installed and ready. From
the installed state,
zone_ready
is used to create and bring up the zone's
virtual platform
that consists of the zone's kernel context (created using
zone_create)
as well as the zone's specific file systems (including the root file
system) and logical networking interfaces. If a zone is supposed to be
bound to a non-default
resource pool,
then that also takes place as part of this state transition.
When a zone's kernel context is created using zone_create, a
zone_t
structure is allocated and initialized. At this time, the the status
of the zone is set to
ZONE_IS_UNINITIALIZED.
Some of the initialization that takes place is in order to set up the
security boundary which isolates processes running inside a zone. For
example, the
vnode_t
of the zone's
root file system,
the zone's
kernel credentials
and the
privilege sets
of the zone's future processes are all initialized here.
Before returning back to the zoneadmd command,
zone_create adds the primordial zone to a doubly-linked list
and two hash tables,
3
one hashed by
zone name
and the other by
zone ID.
These data structures are protected by the
zonehash_lock
mutex which is then dropped after the zone has been added. Finally a
new kernel process is then created,
zsched,
which is where kernel threads for this zone are parented. After
calling
newproc
to create this kernel process, zone_create will wait using
zone_status_wait
until the zsched kernel process has completed initializing the
zone and has set its status to
ZONE_IS_READY.
Since the user structure of the process initialization has not been
completed, the first thing the new zsched process does is
finish that initialization along with reparenting itself to PID 1 (the
global zone's
init,
process). And since the future processes to be run within the new zone
may be subject to resource controls, that initialization takes place
here in the context of zsched.
After grabbing the
zone_status_lock
mutex in order to set the status to ZONE_IS_READY,
zsched will then suspend itself, waiting for the zone's status
to been changed to
ZONE_IS_BOOTING.
Once the zone is in the ready state, zone_create
returns control back to zoneadmd and the door server continues
the boot process by calling
zone_bootup
This initializes the zone's console device, mounts some of the standard
OpenSolaris file systems like /proc and /etc/mnttab
and then uses the
zone_boot
system call to attempt to boot the zone.
As the comment that introduces zone_boot points out, most of
the heavy lifting has already been done either by zoneadmd or
by the work the kernel has done through zone_create. As this
point, zone_boot saves the requested
boot arguments
after grabbing the zonehash_lock mutex and then further grabs
the zone_status_lock mutex in order to set the zone status to
ZONE_IS_BOOTING. After dropping both locks, it
is zone_boot that suspends itself waiting for the zone status
is be set to
ZONE_IS_RUNNING.
Since the zone's status has now been set to ZONE_IS_BOOTING,
zsched now continues where it left off after it has suspended
itself with its call to
zone_status_wait_cpr
After checking that the current zone status is indeed ZONE_IS_BOOTING,
a new kernel process is created in order to run init in the
zone. This process calls
zone_icode
which is analogous to the traditional
icode
function that is used to start init in the global zone and in
traditional UNIX environments. After doing some zone-specific
initialization, each of the icode functions end up calling
exec_init
to actually exec the init process after copying out the path
to the executable, /sbin/init, and the boot arguments. If the
exec is successful, zone_icode will set the zone's status to
ZONE_IS_RUNNING and in the process, zone_boot will
pick up where it had been suspended. At this point, the value of
zone_boot_err
indicates whether the zone boot was successful or not and is used to
set the global errno value for zoneadmd.
There are two additional things to note with the zone's transition to
the running state. First of all,
audit_put_record
is called to generate an event for the Solaris auditing system so that
it's known which user executed which command to boot a
zone. In addition, there is an internal zoneadmd event
generated to indicate on the zone's console device that the zone is
booting. This internal stream of
events
is sent by the door server to the zone console subsystem for all state
transitions, so that the console user can see which state the zone is
transitioning to.
1
This is a bit of defensive programming since unless the global zone
administrator were to make the door in question available through the
non-global zone's own file system, there would be no way for a
privileged user in a non-global zone to actually access door used by
zoneadmd.
2
zoneadm itself checks that the user attempting to boot a zone
has the necessary privilege but it's possible some other privileged
process in the global zone might have access to the door but
lack the necessary
PRIV_SYS_CONFIG
privilege.
3
The
doubly-linked list implementation
was integrated by
Dave
while
Dan
was responsible for the
hash table implementation.
Both of these are worth examining in the OpenSolaris source base.
Technorati Tag:
Containers
Technorati Tag:
OpenSolaris
Technorati Tag:
Solaris
Technorati Tag:
Virtualization
Technorati Tag:
Zones
( Jun 14 2005, 02:26:03 PM PDT )
Permalink

Thursday August 26, 2004
The Pump Don't Work 'Cause the Vandals Took the Handles
Whether you are trying to figure out why the pump "don't
work" or you are trying to protect the pump from the iVandals out
in the real world, Solaris 10 can help you deal with these and many
other situations.
DTrace
is known as the technology which provides concise answers to
arbitrary questions. It has been used within Sun and by our
customers to improve the performance of the operating system and
applications alike and to help find the root cause of bugs which
previously were difficult, if not near impossible to find using
traditional debugging techniques. It allows such analysis, safely, on
production systems without requiring recompilation of the operating
system or the application and without having to recreate the production
environment where a problem has been observed.
Zones
can help isolate application environments from one another such that
even if one becomes a privileged user in one of the application
environments, the damage one can cause on purpose or inadvertently is
isolated to that one zone or container. The degree of isolation is
such that each zone can be rebooted independently without affecting any
other zones on the system or the machine as a whole (and the zones boot
very quickly - for example, on a
Sun Fire V60x
a zone can boot in as little as eight seconds, from a halted state to
login prompt.)
Finally, the
Predictive Self-Healing
technology can help customers maximize the availability of their
computing resources, and to handle faults that may occur whether in
software or in hardware. In the past, typically problems resulted in a
number of messages appearing in the system log which left both
customers and often service personnel scratching their heads, trying to
make sense of these symptoms. Predictive Self-Healing instead observes
generated error events or telemetry and once sufficient
telemetry has been been obtained, diagnosis engines can generate
a single fault event to agents which can respond to the
diagnosed fault.
Not too long ago, a number of engineers who designed these new
frameworks participated in three
Sun Expert Exchanges
where over a live chat system we were able to answer technical
questions about these features and get valuable feedback from
customers. The transcript of the DTrace exchange in which
Adam
and
Bryan
and others participated can be found
here.
About a month later,
Andy
and
Dan
and I participated in an exchange on Zones which not only was a great
deal of fun but provided us with a lot of interesting input and we hope
was helpful to both current zones users and interested parties alike.
The transcript for that exchange is available
here.
And about a week ago, some of the architects of the Predictive Self
Healing functionality participated in their own exchange and its
transcript is available
here.
Transcripts from other Expert Exchanges are available as well
here
under Archives and registration is open for a number of other
planned sessions including one on ZFS (The Zettabyte Filesystem) and
the many fundamental security enhancements that have been made to
Solaris 10.
Technorati Tag:
Containers
Technorati Tag:
Solaris
Technorati Tag:
Virtualization
Technorati Tag:
Zones
( Aug 26 2004, 10:26:17 PM PDT )
Permalink

Friday August 06, 2004
What's New Pussycat?
Support for
Zones
was initially released in the Software Express for Solaris 2/04
release. Since then, we have been working on adding a number of
enhancements as well as fixing a number of bugs that have been
reported. Of course, the definitive source for what has changed in
each of the Solaris releases can be found in that release's
Solaris What's New
document. However, I thought it might be useful to summarize the Zones
enhancements that have been released since February and what is coming
in the upcoming Software Express for Solaris release.
In the Software Express for Solaris 7/04 release, support for Zones
being
NFSv4
clients was added. In that particular build, the default NFS version
was still three (3) but this can be changed by editing the file
/etc/default/nfs and uncommenting and changing the
NFS_CLIENT_VERSMAX parameter to 4. For more information,
please see the
nfs(4)
manual page. Note that in the upcoming Software Express for Solaris
release, the default NFS version will be four (4) although the system
will negotiate a lower version as necessary.
On a similar note, the statistics reported by
nfsstat(1M)
have been virtualized on a per-zone basis.
With the power of the
Tecla command-line editing library,
the
zonecfg(1M)
command now supports command line editing, command history and tab
completion within interactive mode. This new functionality can make it
far easier to enter or edit a zone's configuration. In addition, each
user can customize their own particular set of key bindings through the
file .teclarc in their home directory.
One of the other new features in this release is the ability to specify
a richer set of file systems through
zonecfg(1M)
Previously, the administrator could specify a restricted set of file
systems such as
lofs(7FS)
or
tmpfs(7FS).
This restriction to a large degree is lifted in this release, allowing
the administrator to specify file systems like
ufs(7FS).
For example, consider this update to the configuration of the zone
myzone:
global# zonecfg -z myzone
zonecfg:myzone> add fs
zonecfg:myzone:fs> set dir=/source
zonecfg:myzone:fs> set special=/dev/md/dsk/d4
zonecfg:myzone:fs> set raw=/dev/md/rdsk/d4
zonecfg:myzone:fs> set type=ufs
zonecfg:myzone:fs> end
zonecfg:myzone> commit
zonecfg:myzone> exit
global#
What we have added to the configuration is a UFS file system that will
automatically be mounted as /source when the zone is booted.
The partition used is a
Solaris Volume Manager
metadevice that was created and initialized from within the global
zone.
Finally, the Software Express for Solaris 7/04 release includes two
enhancements to the
ps(1)
command to add zone information to any current command output and to
filter information based on one or more zones. The new -Z
option adds a ZONE column to any report generated by
ps(1)
while the -z zidlist option prints only those processes belong
to the zones specified in the comma-separated zidlist (zones can
be listed either by name or by their ID number.)
In the upcoming Software Express for Solaris release, there are three
additional enhancements being introduced for Zones. As a teaser, I
will briefly describe them now and cover them later in more depth when
the release is made available.
When
resource pools
have been enabled, the first enhancement more accurately reports the
processor resources available to a zone and their statistics as
reported by commands such as
iostat(1M)
,
mpstat(1M)
,
vmstat(1M)
,
psrinfo(1M)
and
sar(1).
In a similar manner, library routines such as
getloadavg(3C)
and
sysconf(3C)
(the latter when invoked for _SC_NPROCESSORS_CONF or
_SC_NPROCESSORS_ONLN) only return information for the
processors in the set a particular zone is bound to.
In addition, a new
resource control
has been introduced, zone.max-lwps, which allows a global zone
administrator to limit the number of lightweight processes or LWPs that
can be created inside a zone. From within the zone itself, another new
resource control, project.max-lwps, can be used to further
divide the total number of LWPs amongst the
projects
defined in the zone.
Finally,
Solaris Auditing
can now be configured for zones in a number of different ways. The
global zone administrator can specify whether the whole system should
be audited as a whole or whether each zone can be audited separately.
In the latter case, each zone has its own audit configuration and that
zone's administrator can configure and process their audit trails
independently from the other zones on the system.
We welcome hearing about your experiences or problems with using Zones
and of any feature enhancements you would like to see.
Technorati Tag:
Containers
Technorati Tag:
Solaris
Technorati Tag:
Virtualization
Technorati Tag:
Zones
( Aug 06 2004, 08:51:15 AM PDT )
Permalink

Tuesday August 03, 2004
Begin the Beguine
To be perfectly honest, it is with some trepidation that I have started
this blog. While I marvel at this new opportunity to be a part of
these conversations with the various Sun communities, I worry that it
will be difficult time-wise to stay involved. Perhaps I also share
some of the feelings expressed by
Casper
when he says that that it
feels like talking into thin air.
In any case, this is a wonderful opportunity for all of us to hear from
you about what concerns you have and how we can make Solaris
that
nexus of innovation
as well as a way of solving your computing needs.
Briefly, I have been working at Sun on Solaris for a little over eight
years now, initially in the networking group responsible for the TCP/IP
stack and related technologies and for the past couple of years, in the
area of server virtualization and resource management working inside a
group with
Andy
and
Dan
and a number of others in introducing Zones into Solaris.
Zones provide a new operating system abstraction for partitioning
systems so that multiple applications can be run in isolation from one
another, while perhaps being administered by different privileged
users. It borrows in a number of ways from the concepts introduced by
FreeBSD Jails
but extends it to be full integrated with the features that Solaris
offers. In my opinion, the most exciting point of integration is with
the
Solaris Resource Management
framework which was introduced in Solaris 8 and greatly enhanced in
Solaris 9 and 10 because combined they provide the necessary
characteristics to fully isolate applications and workloads from one
another. Together, these two technologies form the basis of N1 Grid
Containers which are supported on all system that Solaris supports,
from single-processor x86 laptops, to large, multi-processor servers.
For a brief but semi-technical overview of Solaris Zones, a good
starting point is the
paper
we presented at a work-in-progress session at the recent
USENIX VM '04
conference. For more information in general about Zones, the
Zones BigAdmin site
provides a great deal of information including a pointer to the latest
documentation on these technologies and an active discussion forum.
And to repeat a popular refrain, we encourage you to not just read
about Zones and the other new functionality in Solaris 10, but to
experience it yourself by downloading it from the
Software Express for Solaris site.
Technorati Tag:
Containers
Technorati Tag:
Solaris
Technorati Tag:
Virtualization
Technorati Tag:
Zones
( Aug 03 2004, 11:06:06 PM PDT )
Permalink
|