Friday May 02, 2008
Friday May 02, 2008
This is particularly interesting for Sun's CMT systems - those systems based on the UltraSPARC-T1, -T2, and -T2+ (aka Niagara, Niagara 2, Niagara 2+). Those systems are well known for the high performance-per-watt characteristics, an important consideration as data centers exhaust their power capacity and the price of fossil fuels rise.
Solaris 8 (and 9) Containers can also take advantage of the impressive scalability of the Sun SPARC Enterprise M-series systems - from 4 to 64 dual-core SPARC CPUs. Because of the ability to mix Solaris 8 Containers and Solaris 9 Containers, alongside Solaris 10 Containers, you can move dozens of older SPARC systems into just a few new SPARC systems.
You can find product details, a videotaped demonstration, and free download at http://www.sun.com/software/solaris/containers/index.jsp.
Wednesday Apr 09, 2008
Since 2005, Solaris 10 has offered the Solaris Containers feature set, creating isolated virtual Solaris environments for Solaris 10 applications. Although almost all Solaris 8 applications run unmodified in Solaris 10 Containers, sometimes it would be better to just move an entire Solaris 8 system - all of its directories and files, configuration information, etc. - into a Solaris 10 Container. This has become very easy - just three commands.
Sun offers a Solaris Binary Compatibility Guarantee which demonstrates the significant effort that Sun invests in maintaining compatibility from one Solaris version to the next. Because of that effort, almost all applications written for Solaris 8 run unmodified on Solaris 10, either in a Solaris 10 Container or in the Solaris 10 global zone.
However, there are still some data centers with many Solaris 8 systems. In some situations it is not practical to re-test all of those applications on Solaris 10. It would be much easier to just move the entire contents of the Solaris 8 file systems into a Solaris Container and consolidate many Solaris 8 systems into a much smaller number of Solaris 10 systems.
For those types of situations, and some others, Sun now offers Solaris 8 Containers. These use the "Branded Zones" framework available in OpenSolaris and first released in Solaris 10 in August 2007. A Solaris 8 Container provides an isolated environment in which Solaris 8 binaries - applications and libraries - can run without modification. To a user logged in to the Container, or to an application running in the Container, there is very little evidence that this is not a Solaris 8 system.
The Solaris 8 Container technology rests on a very thin layer of software which performs system call translations - from Solaris 8 system calls to Solaris 10 system calls. This is not binary emulation, and the number of system calls with any difference is small, so the performance penalty is extremely small - typically less than 3%.
Not only is this technology efficient, it's very easy to use. There are five steps, but two of them can be combined into one:
Almost any Solaris 8 revision or patch level will work, but Sun strongly recommends applying the most recent patches to that system. The Solaris 10 system must be running Solaris 10 8/07, and requires the following minimum patch levels:
s10-system# pkgadd -d . SUNWs8brandr SUNWs8brandu SUNWs8p2vNow we can patch the Solaris 10 system, using the patches listed above.
After patches have been applied, it's time to archive the Solaris 8 system. In order to remove the "archive transfer" step I'll turn the Solaris 10 system into an NFS server and mount it on the Solaris 8 system. The archive can be created by the Solaris 8 system, but stored on the Solaris 10 system. There are several tools which can be used to create the archive: Solaris flash archive tools, cpio, pax, etc. In this example I used flarcreate, which first became available on Solaris 8 2/04.
s10-system# share /export/home/s8-archives
s8-system# mount s10-system:/export/home/s8-archives /mnt s8-system# flarcreate -S -n atl-sewr-s8 /mnt/atl-sewr-s8.flarCreation of the archive takes longer than any other step - 15 minutes to an hour, or even more, depending on the size of the Solaris 8 file systems.
With the archive in place, we can configure and install the Solaris 8 Container. In this demonstration the Container was "sys-unconfig'd" by using the -u option. The opposite of that is -p, which preserves the system configuration information of the Solaris 8 system.
s10-system# zonecfg -z test8
zonecfg:test8> create -t SUNWsolaris8
zonecfg:test8> set zonepath=/zones/roots/test8
zonecfg:test8> add net
zonecfg:test8:net> set address=129.152.2.81
zonecfg:test8:net> set physical=vnet0
zonecfg:test8:net> end
zonecfg:test8> exit
s10-system# zoneadm -z test8 install -u -a /export/home/s8-archives/atl-sewr-s8.flar
Log File: /var/tmp/test8.install.995.log
Source: /export/home/s8-archives/atl-sewr-s8.flar
Installing: This may take several minutes...
Postprocessing: This may take several minutes...
Result: Installation completed successfully.
Log File: /zones/roots/test8/root/var/log/test8.install.995.log
This step should take 5-10 minutes. After the Container has been
installed, it can be booted.
s10-system# zoneadm -z test8 boot s10-system# zlogin -C test8At this point I was connected to the Container's console. It asked the usual system configuration questions, and then rebooted:
[NOTICE: Zone rebooting] SunOS Release 5.8 Version Generic_Virtual 64-bit Copyright 1983-2000 Sun Microsystems, Inc. All rights reserved Hostname: test8 The system is coming up. Please wait. starting rpc services: rpcbind done. syslog service starting. Print services started. Apr 1 18:07:23 test8 sendmail[3344]: My unqualified host name (test8) unknown; sleeping for retry The system is ready. test8 console login: root Password: Apr 1 18:08:04 test8 login: ROOT LOGIN /dev/console Last login: Tue Apr 1 10:47:56 from vpn-129-150-80- Sun Microsystems Inc. SunOS 5.8 Generic Patch February 2004 # bash bash-2.03# psrinfo 0 on-line since 04/01/2008 03:56:38 1 on-line since 04/01/2008 03:56:38 2 on-line since 04/01/2008 03:56:38 3 on-line since 04/01/2008 03:56:38 bash-2.03# ifconfig -a lo0:1: flags=1000849At this point the Solaris 8 Container exists. It's accessible on the local network, existing applications can be run in it, or new software can be added to it, or existing software can be patched.mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 vnet0:1: flags=1000843 mtu 1500 index 2 inet 129.152.2.81 netmask ffffff00 broadcast 129.152.2.255
To extend the example, here is the output from the commands I used to limit this Solaris 8 Container to only use a subset of the 32 virtual CPUs on that Sun Fire T2000 system.
s10-system# zonecfg -z test8 zonecfg:test8> add dedicated-cpu zonecfg:test8:dedicated-cpu> set ncpus=2 zonecfg:test8:dedicated-cpu> end zonecfg:test8> exit bash-3.00# zoneadm -z test8 reboot bash-3.00# zlogin -C test8 Console: [NOTICE: Zone rebooting] SunOS Release 5.8 Version Generic_Virtual 64-bit Copyright 1983-2000 Sun Microsystems, Inc. All rights reserved Hostname: test8 The system is coming up. Please wait. starting rpc services: rpcbind done. syslog service starting. Print services started. Apr 1 18:14:53 test8 sendmail[3733]: My unqualified host name (test8) unknown; sleeping for retry The system is ready. test8 console login: root Password: Apr 1 18:15:24 test8 login: ROOT LOGIN /dev/console Last login: Tue Apr 1 18:08:04 on console Sun Microsystems Inc. SunOS 5.8 Generic Patch February 2004 # psrinfo 0 on-line since 04/01/2008 03:56:38 1 on-line since 04/01/2008 03:56:38Finally, to learn more about Solaris 8 Containers: For those who were counting, the "three commands" were, at a minimum, flarcreate, zonecfg and zoneadm.
Tuesday Apr 08, 2008
Solaris Containers have a 'zonepath' ('home') which can be a directory on the root file system or on a non-root file system. Until Solaris 10 8/07 was released, a local file system was required for this directory. Containers that are on non-root file systems have used UFS, ZFS, or VxFS. All of those are local file systems - putting Containers on NAS has not been possible. With Solaris 10 8/07, that has changed: a Container can now be placed on remote storage via iSCSI.
Each Container has its own root directory. Although viewed as the root directory from within that Container, that directory is also a non-root directory in the global zone. For example, a Container's root directory might be called /zones/roots/myzone/root in the global zone.
The configuration of a Container includes something called its "zonepath." This is the directory which contains a Container's root directory (e.g. /zones/roots/myzone/root) and other directories used by Solaris. Therefore, the zonepath of myzone in the example above would be /zones/roots/myzone.
The global zone administrator can choose any directory to be a Container's zonepath. That directory could just be a directory on the root partition of Solaris, though in that case some mechanism should be used to prevent that Container from filling up the root partition. Another alternative is to use a separate partition for that Container, or one shared among multiple Containers. In the latter case, a quota should be used for each Container.
Local file systems have been used for zonepaths. However, many people have strongly expressed a desire for the ability to put Containers on remote storage. One significant advantage to placing Containers on NAS is the simplification of Container migration - moving a Container from one system to another. When using a local file system, the contents of the Container must be transmitted from the original host to the new host. For small, sparse zones this can take as little as a few seconds. For large, whole-root zones, this can take several minutes - a whole-root zone is an entire copy of Solaris, taking up as much as 3-5 GB. If remote storage can be used to store a zone, the zone's downtime can be as little as a second or two, during which time a file system is unmounted on one system and mounted on another.
Here are some significant advantages to iSCSI over SANs:
Unfortunately, a Container cannot 'live' on an NFS server, and it's not clear if or when that limitation will be removed.
iSCSI is simply "SCSI communication over IP." In this case, SCSI commands and responses are sent between two iSCSI-capable devices, which can be general-purpose computers (Solaris, Windows, Linux, etc.) or specific-purpose storage devices (e.g. Sun StorageTek 5210 NAS, EMC Celerra NS40, etc.). There are two endpoints to iSCSI communications: the initiator (client) and the target (server). A target publicizes its existence. An initiator binds to a target.
The industry's design for iSCSI includes a large number of features, including security. Solaris implements many of those features. Details can be found:
In Solaris, the command iscsiadm(1M) configures an initiator, and the command iscsitadm(1M) configures a target.
The target system is an LDom on a T2000, and looks like this:
System Configuration: Sun Microsystems sun4v Memory size: 1024 Megabytes SUNW,Sun-Fire-T200 SunOS ldg1 5.10 Generic_127111-07 sun4v sparc SUNW,Sun-Fire-T200 Solaris 10 8/07 s10s_u4wos_12b SPARCThe initiator system is another LDom on the same T2000 - although there is no requirement that LDoms are used, or that they be on the same computer if they are used.
System Configuration: Sun Microsystems sun4v Memory size: 896 Megabytes SUNW,Sun-Fire-T200 SunOS ldg4 5.11 snv_83 sun4v sparc SUNW,Sun-Fire-T200 Solaris Nevada snv_83a SPARCThe first configuration step is the creation of the storage underlying the iSCSI target. Although UFS could be used, let's improve the robustness of the Container's contents and put the target's storage under control of ZFS. I don't have extra disk devices to give to ZFS, so I'll make some and use them for a zpool - in real life you would use disk devices here:
Target# mkfile 150m /export/home/disk0 Target# mkfile 150m /export/home/disk1 Target# zpool create myscsi mirror /export/home/disk0 /export/home/disk1 Target# zpool status pool: myscsi state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM myscsi ONLINE 0 0 0 /export/home/disk0 ONLINE 0 0 0 /export/home/disk1 ONLINE 0 0 0Now I can create a zvol - an emulation of a disk device:
Target# zfs list NAME USED AVAIL REFER MOUNTPOINT myscsi 86K 258M 24.5K /myscsi Target# zfs create -V 200m myscsi/jvol0 Target# zfs list NAME USED AVAIL REFER MOUNTPOINT myscsi 200M 57.9M 24.5K /myscsi myscsi/jvol0 22.5K 258M 22.5K -Creating an iSCSI target device from a zvol is easy:
Target# iscsitadm list target Target# zfs set shareiscsi=on myscsi/jvol0 Target# iscsitadm list target Target: myscsi/jvol0 iSCSI Name: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6 Connections: 0 Target# iscsitadm list target -v Target: myscsi/jvol0 iSCSI Name: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6 Alias: myscsi/jvol0 Connections: 0 ACL list: TPGT list: LUN information: LUN: 0 GUID: 0x0 VID: SUN PID: SOLARIS Type: disk Size: 200M Backing store: /dev/zvol/rdsk/myscsi/jvol0 Status: online
Configuring the iSCSI initiator takes a little more work. There are three methods to find targets. I will use a simple one. After telling Solaris to use that method, it only needs to know what the IP address of the target is.
Note that the example below uses "iscsiadm list ..." several times, without any output. The purpose is to show the difference in output before and after the command(s) between them.
First let's look at the disks available before configuring iSCSI on the initiator:
Initiator# ls /dev/dsk c0d0s0 c0d0s2 c0d0s4 c0d0s6 c0d1s0 c0d1s2 c0d1s4 c0d1s6 c0d0s1 c0d0s3 c0d0s5 c0d0s7 c0d1s1 c0d1s3 c0d1s5 c0d1s7We can view the currently enabled discovery methods, and enable the one we want to use:
Initiator# iscsiadm list discovery Discovery: Static: disabled Send Targets: disabled iSNS: disabled Initiator# iscsiadm list target Initiator# iscsiadm modify discovery --sendtargets enable Initiator# iscsiadm list discovery Discovery: Static: disabled Send Targets: enabled iSNS: disabledAt this point we just need to tell Solaris which IP address we want to use as a target. It takes care of all the details, finding all disk targets on the target system. In this case, there is only one disk target.
Initiator# iscsiadm list target
Initiator# iscsiadm add discovery-address 129.152.2.90
Initiator# iscsiadm list target
Target: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6
Alias: myscsi/jvol0
TPGT: 1
ISID: 4000002a0000
Connections: 1
Initiator# iscsiadm list target -v
Target: iqn.1986-03.com.sun:02:c8a82272-b354-c913-80f9-db9cb378a6f6
Alias: myscsi/jvol0
TPGT: 1
ISID: 4000002a0000
Connections: 1
CID: 0
IP address (Local): 129.152.2.75:40253
IP address (Peer): 129.152.2.90:3260
Discovery Method: SendTargets
Login Parameters (Negotiated):
Data Sequence In Order: yes
Data PDU In Order: yes
Default Time To Retain: 20
Default Time To Wait: 2
Error Recovery Level: 0
First Burst Length: 65536
Immediate Data: yes
Initial Ready To Transfer (R2T): yes
Max Burst Length: 262144
Max Outstanding R2T: 1
Max Receive Data Segment Length: 8192
Max Connections: 1
Header Digest: NONE
Data Digest: NONE
The initiator automatically finds the iSCSI remote storage, but
we need to turn this into a disk device. (Newer builds seem to not
need this step, but it won't hurt. Looking in /devices/iscsi will
help determine whether it's needed.)
Initiator# devfsadm -i iscsi Initiator# ls /dev/dsk c0d0s0 c0d0s3 c0d0s6 c0d1s1 c0d1s4 c0d1s7 c1t7d0s2 c1t7d0s5 c0d0s1 c0d0s4 c0d0s7 c0d1s2 c0d1s5 c1t7d0s0 c1t7d0s3 c1t7d0s6 c0d0s2 c0d0s5 c0d1s0 c0d1s3 c0d1s6 c1t7d0s1 c1t7d0s4 c1t7d0s7 Initiator# ls -l /dev/dsk/c1t7d0s0 lrwxrwxrwx 1 root root 100 Mar 28 00:40 /dev/dsk/c1t7d0s0 -> ../../devices/iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Ac8a82272-b354-c913-80f9-db9cb378a6f60001,0:aNow that the local device entry exists, we can do something useful with it. Installing a new file system requires the use of format(1M) to partition the "disk" but it is assumed that the reader knows how to do that. However, here is the first part of the format dialogue, to show that format lists the new disk device with its unique identifier - the same identifier listed in /devices/iscsi.
Initiator# format
Searching for disks...done
c1t7d0: configured with capacity of 199.98MB
AVAILABLE DISK SELECTIONS:
0. c0d0
/virtual-devices@100/channel-devices@200/disk@0
1. c0d1
/virtual-devices@100/channel-devices@200/disk@1
2. c1t7d0
/iscsi/disk@0000iqn.1986-03.com.sun%3A02%3Ac8a82272-b354-c913-80f9-db9cb378a6f60001,0
Specify disk (enter its number): 2
selecting c1t7d0
[disk formatted]
Disk not labeled. Label it now? no
Let's jump to the end of the partitioning steps, after assigning all of
the available disk space to partition 0:
partition> print Current partition table (unnamed): Total disk cylinders available: 16382 + 2 (reserved cylinders) Part Tag Flag Cylinders Size Blocks 0 root wm 0 - 16381 199.98MB (16382/0/0) 409550 1 unassigned wu 0 0 (0/0/0) 0 2 backup wu 0 - 16381 199.98MB (16382/0/0) 409550 3 unassigned wm 0 0 (0/0/0) 0 4 unassigned wm 0 0 (0/0/0) 0 5 unassigned wm 0 0 (0/0/0) 0 6 unassigned wm 0 0 (0/0/0) 0 7 unassigned wm 0 0 (0/0/0) 0 partition> label Ready to label disk, continue? yThe new raw disk needs a file system.
Initiator# newfs /dev/rdsk/c1t7d0s0
newfs: construct a new file system /dev/rdsk/c1t7d0s0: (y/n)? y
/dev/rdsk/c1t7d0s0: 409550 sectors in 16382 cylinders of 5 tracks, 5 sectors
200.0MB in 1024 cyl groups (16 c/g, 0.20MB/g, 128 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 448, 864, 1280, 1696, 2112, 2528, 2944, 3232, 3648,
Initializing cylinder groups:
....................
super-block backups for last 10 cylinder groups at:
405728, 406144, 406432, 406848, 407264, 407680, 408096, 408512, 408928, 409344
Back on the target:
Target# zfs list NAME USED AVAIL REFER MOUNTPOINT myscsi 200M 57.9M 24.5K /myscsi myscsi/jvol0 32.7M 225M 32.7M -Finally, the initiator has a new file system, on which we can install a zone.
Initiator# mkdir /zones/newroots Initiator# mount /dev/dsk/c1t7d0s0 /zones/newroots Initiator# zonecfg -z iscuzone iscuzone: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:iscuzone> create zonecfg:iscuzone> set zonepath=/zones/newroots/iscuzone zonecfg:iscuzone> add inherit-pkg-dir zonecfg:iscuzone:inherit-pkg-dir> set dir=/opt zonecfg:iscuzone:inherit-pkg-dir> end zonecfg:iscuzone> exit Initiator# zoneadm -z iscuzone install Preparing to install zoneThere it is: a Container on an iSCSI target on a ZFS zvol.. Creating list of files to copy from the global zone. Copying <2762> files to the zone. Initializing zone product registry. Determining zone package initialization order. Preparing to initialize <1162> packages on the zone. ... Initialized <1162> packages on zone. Zone is initialized. Installation of these packages generated warnings: The file contains a log of the zone installation.
You can use Solaris Live Upgrade to patch or upgrade a system with Containers. If the Containers are on a traditional file system which uses UFS (e.g. /, /export/home) LU will automatically do the right thing. Further, if you create a UFS file system on an iSCSI target and install one or more Containers on it, the ABE will also need file space for its copy of those Containers. To mimic the layout of the original BE you could use another UFS file system on another iSCSI target. The lucreate command would look something like this:
# lucreate -m /:/dev/dsk/c0t0d0s0:ufs -m /zones:/dev/dsk/c1t7d0s0:ufs -n newBE
Friday Mar 21, 2008
Here's another example of Containers that can manage their own affairs.
Sometimes you want to closely manage the devices that a Solaris Container uses. This is easy to do from the global zone: by default a Container does not have direct access to devices. It does have indirect access to some devices, e.g. via a file system that is available to the Container.
By default, zones use NICs that they share with the global zone, and perhaps with other zones. In the past these were just called "zones." Starting with Solaris 10 8/07, these are now referred to as "shared-IP zones." The global zone administrator manages all networking aspects of shared-IP zones.
Sometimes it would be easier to give direct control of a Container's devices to its owner. An excellent example of this is the option of allowing a Container to manage its own network interfaces. This enables it to configure IP Multipathing for itself, as well as IP Filter and other network features. Using IPMP increases the availability of the Container by creating redundant network paths to the Container. When configured correctly, this can prevent the failure of a network switch, network cable or NIC from blocking network access to the Container.
As described at docs.sun.com, to use IP Multipathing you must choose two network devices of the same type, e.g. two ethernet NICs. Those NICs are placed into an IPMP group through the use of the command ifconfig(1M). Usually this is done by placing the appropriate ifconfig parameters into files named /etc/hostname.<NIC-instance>, e.g. /etc/hostname.bge0.
An IPMP group is associated with an IP address. Packets leaving any NIC in the group have a source address of the IPMP group. Packets with a destination address of the IPMP group can enter through either NIC, depending on the state of the NICs in the group.
Delegating network configuration to a Container requires use of the new IP Instances feature. It's easy to create a zone that uses this feature, making this an "exclusive-IP zone." One new line in zonecfg(1M) will do it:
zonecfg:twilight> set ip-type=exclusiveOf course, you'll need at least two network devices in the IPMP group. Using IP Instances will dedicate these two NICs to this Container exclusively. Also, the Container will need direct access to the two network devices. Configuring all of that looks like this:
global# zonecfg -z twilight zonecfg:twilight> create zonecfg:twilight> set zonepath=/zones/roots/twilight zonecfg:twilight> set ip-type=exclusive zonecfg:twilight> add net zonecfg:twilight:net> set physical=bge1 zonecfg:twilight:net> end zonecfg:twilight> add net zonecfg:twilight:net> set physical=bge2 zonecfg:twilight:net> end zonecfg:twilight>add device zonecfg:twilight:device> set match=/dev/net/bge1 zonecfg:twilight:net> end zonecfg:twilight>add device zonecfg:twilight:device> set match=/dev/net/bge2 zonecfg:twilight:net> end zonecfg:twilight> exitAs usual, the Container must be installed and booted with zoneadm(1M):
global# zoneadm -z twilight install global# zoneadm -z twilight bootNow you can login to the Container's console and answer the usual configuration questions:
global# zlogin -C twilight <answer questions> <the zone automatically reboots>After the Container reboots, you can configure IPMP. There are two methods. One uses link-based failure detection and one uses probe-based failure detection.
Link-based detection requires the use of a NIC which supports this feature. Some NICs that support this are hme, eri, ce, ge, bge, qfe and vnet (part of Sun's Logical Domains). They are able to detect failure of the link immediately and report that failure to Solaris. Solaris can then take appropriate steps to ensure that network traffic continues to flow on the remaining NIC(s).
Other NICs do not support this link-based failure detection, and must use probe-based detection. This method uses ICMP packets ("pings") from the NICs in the IPMP group to detect failure of a NIC. This requires one IP address per NIC, in addition to the IP address of the group.
Regardless of the method used, configuration can be accomplished manually or via files /etc/hostname.<NIC-instance>. First I'll describe the manual method.
# ifconfig bge1 plumb # ifconfig bge1 twilight group ipmp0 up # ifconfig bge2 plumb # ifconfig bge2 group ipmp0 upNote that those commands only achieve the desired network configuration until the next time that Solaris boots. To configure Solaris to do the same thing when it next boots, you must put the same configuration information into configuration files. Inserting those parameters into configuration files is also easy:
/etc/hostname.bge1: twilight group ipmp0 upThose two files will be used to configure networking the next time that Solaris boots. Of course, an IP address entry for twilight is required in /etc/inet/hosts.
/etc/hostname.bge2: group ipmp0 up
If you have entered the ifconfig commands directly, you are finished. You can test your IPMP group with the if_mpadm command, which can be run in the global zone, to test an IPMP group in the global zone, or can be run in an exclusive-IP zone, to test one of its groups:
# ifconfig -a ... bge1: flags=201000843If you are using link-based detection, that's all there is to it!mtu 1500 index 4 inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255 groupname ipmp0 ether 0:14:4f:f8:9:1d bge2: flags=201000843 mtu 1500 index 5 inet 0.0.0.0 netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:b ... # if_mpadm -d bge1 # ifconfig -a ... bge1: flags=289000842 mtu 0 index 4 inet 0.0.0.0 netmask 0 groupname ipmp0 ether 0:14:4f:f8:9:1d bge2: flags=201000843 mtu 1500 index 5 inet 0.0.0.0 netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:b bge2:1: flags=201000843 mtu 1500 index 5 inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255 ... # if_mpadm -r bge1 # ifconfig -a ... bge1: flags=201000843 mtu 1500 index 4 inet 129.152.2.72 netmask ffff0000 broadcast 129.152.255.255 groupname ipmp0 ether 0:14:4f:f8:9:1d bge2: flags=201000843 mtu 1500 index 5 inet 0.0.0.0 netmask ff000000 groupname ipmp0 ether 0:14:4f:fb:ca:b ...
As mentioned above, using probe-based detection requires more IP addresses:
/etc/hostname.bge1: twilight netmask + broadcast + group ipmp0 up addif twilight-test-bge1 \ deprecated -failover netmask + broadcast + up
/etc/hostname.bge2: twilight-test-bge2 deprecated -failover netmask + broadcast + group ipmp0 upThree entries for hostname and IP address pairs will, of course, be needed in /etc/inet/hosts.
All that's left is a reboot of the Container. If a reboot is not practical at this time, you can accomplish the same effect by using ifconfig(1M) commands:
twilight# ifconfig bge1 plumb twilight# ifconfig bge1 twilight netmask + broadcast + group ipmp0 up addif \ twilight-test-bge1 deprecated -failover netmask + broadcast + up twilight# ifconfig bge2 plumb twilight# ifconfig bge2 twilight-test-bge2 deprecated -failover netmask + \ broadcast + group ipmp0 up
Whether link-based failure detection or probe-based failure detection is used, we have a Container with these network properties:
Tuesday Feb 05, 2008
Tuesday Oct 16, 2007
It's time for a "shameless plug"...
If you would like to develop deeper Solaris skills, LISA'07 offers some excellent opportunities. LISA is a conference organized by Usenix, and is intended for Large Installation System Administrators. This year, LISA will be held in Dallas, Texas, November 11-16. It includes vendor exhibits, training sessions and invited talks. This year the keynote address will be delivered by John Strassner, Motorola Fellow and Vice President, and is entitled "Autonomic Administration: HAL 9000 Meets Gene Roddenberry."
Many tutorials will be available, including four full-day sessions focusing on Solaris:
Early-bird
registration ends this Friday, October 19 and saves $Hundreds compared to the Procrastinator's Rate
.
Wednesday Sep 26, 2007
Monday Sep 24, 2007
I blogged about my
Vulcanite earlier this year. This rocket is 53" tall
(4.5 ft, 135 cm) and weighs 32 oz (2 pounds, about 1 kg) before adding a motor.
I painted it orange and black to make it more visible against blue sky or
light clouds.
My goals for this rocket include:
The results were gratifying.
| (When I take pictures of a launch,
I press the shutter as soon as I see any vertical movement, which
resulted in a well-composed picture. At least it did this time...)
According to the on-board altimeter I added, it flew to 1,584 feet (480 m). More importantly, it flew almost perfectly straight up, and the 24-inch parachute returned it safely to Earth not far away from the launch rail. However, it seems that the delay I chose - the time before the parachute is ejected - was not long enough. With the correct delay, the rocket would have flown higher. |
| Beaming with success, I decided that the next launch would begin to test
the limits of this rocket. I chose an I218R - an 8-inch (20 cm) motor with
almost twice the total impulse of the previous motor. (Think of total impulse
as the total force exerted while the motor is burning.) Even though I knew
it would fly much higher, the wind was very light that day, so I
didn't expect to walk far to recover it.
With this motor, the Vulcanite flew to 4,469 feet (1.35 km)! Also impressive was its maximum speed: over 500 MPH (800 km/h). You can see that in the picture to the right: I have an itchy shutter finger, but the rocket launched so fast I missed it entirely! Unfortunately, although the nose cone ejected properly, the parachute never came out. The two ends of the rocket, connected by an elastic cord, fell over 4,000 feet to the ground. Fortunately, the launch area was an empty corn field with large clods of dirt which had been softened by rain the day before. The only damage was a partial crack in one plywood fin. A little sanding, some new epoxy, and it should fly again. To one mile? |
|
Tuesday Sep 11, 2007
Wednesday Sep 05, 2007
This update to Solaris 10 has many new features. Of those, many enhance Solaris Containers either directly or indirectly. This update brings the most important changes to Containers since they were introduced in March of 2005. A brief introduction to them seems appropriate, but first a review of the previous update.
Solaris 10 11/06 added four features to Containers. One of them is called "configurable privileges" and allows the platform administrator to tailor the abilities of a Container to the needs of its application. I blogged about configurable privileges before, so I won't say any more here.
At least as important as that feature was the new ability to move (also called 'migrate') a Container from one Solaris 10 computer to another. This uses the 'detach' and 'attach' sub-commands to zoneadm(1M).
Other, minor new features, included:
Earlier releases of Solaris 10 included the Resource Capping Daemon. This tool enabled you to place a 'soft cap' on the amount of RAM (physical memory) that an application, user or group of users could use. Excess usage would be detected by rcapd. When it did, physical memory pages owned by that entity would be paged out until the memory usage decreased below the cap.
Although it was possible to apply this tool to a zone, it was cumbersome and required cooperation from the administrator of the Container. In other words, the root user of a capped Container could change the cap. This made it inappropriate for potentially hostile environments, including service providers.
Solaris 10 8/07 enables the platform administrator to set a physical memory cap on a Container using an enhanced version of rcapd. Cooperation of the Container's administrator is not necessary - only the platform administrator can enable or disable this service or modify the caps. Further, usage has been greatly simplified to the following syntax:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set physical=500m zonecfg:myzone:capped-memory> end zonecfg:myzone> exitThe next time the Container boots, this cap (500MB of RAM) will be applied to it. The cap can be also be modified while the Container is running, with:
global# rcapadm -z myzone -m 600mBecause this cap does not reserve RAM, you can over-subscribe RAM usage. The only drawback is the possibility of paging.
For more details, see the online documentation.
Virtual memory (i.e. swap space) can also be capped. This is a 'hard cap.' In a Container which has a swap cap, an attempt by a process to allocate more VM than is allowed will fail. (If you are familiar with system calls: malloc() will fail with ENOMEM.)
The syntax is very similar to the physical memory cap:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set swap=1g zonecfg:myzone:capped-memory> end zonecfg:myzone> exitThis limit can also be changed for a running Container:
global# prctl -n zone.max-swap -v 2g -t privileged -r -e deny -i zone myzoneJust as with the physical memory cap, if you want to change the setting for a running Container and for the next time it boots, you must use zonecfg and prctl or rcapadm.
The third new memory cap is locked memory. This is the amount of physical memory that a Container can lock down, i.e. prevent from being paged out. By default a Container now has the proc_lock_memory privilege, so it is wise to set this cap for all Containers.
Here is an example:
global# zonecfg -z myzone zonecfg:myzone> add capped-memory zonecfg:myzone:capped-memory> set locked=100m zonecfg:myzone:capped-memory> end zonecfg:myzone> exit
Many existing resource management features have a new, simplified user interface. For example, "dedicated-cpus" re-use the existing Dynamic Resource Pools features. But instead of needing many commands to configure them, configuration can be as simple as:
global# zonecfg -z myzone zonecfg:myzone> add dedicated-cpu zonecfg:myzone:dedicated-cpu> set ncpus=1-3 zonecfg:myzone:dedicated-cpu> end zonecfg:myzone> exitAfter using that command, when that Container boots, Solaris:
Also, three existing project resource controls were applied to Containers:
global# zonecfg -z myzone zonecfg:myzone> set max-shm-memory=100m zonecfg:myzone> set max-shm-ids=100 zonecfg:myzone> set max-msg-ids=100 zonecfg:myzone> set max-sem-ids=100 zonecfg:myzone> exitFair Share Scheduler
A commonly used method to prevent "CPU hogs" from impacting other workloads is to assign a number of CPU shares to each workload, or to each zone. The relative number of shares assigned per zone guarantees a relative minimum amount of CPU power. This is less wasteful than dedicating a CPU to a Container that will not completely utilize the dedicated CPU(s).
Several steps were needed to configure this in the past. Solaris 10 8/07 simplifies this greatly: now just two steps are needed. The system must use FSS as the default scheduler. This command tells the system to use FSS as the default scheduler the next time it boots.
global# dispadmin -d FSSAlso, the Container must be assigned some shares:
global# zonecfg -z myzone zonecfg:myzone> set cpu-shares=100 zonecfg:myzone> exitShared Memory Accounting
One feature simplification is not a reduced number of commands, but reduced complexity in resource monitoring. Prior to Solaris 10 8/07, the accounting of shared memory pages had an unfortunate subtlety. If two processes in a Container shared some memory, per-Container summaries counted the shared memory usage once for every process that was sharing the memory. It would appear that a Container was using more memory than it really was.
This was changed in 8/07. Now, in the per-Container usage section of prstat and similar tools, shared memory pages are only counted once per Container.
global# zonecfg -z global zonecfg:myzone> set cpu-shares=100 zonecfg:myzone> set scheduling-class=FSS zonecfg:myzone> exitUse those features with caution. For example, assigning a physical memory cap of 100MB to the global zone will surely cause problems...
| Argument or Option | Meaning |
|---|---|
| -s | Boot to the single-user milestone |
| -m <milestone> | Boot to the specified milestone |
| -i </path/to/init> | Boot the specified program as 'init'. This is only useful with branded zones. |
Allowed syntaxes include:
global# zoneadm -z myzone boot -- -s global# zoneadm -z yourzone reboot -- -i /sbin/myinit ozone# reboot -- -m verboseIn addition, these boot arguments can be stored with zonecfg, for later boots.
global# zonecfg -z myzone zonecfg:myzone> set bootargs="-m verbose" zonecfg:myzone> exit
Also, the privilege proc_priocntl can be added to a Container to enable the root user of that Container to change the scheduling class of its processes.
This also allows a Container to control its own network configuration, including routing, IP Filter, the ability to be a DHCP client, and others. The syntax is simple:
global# zonecfg -z myzone zonecfg:myzone> set ip-type=exclusive zonecfg:myzone> add net zonecfg:myzone:net> set physical=bge1 zonecfg:myzone:net> end zonecfg:myzone> exit
The latter ability requires more explanation. An existing challenge in the maintenance of zones is patching - each zone must be patched when a patch is applied. If the patch must be applied while the system is down, the downtime can be significant.
Fortunately, Live Upgrade can create an Alternate Boot Environment (ABE) and the ABE can be patched while the Original Boot Environment (OBE) is still running its Containers and their applications. After the patches have been applied, the system can be re-booted into the ABE. Downtime is limited to the time it takes to re-boot the system.
An additional benefit can be seen if there is a problem with the patch and that particular application environment. Instead of backing out the patch, the system can be re-booted into the OBE while the problem is investigated.
Solaris 10 8/07 contains a new framework called Branded Zones. This framework enables the creation and installation of Containers that are not the default 'native' type of Containers, but have been tailored to run 'non-native' applications.
This was only a brief introduction to these many new and improved features. Details are available in the usual places, including http://docs.sun.com, http://sun.com/bigadmin, and http://www.sun.com/software/solaris/utilization.jsp.
Tuesday Jul 24, 2007
Solaris Containers for Linux Applications is the first implementation in the BrandZ framework, and runs Red Hat and CentOS applications. In addition, non-Sun members of the OpenSolaris community have begun contributing new code to BrandZ. Albert Lee has demonstrated the ability to create a Debian zone.
Also, Wei Shen is leading an effort to enable 64-bit apps to run in Linux-branded Containers.
Thursday Jul 12, 2007
Solaris Containers (aka Zones) is a virtualization tool that has other powerful, but less well known uses. These rely on a unique combination of features:
By default, Solaris Containers are more secure than general-purpose operating systems in many ways. For example, even the root user of a Container with a default configuration cannot modify the Container's operating system programs. That limitation prevents trojan horse attacks which replace those programs. Also, a process running in a Container cannot directly modify any kernel data, nor can it modify kernel modules like device drivers. Glenn Brunette created an excellent slide deck that describes the multiple layers of security in Solaris 10, of which Containers can be one layer.
Even considering that level of security, the ability to selectively remove Solaris privileges can be used to further tighten a zone's security boundary. In addition, the ability to disable network services prevents almost all network-based attacks. This is very difficult to accomplish in most operating systems without making the system unusable or unmanageable.
The combination of those abilities and the resource controls that are part of Containers' functionality enables you to configure an application environment that can do little more than fulfill the role you choose for it.
This blog entry describes a method that can be used to slightly expand a Container's abilities, and then tighten the security boundary snugly around the Container's intended application.
Imagine that you want to run an application on a Solaris system, but the workload(s) running on this system should not be directly attached to the Internet. Further, imagine that the application needs an accurate sense of time. Today this can be done by properly configuring a firewall to allow the use of an NTP client. But now there's another way... (If this concept sounds familiar, it is because this idea has been mentioned before here and here.)
To achieve the same goal without a firewall, you could use two Solaris "virtual environments" (zones): one that has "normal" behavior, for the application, and one that has the ability to change the system's clock, but has been made extremely secure by meeting the following requirements:
Any zone can be configured to have access to one or more network ports (NICs). Further, OpenSolaris build 57 and newer builds, and the next update to Solaris 10, enable a zone to have exclusive access to a NIC, further isolating network activity of different zones. This feature is called IP Instances and will be mentioned again a bit later. A zone has its own SSM (Solaris Services Manager). Most of the services managed by SSM can be disabled if you are limiting the abilities of a zone. The zone that will manage the time clock can be configured so that it does not respond to any network connection requests by disabling all non-essential services. Also, Solaris Configurable Privileges enables us to remove unnecessary privileges from the zone, and add the one non-default privilege it needs: sys_time. That privilege is needed in order to use the stime(2) system call.
|
|
Here is the configuration for the zone when I initially created it:
zonecfg -z timelord zonecfg:timelord> create zonecfg:timelord> set zonepath=/zones/roots/timelord zonecfg:timelord> exit
After the zone has been booted and halted once, disabling services in Solaris is easy - the svcadm(1M) command does that. Through experimentation I found that this script disabled all of the network services - and some non-network services, too - but left enough services running that the zone would boot and NTP client software would run. Note that this is less important starting with Solaris 10 11/06: new installations of Solaris 10 will offer the choice to install "Secure By Default" with almost all network services turned off.
To use that script, I booted the zone and logged into it from the global zone - something you can do with zlogin(1) even if the zone does not have access to a NIC. Then I copied the script from the global zone into the non-global zone. A secure method to do this is: as the root user of the global zone, create a directory in <zonepath>/root/tmp, change its permissions to prevent access by any user other than root, and then copy the script into that directory. All of that allowed the script to be run by the root user of the non-global zone. Those steps can be accomplished with these commands:
global# mkdir /zones/roots/timelord/root/tmp/ntpscript global# chmod 700 /zones/roots/timelord/root/tmp/ntpscript global# cp ntp-disable-services /zones/roots/timelord/root/tmp/ntpscript global# zlogin timelord timelord# chmod 700 /tmp/ntpscript/disable-services timelord# /tmp/ntpscript/disable-services
Now we have a zone that only starts the services needed to boot the zone and run NTP. Incidentally, many other commands will still work, but they don't need any additional privileges.
The next step is to gather the minimum list of Solaris privileges needed by the reduced set of services. Fortunately, a tool has been developed that helps you determine the minimum necessary set of privileges: privdebug.
Here is a sample use of privdebug, which was started just before booting the zone, and stopped after the zone finished booting:
global# ./privdebug.pl -z timelord STAT PRIV USED sys_mount USED sys_mount USED sys_mount USED sys_mount USED sys_mount USED proc_exec USED proc_fork USED proc_exec USED proc_exec USED proc_fork USED contract_event USED contract_event <many lines deleted> ^C global#Running that output through sort(1) and uniq(1) summarizes the list of privileges needed to boot the zone and our minimal Solaris services. Limiting a zone to a small set of privileges requires using the zonecfg command:
global# zonecfg -z timelord zonecfg:timelord> set limitpriv=file_chown,file_dac_read,file_dac_write,file_owner,prov_exec,proc_fork,proc_info,proc_session,proc_setid,proc_taskid,sys_admin,sys_mount,sys_resource zonecfg:timelord> exitAt this point the zone is configured without unnecessary privileges and without network services. Next we must discover the privileges needed to run our application. Our first attempt to run the application may succeed. If that happens, there is no need to change the list of privileges that the zone has. If the attempt fails, we can determine the missing privilege(s) with privdebug.
For this example I will use ntpdate(1M) to synchronize the system's time clock with time servers on the Internet. In order for ntpdate to run, it needs network access, which must be enabled with zonecfg. When adding a network port, I increased zone isolation with a new feature in OpenSolaris called IP Instances. Use of this feature is not required, but it does improve network isolation and network configuration flexibility. You can choose to ignore this feature if you are using a version of Solaris 10 which does not offer it, or if you do not want to dedicate a NIC to this purpose.
To use IP Instances, I added the following parameters via zonecfg:
global# zonecfg -z timelord zonecfg:timelord> set ip-type=exclusive zonecfg:timelord> add net zonecfg:timelord:net> set physical=bge1 zonecfg:timelord:net> end zonecfg:timelord> zonecfg:timelord> exit global#Setting ip-type=exclusive quietly adds the net_rawaccess privilege and the new sys_ip_config privilege to the zone's limit set. This happens whenever the zone boots. These privileges are required in exclusive-IP zones.
We can assign a static address to the zone with the usual methods of configuring IP addresses on Solaris systems. For example, you could boot the zone, login to it, and enter the following command:
timelord# echo "192.168.1.11/24" > /etc/hostname.bge1However, because the root user of the global zone can access any of the zone's files, you can do the same thing without booting the zone by using this command instead:
global# echo "192.168.1.11/24" > /zones/roots/timelord/root/etc/hostname.bge1
With network access in place, we can discover the list of privileges necessary to run the NTP client. First boot the zone:
global# zoneadm -z timelord bootAfter the zone boots, in one window run the privdebug script, and then in another window run the NTP client in the NTP zone:
global# ./privdebug.pl -z timelord STAT PRIV USED proc_fork USED proc_exec USED proc_fork USED proc_exec NEED sys_time ^Cglobal# |
global# zlogin timelord
timelord# ntpdate -u <list of NTP server IP addresses>
16 May 13:12:27 ntpdate[24560]: Can't adjust the time of day: Not owner
timelord#
|
That output shows us that the privilege 'sys_time' is the only additional one needed to enable the zone to set the system time clock using ntpdate(1M).
Again we use zonecfg to modify the zone's privileges:
global# zonecfg -z timelord zonecfg:timelord> set limitpriv=file_chown,file_dac_read,file_dac_write,file_owner,prov_exec,proc_fork,proc_info,proc_session,proc_setid,proc_taskid,sys-admin,sys_mount,sys_resource,sys_time zonecfg:timelord> exit
While isolating the zone, why not also limit the amount of resources that it can consume? If the zone is operating normally the use of resource management features is unnecessary, but they are easy to configure and their use in this situation could be valuable. These limits could reduce or eliminate the effects of a hypothetical bug in ntpdate which might cause a memory leak or other unnecessary use of resources.
Capping the amount of resources which can be consumed by the zone is also another layer of security in this environment. Resource constraints can reduce or eliminate risks associated with a denial of service attack. Note that the use of these features is not necessary. Their use is shown for completeness, to demonstrate what is possible.
A few quick tests with rcapstat(1) showed that the zone needed less than 50MB of memory to do its job. A cap on locked memory further minimized the zone's abilities without causing a problem for NTP. As with IP Instances, these features are available in OpenSolaris and will be in the next update to Solaris 10.
global# zonecfg -z timelord zonecfg:timelord> add capped-memory zonecfg:timelord:capped-memory> set physical=50m zonecfg:timelord:capped-memory> set swap=50m zonecfg:timelord:capped-memory> set locked=20m zonecfg:timelord:capped-memory> end zonecfg:timelord> set scheduling-class=FSS zonecfg:timelord> set cpu-shares=1 zonecfg:timelord> set max-lwps=200 global#
Assigning one share to the zone prevents the zone from using too much CPU power and impacting other workloads. It also guarantees that other workloads will not prevent this zone from getting access to the CPU. Capping the number of threads (lwps) limits the ability to use up a fixed resource: process table slots. That limit is probably not necessary given the strict memory caps, but it can't hurt.
Now that we have 'shrink-wrapped' the security boundary even more tightly than the default, we're ready to use this zone.
global# zoneadm -z timelord boot global# zlogin timelord timelord# ntpdateThe output of ntpdate shows that that it was able to contact an NTP server and adjust this system's time clock by almost 0.4 seconds.16 May 14:40:35 ntpdate[25070]: adjust time server
offset -0.394755 sec
Experience with Solaris privileges can allow you to further tighten the security boundary. For example, if you want to prevent the zone from changing its own host name, you could remove the sys_admin privilege from the zone's limit set. Doing so, and then rebooting the zone, would allow you to demonstrate this:
timelord# hostname drwho hostname: error in setting name: Not owner timelord#What privilege is needed to use the hostname(1M) command?
timelord# ppriv -e -D hostname drwho hostname[4231]: missing privilege "sys_admin" (euid = 0, syscall = 139) needed at systeminfo+0x139 hostname: error in setting name: Not owner
Before disabling services, I ran "netstat -a" on another zone which had just been created. It showed a list of 13 ports to which services were listening, including ssh and sunrpc services. After hardening the zone 'timelord' by disabling unneeded services, "netstat -a" doesn't show any open ports.
In order to further evaluate the security of the configuration described above, Nessus was used to evaluate the possible attack vectors. It did not find any security weaknesses.
What else can be secured using this method? Typical Unix services like sendmail and applications like databases are ideal candidates. What application do you want to secure?
Thanks to Glenn Brunette for assistance with security techniques and to Bob Bownes for providing a test platform and assistance with Nessus.
Tuesday Jul 10, 2007
Thursday May 24, 2007
This week we have a guest blogger. Christine works with me in the Solaris Adoption Practice. She doesn't have a blog but suddenly finds that this week she has something to say. Here is her two-part discussion about starting your Solaris 10 Migration. She can be reached at christine DOT tran AT sun.com.
Part 1: "How much does it cost to upgrade to Solaris 10?"
We in the Solaris Adoption group get questions like this once in a while, from customers and from account managers. It has grown more frequent in recent weeks. The answer is: "It depends!" This answer exasperates some people and drives others up the wall. They try asking it another way: "How long does it take?" "Give me a ball-park level-of-effort estimate." "What's the cost of NOT upgrading to Solaris 10?"
The answer is: It depends.
I can understand the impetus behind the question. Before buying a car, a phone plan, a Twinkie, I want to know how much it costs. I'm trading my moneys for goods or service, and I want to know how much I'm in for. For crying out loud if they can estimate how much it costs to raise a child ($165,630 for 17 years, before inflation), you can tell me how much it costs to upgrade to Solaris 10! What is this "it depends" monkey business?
Introducing change into any system will cost you. It doesn't matter if you're patching, upgrading a firmware version, Apache, or the entire OS. At the very least it'll cost you time: even if you don't incur downtime, you will have expended time to do the task. Let's say you'll expend five minutes of time to introduce a patch to your system." There's your cost: five minutes." Will you install the patch?" "Sure. Why not?" How about one hour? Will you install the patch if it takes you one hour? How about eight hours? "We-l-l-l ... that's a wee bit long for a patch, isn't it? What's the patch for?"
And THERE is the first question you should ask, "What is it for?" If the patch fixes a rarely used OpenGL library, you may patch if it takes five minutes. You may not patch if it takes one hour, unless you had other patches to apply. You certainly are not going to spend eight hours. On the other hand if the patch fixes a security vulnerability with open exploits, you definitely are going to patch even if it takes you eight hours.
Before collaring your sales rep and asking him how much will it cost to upgrade to Solaris 10, ask "What is it for?" The cost is related to the benefits derived, a price tag by itself is meaningless
Should you upgrade to Solaris 10? From 2.6, Solaris 8, Solaris 9, migrate from some other OS? Perhaps not. Perhaps your system is running at the utilization rate you want, at the stable patch level you want, you're not going to be adding new hardware, you don't expect to grow your users, your disks, your bandwidth, you don't want any new feature. In short, you are perfectly happy with your server and there's no improvement you can think to make. In that case, no, you probably shouldn't spend the effort to run Solaris 10.
Of course, this scenario only exists in a perfect world. Solaris 10 is so full of Wheaties goodness that it would be near impossible to find a case where it's not an improvement on a previous version of Solaris. However, upgrading to Solaris 10 only to be running the latest, the hippest, the coolest, will give you no quantifiable benefit except that you're now hip and cool
Begin with a problem in your datacenter that you want to solve. Make it a business problem, and not a technical problem. Not "poor network performance", but "buyers take too long to go from checkout to payment." Not "consolidation" but "maintenance contracts and license on 25 web servers eating up IT budget."
These problems will resolve into technical problems: want a faster network stack, want consolidation but guaranteed computing resources per server, want faster and better development tools, want to re-use old JBODs. The technical problems will be solved by Solaris 10.
If you answer yes to all of the above, and have other requirements beyond that, prioritize and pick the top three. You're going to spend time, money, and manpower. Keeping the migration scope small makes the project manageable, and the relationship between cost and benefit clear
Part II: "How do I upgrade to Solaris 10?"
You've identified the top three business problems that Solaris 10 will solve. As an example, let's make them
Let's say Solaris 10 improved TCP/IP stack will solve problem number 1, buying a big honking new server will solve problem number 2, and deploying zones to mimic your acquisition environment on the big honking new server will solve problem number 3
Your pilot migration plan will focus on the "before" and "after" of these three things. You want to know if upgrading to Solaris 10 will make your transaction faster, if new hardware will give you room to grow, and if you can host your acquisition without adding footprint in your datacenter. Granted, these are strawman problems simplified for the brevity of this post, but these are not very far removed from the real problems in real datacenters or mom-n-pop shops.
Planning
1. Scope your migration. Let's say you're just migrating 10 older platforms to a Sun Fire V890. You want to see faster paying transaction, enough memory and CPU power to triple the workload, and a several zones to mimic the environment of your recent acquisition. Decide on the target: for example, you'll want to end up with Solaris 10 11/06, most current patch cluster, application version x.y.
2. Take an inventory. What applications will be loaded on the V890, what firmware, what driver? Are there supporting applications that will have to be carried over? Don't forget any network or SAN support, like trunking or PowerPath.
3. Check your inventory. Is your COTS application supported by the vendor on Solaris 10? Sun can help you check this. If your application is developed in-house, can it pass Appcert, the tool to flag binary incompatibility between Solaris versions? If you have application versions not supported on Solaris 10, can you upgrade the application.
4. Understand your operating environment. Do you know how your applications are backed up? Where are traps and alerts being sent? What to do for Disaster Recovery? What firewall policy or user policy is in effect? What are the intersections between applications which will be moved
5. Have test methods and criteria. What test will verify that your application is functional? Are there performance benchmark? Have you taken a performance baseline measurement? A baseline measurement indicates how you are performing now. After migration, take another measurement and compare it with your baseline.
6. Some /etc/systems parameters have been obsoleted in Solaris 10, replaced by Resource Management parameters. Check your /etc/systems to see if you need to convert these parameters.
7. Solaris 10 comes with SMF (Service Management Facility.) Are there scripts in /etc/rc that you want to convert to XML manifest?
8. There are special considerations to using zones. Have a plan for creating, patching, and managing zones.
9. Do you have a Change Management framework? If yes, fit your migration activities into this framework. File your migration plan with the Change Management Board.
Action
10. Upgrade a test environment first.
11. Depending on your standard installation practice, write up an install plan for upgrading to Solaris 10. It can be as simple as taking a backup and inserting a Solaris 10 CD, or as complex as un-encapsulating rootdg, removing disks from Veritas Volume Manager and using LiveUpgrade. The install plan should have a section on how to fall-back if the installation does not go well.
12. Have a good backup. Test your good backup.
13. Schedule a maintenance window and follow your installation plan. Install any additional drivers, firmware, support packages, mandatory patches. Install your zones. Install your applications.
14. Perform functional tests (does the OS and application work?) performance test (does the application run as well or better than before?) and integration test (does the application interact and perform as before when interfacing with other applications?) You have documented these test methods and results in Step 5.
15. Can you perform routine administration tasks as before? Check cron jobs, log management, backups, system management, patching, name service, directory service, remote management, and anything else you do in your daily care-and-feeding routine. You have documented these tasks in Step 4.
16. Repeat the previous two steps for your zones and applications running in a zone.
It's important to define an endpoint for your pilot where you stop work and examine if your migration has fulfilled your objectives. In other words, now your application is twice as fast, your new server can support three times the workload, and it contains zones which enclose a separate working environment for your recently-acquired business unit.
A successful small pilot can be broaden to cover a production environment. Document your pilot project, and repeat it for your production environment. From here, repeat the steps, with modifications from your experience, to upgrade another part of your datacenter. Or you can write another project plan for an OS refresh to gradually transform the rest of your datacenter. But you're done with the hardest part: the beginning.
Thursday May 10, 2007
No, not that DOS.
I'm referring to Denial-of-Service.
A team at Clarkson University including a professor and several students recently performed some interesting experiments. They wanted to determine how server virtualization solutions handled a guest VM which performed a denial-of-service attack on the whole system. This knowledge could be useful when virtualizing guests that you don't trust. It gives you a chance to put away the good silver.
They tested VMware Workstation, Xen, OpenVZ, and Solaris Containers. (It's a shame that they didn't test VMware ESX. VMware Workstation and ESX are very different technologies. Therefore, it is not safe to assume that the paper's conclusions regarding VMware Workstation apply to ESX.) After reading the paper, my conclusion for Solaris Containers is "they have non-default resource management controls to contain DoS attacks, and it's important to enable those controls."
Fortunately, with the next update to Solaris 10 (due this summer) those controls are much easier to use. For example, the configuration parameters used in the paper, and shown below, limit a Container's use of physical memory, virtual memory, and amount of physical memory which can be locked so that it doesn't get paged out:
add capped-memory set physical=128M set swap=512M set locked=64M endFurther, the following parameters limit the number of execution threads that the Container can use, turn on the fair-share scheduler and assign a quantity of shares for this Container:
set max-lwps=175 set scheduling-class=FSS set cpu-shares=10All of those parameters are set using the zonecfg(1M) command. One benefit of the centralization of these control parameters is that they move with a Container when it is moved to another system.
I partly disagree with the authors' statement that these controls are complex to configure. The syntax is simple - and a significant improvement over previous versions - and an experienced Unix admin can determine appropriate values for them without too much effort. Also, a GUI is available for those who don't like commands: the Solaris Container Manager. On the other hand, managing these controls does require Solaris administration experience, and there are no default values. It is important to use these features in order to protect well-behaved workloads from misbehaving workloads.
It also is a shame that the hardware used for the tests was a desktop computer with limited physical resources. For example it had only one processor. Because multi-core processors are becoming the norm, it would be valuable to perform the same tests on a multi-core system. The virtualization software would be stressed in ways which were not demonstrated. I suspect that Containers would handle that situation very well, for two reasons:
Also, the test system did not have multiple NICs. The version of Solaris that was used includes a new feature called IP Instances. This feature allows a Container to be given exclusive access to particular NIC. No process outside that Container can access that NIC. Of course, multiple NICs are required to use that feature...
The paper Quantifying the Performance Isolation Properties of Virtualization Systems will be delivered at the ACM's Workshop on Experimental Computer Science.