Tuesday May 10, 2005
The Xen Summit
A few weeks ago I attended the Xen summit in Cambridge, UK. Xen is an open-source hypervisor project being driven by Ian Pratt from the University's Computer Laboratory. Xen is attracting a lot of interest as the de facto open source hypervisor for commodity hardware.
Xen is designed to be a thin layer of software which allows multiple kernels to run on a single machine. Among the many cool things this new layer of virtualization allows is OS checkpoint and resume, which the Xen team have used to good effect in their workload migration experiments. (I was at a Linuxworld BOF last March where the sound of jaws dropping as Ian presented their results was quite evident!) Take a look at the papers on their website - it's pretty cool stuff.
Xen is based on paravirtualization; that is, you have to make some changes to the low-level kernel to allow the OS to run. This is both because that was easier to do on existing x86 hardware, and more importantly, it's also better performing than other approaches.
Anyhow, we've been looking at Xen for about a year now, and recently a few of us have been working on a prototype port of Solaris to Xen on the x86 architecture. We're planning to make it work on x64 machines where we can exploit the new hardware virtualization technology as it becomes available. We're also planning to make the Solaris side of things into an OpenSolaris community project too; particularly since Xen is itself an open source project. Although we're still working on the mechanics of all that, I'd like to hear from people who want to participate.
Update
Some of the comments below imply that you might think I'm only interested in help from kernel developers. I'm also interested to here from people who are already using Xen, and are prepared to experiment with, and give us feedback on, early alpha-class builds of Solaris on Xen too.
Another Update
The OpenSolaris on Xen community is now up: see the OpenSolaris web site to participate.
Technorati Tag: OpenSolaris
Technorati Tag: Solaris
Posted at
02:59PM May 10, 2005
by tpm in Solaris |
Solaris 10 on x64 Processors: Part 1 - Prework
SSE Support
Back in 2002, after we had resurrected Solaris on x86, we realized that we needed to get back to basics with a number of core kernel subsystems, because while we'd been slowly disinvesting in Solaris on x86, the x86 hardware world had been scampering off doing some really interesting things e.g. the Streaming SIMD Extensions (SSE) to the instruction set, and introducing fast system call mechanisms. We also knew that these were basic assumptions of the 64-bit architecture that AMD was working on, so we started work on the basic kernel support which allows the xmm and mxcsr registers to be part of the state of every Solaris lwp. At the same time, Matt Simmons helped out with the disassembler and debugger support for the new instructions. This didn't take too long, and the work was integrated into Solaris 10 in October 2003, and Solaris 9 Update 6. One of the immediate benefits was Java floating point performance which used SSE instructions on capable platforms, and on the right hardware, Solaris was now one of those platforms!
Fast System Calls
In earlier Solaris releases, system calls were implemented using the lcall instruction; we'd been faithfully doing this for years, without really noticing that the performance of call gates was falling further and further behind. For Solaris 10, we decided to make fast system calls work using the sysenter instruction that was first introduced on Pentium II processors. Because of some awkward limitations around the register usage of the sysexit instruction (in particular, dealing with system calls that return two values), plus our desire to run on older machines, we also keep the older lcall handler around too.
First, I should explain something about the way Solaris system call handlers work in general. As you can imagine, in a highly observable system like Solaris, we can, in some circumstances, end up doing a lot of work in the system call enter and system call exit path. But, most of the time, we don't actually need to do all the checks, so more than 10 years ago, one of my former colleagues restructured SPARC system calls to do a single test for all possible pre-work on the per-lwp thread variable called t_presys, and a single test for all possible post-system call handler work on another thread variable called t_postsys. The system call handler is then constructed assuming that the t_presys and t_postsys cases are rare - but if either t_presys or t_postsys is set e.g. by the current system call, previous system call, or via /proc, we handle the relevant rare case in C code, allowing us to code the fast path in a small amount of assembler. To summarize:
entry_point:
if (curthread->t_presys)
presys();
real_handler();
if (curthread->t_postsys)
postsys();
return-from-trap sequence
Obviously the Solaris x86 architecture mirrored this to some extent, but the presys() and postsys() functions had been
partially rendered from C into assembler which was, as usual, difficult to understand, port and maintain, and wasn't even particularly fast. So the initial exercise was to turn the slow paths back into C code, and macro-ize the assembler code involved in performing the pre and post checks so that different syscall handlers could easily share code. Then I coded up a sysenter style handler, and we were pretty impressed with the results on our system call microbenchmarks.
Hardware Capability Architecture
All this kernel work was fun, but we didn't have a clear idea of how we were going to let libc use fast instructions on machines capable of handling them, and fall back to lcall on machines that couldn't. We also noted that when AMD processors are running in long mode, sysenter is not supported but syscall (similar but different) is.
Earlier attempts to introduce support for this facility had considered using either the libc_psr mechanism that we introduced in Solaris 2.5 for dealing with the fast bcopy instructions available on UltraSPARC platforms, or using the isalist mechanism. The former scheme assumes that the instruction set extensions were specific to a platform, while the latter implicitly assumes that there are a small set of instructions extensions that were additive, and acted to improve performance. However we realized that in the x86 world we weren't dealing with platform extensions so much as processor extensions, and that processor vendors were adding instruction set extensions orthogonally, so we'd be better describing each instruction set extension by close analogy to the way the vendors were describing them in the cpuid instruction i.e. via a bit value in a feature word. See getisax(3C) for programmatic access to the kernel's view; <sys/aux_386.h> contains the list of capabilities we expose.
What we ended up with is (currently) three copies of the libc binary compiled different ways; the basic version in /lib/libc.so.1 is able to run on the oldest hardware we support, the newer versions in /usr/lib/libc correspond to more modern hardware running on a 32-bit or 64-bit kernel. Thanks to Rod Evans the libraries are marked with the capabilities they require, and the system figures out which is the best library to use on the running system at boot time. Last November, Darren Moffat wrote something up about how the system configures which libc it uses; there's no point in repeating that here.
The Tool Chain
The other key piece we needed for the amd64 kernel was to make the Solaris kernel compile and run with the GNU C compiler and assembler, so I started work on that too. Note that wasn't because we didn't want to use the Sun compiler, it's just that it's easier to bring up an OS using a compiler that works, instead of debugging both the kernel and the compiler simultanously. More critically, there wasn't a Sun compiler that would build 64-bit objects at the time. GNU C is great for finding bugs, and really complimented the capabilities of the Sun compiler and lint tools. I got the 32-bit kernel working fairly easily, we were able to start the 64-bit project using this compiler, once we'd hacked up an initial configuration.
In the meantime, while we were completing and integrating some of these prerequisites into Solaris 10, we were assembling the main amd64 project team; work really started in earnest in January of 2004. Next time I'll describe some of our early problems with the porting work.
Technorati Tag: Solaris
Posted at
02:30PM May 10, 2005
by tpm in Solaris |
Hello, World
Hello. First some preliminaries. Who am I? Well, I'm a Distinguished Engineer in the Operating Platforms Group at Sun; I'm also the Chief Technical Officer for that group. The first part of that means that I get to work on Solaris, the second part of that sentence means that it's a continuous struggle to do any engineering work in the face of too many meetings :) Among the other CTOs I've met both within Sun and outside of Sun, it's a relief to see that there's a wide interpretation of the role. Some CTOs are only concerned with abstract futures, some are solely buried in day-to-day issues. I try to be somewhere in the middle, and thus please nobody all of the time :(
I'm an engineer by calling; like my colleagues, we're in this to build software artifacts that make other peoples lives better. I've worked on Solaris for many years, on numerous subsystems and problems, from architectural direction to fixing bugs. We're very excited about Solaris 10 and how much it seems to be helping customers with their problems, and changing the way people think about Solaris, Sun, and Operating Systems technology in general. I think we're living in a time of transitions, and operating systems are relevant again, as the boundaries between software system components are shifting, hardware devices become ever more capable and complex, while new business models emerge.
During Solaris 10, my principal code contributions were around modernizing the Solaris kernel port to the x86 architectures, and on bringing 64-bit Solaris up on x64 platforms - our slightly boringly named "amd64" project. Some months ago a member of that team blogged about some of the work he'd been doing on that project, and hoped that someone would spend a bit more time talking about the other work that we did during the amd64 port. That seemed like a good topic for a blog entry, so that's what I thought I'd write about first.
Posted at
02:20PM May 10, 2005
by tpm in General |
|
|
| Archives |
|
|
| May 2005 » | | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | | 12 | 13 | 14 | 15 | | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | | | | | | | | | | | | | | Today |
|
|
|
|
|
|
| Links |
|
|
|
|
|
| Referrers |
|
|
|
Today's Page Hits: 24
|
|
|
|
|
|