Useful stuff for your blog-reading pleasure.
All | General

20090615 Monday June 15, 2009

OpenSolaris meets Mac OS X in Munich

Last Wednesday, Wolfgang and I had the honor to present at "Mac Treff München", Munich's local Mac User Group. There are quite a few touching points between OpenSolaris and Mac OS X, such as ZFS, DTrace and VirtualBox, we thought it would be a good idea to contact them out of our Munich OpenSolaris User Group and talk a little bit about OpenSolaris.

Breaking the Ice

We were a little bit nervous about what would happen. Do Mac people care about the innards of a different, seemingls non-GUIsh OS? Are they just fanboys or are they open to other people's technologies? Will talking about redundancy, BFU, probes and virtualization bore them to death?

Fortunately, the 30-40 people that attended the event proved to be a very nice, open and tolerant group. They let us talk about OpenSolaris in General including some of the nitty-grittyness of the development process, before we started talking about the features that are more interesting to Mac users. We then talked about ZFS, DTrace and VirtualBox:

ZFS for Mac OS X (or not (yet)?)

Explaining the principles behind ZFS to people who are only used to draging'n'dropping icons, shooting photos or video and using computers to get work done, without having to care about what happens inside, is not easy. We concentrated on getting the basics of the tree structure, copy-on-write, check-summing and using redundancy to self-heal while using real world examples and metaphors to illustrate the principles. Here's the deal: If you have lots of important data (photos, recording, videos, anyone?) and care about it (content creators...), then you need to be concerned about data availability and integrity. ZFS solves that, it's that simple. A little animation in the slides were quite helpful in explaining that, too :).

The bad news is that ZFS seems to have vanished from all of Apple's communication about the upcoming Mac OS X Snow Leopard release. That's really bad, because many developers and end-users were looking forward to take advantage of it.

The good news is that there are still ways to take advantage of ZFS as a Mac User: Run an OpenSolaris file server for archiving your data or using it as a TimeMachine store, or even run a small OpenSolaris ZFS Server inside your Mac through VirtualBox.

DTrace: A Mac Developer/Admin's Heaven, Albeit in Jails

Next, we dove a little bit into DTrace and how it makes the OS really transparent for admins, developers and users. In addition to the dtrace(1) command, Apple created a nice GUI called "Instruments" as part of their XCode development environment that leverages the DTrace infrastructure to collect useful data about your application in realtime.

Alas, as with ZFS, there's another downer, and this time it's more subtle: While you can enjoy the power of DTrace in Mac OS X now, it's still kinda crippled, as Adam Leventhal pointed out: Processes can escape the eyes of DTrace at will, which counters the absolute observability idea of DTrace quite massively. Yes, there are valid reasons for both sides of the debate, but IMHO, legal things should be enforced using legal means, and software should be treated as software, meaning it is not a reliable way of enforcing any license contracts - with or without powerful tools such as DTrace.

OpenSolaris for all: VirtualBox

Finally, a free present to the Mac OS X community: VirtualBox. I still get emails asking me to spend 80+ dollars on some virtualization software for my Mac. There are at least two choices in that price range: VMware Workstation and Parallels. Well, the good news is that you can save your 80 bucks and use VirtualBox instead.

This may not be new to you, since as a reader of my blog you've likely heard of VirtualBox before, but it's always amazing for me to see how slowly these things spread. So, after reading this article, do your Mac friends a favour and tell them they can save precious money buy just downloading VirtualBox instead of spending money on other virtualization solutions for the Mac. It's really that simple.

Indeed, this was the part where the attendees took most of their notes, and asked a lot of questions about (ZFS being a close first in terms of discussion/questions).

Conclusion

After our presentations, a lot of users came up and asked questions about how to install OpenSolaris on their hardware and on VirtualBox. Some even asked where to buy professional services for installing them an OpenSolaris ZFS fileserver in their company. The capabilities of ZFS clearly struck some chords inside the Mac OS X community, which is no wonder: If you have lots of Audio/Video/Photo data and care about quality and availability, then there's no way around FS.

I used this event as an excuse to try out keynote, which worked quite well for me, especially because it helped me create some easy to understand animations about the mechanics of ZFS. I also liked the automatic guides a lot which help you position elements on your slides very easily and seem to guess very well what your layout intentions were. I'd love the OpenOffice folks to check out Keynote's guides and see if they can come up with something similar. So, here's a Keynote version of my "OpenSolaris for Mac Users" slides as well as a PDF version (both in German) for you to check out and re-use if you like.

Update: Wolfgang's introductory slides are now available for download as well and Klaus, the organizer of the event, posted a review in the Mac Treff München Blog with some pictures, too.

"OpenSolaris meets Mac OS X in Munich" has been brought to you by Constantin's Blooog.
This entry was created on 2009-06-15 02:32:51.0 PST and is associated with the following tags:

You're welcome to use this Permalink , add a comment below or send your feedback to constantin at sun dot com.
Comments [7]


20090302 Monday March 02, 2009

The Inner Life of ZFS: Cool ZFS On-Disk Block Structure Movies

Pascal Gienger of Konstanz University published a nifty DTrace script that captures ZFS' on-disk block activity and published it on his Southbrain blog.

The cool thing: He animated the data. That's right. Using a Perl script, he draws greener or redder dots depending on whether a particular range of blocks on disk sees more reads or writes. By aggregating data over many hours while doing interesting tasks such as backup, he created a series of very cool animations.

In his first post, he shows us the inner life of a Postfix mail queue as an animated GIF:

ZFS on-disk block animation

Then, he compared the write patterns of UFS vs. ZFS using a MySQL workload to produce a cool MPEG-4 movie.

In his latest ZFS animation work, he shows us 18 hours of a mirrored file server including some backup, night rest and user action (Download MPEG-4 Movie here).

Congratulations, Pascal, this is way cool stuff. You really should upload these to YouTube so people can embed them in their blogs :).

Update: Meanwhile, pascal told me that he uploaded his videos on YouTube already. He has a full playlist full of them. Enjoy!

"The Inner Life of ZFS: Cool ZFS On-Disk Block Structure Movies" has been brought to you by Constantin's Blooog.
This entry was created on 2009-03-02 03:08:10.0 PST and is associated with the following tags:

You're welcome to use this Permalink , add a comment below or send your feedback to constantin at sun dot com.


20080703 Thursday July 03, 2008

Welcome to the year 2038!

An illustration of the year 2038 problem. Image taken from Wikipedia, see license there.A lot of interesting things are going to happen in the next 30 years. One of them will be a big push to fix the so called "Year 2038 Problem" on Solaris and other Unix and C-based OSes (assuming there'll be any left), which will be similar to the "Year 2000 Problem".

The Year 2038 Problem

To understand the Year 2038 Problem, check out the definition of time_t in sys/time.h:

typedef long time_t; /* time of day in seconds */

To represent a date/time combination, most Unix OSes store the number of seconds since January 1st, 1970, 00:00:00 (UTC) in such a time_t variable. On 32-Bit systems, "long" is a signed integer between -2147483648 and 2147483647 (see types.h). This covers the range between December 13th, 1901, 20:45:52 (UTC) and January 19th, 2038, 03:14:07, which the fathers of C and Unix thought to be sufficient back then in the seventies.

On 64-Bit systems, time_t can be much bigger (or smaller), covering a range of several hundred thousands of years, but if you're 32-Bit in 2038 you'll be in trouble: A second after January 19th, 2038, 03:14:07 you'll travel back in time and immediately find yourself in the middle of December 13th, 1901, 20:45:52 with a major headache called "overflow".

More details about this problem can be found on its Wikipedia page.

2038 could be today...

Well, you might say, I'll most probably be retired in 2038 anyway and of course, there won't be any 32-Bit systems that far in the future, so who cares?

A customer of mine cared. They run a very big file server infrastructure, based on Solaris, ZFS and a number of Sun Fire X4500 machines. A big infrastructure like this also has a large number of clients in many variations. And some of their clients have a huge problem with time: They create files with a date after 2040.

Now, the NFS standard will happily accept dates outside the 32-Bit time_t range and so will ZFS. But any program compiled in 32-Bit mode (and there are many) will run into an overflow error as soon as it wants to handle such a file. Incidentally, most of the Solaris file utilities (you know, rm, cp, find, etc.) are still shipped in 32-Bit, so having files 30+ years in the future is a big problem if you can't administer them.

The 64-Bit solution

One simple solution is to recompile your favourite file utilities, say, from GNU coreutils in 64-Bit mode, then put them into your path and hello future! You can do this by saying something like:

CC=/opt/SUNWspro/bin/cc CFLAGS=-m64 ./configure --prefix=/opt/local; make

(Use /opt/SunStudioExpress if you're using Sun Studio Express).

Now, while trying to reproduce the problem and sending some of my own files into the future, I found out thanks to Chris and his short "what happes if I try" DTrace script, that OpenSolaris already has a way to deal with these problems: ufs and ZFS just won't accept any dates outside the 32-Bit range any more (check out lines 2416-2428 in zfs_vnops.c). Tmpfs will, so at least I could test there on my OpenSolaris 2008.05 laptop.

That's one way to deal with it, but shutting the doors doesn't help our poor disoriented client of the future. And it's also only available in OpenSolaris, not Solaris 10 (yet).

The DTrace solution

So, I followed Ulrich's helpful suggestions and Chris' example and started to hack together a DTrace script of my own that would print out who is trying to assign a date outside of 32-Bit-time_t to what file, and another one that would fix those dates so files can still be accepted and dealt with the way sysadmins expect.

The first script is called "showbigtimes" and it does just that:

constant@foeni:~/file/projects/futurefile$ pfexec ./showbigtimes_v1.1.d 
dtrace: script './showbigtimes_v1.1.d' matched 7 probes
CPU     ID                    FUNCTION:NAME
  0  18406                  futimesat:entry 
UID: 101, PID: 2826, program: touch, file: blah
  atime: 2071 Jun 23 12:00:00, mtime: 2071 Jun 23 12:00:00

^C

constant@foeni:~/file/projects/futurefile$ /usr/bin/amd64/ls -al /tmp/blah
-rw-r--r--   1 constant staff          0 Jun 23  2071 /tmp/blah
constant@foeni:~/file/projects/futurefile$

Of course, I ran "/opt/local/bin/touch -t 207106231200 /tmp/blah" in another terminal to trigger the probe in the script above (that was a 64-Bit touch compiled from GNU coreutils).

A couple of non-obvious hoops needed to be dealt with:

  • To make the script thread-safe, all variables need to be prepended with self->.
  • There are two system calls that can change a file's date: utimes(2) and futimesat(2) (let me know if you know of more). The former is very handy, because we can steal the filename from it's second argument, but the latter also allows to just give it a file descriptor. If we want to see the file names for futimesat(2) calls, then we may need to figure them out from the file descriptor. I decided to create my own fd-to-filename table by tapping into open(2) and close(2) calls because chasing down the thread data structures or calling pfiles from within a D script would have been more tedious/ugly.
  • Depending on whether we are fired from utimes(2) or futimesat(2), the arguments we're interested in change place, i.e. the filename will come from arg0 in the case of utimes(2) or arg1 if futimesat(2) was used. To always get the right argument, we can use something like probefunc=="utimes"?argo:arg1.
  • We can't directly access arguments to system calls and manipulate them, we have to use copyin().

I hope the comments inside the script are helpful. Be sure to check out the DTrace Documentation, which was very useful to me.

The second script is called correctbigtimes.d and it not only alerts us of files being sent into the future, it automatically corrects the dates to the current date/time in order to prevent any time-travel outside the bounds of 32-Bit time_t at all:

constant@foeni:~/file/projects/futurefile$ pfexec ./correctbigtimes_v1.1.d 
dtrace: script './correctbigtimes_v1.1.d' matched 2 probes
dtrace: allowing destructive actions
CPU     ID                    FUNCTION:NAME
  0  18406                  futimesat:entry 
UID: 101, PID: 2844, program: touch, fd: 0, file: 
  atime: 2071 Jun 23 12:00:00, mtime: 2071 Jun 23 12:00:00
Corrected atime and mtime to: 2008 Jul  3 16:23:25

^C

constant@foeni:~/file/projects/futurefile$ ls -al /tmp/blah
-rw-r--r-- 1 constant staff 0 2008-07-03 16:23 /tmp/blah
constant@foeni:~/file/projects/futurefile$

As you can see, we enabled DTrace's destructive mode (of course only for constructive purposes) which allows us to change the time values on the fly and ensure a stable time continuum.

This time, I left out the code that created the file descriptor-to-filename table, because this script may potentially be running for a long time and I didn't want to consume preciuous memory for just a convenience feature (Otherwise we'd kept an extra table of all open files for all running threads in the syste,!). If we get a filename string, we print it, otherwise a file descriptor needs to suffice, we can always look it up through pfiles(1).

The actual time modification takes place inside our local variables, which then get copied back into the system call through copyout().

I hope you liked this little excursion into the year 2038, which can happen sooner than we think for some. To me, this was a great opportunity to dig a little deeper into DTrace, a powerful tool that shows us what's going on while enabling us to fix stuff on the fly.

Update: Ulrich had some suggestions and found a bug, so I updated both scripts to version 1.2:

  • It's better to use two individual probes for utimes(2) and futimesat(2) than placing if-then structures based on probefunc. Improves readability, less error-prone, more efficient.
  • The predicates are now much simpler due to using struct timeval arrays there already.
  • Introduced constants for LONG_MIN and LONG_MAX to improve readability of the predicates.
  • The filename table doesn't account for one process opening a file in one thread, then pass the fd to another thread. Therefore, it's better to have a 2-dimensional array with fd and pid as index that is local.
  • The +8 in the predicate to fetch was incorrect, +16 or sizeof(struct timeval) would have been correct. That's fixed by using the original structures right from the beginning at predicate time.

The new versions are already linked from above or available here: showbigtimes_v1.2.d, correctbigtimes_v1.2.d.

"Welcome to the year 2038!" has been brought to you by Constantin's Blooog.
This entry was created on 2008-07-03 08:18:35.0 PST and is associated with the following tags:

You're welcome to use this Permalink , add a comment below or send your feedback to constantin at sun dot com.




Archives
Subscribe to This Blog!
Most Popular Entries
Watch videos of Constantin
About this site
Links
Get in Touch!
This is Sun employee Constantin Gonzalez' personal blog.
All opinions expressed herein are solely of the author and do not necessarily reflect those of his employer.
If you want to contact the author, please send email to constantin (dot) gonzalez (at) sun (dot) com.
Thank you for reading this blog!