Sunday Oct 18, 2009

Nice review at Gizmodo, here, of VirtualBox.  Title is Virtualize Any OS for Free.  Check it out.

Monday Aug 17, 2009

Slides from last week's meeting of the Atlanta OpenSolaris User Group (ATLOSUG) are posted now on the group website - http://opensolaris.org/os/project/atl-osug

We had a good group of about 16 people in attendance and a great discussion around how and why to use COMSTAR. 

The next meeting will be held on Sept. 8.  The topic will be how COMSTAR and other OpenSolaris technologies fit together in the Sun Unified Storage family of products.  Hope to see you there!

Monday Jun 29, 2009

Pro OpenSolaris - Harry Foxwell and Christine Tran

Several (too many) weeks ago, I said that I was going to read and review Harry & Christine's new book, Pro OpenSolaris. Finally, I am getting around to doing this.

Overall, I was pleased with Pro OpenSolaris.  It does a good job at what it tries to do.  The key is to recognize when it is the right text and when others might be the right text.  Right in the Introduction, the authors are clear that this is an orientation tour.  They say "We assume that you are a professional system administrator ... and that your learning style needs only an orientation and in indication of what should be learned first in order to take advantage of OpenSolaris."  That's a good summary of the main direction of the book.  And at this, it does a very nice job!

This means that Pro OpenSolaris is not an exhaustive reference manual on all of the features and nuances of OpenSolaris.  Instead, it's a broad overview of what OpenSolaris is, how it got to be what it is, what is key features and differentiators are, and why I might choose to use OpenSolaris instead of some other system.  That's important to realize from the outset.  If you are looking for the thousand-page reference guide, this is not the one.  If you have heard about OpenSolaris and want to explore a bit more deeply, to decide whether or not OpenSolaris is something that might help your business or might be a tool you can use, this is a great place to start.
Pro OpenSolaris spends a good bit of time on the preliminaries.  There is an extensive section on the philosophical differences between the approaches and requirements of different open source licenses and styles of licenses.  Pro OpenSolaris explains clearly why OpenSolaris uses the CDDL license as opposed to other licenses and how this fits in with the overall goal of the OpenSolaris project.

Pro OpenSolaris helps you get started, with a lengthy discussion of how to go about installing OpenSolaris either on  bare metal or in a virtual machine.

Compare this to the OpenSolaris Bible (Solter, Jelinek, & Miner), which really does aspire to be the thousand-page reference guide.  In the OpenSolaris Bible, licensing and installation are given only a short discussion, since they are not central to the book's focus.  Instead, the reader is directed to other places for that discussion.

But that's why it's important to have both books.  Pro OpenSolaris gives the tour of the important parts of the OpenSolaris operating system, how and why I might use them, and why they are important, but it does not go deeply into the details.  That's probably wise for an operating system that is still growing and changing substantially with each new release.

One thing that particularly interested me in Pro OpenSolaris was the fact that it includes large sections on both the OpenSolaris Webstack which includes IPS-packaged versions of the commonly used pieces of an AMP stack - notably, Apache, MySQL, PHP, lighttpd, nginx, Ruby, Rails, etc - all compiled and optimized for OpenSolaris and including key add-ons such as DTrace providers where applicable.  Pro OpenSolaris also has a nice, long chapter on NetBeans and its role as a part of an overall OpenSolaris development environment.

What's my take overall?  Pro OpenSolaris is a quick read that will give you a good understanding of what OpenSolaris is and why you would want to use it; what it's key features are and why they are important; and how you can use these to your best advantage.  There are lots of examples and technical details so that you can see that what Harry & Christine talk about is for real.  I would recommend this as part of your library.  But I would also recommend the OpenSolaris Bible.  The two complement each other nicely to complete the picture.

Saturday Jun 13, 2009

Had a great Atlanta OpenSolaris User Group meeting this month.  We did an installfest, an update from CommunityOne, and a recap of what's new in OpenSolaris 2009.06.  About twenty folks showed up and about half loaded their laptops with the new build while we were there.

We got some great feedback for upcoming topics and are pushing forward with that.  We also decided to move back to monthly meetings starting in August.  Our next meeting is August 11 when we will talk about COMSTAR.  We are also considering a change in venue back to the Sun office in Alpharetta.  Matrix Resources has been very gracious in allowing us to use their facility, but I always feel bad that they have to have someone stick around until late at night to babysit us.

We're going to try an experiment to see if we can't get the word out a little better about our merry band via social networks.  We've started by creating a Meetup group at http://meetup.com/atlosug.  Hopefully this might generate more traffic to our meetings and help us find folks in the area.

Tuesday Jun 09, 2009

The keepers of the OpenSolaris Community took advantage of having a number of the User Group leaders at the CommunityOne conference this last week to set aside a day for a User Group Leaders' Bootcamp.

What a great opportunity to get together in the same room with folks working to create and sustain OpenSolaris user groups around the world! We had folks from every continent - from Atlanta and Argentina, from Dallas and Serbia, from China and London, and on and on. Something like twenty-five to thirty of the OpenSolaris User Groups were represented.

The whole day was a great experience. It was great to see that as different as each group was, there were a lot of common themes for both successes and for challenges. And a lot of great ideas were shared as to how to boost participation, to improve meetings, and to improve the success of the groups overall. It will be exciting to hear a report back next year on how these ideas have played out.

Be sure to check out Jim Grisanzio's photos to see some of these characters and what all went on at CommunityOne and in the OSUG Bootcamp.

Jeff Jackson, Sr. VP for Solaris Engineering, started the day off with a greeting and charge to get the most out of this opportunity to meet with each other and with the OpenSolaris and Solaris headquarters teams.

Since the thing that brought this group together was a common focus on OpenSolaris User Groups and not the fact that we knew each other, we began the day with a bit of team-building exercise, courtesy of The Go Game. This is a cross between a scavenger hunt and an improvisational acting class. Teams criss-crossed downtown San Francisco trying to find and photograph places hinted at by clues on web pages. At some venues, the teams had to act out and film various tasks. For example, on the Yerba Buena lawn, the team had to engage in an impromptu Tai Chi exercise in order to find their long-lost phys ed teacher, Ms. Karpanski, who then led the team in creating a new exercise video. Once we all returned, all of our submissions were voted on by the team and a winning team chosen. Supposedly, we can see all these photos and videos. Haven't yet found out how. Perhaps, that's for the best!

In order for us to get to know each other's groups, each User Group prepared a poster describing the group, where we were located, what we do, what sort of members make up the group, and what makes us special. Many of these posters were really well done! We had a bit of a scavenger hunt for answers to questions found by careful reading of all of the posters. It was really cool to see what sorts of projects some of the groups had undertaken and how they were working with various university or other organizations.

But the main part of the day was spent in a big brainstorming session. We all identified our successes, our failures, our challenges, and ideas for the future. We put all of these on several hundred post-it notes and placed them on large posters. We grouped them by topic and then went through all of these. Even though this only had an hour on the agenda, it ended up taking the bulk of the day. Since this was the most important thing for us, we decided to rearrange the day to accommodate it.

From these sticky-notes, we found out that some of our groups were mostly focused on administrators but others had a large developer population. We all have some sort of issues around meeting locations - whether it's a matter of access in the evening, finding a convenient location, or providing network access and power. For most groups, having some sort of refreshments was important, though some groups felt like good refreshments attracted too many folks who just show up for the food.

There were a lot of good ideas around using a registration site to get access to the facility and order food, creating and using Facebook, LinkedIn, and Twitter, using IRC, interacting with the Sun Campus Ambassadors, using MeetUp to find new members. Many folks found it useful to video and make available presentations given at their meetings. Some groups (for example in Japan) have special sub-groups for beginners. Other groups are doing large-scale development projects, such as the Belenix project in Bangalore.

For me and the Atlanta OpenSolaris User Group, I have a lot of new ideas that I want to put out to our membership and our leaders - move back to monthly meetings, use a registration site, set up a presence on various social networks.

Many people said that folks come to the user groups in order to network and expand their circle of business acquaintances. In light of the current economic situation, with so many smart people out of work, I am thinking of promoting our group with some of the job networking groups around Atlanta. For example, my church, Roswell United Methodist Church, has one of the largest job networking groups in the Atlanta area. Every two weeks, nearly 500 people meet to network and help each other in their job search. Perhaps the many IT folks in this group might find this a way to get current and stay current in a whole new area.

At any rate, I am inspired to get things cranking at ATLOSUG!

After spending the afternoon working through our hundreds of sticky notes, the OpenSolaris Governing Board had a bit of a roundtable with us to talk about what they do and how we can work better together. It was really helpful for me to hear from them and to get to put faces to some of the names for the folks I did not already know.

We finished out the evening with a great dinner at the Crab House at Pier 39. From what I have seen, many of the photos from dinner and the meeting are already on Facebook, Flickr, and likely blogs.sun.com. Jim Grisanzio, OpenSolaris Chief Photographer, was out in force with his camera!

Thanks so much to Teresa Giacomini, Lynn Rohrer, Dierdre Straughan, Jim Grisanzio, Tina Hartshorn, Wendy Ames, Kris Hake and everyone else who had a hand in organizing this event. Thanks to Jeff Jackson, Bill Franklin, Chris Armes, Dan Roberts and all the other HQ folks who took the time to come and listen and interact with the leaders of these groups. I know that I got a lot out of the meeting and am more eager than ever to promote and push forward with our user group.

Last week, I had the opportunity to attend CommunityOne West in San Francisco, along with a number of the other leaders of OpenSolaris User Groups. (I head up the Atlanta OpenSolaris User Group.) What a great meeting! Three days of OpenSolaris.

First off, I am sure that Teresa and the OpenSolaris team selected the Hotel Mosser because they knew it was a Solaris focused venue. As Dave Barry would say, I am not making this up! Even the toilet paper was Solaris-based. Bob Netherton and I were speculating that perhaps this was an example of Solaris Roll-Based Dump Management, new in OpenSolaris 2009.06.

CommunityOne Day One

Day One was a full day of OpenSolaris and related talks. The OpenSolaris teams maintained tracks around deploying OpenSolaris 2009.06 in the datacenter and around developing applications on OpenSolaris 2009.06. For the most part, I stuck with the operations-focused sessions, though I did step out into a few others. Some of the highlights included:

  • Peter Dennis and Brian Leonard's fun survey of what's new and exciting in OpenSolaris 2009.06. ATLOSUG folks should look for a reprise of this at our meeting on Tuesday.
  • Jerry Jelinek's discussion of the various virtualization techniques built into and onto OpenSolaris. This is a sort of talk that I give a lot. It was really helpful to hear how the folks in engineering approach this topic.
  • Scott Tracy & Dan Maslowski's COMSTAR discussion and demo. COMSTAR has been significantly expanded in recent builds, with more coolness still to come. I had not paid a lot of attention to this lately and this was a really helpful talk, especially since Teresa Giacomini had asked me to present this demo for the user group leaders on Wednesday. In any case, I have reproduced the iSCSI demo that Scott did using just VirtualBox, rather than requiring a server. Of course, the VB version is not something I would run my main storage server on. But it certainly is a great tool to understand the technology. I hope to have Ryan Matteson (Ryan, you volunteered!) give a talk at the ATLOSUG sometime soon.
  • I branched out of main OpenSolaris path to see a few other things on Day One, as well. Ken Pepple, Scott Mattoon, and John Stanford gave a good talk on Practical Cloud Patterns. They talked about some of the typical ways that people do provisioning, application deployment, and monitoring within the cloud.
  • Karsten Wade, "Community Gardener" at Red Hat, gave a talk called Participate or Die. This was about the importance of participating in the Open Source projects that are important to your business. He talked about understanding the difference in participating (perhaps, using open source code) and influencing (helping to guide the project). By paying more attention to those who actively participate, active members of the community enhance their status and become influencers of the direction for a project. And it is important that this happen - in successful projects, the roadmap is driven by the participants rather than handed down on high with the hope that people will line up behind it. Really, I think, his key message was that it is important not to just passively stand by when you care about or depend upon something, leaving its future in the hands of others.
  • Kevin Nilson and Michael Van Riper gave a great talk about building and maintaining a successful user group. This was built on their experiences with the Silicon Valley Java User Group and with the Google Technology User Group. They took a great approach by collecting videos from the leaders, hosts, and participants in these and other groups around the country. It was really helpful to hear people's perspectives on why they attend a group, why companies host group meetings, and why and how people continue to lead user groups. While a lot of what they had to say, and the successes that they have had, are a product of being in a very "target-rich environment" in Silicon Valley, it was interesting to see that some things are universal: a good location makes a lot of difference; having food matters. I got a lot of ideas from this and from the OpenSolaris User Group Bootcamp that I hope to get going in ATLOSUG.
  • OpenSolaris 2009.06 Launch Party finished out the evening. Dodgeball and the Extra Action Marching Band. I thought these folks were the hit of the evening. You get the best of marching bands, big drums, loud brass, but add to that folks flaying around, throwing themselves at the dodgeball court nets. Much more exciting than your regular marching band, even some of the cool ones around Atlanta in the Battle of the Bands!

CommunityOne Day Two

Day Two was filled with OpenSolaris Deep Dives. These were very helpful, not just in content, but in helping me to hone my own OpenSolaris presentations. For this day, I stuck close to the Deploying OpenSolaris track, having learned in graduate school that I am not a developer. This track included:

  • Chris Armes kicked off the day with a talk on deploying OpenSolaris in your Data Centre (as he spells it).
  • Becoming a ZFS Ninja, presented by Ben Rockwood. Ben is an early adopter and a production user of ZFS. This was a two-hour, fairly in-depth talk about ZFS and its capabilities.
  • Nick Solter, co-author of the OpenSolaris Bible, talked about OpenHA Cluster, newly released and available for OpenSolaris. With OpenHA, enterprise-level availability is not just available, but also supported. He talked about how the cluster works and about extensions to the OpenHA cluster beyond the capabilities of Solaris Cluster, based on OpenSolaris technologies. Some of these include the use of Crossbow VNICs for private interconnects. I am still thinking about the availability implications of this and am not sure it's an answer for all configurations. But it's cool that it's there!
  • Jerry Jelinek rounded out the day talking about Resource Management with Containers, a topic near and dear to my heart and one I end up presenting a lot.
We finished out Day Two with a reunion dinner of some of the old team at Bucca di Beppo. Around the table, we had Vasu Karunanithi, Dawit Bereket, Matt Ingenthron, Scott Dickson (me), Bob Netherton, Isaac Rosenfeld, and Kimberly Chang. It was great to get at least part of the old gang together and catch up.

Day Three was the OpenSolaris User Group Leaders Bootcamp. But that's for another post....

Monday May 25, 2009

Sun's Executive Briefing Center is on the road this week.  We are visiting with customers in Cleveland, Columbus, and Detroit.  Looks like a busy schedule and I am looking forward to the trip.  I was asked to fill in at the Solaris Virtualization speaker for this trip.

We fly to Cleveland and fly home from Detroit.  Kate has arranged a bus to get us from Cleveland to Columbus to Detroit.  My wife calls it Geeks on a Bus and thought it sounded too scary to contemplate!

We'll be talking about Sun's Vision, Systems, Software, OpenStorage, Solaris, Virtualization of Systems, Desktop Virtualization, and Services to support all of these.  Hope to see many of you there.

Between Redmeat and Cyanide & Happiness, one might think my sense of humor a little warped.  Check out today's Cyanide & Happiness, though.  I love comics and some of the online ones are my favorites.

Saturday May 09, 2009

Last week, I blogged about a Jumpstart Survey.  I've gotten good comments and some responses to the survey.  It's been a week, but I want to collect some more responses before posting an analysis.  Take a look at my previous blog and fill out the survey or comment on the blog.  I will summarize and report in another week or so.

I'm doing briefings on DTrace and Solaris Performance Tools this week in Atlanta, Ft. Lauderdale, and Tampa.  Click the links below to register if this is of interest and you can attend.  These are pretty much a 2 1/2 to 3 hour briefing that stays pretty technical with lots of examples.  

From the flyer:

Join us for our next Solaris 10 Technology Brief featuring DTrace.  DTrace, Solaris 10's powerful new framework for system observability, helps system administrators, capacity planners, and application developers improve performance and problem resolution. 

DATE: May 12, 2009
LOCATION: Classroom Resource Group, Atlanta
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session
DIRECTIONS: http://www.crgatlanta.com/directions.asp
REGISTER AT: http://www.suneventreg.com/cgi-bin/pup_registration.pl?EventID=2705

HOLLYWOOD, FL - May 13, 2009
LOCATION: Seminole Hardrock Hotel
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session
DIRECTIONS: http://www.seminolehardrockhollywood.com/getting_here/directions.php
REGISTER: http://www.suneventreg.com/cgi-bin/pup_registration.pl?EventID=2706

TAMPA, FL - May 14, 2009
LOCATION:  University of South Florida
TIME: 8:30 AM Registration, 9:00 am - 12:00 pm Session
DIRECTIONS: http://www.msc.usf.edu/directions.htm  
REGISTER:  https://www.suneventreg.com//cgi-bin/register.pl?EventID=2707

What You'll Learn?
You can't improve what you can't see and DTrace provides safe, production-quality, top to bottom observability - from the PHP application scripts down to the device drivers - without modifying applications or the system.  This seminar will introduce DTrace and the DTrace Toolkit as key parts of an overall Solaris performance and observability toolkit. 

AGENDA:
8:30 AM To 9:00 AM      Check In, Continental Breakfast
9:00 AM To 9:10 AM      Welcome
9:10 AM To 10:15 AM     Dtrace
10:15 AM To 10:30 AM    BREAK
10:30 AM To 11:30 AM    Dtrace Continued
11:30 AM To 12:00 PM    Wrap Up, Q&A, Evaluations

We look forward to seeing you at one of these upcoming Solaris 10 Dtrace sessions! 


Wednesday Apr 29, 2009

Jumpstart is the technology within Solaris that allows a system to be remotely installed across a network. This feature has been in the OS for a long, long time, dating to the start of Solaris 2.0, I believe. With Jumpstart, the system to be installed, the Jumpstart client, contacts a Jumpstart server to be installed across the network. This is a huge simplification, since there are nuances to how to set all of this up. Your best bet is to check the Solaris 10 Installation Guide: Network Based Installations and the Solaris 10 Installation Guide: Custom Jumpstart and Advanced Installations.

Jumpstart makes use of rules to decide how to install a particular system, based on its architecture, network connectivity, hostname, disk and memory capacity, or any of a number of other parameters. The rules select a profile that determines what will be installed on that system and where it will come from. Scripts can be inserted before and after the installation for further customization. To help manage the profiles and post-installation customization, Mike Ramchand has produced a fabulous tool, the Jumpstart Enterprise Toolkit (JET).

My Questions for You

As a long time Solaris admin, I have been a fan of Jumpstart for years and years. As an SE visiting many cool companies, I have seen people do really interesting things with Jumptstart. I want to capture how people use Jumpstart in the real world - not just the world of those who create the product. I know that people come up with new and unique ways of using the tools that we create in ways we would never imagine.

For example, I once installed 600 systems with SunOS 4.1.4 in less than a week using Jumpstart - remember that Jumpstart never supported SunOS 4.1.4.

But, I am not just looking for the weird stories. I want to know what Jumpstart features you use. I'll follow this up with extra, detailed questions around Jumpstart Flash, WAN Boot, DHCP vs. RARP. But I want to start with just some basics about Jumpstart.

Lacking a polling mechanism here at blogs.sun.com, you can just enter your responses as a comment. Or you can answer these questions at SurveyMonkey here. Or drop me a note at scott.dickson at sun.com.

  1. How do you install Solaris systems in your environment?
    1. I use Jumpstart
    2. I use DVD or CD media
    3. I do something else - please tell me about it
  2. Do you have a system for automating your jumpstart configurations?
    1. Yes, we have written our own
    2. Yes, we use JET
    3. Yes, we use xVM OpCenter
    4. No, we do interactive installations via Jumpstart. We just use Jumpstart to get the bits to the client.
  3. What system architectures do you support with Jumpstart?
    1. SPARC
    2. x86
  4. Do you use a sysidcfg file to answer the system identification questions - hostname, network, IP address, naming service, etc?
    1. No, I answer these interactively
    2. Yes, I hand-craft a sysidcfg file
    3. Yes, but it is created via the Jumpstart automation tools
  5. Do you use WANboot? I'll follow up with more questions on this at a later time.
    1. What's Wanboot?
    2. I have heard of it, but have never used it
    3. We rely on Wanboot
  6. Do you use Jumpstart Flash? More questions on this later, too
    1. Never heard of it
    2. We sometimes use Flash
    3. We live and breathe Flash
  7. What sort of rules do you include in your rules file?
    1. We do interactive installations and don't use a rules file
    2. We use the rules files generated by our automation tools, like JET
    3. We have a common rules file for all Jumpstarts based on hostname
    4. We use not only hostnames but also other parameters to determine which rule to use for installation
  8. Do you use begin scripts?
    1. No
    2. We use them to create derived profiles for installation
    3. We use them some other way
  9. Do you use finish scripts
    1. No
    2. We use the finish scripts created by our automation
    3. We use finish scripts to do some minor cleanup
    4. We do extensive post-installation customization via finish scripts. If so, please tell me about it.
  10. Do you customize the list of packages to be installed via Jumpstart?
    1. No
    2. Somewhat
    3. Not only do we customize the list of packages, but we create custom packages for our installation

Monday Apr 27, 2009

Just got my copy of Pro OpenSolaris by Harry Foxwell and Christine Tran in the mail today!  Can't wait to get a good look and post a review.  I wonder if I can get the authors to inscribe it to me!  

Also got a copy of OpenSolaris Bible by Nick Solter, Gerry Jelinek, and Dave Miner.  Looking forward into cracking into it as well.

Will post reviews shortly.

Friday Apr 03, 2009

After a really long and difficult week, we've lost a good friend in our house today.  Our Ernie passed away at 16.  Ten days ago, everything was good.  But, when he went in to get his teeth cleaned, they found a cancerous tumor in his lower jaw.  Kathleen and I are losing a friend, a member of our family.  He's like our baby.

We will so much miss him.  I know he didn't want to go now, either.

Ernie

Tuesday Dec 23, 2008

A Different Approach

A week or so ago, I wrote about a way to get around the current limitation of mixing flash and ZFS root in Solaris 10 10/08. Well, here's a much better approach.

I was visiting with a customer last week and they were very excited to move forward quickly with ZFS boot in their Solaris 10 environment, even to the point of using this as a reason to encourage people to upgrade. However, when they realized that it was impossible to use Flash with Jumpstart and ZFS boot, they were disappointed. Their entire deployment infrastructure is built around using not just Flash, but Secure WANboot. This means that they have no alternative to Flash; the images deployed via Secure WANBoot are always flash archives. So, what to do?

It occurred to me that in general, the upgrade procedure from a pre-10/08 update of Solaris 10 to Solaris 10 10/08 with a ZFS root disk is a two-step process. First, you have to upgrade to Solaris 10 10/08 on UFS and then use lucreate to copy that environment to a new ZFS ABE. Why not use this approach in Jumpstart?

Turns out that it works quite nicely. This is a framework for how to do that. You likely will want to expand on it, since one thing this does not do is give you any indication of progress once it starts the conversion. Here's the general approach:

  • Create your flash archive for Solaris 10 10/08 as you usually would. Make sure you include all the appropriate LiveUpgrade patches in the flash archive.
  • Use Jumpstart to deploy this flash archive to one disk in the target system.
  • Use a finish script to add a conversion program to run when the system reboots for the first time. It is necessary to make this script run once the system has rebooted so that the LU commands run within the context of the fully built new system.

Details of this approach

Our goal when complete is to have the flash archive installed as it always has been, but to have it running from a ZFS root pool, preferably a mirrored ZFS pool. The conversion script requires two phases to complete this conversion. The first phase creates the ZFS boot environment and the second phase mirrors the root pool. The following in this example, our flash archive is called s10u6s.flar. We will install the initial flash archive onto the disk c0t1d0 and built our initial root pool on c0t0d0.

Here is the Jumpstart profile used in this example:


install_type    flash_install
archive_location nfs nfsserver:/export/solaris/Solaris10/flash/s10u6s.flar
partitioning    explicit
filesys         c0t1d0s1        1024    swap
filesys         c0t1d0s0        free    /

We specify a simple finish script for this system to copy our conversion script into place:

cp ${SI_CONFIG_DIR}/S99xlu-phase1 /a/etc/rc2.d/S99xlu-phase1

You see what we have done: We put a new script into place to run at the end of rc2 during the first boot. We name the script so that it is the last thing to run. The x in the name makes sure that this will run after other S99 scripts that might be in place. As it turns out, the luactivate that we will do puts its own S99 script in place, and we want to come after that. Naming ours S99x makes it happen later in the boot sequence.

So, what does this magic conversion script do? Let me outline it for you:

  • Create a new ZFS pool that will become our root pool
  • Create a new boot environment in that pool using lucreate
  • Activate the new boot environment
  • Add the script to be run during the second phase of the conversion
  • Clean up a bit and reboot

That's Phase 1. Phase 2 has its own script to be run at the same time that finishes the mirroring of the root pool. If you are satisfied with a non-mirrored pool, you can stop here and leave phase 2 out. Or you might prefer to make this step a manual process once the system is built. But, here's what happens in Phase 2:

  • Delete the old boot environment
  • Add a boot block to the disk we just freed. This example is SPARC, so use installboot. For x86, you would do something similar with installgrub.
  • Attach the disk we freed from the old boot environment as a mirror of the device used to build the new root zpool.
  • Clean up and reboot.

I have been thinking it might be worthwhile to add a third phase to start a zpool scrub, which will force the newly attached drive to be resilvered when it reboots. The first time something goes to use this drive, it will notice that it has not been synced to the master drive and will resilver it, so this is sort of optional.

The reason we add bootability explicitly to this drive is because currently, when a mirror is attached to a root zpool, a boot block is not automatically installed. If the master drive were to fail and you were left with only the mirror, this would leave the system unbootable. By adding a boot block to it, you can boot from either drive.

So, here's my simple little script that got installed as /etc/rc2.d/S99xlu-phase1. Just to make the code a little easier for me to follow, I first create the script for phase 2, then do the work of phase 1.


cat > /etc/rc2.d/S99xlu-phase2 << EOF
ludelete -n s10u6-ufs
installboot -F zfs /usr/platform/`uname -i`/lib/fs/zfs/bootblk /dev/rdsk/c0t1d0s0
zpool attach -f rpool c0t0d0s0 c0t1d0s0
rm /etc/rc2.d/S99xlu-phase2
init 6
EOF
dumpadm -d swap
zpool create -f rpool c0t0d0s0
lucreate -c s10u6-ufs -n s10u6 -p rpool
luactivate -n s10u6
rm /etc/rc2.d/S99xlu-phase1
init 6

I think that this is a much better approach than the one I offered before, using ZFS send. This approach uses standard tools to create the new environment and it allows you to continue to use Flash as a way to deploy archives. The dependency is that you must have two drives on the target system. I think that's not going to be a hardship, since most folks will use two drives anyway. You will have to keep then as separate drives rather than using hardware mirroring. The underlying assumption is that you previously used SVM or VxVM to mirror those drives.

So, what do you think? Better? Is this helpful? Hopefully, this is a little Christmas present for someone! Merry Christmas and Happy New Year!

Friday Dec 05, 2008

Ancient History

Gather round kiddies and let Grandpa tell you a tale of how we used to to clone systems before we had Jumpstart and Flash, when we had to carry water in leaky buckets 3 miles through snow up to our knees, uphill both ways.

Long ago, a customer of mine needed to deploy 600(!) SPARCstation 5 desktops all running SunOS 4.1.4. Even then, this was an old operating system, since Solaris 2.6 had recently been released. But it was what their application required. And we only had a few days to build and deploy these systems.

Remember that Jumpstart did not exist for SunOS 4.1.4, Flash did not exist for Solaris 2.6. So, our approach was to build a system, a golden image, the way we wanted to be deployed and then use ufsdump to save the contents of the filesystems. Then, we were able to use Jumpstart from a Solaris 2.6 server to boot each of these workstations. Instead of having a Jumpstart profile, we only used a finish script that partitioned the disks and restored the ufsdump images. So Jumpstart just provided us clean way to boot these systems and apply the scripts we wanted to them.

Solaris 10 10/08, ZFS, Jumpstart and Flash

Now, we have a bit of a similar situation. Solaris 10 10/08 introduces ZFS boot to Solaris, something that many of my customers have been anxiously awaiting for some time. A system can be deployed using Jumpstart and the ZFS boot environment created as a part of the Jumpstart process.

But. There's always a but, isn't there.

But, at present, Flash archives are not supported (and in fact do not work) as a way to install into a ZFS boot environment, either via Jumpstart or via Live Upgrade. Turns out, they use the same mechanism under the covers for this. This is CR 6690473.

So, how can I continue to use Jumpstart to deploy systems, and continue to use something akin to Flash archives to speed and simplify the process?

Turns out the lessons we learned years ago can be used, more or less. Combine the idea of the ufsdump with some of the ideas that Bob Netherton recently blogged about (Solaris and OpenSolaris coexistence in the same root zpool), and you can get to a workaround that might be useful enough to get you through until Flash really is supported with ZFS root.

Build a "Golden Image" System

The first step, as with Flash, is to construct a system that you want to replicate. The caveat here is that you use ZFS for the root of this system. For this example, I have left /var as part of the root filesystem rather than a separate dataset, though this process could certainly be tweaked to accommodate a separate /var.

Once the system to be cloned has been built, you save an image of the system. Rather than using flarcreate, you will create a ZFS send stream and capture this in a file. Then move that file to the jumpstart server, just as you would with a flash archive.

In this example, the ZFS bootfs has the default name - rpool/ROOT/s10s_u6wos_07.


golden# zfs snapshot rpool/ROOT/s10s_u6wos_07@flar
golden# zfs send -v rpool/ROOT/s10s_u6wos_07@flar > s10s_u6wos_07_flar.zfs
golden# scp s10s_u6wos_07_flar.zfs js-server:/flashdirectory

How do I get this on my new server?

Now, we have to figure out how to have this ZFS send stream restored on the new clone systems. We would like to take advantage of the fact that Jumpstart will create the root pool for us, along with the dump and swap volumes, and will set up all of the needed bits for the booting from ZFS. So, let's install the minimum Solaris set of packages just to get these side effects.

Then, we will use Jumpstart finish scripts to create a fresh ZFS dataset and restore our saved image into it. Since this new dataset will contain the old identity of the original system, we have to reset our system identity. But once we do that, we are good to go.

So, set up the cloned system as you would for a hands-free jumpstart. Be sure to specify the sysid_config and install_config bits in the /etc/bootparams. The manual Solaris 10 10/08 Installation Guide: Custom JumpStart and Advanced Installations covers how to do this. We add to the rules file a finish script (I called mine loadzfs in this case) that will do the heavy lifting. Once Jumpstart installs Solaris according to the profile provided, it then runs the finish script to finish up the installation.

Here is the Jumpstart profile I used. This is a basic profile that installs the base, required Solaris packages into a ZFS pool mirrored across two drives.


install_type    initial_install
cluster         SUNWCreq
system_type     standalone
pool            rpool auto auto auto mirror c0t0d0s0 c0t1d0s0
bootenv         installbe bename s10u6_req

The finish script is a little more interesting since it has to create the new ZFS dataset, set the right properties, fill it up, reset the identity, etc. Below is the finish script that I used.


#!/bin/sh -x

# TBOOTFS is a temporary dataset used to receive the stream
TBOOTFS=rpool/ROOT/s10u6_rcv

# NBOOTFS is the final name for the new ZFS dataset
NBOOTFS=rpool/ROOT/s10u6f

MNT=/tmp/mntz
FLAR=s10s_u6wos_07_flar.zfs
NFS=serverIP:/export/solaris/Solaris10/flash

# Mount directory where archive (send stream) exists
mkdir ${MNT}
mount -o ro -F nfs ${NFS} ${MNT}

# Create file system to receive ZFS send stream &
# receive it.  This creates a new ZFS snapshot that
# needs to be promoted into a new filesystem
zfs create ${TBOOTFS}
zfs set canmount=noauto ${TBOOTFS}
zfs set compression=on ${TBOOTFS}
zfs receive -vF ${TBOOTFS} < ${MNT}/${FLAR}

# Create a writeable filesystem from the received snapshot
zfs clone ${TBOOTFS}@flar ${NBOOTFS}

# Make the new filesystem the top of the stack so it is not dependent
# on other filesystems or snapshots
zfs promote ${NBOOTFS}

# Don't automatically mount this new dataset, but allow it to be mounted
# so we can finalize our changes.
zfs set canmount=noauto ${NBOOTFS}
zfs set mountpoint=${MNT} ${NBOOTFS}

# Mount newly created replica filesystem and set up for
# sysidtool.  Remove old identity and provide new identity
umount ${MNT}
zfs mount ${NBOOTFS}

# This section essentially forces sysidtool to reset system identity at
# the next boot.
touch /a/${MNT}/reconfigure
touch /a/${MNT}/etc/.UNCONFIGURED
rm /a/${MNT}/etc/nodename
rm /a/${MNT}/etc/.sysIDtool.state
cp ${SI_CONFIG_DIR}/sysidcfg /a/${MNT}/etc/sysidcfg

# Now that we have finished tweaking things, unmount the new filesystem
# and make it ready to become the new root.
zfs umount ${NBOOTFS}
zfs set mountpoint=/ ${NBOOTFS}
zpool set bootfs=${NBOOTFS} rpool

# Get rid of the leftovers
zfs destroy ${TBOOTFS}
zfs destroy ${NBOOTFS}@flar

When we jumpstart the system, Solaris is installed, but it really isn't used. Then, we load from the send stream a whole new OS dataset, make it bootable, set our identity in it, and use it. When the system is booted, Jumpstart still takes care of updating the boot archives in the new bootfs.

On the whole, this is a lot more work than Flash, and is really not as flexible or as complete. But hopefully, until Flash is supported with a ZFS root and Jumpstart, this might at least give you an idea of how you can replicate systems and do installations that do not have to revert back to package-based installation.

Many people use Flash as a form of disaster recover. I think that this same approach might be used there as well. Still not as clean or complete as Flash, but it might work in a pinch.

So, what do you think? I would love to hear comments on this as a stop-gap approach.

Thursday Sep 25, 2008

I just need to take a minute to brag on my wife, Kathleen.  She has taken over as the local coordinator for our food pantry for America's Second Harvest, now called Feeding America.  She coordinates the couple of dozen volunteers who glean extra food from the local restaurants and groceries and bring it all back to our food pantry, North Fulton Community Charities. It's amazing how much these places would just discard as leftovers at the end of the day or as they restock the shelves with newer product.

Since she took this on, she has done some really cool stuff.  She has started to recruit volunteers from among the people who receive food from the pantry that want to give back to the community in gratitude.  She has gotten the pantry to start collecting and distributing pet food for the families who need groceries as well, so that they can continue to look after their pets.  Now, she has started working with some local folks who make decorative cut fruit arrangements  to provide fresh fruit to the pantry.  That's something that really makes a difference to the people who are receiving the food subsidies and groceries from the pantry.

I have to say that I am right proud of her for all of this.  And I would encourage folks to get involved with their local charities.  Go to the Feeding America page to find out what opportunities there are in your area.  It really can make a difference to so many people.

Monday Jul 28, 2008

Today is my first day back at Sun.

I am excited to be back at Sun.  We have a new group of folks focused on Solaris and OpenSolaris.  Now, we need to get our heads together and put together a bit of a business plan for the team.

I am sure that the next few months will be hugely busy and exciting!

Thanks to Hal and everyone who made this possible.


Friday Jul 11, 2008

I've not been terribly faithful about blogging here.  Once in a while, but this is worth saying.

I am part of the great exodus going on this week from Sun.  I was notified yesterday that my position has been eliminated. 

This has been a great 13 1/2 year ride.  Sun has had great peaks and great valleys in that time.  But through it all, it has been a top-notch place to be.

Of the things I have done at Sun, I am proudest of being associated with two groups: Dawit Bereket's Solaris team for the last three years and the OS Ambassador program for the last 13 years.  These are both groups of the top flight of Solaris folks in the field, and folks who all wear the SUNW (oops, JAVA) hat, rather than the hat of any parochial group or division.

So, for now, I'm signing off.  But hope to be back soon.



Friday Dec 21, 2007

Being a long time Sun and Solaris guy, it's not often that I step up to say "Wow, Microsoft did something good."  But this time I want to.

Recently, a good friend's son returned home from a tour of duty with the Air Force in Iraq.  As the plane unloaded in Baltimore, there was a representative from Microsoft handing each of the servicemen and women a fully tricked out Zune, accessories, speakers - the whole nine yards. 

There were no cameras, no press releases, no publicity.  Just a nice gesture for these men and women who had been away from home and family doing something that, even though they trained and prepared for it, they would just as soon not have to do.

Thanks for this nice gesture, Microsoft.

Wednesday Oct 10, 2007

So, CEC is actually almost over. It's been a whirlwind of sessions, meet-ups with folks, filling my head with new stuff. And, of course in the midst of all of the CEC excitement, there's still the need to keep up with what customers back home need.

So, what was exciting from days one and two? Lots!

 

  • Jon Haslam and Simon Ritter gave a great talk and demo about using DTrace along with Java. I am absolutely not a developer; never even written "Hello World" in Java. But, this really helped me understand how DTrace and Java are two great tastes that go great together. And with the newer JVMs, it really is a case of "Hey, you got your DTrace in my Java!", "No, you got your Java in my DTrace!" This all comes at a great time -- I have to do a presentation on Wednesday in Florida on exactly this topic.
  • Matt Ingenthron and Shanti gave a great talk about the various working parts and commonly used components and tools in a modern web infrastructure. Really helped me figure out how the pieces fit together.
  • Tim Cook had a great talk comparing the various file system offerings from Sun and others for OLTP workloads on large systems. He gave us some handy, simple, best practices for each and worked to bust some commonly held myths and misconceptions.
  • Tim Bray shared his perspective on what really is important about a Web 2.0 world, about how the things in that world can really matter to an enterprise. He talked about the fact that, end the end, time to market and managability are the overwhelming priorities for enterprises in selecting tools and techniques for application development and deployment. I am really inspired to go out and finally learn more about Ruby and Rails as a result.
Of course, there were more. These are just some of the highlights that come to mind quickly. As always, CEC was a great trip and well worth the effort (but I still dislike Las Vegas - a lot). And like Juan Antonio Samaranch at the Olypics, this CEC is about to be declared over, realized to be the best yet, and we will agree to meet again next year. I, for one am looking forward to it. Time to start working on a topic for my presentation!

Monday Oct 08, 2007

Last week, I spent the week in our OS Ambassador meeting in Menlo Park. More on that later. This week is CEC in Las Vegas. So, I ended up spending the weekend in Las Vegas.

This Sunday was World Communion Sunday, where Christians all over the world all celebrate Holy Communion on the same day. For me, this is always a powerful statement of the universality of the church. Being on the Strip in Las Vegas, getting to church is a struggle. But, Web 2.0 to the rescue. Google Maps found for me the University United Methodist Church, across the street from UNLV.

Google Maps told me it was 2.7 miles from the hotel to the church, so off I went. Even with the long walk there (and back!), I was really glad that I went. Lovely, small church, but very nice people, and a service that left me thinking hard all the way back to the hotel.

The text was Luke 17:5-10. The first part of this passage is a familiar one, but the second part is a hard, hard saying. But more than that, I pondered all the way home the word "rehearse" in the liturgy. I think there's a lot there to think about still.

So, it's the first day of CEC, Sun's Customer Engineering Conference. This year, there are about 4000 of us hanging out at Paris & Bally's hotels in Las Vegas. Systems Engineers, folks from Sun's various practices, Service & Support engineers, architects, folks from headquarters engineering are all here. But, we also have a huge number of our partners - resellers, OEMs, developers, etc.

Last night was our Networking Reception. Great to see folks again that I had not seen in a while and to meet lots of new faces.

Today, we start with opening sessions from Hal Stern, Dan Berg, Jim Baty, and a host of others. Then, we get into, for me, the guts of CEC - the breakout sessions. There are over 240 sessions, selected from a pool of over 700 submissions. I'm talking (Tuesday, 6PM, Versailles ballroom 3 & 4) on Dynamic Resource Pools in Solaris 10. I'll post my slides after the talk. If you are at the conference, come on over. I understand my talk will also be available in Second Life. I'm still trying to figure out how all of that works, though.

Here are some of my initial + and - observations from CEC so far:

  • Plus - Paris is great. Very lovely hotel. The look really captures all that you might remember and love from Paris.
  • Plus - I scored a deluxe room - corner room, view of the Bellagio fountains, windows on two walls.
  • Plus - Check-in logistics. Got through even the really long materials line in less than 10 minutes.
  • Plus - Networking Reception - Food was good and plentiful. Double plus for the desserts. Great to see folks. Last year, I missed the reception since I got in late.
  • Plus / Minus - In-room network. Fastest hotel network I have had in as long as I can recall. But it costs $13/day.
  • Minus - Room for meals was really, really, really crowded for breakfast. I can only imagine as folks try to rush through for lunch. And no sodas. Last year, folks finally got it that geeks often take their caffeine in a carbonated form.
  • Minus - Having the agenda only on-line via schedule builder has made it sort of inconvenient to select sessions, alter you plans, and pick new things on the fly. Same as last year. Sometimes paper really is useful.
  • Minus - Smoke - Las Vegas is smoky. Seems that they are managing it better now than in years past, but in these days of smoke-free public spaces, it's really noticeable.
  • Big Minus - For me, Las Vegas is absolutely not my top choice for a venue. For me, this is a very uncomfortable place. Maybe I'm just a stick in the mud or a prude or old in my thinking, but this town is just about too many things that really make me uncomfortable.

All in all, though I am excited about a great conference and expect to be really tired when I get home!

Jason Calacanis has posted his "official" definition of Web 3.0. He says "Web 3.0 is defined as the creation of high-quality content and services produced by gifted individuals using Web 2.0 technology as an enabling platform."

The same day I saw this, I also saw, on Keith Bostic's fabulous /dev/null mailing list, a link to Cracked.com's The 8 Most Needlessly Detailed Wikipedia Entries. Even though all of these folks are clearly authorities in their field, are we really getting the "wisdom of the crowd"? Geek and Poke gets it pretty right.

Sun's Customer Engineering Conference is going on this week in Las Vegas. As a result, we've had to cancel our October meeting of ATLOSUG - We're all in Las Vegas.

Sorry for the inconvenience. We will pick up with our meetings in November. Ryan Matteson, from Ning, will be our speaker. Should be a really good meeting. Details on the topic to follow.

Tuesday Jun 26, 2007

In between Solaris workshops, I got to take a week off and go canoeing with my dad.  We had planned to go to the Okefenokee Swamp, but the fires in south Georgia and northern Florida pretty much made that impossible.  So, we just bummed around instead, going over to Coldwater Creek in northwest Florida, and then over to Wakulla River, south of Tallahassee.

I have to say that the Wakulla River, with its headwaters in Wakulla Springs State Park, is way cool!  This river is fed by a spring that pumps out 250 million gallons of water per day.  Crystal clear.  At the spring, you can see the bottom at 125 feet!  There are mastadon bones on the bottom from when either the cave that supplies the spring was dry, or when the furry brute fell in.

There is a fabulous site that talks about the spring, its geology, the land around it, etc. here.

At the state park, there is a lodge, built in the 1920's, formerly frequented by Johnny Weissmuller of Tarzan fame.  In fact, several of the original Tarzan films were shot here.  As well as the Creature from the Black Lagoon.  The lodge looks like a great place to stay - very Art Deco and ornate and old.

But we were there to canoe. 

This river has its fair share of wild life.  There are turtles, wading birds, osprey and birds of prey, mullet leaping, and even manatees.  And there are alligators.  Lots of them.  Weird thing is that there's swimming right next to the prime alligator areas.  They seem to hang out in the marshy edges right around the spring itself.  Maybe they are waiting for an unsuspecting teenager to wander too close.

  

 And we found our share of alligators, small and large, as we paddled the river.  Dad was in the front taking pictures and my job was to paddle and put him where he could get good pictures.  So, we got really close to this one.  It was about 8 or 9 feet long, and we got as close as maybe 8 feet to it.  I would have gone closer, but there was a log I couldn't get the boat over.  I figured that I was okay.  It's like the old story of the guy running from the bear.  I didn't have to be so far from the alligator that it couldn't get me, just father away than Dad!   I'm working hard to get back to Wakulla River, this time so I can be on the river before light and after dark to really see what goes on on the river.  If you're looking for a great place to escape from most everything on the Florida Gulf Coast, Wakulla River, Wakulla Springs State Park should be on your list.

 


 

I feel like that Johnny Cash song (which I think maybe Jimmy Rodgers did first - can't recall for sure).  Seems like for the last several months, I've been on the road doing Solaris bootcamps, best practices workshops, and all sorts of other things Solaris.  I've seen a lot of interesting places and met lots of interesting folks.  Just the last few weeks, I've been to:
  • Bismarck, ND, Sioux Falls, SD, Fargo, ND for University Solaris Bootcamps.  Got to see lots of that area driving from one to another across the secondary highways.  Thanks to Greg Stromme from Applied Engineering, Sun's reseller partner in that geography, for driving me and showing places I'd never been before.  We saw the homeplaces of Lawrence Welk and Laura Ingalls Wilder, plus lots of wide-open territory
  • Conway, Arkansas for Solaris resource management workshop.  Got to see a cousin in Russellville this trip.
  • Austin, Texas for Solaris virtualization workshop.
  • Baton Rouge, LA for University Solaris bootcamp - Got to see a cousin here, too
  • Huntsville, AL for various Solaris briefings
And that's just the last six weeks!  I'm kind of thankful for the end of the quarter and the year coming up.  I have no tickets booked until the end of July right now!


Powered by ScribeFire.

Friday Mar 23, 2007

White Flag of Spring Spring has arrived!  My first iris are blooming right on schedule, actually a couple of days early.  The White Flags of Spring, as my grandmother called them, bloomed on the first day of spring, Tuesday of this week.  These are a small, only about 14" high, white iris.  The always bloom before the first day of April and this year was no exception.

I am looking forward to a pretty good crop of iris this year, I think.  I just cleaned out the winter cruft.  It looks like this is the year to dig up several of the beds, split them, give them away, and replant.  I think I will get a couple of yards of new good dirt to work in with them, too.  It looks like everything is just sand anymore in the beds.

 

I hope that the purple and bronze iris that my grandmother hybridized come back.  I didn't see any last year, so I am afraid I have lost those.  But I still have so many of hers that every time I go out I remember being at my grandmother's house in the springtime, having Easter egg hunts among the iris, and the sweet smell of the flowers everywhere. 

For some reason, I have the Indigo Girls song Southland in the Springtime running through my head about now.  Just call me a sentimental old softie.....

 

 

 

 


Wednesday Jan 24, 2007

I am amazed and awed by all of the folks on BSC who are able to contribute great content *and* get their jobs done!  I find that even when I want to share something, there just don't seem to be enought hours in the day to get the job done, talk to & support the customers, and then to put something together that makes enough sense to share.

 How do you guys do it?  Or do you never sleep?
 

Friday Dec 01, 2006

Continuing with some of the ideas around zvols, I wondered about UFS on a zvol.  On the surface, this appears to be sort of redundant and not really very sensible.  But thinking about it, there are some real advantages.

  • I can take advantage of the data integrity and self-healing features of ZFS since this is below the filesystem layer.
  • I can easily create new volumes for filesystems and grow existing ones
  • I can make snapshots of the volume, sharing the ZFS snapshot flexibility with UFS - very cool
  • In the future, I should be able to do things like have an encrypted UFS (sort-of) and secure deletion

Creating UFS filesystems on zvols

Creating a UFS filesystem on a zvol is pretty trivial.  In this example, we'll create a mirrored pool and then build a UFS filesystem in a zvol.

bash-3.00# zpool create p mirror c2t10d0 c2t11d0 mirror c2t12d0 c2t13d0
bash-3.00# zfs create -V 2g p/v1
bash-3.00# zfs list
NAME     USED  AVAIL  REFER  MOUNTPOINT
p       4.00G  29.0G  24.5K  /p
p/v1    22.5K  31.0G  22.5K  -
bash-3.00# newfs /dev/zvol/rdsk/p/v1
newfs: construct a new file system /dev/zvol/rdsk/p/v1: (y/n)? y
Warning: 2082 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/p/v1:    4194270 sectors in 683 cylinders of 48 tracks, 128 sectors
        2048.0MB in 43 cyl groups (16 c/g, 48.00MB/g, 11648 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 98464, 196896, 295328, 393760, 492192, 590624, 689056, 787488, 885920,
3248288, 3346720, 3445152, 3543584, 3642016, 3740448, 3838880, 3937312,
4035744, 4134176
bash-3.00# mkdir /fs1
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# df -h /fs1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G   2.0M   1.9G     1%    /fs1

Nothing much to it. 

Growing UFS filesystems on zvols

But, what if I run out of space?  Well, just as you can add disks to a volume and grow the size of the volume, you can grow the size of a zvol.  Now, since the UFS filesystem is a data structure inside zvol container, you have to grow it as well.  Were I using just zfs, the size of the file system would grow and shrink dynamically with the size of the data in the file system.  But  a UFS has a fixed size, so it has to be expanded manually to accomodate the enlarged volume.  Now, this seems to have quite working between b45 and b53, so I just filed a bug on this one.

bash-3.00# uname -a
SunOS atl-sewr-158-154 5.11 snv_45 sun4u sparc SUNW,Sun-Fire-480R
bash-3.00# zfs create -V 1g bsd/v1
bash-3.00# newfs /dev/zvol/rdsk/bsd/v1
...
bash-3.00# zfs set volsize=2g bsd/v1
bash-3.00# growfs /dev/zvol/rdsk/bsd/v1
Warning: 2048 sector(s) in last cylinder unallocated
/dev/zvol/rdsk/bsd/v1:  4194304 sectors in 683 cylinders of 48 tracks, 128 sectors
        2048.0MB in 49 cyl groups (14 c/g, 42.00MB/g, 20160 i/g)
super-block backups (for fsck -F ufs -o b=#) at:
32, 86176, 172320, 258464, 344608, 430752, 516896, 603040, 689184, 775328,
3359648, 3445792, 3531936, 3618080, 3704224, 3790368, 3876512, 3962656,
4048800, 4134944

What about compression? 

Along the same lines as growing the file system, I suppose you could turn compression on for the zvol.  But since the UFS is of fixed size, it won't help especially, as far as fitting more data in the file system.  You can't put more into the filesystem than the filesystem thinks that it can hold.  Even if it isn't using that much on the disk.  Here's a little demonstration of that.

First, we will loop through, creating 200MB files in a 1GB file system with no compression.  We will use blocks of zeros, since these will compress quite a bit the second time round. 

bash-3.00# zfs create -V 1g p/v1
bash-3.00# zfs get used,volsize,compressratio p/v1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           22.5K    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
bash-3.00# newfs /dev/zvol/rdsk/p/v1
...
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00#
bash-3.00# for f in f1 f2 f3 f4 f5 f6 f7 ; do
> dd if=/dev/zero bs=1024k count=200 of=/fs1/$f
> df -h /fs1
> zfs get used,volsize,compressratio p/v1
> done

200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   201M   703M    23%    /fs1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           62.5M    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   401M   503M    45%    /fs1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           149M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   601M   303M    67%    /fs1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           377M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     962M   801M   103M    89%    /fs1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           497M     -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
dd: unexpected short write, wrote 507904 bytes, expected 1048576
161+0 records in
161+0 records out
Dec  1 14:53:04 atl-sewr-158-122 ufs: NOTICE: alloc: /fs1: file system full

bash-3.00# zfs get used,volsize,compressratio p/v1
NAME  PROPERTY       VALUE    SOURCE
p/v1  used           1.00G    -
p/v1  volsize        1G       -
p/v1  compressratio  1.00x    -
bash-3.00#

So, you see that it fails as it writes the 5th 200MB chunk, which is what you would expect.  Now, let's do the same thing with compression turned on for the volume.

bash-3.00# zfs create -V 1g p/v2
bash-3.00# zfs set compression=on p/v2
bash-3.00# newfs /dev/zvol/rdsk/p/v2
...
bash-3.00#
bash-3.00# mount /dev/zvol/dsk/p/v2 /fs2
bash-3.00# for f in f1 f2 f3 f4 f5 f6 f7 ; do
> dd if=/dev/zero bs=1024k count=200 of=/fs2/$f
> df -h /fs2
> zfs get used,volsize,compressratio p/v2
> done
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   201M   703M    23%    /fs2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           8.58M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.65x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   401M   503M    45%    /fs2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           8.58M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.65x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   601M   303M    67%    /fs2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           8.83M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.50x    -
200+0 records in
200+0 records out
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   801M   103M    89%    /fs2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           8.83M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.50x    -
dd: unexpected short write, wrote 507904 bytes, expected 1048576
161+0 records in
161+0 records out
Dec  1 15:16:42 atl-sewr-158-122 ufs: NOTICE: alloc: /fs2: file system full

bash-3.00# zfs get used,volsize,compressratio p/v2
NAME  PROPERTY       VALUE    SOURCE
p/v2  used           9.54M    -
p/v2  volsize        1G       -
p/v2  compressratio  7.07x    -
bash-3.00# df -h /fs2
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v2     962M   962M     0K   100%    /fs2
bash-3.00#

This time, even though the volume was not using much space at all, the file system was full.  So compression in this case is especially valuable from a space management standpoint.  Depending on the contents of the filesystem, compression may still help the performance by converting multiple I/Os into single or fewer I/Os, though.

The Cool Stuff - Snapshots and Clones with UFS on Zvols

One of the things that is not available in UFS is the ability to create multiple snapshots quickly and easily.  The fssnap(1M) command allows me to create a single, read-only snapshot of a UFS file system.  In addition, it requires an additional location to maintain backing store for files changed or deleted in the master image during the lifetime of  the snapshot.

ZFS offers the ability to create many snapshots of a ZFS filesystem quickly and easily.  This ability extends to zvols, as it turns out.

For this example, we will create a volume, fill it up with some data and then play around with taking some snapshots of it.  We will just tar over the Java JDK so there are some files in the file system. 

bash-3.00# zfs create -V 2g p/v1
bash-3.00# newfs /dev/zvol/rdsk/p/v1
...
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# tar cf -  ./jdk/ | (cd /fs1 ; tar xf - )
bash-3.00# df -h /fs1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G   431M   1.5G    23%    /fs1
bash-3.00# zfs list
NAME     USED  AVAIL  REFER  MOUNTPOINT
p       4.00G  29.0G  24.5K  /p
p/swap  22.5K  31.0G  22.5K  -
p/v1     531M  30.5G   531M  -

Now, we will create a snapshot of the volume, just like for any other ZFS file system.  As it turns out, this creates new device nodes in /dev/zvol for the block and character devices.  We can mount them as UFS file systems same as always.

bash-3.00# zfs snapshot p/v1@s1  # Make the snapshot
bash-3.00# zfs list # See that it's really there
NAME      USED  AVAIL  REFER  MOUNTPOINT
p        4.00G  29.0G  24.5K  /p
p/swap   22.5K  31.0G  22.5K  -
p/v1      531M  30.5G   531M  -
p/v1@s1      0      -   531M  -
bash-3.00# mkdir /fs1-s1
bash-3.00# mount  /dev/zvol/dsk/p/v1@s1 /fs1-s1 # Mount it
mount: /dev/zvol/dsk/p/v1@s1 write-protected # Snapshots are read-only, so this fails
bash-3.00# mount -o ro  /dev/zvol/dsk/p/v1@s1 /fs1-s1 # Mount again read-only
bash-3.00# df -h /fs1-s1 /fs1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1@s1
                       1.9G   431M   1.5G    23%    /fs1-s1
/dev/zvol/dsk/p/v1     1.9G   431M   1.5G    23%    /fs1
bash-3.00#

At this point /fs1-s1 is a read-only snapshot of /fs1.  If I delete files, create files, or change files in /fs1, that change will not be reflected in /fs1-s1.

bash-3.00# ls /fs1/jdk
instances    jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# rm -rf /fs1/jdk/instances
bash-3.00# df -h /fs1 /fs1-s1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G    61M   1.8G     4%    /fs1
/dev/zvol/dsk/p/v1@s1
                       1.9G   431M   1.5G    23%    /fs1-s1
bash-3.00#

Just as you can create multiple snapshots.  And as with any other ZFS file system, you can rollback a snapshot and make it the master again.  You have to unmount the filesystem in order to do this, since the rollback is at the volume level.  Changing the volume underneath the UFS filesystem would leave UFS confused about the state of things.  But, ZFS catches this, too.

 

bash-3.00# ls /fs1/jdk/
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# rm /fs1/jdk/jdk1.6.0
bash-3.00# ls /fs1/jdk/
jdk1.5.0_08  latest       packages
bash-3.00# zfs list
NAME      USED  AVAIL  REFER  MOUNTPOINT
p        4.00G  29.0G  24.5K  /p
p/swap   22.5K  31.0G  22.5K  -
p/v1      535M  30.5G   531M  -
p/v1@s1  4.33M      -   531M  -
bash-3.00# zfs rollback p/v1@s2 # /fs1 is still mounted.
cannot remove device links for 'p/v1': dataset is busy
bash-3.00# umount /fs1
bash-3.00# zfs rollback p/v1@s2
bash-3.00# mount /dev/zvol/dsk/p/v1 /fs1
bash-3.00# ls /fs1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00#

I can create additional read-write instances of a volume by cloning the snapshot.  The clone and the master file system will share the same objects on-disk for data that remains unchanged, while new on-disk objects will be created for any files that are changed either in the master or in the clone.

 

bash-3.00# ls /fs1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# zfs snapshot p/v1@s1
bash-3.00# zfs clone p/v1@s1 p/c1
bash-3.00# zfs list
NAME      USED  AVAIL  REFER  MOUNTPOINT
p        4.00G  29.0G  24.5K  /p
p/c1         0  29.0G   531M  -
p/swap   22.5K  31.0G  22.5K  -
p/v1      531M  30.5G   531M  -
p/v1@s1      0      -   531M  -
bash-3.00# mkdir /c1
bash-3.00# mount /dev/zvol/dsk/p/c1 /c1
bash-3.00# ls /c1/jdk
jdk1.5.0_08  jdk1.6.0     latest       packages
bash-3.00# df -h /fs1 /c1
Filesystem             size   used  avail capacity  Mounted on
/dev/zvol/dsk/p/v1     1.9G    61M   1.8G     4%    /fs1
/dev/zvol/dsk/p/c1     1.9G    61M   1.8G     4%    /c1
bash-3.00#

I think am pretty sure that this isn't exactly what the ZFS guys had in mind when they set out to build all of this, but this is pretty cool.  Now, I can create UFS snapshots without having to specify a backing store.  I can create clones, promote the clones to the master, and the other things that I can do in ZFS.  I still have to manage the mounts myself, but I'm better off than before.

I have not tried any sort of performance testing on these.  Dominic Kay has just written a nice blog about using filebench to compare ZFS and VxFS.  Maybe I can use some of that work to see how things go with UFS on top of ZFS.

As always, comments, etc. are welcome!

I mentioned recently that I just spent a week in a ZFS internals TOI. Got a few ideas to play with there that I will share. Hopefully folks might have suggestions as to how to improve / test / validate some of these things.

ZVOLs as Swap

The first thing that I thought about was using ZFS as a swap device. Of course, this is right there in the zfs(1) man page as an example, but it still deserves a mention here.  There has been some discussion of this on the zfs-discuss list at opensolaris.org (I just retyped that dot four times thinking it was a comma. Turns out there was crud on my laptop screen).  The dump device cannot be on a zvol (at least if you want to catch a crash dump) but this still gives a lot of flexibility.  With root on ZFS (coming before too long) ZFS swap makes a lot of sense and is the natural choice. We were talking in class that maybe it would be nice if there were a way to turn off ZFS' caching for the swap surface to improve performance, but that remains to be seen.

At any rate, setting up mirrored swap with ZFS is way simple! Much simpler even than with SVM, which in turn is simpler than VxVM. Here's all it takes:


bash-3.00# zpool create -f p mirror c2t10d0 c2t11d0
bash-3.00# zfs create -V 2g p/swap
bash-3.00# swap -a /dev/zvol/dsk/p/swap

Pretty darn simple, if you ask me. You can make it permanent by changing the lines for swap in your /etc/vfstab (below).  Notice that you use the path to the zvol in the /dev tree rather than the ZFS dataset name.


bash-3.00# cat /etc/vfstab
#device device mount FS fsck mount mount
#to mount to fsck point type pass at boot options
#
#/dev/dsk/c1t0d0s1 - - swap - no -
/dev/zvol/dsk/p/swap - - swap - no -

I would like to do some performance testing to see what kind of performance you can get with swap on a zvol.  I am curious about how this will affect kernel memory usage.  I am curious about the effect of things like compression on the swap volume.  Thinking about that one, it doesn't make a lot of sense.  I am also curious about the ability to dynamically change the size of the swap space.  At first glance, changing the size of the volume does not automatically change the amount of available swap space.  That makes sense.  That makes sense for expanding swap space.  But if you reduce the size of the volume and the kernel doesn't notice, that sounds like a it could be a problem.  Maybe I should file a bug.

Suggestions for things to try and ways to measure overhead and performance for this are welcomed.