Tuesday Nov 03, 2009

When asked about Sun Microsystems, one word will always spring to the top of my mind: innovation

There is such a fantastic DNA in this company that looks to push boundaries and make things better - ok, we often do not got the message across well but the effort and dedication shown by employees always makes me proud.

To emphasis this point again there is great news as told by Jeff Bonwick earlier this week: "ZFS now has built-in deduplication"

Deduplication is a process to remove duplicate copies of data, whether it's files, blocks or bytes.

It's probably easier to explain with an example: suppose you have a database with company addresses, the location 'London' will exist for quite a few customers, so instead of having this entry 100 times, there will be one entry and the other 99 references to the original entry. So it saves space and lookup time as it's likely that the reference will already be loaded in cache.

How easy is it to set up?

Assuming you have a storage pool named 'tank' and you want to use dedup, just type this:

zfs set dedup=on tank

There is more to it, so read Jeffs blog for the whole story.

I'm guess this should appear shortly in the OpenSolaris /Dev builds, which will feed into the next OpenSolaris release (2010.02) and in Solaris 10 Update 9. Once it's released, I'll try and run some tests to see the savings I get.

This should also feed into the FreeBSD project. Such a shame OSX has dumped their ZFS project.

Wednesday Oct 28, 2009

After the announcements from Oracle Open World and new TPC benchmark, a lot of focus has been on Sun and the innovation DNA that drives the company.  The announcements focus on flash and their increasing use in computing: 

So what is the secret sauce in these?  These are essentially caching data and are made up of 94GB (4 x 24GB modules) of single-level cell NAND flash, in the F20 card and a staggering 1.92TB (80 modules) for the F5100 flash array.

The F5100 Flash Array has 64 SAS lanes (16 x 4-wide ports), 4 domains and SAS zoning, It can perform 1.6m read IOPS and 1.2M write IOPS, with a bandwidth of 12.8GB/sec.

This read IOPS figure is equivalent to 3,000 hard drives in 14 rack cabinets. The F5100 uses 1/100th of the space and power, of such a collection of hard drives.

This is an amazing database accelerator for Oracle and MySQL. The unit can be zoned into 16 partitions, one for each of up to 16 hosts. The device can form part of a Sun ZFS hybrid storage pool, embracing solid state and hard disk drives.

Further Notes: Sequential Read = 9.7GB/sec; Read/Write Latency (1M transfers) = 0.41ms/0.28ms; Average Power 300 watts (Idle = 213W ; 100% = 386W).  More spec info here.

So if you have need to speed up your Databases, Storage grids, HPC computing or Financial modeling look at what flash SSDs can offer.

Download the Sun Flash Analyzer and install on your server and see where SSDs can help accelerate system performance today.

It won't be long before all computers come with flash as standard as either a separate or hybrid disk to speed up response times . . . OpenSolaris can already do this today with ZFS Storage Pools.

Friday Oct 09, 2009

Hot on the heals of recent announcements comes the latest update to Solaris 10, Update 8 also know as 10/09:

Here's some key new features:

  • Patching enhancements: Turbo Patching and Parallel Patching for Containers
  • New ZFS features: Quotas, Flash Archives and Cache devices
  • Support for disks over 1TB - this is limited to systems running 64 bit kernel
  • Software Updates: PostgreSQL 8.37, NTP 4.2.5, Samba 3.0.35
  • Numerous other system performance, driver and device enhancements.

Further information:

Documents

Download Solaris 10, U8

What's now EOF (Software no longer supported) 

Gentlemen (and women) start your downloads ;-) 

There has been a few announcements recently (and more to come) and here's one that can really be a game changer and enabler for future tech advances:

Hybrid Storage Pools (HSP) are a new innovation designed to provide superior storage through the integration of flash with disk and DRAM. Sun and Intel have teamed up to combine their technologies of ZFS and high performance, flash-based solid state drives (SSDs) to offer enterprises cutting-edge HSP innovation that can reduce the risk, cost, complexity, and deployment time of multitiered storage environments.

Sun's ZFS

Sun's ZFS file system transparently manages data placement, holding copies of frequently used data in fast SSDs while less-frequently used data is stored in slower, less expensive mechanical disks. The application data set can be completely isolated from slower mechanical disk drives, unlocking new levels of performance and higher ROI. This ‘Hybrid Storage Pool’ approach provides the benefits of high performance SSDs while still saving money with low cost high capacity disk drives.

Solaris ZFS can easily be combined with Intel's SSDs by simply adding Intel Enterprise SSDs into the server’s disk bays. ZFS is designed to dynamically recognize and add new drives, so SSDs can be configured as a cache disk without dismounting a file system that is in use. Once this is done, ZFS automatically optimizes the file system to use the SSDs as high-speed disks that improve read and write throughput for frequently accessed data, and safely cache data that will ultimately be written out to mechanical disk drives.

Intel's SSDs

Intel's SSDs provide 100x I/O performance improvement over mechanical disk drives with twice the reliability:

  • One Intel Extreme SATA SSD (X25-E) can provide the same IOPS as up to 50 high-RPM hard disk drives (HDDs) -- handling the same server workload in less space, with no cooling requirements and lower power consumption.
  • Intel High-Performance SATA SSDs deliver higher IOPS and throughput performance than other SSDs while drastically outperforming traditional hard disk drives. Intel SATA SSDs feature the latest-generation native SATA interface with an advanced architecture employing 10 parallel NAND Flash channels equipped the latest generation (50nm) of NAND Flash memory. With powerful Native Command Queuing to enable up to 32 concurrent operations, Intel SATA SSDs deliver the performance needed for multicore, multi-socket servers while minimizing acquisition and operating costs.
  • Intel High-Performance SATA SSDs feature sophisticated “wear leveling” algorithms that maximizes SSD lifespan, evening out write activity to avoid flash memory hot spot failures. These Intel drives also feature low write amplification and a unique wearleveling design for higher reliability, meaning Intel drives not only perform better, they last longer. The result translates to a tangible reduction in your TCO and dramatic improvements to system performance

Benefits of HSP

Architectures based on HSP can consume 1/5 the power and 1/3 the cost of standard monolithic storage pools while providing maximum performance.

For example, if an application environment with a 350 GB working set needs 30,000 IOPS to meet service level agreements, 100 15K RPM HDDs would be needed. If the drives are 300GB, consume 17.5 watts, and cost $750 each, this traditional environment provides the IOPS needed, has 30TB capacity, costs $75,000 to buy, and consumes 1.75 kWh of electricity.

Using a Hybrid Storage Pool, six 64 GB SSDs (at $1,000 each) provide the 30,000 IOPS required, and hold the 350GB working set. Lower cost, high-capacity drives can be used to store the rest of the data; 30 1TB 7200 RPM drives, at $689 each ($20,670) and consuming 13 watts, provide cost-effective HDD storage. The savings are dramatic:

  • Purchase cost is $26,670, a 64-percent savings
  • Electricity consumed is 0.392 kWh, a 77-percent savings

Link to docs:

Solaris ZFS Enables Hybrid Storage Pools - Shatters Economic and Performance Barriers

UPDATE: Brendon from the Fishwork team has posted some speed and performance notes here

Tuesday Nov 18, 2008

Now that I'm successfully running a zfs root, I don't need my old usf root anymore, so it should be a simple matter of removing the old usf boot environment and increasing the size of the new zfs root pool. 

Right?  Well no actually, there seems to be a bug or 3.

# lustatus
Boot Environment           Is       Active Active    Can    Copy     
Name                       Complete Now    On Reboot Delete Status   
-------------------------- -------- ------ --------- ------ ----------
snv_98                     yes      no     no        yes    -        
snv_102                    yes      yes    yes       no     -        
# ludelete snv_98
System has findroot enabled GRUB
Checking if last BE on any disk...
BE <snv_98> is not the last BE on any disk.
Updating GRUB menu default setting
Changing GRUB menu default setting to <3>
ERROR: Failed to copy file </boot/grub/menu.lst> to top level dataset for BE <snv_98>
ERROR: Unable to delete GRUB menu entry for deleted boot environment <snv_98>.
Unable to delete boot environment.

This is CR6718038/CR6715220/CR6743529. A quick workaround would be to edit /usr/lib/lu/lulib and replace the following in line 2937:
lulib_copy_to_top_dataset "$BE_NAME" "$ldme_menu" "/${BOOT_MENU}"
with
lulib_copy_to_top_dataset `/usr/sbin/lucurr` "$ldme_menu" "/${BOOT_MENU}"

then rerun the ludelete:

# ludelete snv_98
System has findroot enabled GRUB
Checking if last BE on any disk...
BE <snv_98> is not the last BE on any disk.
Updating GRUB menu default setting
Changing GRUB menu default setting to <3>
Saving existing file </boot/grub/menu.lst> in top level dataset for BE <snv_102> as <mount-point>//boot/grub/menu.lst.prev.
File </boot/grub/menu.lst> propagation successful
File </etc/lu/GRUB_backup_menu> propagation successful
Successfully deleted entry from GRUB menu
Determining the devices to be marked free.
Updating boot environment configuration database.
Updating boot environment description database on all BEs.
Updating all boot environment configuration databases.
Boot environment <snv_98> deleted.
#

Then I needed to remove the old usf boot and swap slices, old and new layout:

partition> print
Current partition table (original):
Total disk cylinders available: 12047 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       3 -  1962       15.01GB    (1960/0/0)   31487400
  1       swap    wu    1963 -  2224        2.01GB    (262/0/0)     4209030
  2     backup    wm       0 - 12046       92.28GB    (12047/0/0) 193535055
  3 unassigned    wm    2225 -  4182       15.00GB    (1958/0/0)   31455270
  4 unassigned    wu       0                0         (0/0/0)             0
  5 unassigned    wu       0                0         (0/0/0)             0
  6 unassigned    wm    4183 -  6793       20.00GB    (2611/0/0)   41945715
  7       home    wm    6794 - 12046       40.24GB    (5253/0/0)   84389445
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 alternates    wu       1 -     2       15.69MB    (2/0/0)         32130

partition>

partition> print
Current partition table (original):
Total disk cylinders available: 12047 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       3 -  4182       32.02GB    (4180/0/0)   67151700
  1 unassigned    wu       0                0         (0/0/0)             0
  2     backup    wm       0 - 12046       92.28GB    (12047/0/0) 193535055
  3 unassigned    wu       0                0         (0/0/0)             0
  4 unassigned    wu       0                0         (0/0/0)             0
  5 unassigned    wu       0                0         (0/0/0)             0
  6 unassigned    wm    4183 -  6793       20.00GB    (2611/0/0)   41945715
  7       home    wm    6794 - 12046       40.24GB    (5253/0/0)   84389445
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 alternates    wu       1 -     2       15.69MB    (2/0/0)         32130

partition>

The size of my pools before:

# zpool list
NAME       SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rootpool    15G  8.54G  6.46G    56%  ONLINE  -
tank        40G  5.65G  34.3G    14%  ONLINE  -
tank2     19.9G   652K  19.9G     0%  ONLINE  -
#

Then reboot, oops!!!

It just defaults to >grub prompt, because my old ufs slice held all the boot info and I just deleted that so it can't find any  . . . but this is a simple process to restore (as long as you have a recent dvd image handy (so it can recongnise and mount the zfs pool).

Insert and boot from the dvd image, select single user mode.

Mount the rootpool as r/w on /a (it should prompt automatically for this).

At the command prompt, type:

installgrub /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c2t1d0s0

then reboot.  I love it when a plan comes together, pool sizes after the reboot:

# zpool list
NAME       SIZE   USED  AVAIL    CAP  HEALTH  ALTROOT
rootpool    32G  8.54G  23.5G    26%  ONLINE  -
tank        40G  5.54G  34.5G    13%  ONLINE  -
tank2     19.9G   722K  19.9G     0%  ONLINE  -
#

Monday Nov 17, 2008

An appropriate title for my first blog as this is the 'new' me going forward and leaving the 'old' unblogger behind. I'm also starting an MBA course so will be blowing the cobwebs off my study brain and changing my life for the next year or so if I continue after that.

It's also an appropriate title as I recently upgraded my home computer to run zfs root: - I previously had 2 ufs slices and proceeded to use live upgrade to move between them when updating between editions of solaris community edition (also known as nevada). - I had been meaning to do it for some time as it's much quicker when copying (as it clones, think seconds rather than 50 minutes to copy 7GB) and is more robust with all that zfs goodness.

I just created a new zfs pool on the unused ufs liveupgrade slice, no need to re-format, mind you I did need to use the -f flag:

# zpool create rootpool c2d0s0               
invalid vdev specification
use '-f' to override the following errors:
/dev/dsk/c2d0s0 contains a ufs filesystem.

# zpool create -f rootpool c2d0s0

Then create the boot environment:

# lucreate -c snv_98 -n snv_102 -p rootpool
Checking GRUB menu...
System has findroot enabled GRUB
Analyzing system configuration.
Comparing source boot environment  file systems with the file 
system(s) you specified for the new boot environment. Determining which 
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
The device  is not a root device for any boot environment; cannot get BE ID.
Creating configuration for boot environment .
Source boot environment is .
Creating boot environment .
Creating file systems on boot environment .
Creating  file system for  in zone  on .
Populating file systems on boot environment .
Checking selection integrity.
Integrity check OK.
Populating contents of mount point .
Copying.
Creating shared file system mount points.
Creating compare databases for boot environment .
Creating compare database for file system .
Updating compare databases on boot environment .
Making boot environment  bootable.
Updating bootenv.rc on ABE .
File  propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE  in GRUB menu
Population of boot environment  successful.
Creation of boot environment  successful.

Mount the iso for the upgrade and wait:
# lofiadm -a /export/home/ic140957/sol-nv-b102-x86-dvd.iso
/dev/lofi/1
# mount -r -F hsfs /dev/lofi/1 /mnt
# luupgrade -u -n snv_102 -s /mnt
System has findroot enabled GRUB
No entry for BE  in GRUB menu
Uncompressing miniroot
Copying failsafe kernel from media.
52155 blocks
miniroot filesystem is 
Mounting miniroot at 
Validating the contents of the media .
The media is a standard Solaris media.
The media contains an operating system upgrade image.
The media contains  version <11>.
Constructing upgrade profile to use.
Locating the operating system upgrade program.
Checking for existence of previously scheduled Live Upgrade requests.
Creating upgrade profile for BE .
Checking for GRUB menu on ABE .
Saving GRUB menu on ABE .
Checking for x86 boot partition on ABE.
Determining packages to install or upgrade for BE .
Performing the operating system upgrade of the BE .
CAUTION: Interrupting this process may leave the boot environment unstable 
or unbootable.
Upgrading Solaris: 100% completed
Installation of the packages from this media is complete.
Restoring GRUB menu on ABE .
Adding operating system patches to the BE .
The operating system patch installation is complete.
ABE boot partition backing deleted.
PBE GRUB has no capability information.
PBE GRUB has no versioning information.
ABE GRUB is newer than PBE GRUB. Updating GRUB.
GRUB update was successful.
Configuring failsafe for system.
Failsafe configuration is complete.
INFORMATION: The file  on boot 
environment  contains a log of the upgrade operation.
INFORMATION: The file  on boot 
environment  contains a log of cleanup operations required.
INFORMATION: Review the files listed above. Remember that all of the files 
are located on boot environment . Before you activate boot 
environment , determine if any additional system maintenance is 
required or if additional media of the software distribution must be 
installed.
The Solaris upgrade of the boot environment  is complete.
Installing failsafe
Failsafe install is complete.
# umount /mnt
# lofiadm -d /dev/lofi/1
# luactivate snv_102
# init 6 (reboot)

The new boot environment is now updated and can be booted. Hurrah!

This blog copyright 2009 by Thin Slice