Wednesday Nov 25, 2009

Firefox 3.6 is just around the corner, due to be delivered later this year.

In testing I found out that the old java plugin version (libjavaplugin_oji.so) on opensolaris was no longer recognised and hence java apps didn't work :-(

So what's the deal?

Since Java 6 update 10, there is a new implementation of the java plugin which means java applets are run in separate Java Virtual Machine instances which are launched by the plug-in's code.  Currently they are executed in a JVM instance embedded in the web browser's process.

So what do OpenSolaris/Solaris users need to do?

Install Java 6 update 10 (at least), currently update 17 is available.

Remove the current java plugin from firefox/plugins directory:

rm /export/home/tadpole/firefox/plugins/libjavaplugin_oji.so

Add a symbolic link to the new plugin:

ln -s /usr/java/jre/lib/i386/libnpjp2.so  /export/home/tadpole/firefox/plugins

you should also check the system plugin directory: /usr/lib/firefox/plugins/

More info can be found on the java.com pages here and here.

Tuesday Nov 03, 2009

When asked about Sun Microsystems, one word will always spring to the top of my mind: innovation

There is such a fantastic DNA in this company that looks to push boundaries and make things better - ok, we often do not got the message across well but the effort and dedication shown by employees always makes me proud.

To emphasis this point again there is great news as told by Jeff Bonwick earlier this week: "ZFS now has built-in deduplication"

Deduplication is a process to remove duplicate copies of data, whether it's files, blocks or bytes.

It's probably easier to explain with an example: suppose you have a database with company addresses, the location 'London' will exist for quite a few customers, so instead of having this entry 100 times, there will be one entry and the other 99 references to the original entry. So it saves space and lookup time as it's likely that the reference will already be loaded in cache.

How easy is it to set up?

Assuming you have a storage pool named 'tank' and you want to use dedup, just type this:

zfs set dedup=on tank

There is more to it, so read Jeffs blog for the whole story.

I'm guess this should appear shortly in the OpenSolaris /Dev builds, which will feed into the next OpenSolaris release (2010.02) and in Solaris 10 Update 9. Once it's released, I'll try and run some tests to see the savings I get.

This should also feed into the FreeBSD project. Such a shame OSX has dumped their ZFS project.

Wednesday Oct 28, 2009

After the announcements from Oracle Open World and new TPC benchmark, a lot of focus has been on Sun and the innovation DNA that drives the company.  The announcements focus on flash and their increasing use in computing: 

So what is the secret sauce in these?  These are essentially caching data and are made up of 94GB (4 x 24GB modules) of single-level cell NAND flash, in the F20 card and a staggering 1.92TB (80 modules) for the F5100 flash array.

The F5100 Flash Array has 64 SAS lanes (16 x 4-wide ports), 4 domains and SAS zoning, It can perform 1.6m read IOPS and 1.2M write IOPS, with a bandwidth of 12.8GB/sec.

This read IOPS figure is equivalent to 3,000 hard drives in 14 rack cabinets. The F5100 uses 1/100th of the space and power, of such a collection of hard drives.

This is an amazing database accelerator for Oracle and MySQL. The unit can be zoned into 16 partitions, one for each of up to 16 hosts. The device can form part of a Sun ZFS hybrid storage pool, embracing solid state and hard disk drives.

Further Notes: Sequential Read = 9.7GB/sec; Read/Write Latency (1M transfers) = 0.41ms/0.28ms; Average Power 300 watts (Idle = 213W ; 100% = 386W).  More spec info here.

So if you have need to speed up your Databases, Storage grids, HPC computing or Financial modeling look at what flash SSDs can offer.

Download the Sun Flash Analyzer and install on your server and see where SSDs can help accelerate system performance today.

It won't be long before all computers come with flash as standard as either a separate or hybrid disk to speed up response times . . . OpenSolaris can already do this today with ZFS Storage Pools.

Information management is described as ‘the conscious process by which information is gathered and used to assist decision making at all levels of the organisation’.

It sounds easy but how do you get to this Utopian place?   Starting at the basic level every organisation has data: individual building blocks of information which convey little meaning (e.g. strings of textual or numeric characters).Most organisations have too much data floating around their organisations these days, the important thing is to create information from it.

Information is data that has been organised in some useful way in order so that meaning can be extracted from it. Once it has meaning and context it's much easier to start the steps to providing "business intelligence".

The key here is to have standards for data definitions - recently called Master Data Management and now lately Data Relationship Management.  What is the definition of MDM/DRM:  Master Data is the reference data, which we see in General Ledger hierarchies and reporting structures.

At a simple level having MDM/DRM in place means having a single version of hierarchies and definitions, so that all applications and users can reference the same definitions, to allow consistent reporting and analysis. Otherwise with separate applications running inconsistent hierarchies it means users are not comparing like with like.

Challenges:

1. This thing is too big!  Yes, companies can have multiple systems and multiple hierarchies and multiple definitions, which at first glance can seem to big to manage let alone trying to get the systems to speak to each other. You need to have a grand vision to include all systems and understand dipendancies however it's best if you limit the scope of the initial deployment.

2. Getting buy in from the business/IT: You need to identify the stakeholders and help them understand the benefits, such as reduced downtime and less errors/fixes. It is very difficult to centralise this process as most users don't want to lose control of their own MDM.  One way to influence this is to focus on the end state and reduced maintenance and quicker updates that this model can provide.

3. Creating Data Governance Policies and Processes:  Once you have buy in the next tricky part is creating a process that aligns with everyones calendar and timetables.  It's important to note, that you don't need to immediately build everything to be "Enterprise Ready", as this can create extra layers and distrust with the system, make sure it's fit for the immediate purpose and has room to scale.

The key to progressing at speed is to have single data definitions and a master repository of hierarchies. Not only does this allow production systems to be updated but it also improves the development systems - allowing alternative hierarchies to be tested and easily rolled back.

Oracle has updated and enhanced their Hyperion MDM tool, which is now called Data Relationship Management.

We have long talked about MDM and the benefits, although managing to get all users: Finance, IT and the business to agree to relinquish control has been harder.  

We have at this stage limited our MDM activities to Management Reporting systems, which has been easier to control through the use of standard hierarchies, naming conventions and data definitions. Currently on Hyperion version 9.3.1, we have utilised skeleton cubes to manage the hierarchies to simplify the maintenance and create a single repository for management reporting changes.

Looking ahead, Version 11.1, can manage the approval and changes in hierarchies much better, through workflow with the DRM tool and also integration and synchronisation with multiple systems and data warehouses.  The other important aspect here is this would also simplify development and testing of alternative hierarchies, being able to easily test and roll out to other systems.

Anything that can simplify back-end processes and add business value is always welcome, I'm looking forward to the upgrade ;-)

Friday Oct 09, 2009

Hot on the heals of recent announcements comes the latest update to Solaris 10, Update 8 also know as 10/09:

Here's some key new features:

  • Patching enhancements: Turbo Patching and Parallel Patching for Containers
  • New ZFS features: Quotas, Flash Archives and Cache devices
  • Support for disks over 1TB - this is limited to systems running 64 bit kernel
  • Software Updates: PostgreSQL 8.37, NTP 4.2.5, Samba 3.0.35
  • Numerous other system performance, driver and device enhancements.

Further information:

Documents

Download Solaris 10, U8

What's now EOF (Software no longer supported) 

Gentlemen (and women) start your downloads ;-) 

There has been a few announcements recently (and more to come) and here's one that can really be a game changer and enabler for future tech advances:

Hybrid Storage Pools (HSP) are a new innovation designed to provide superior storage through the integration of flash with disk and DRAM. Sun and Intel have teamed up to combine their technologies of ZFS and high performance, flash-based solid state drives (SSDs) to offer enterprises cutting-edge HSP innovation that can reduce the risk, cost, complexity, and deployment time of multitiered storage environments.

Sun's ZFS

Sun's ZFS file system transparently manages data placement, holding copies of frequently used data in fast SSDs while less-frequently used data is stored in slower, less expensive mechanical disks. The application data set can be completely isolated from slower mechanical disk drives, unlocking new levels of performance and higher ROI. This ‘Hybrid Storage Pool’ approach provides the benefits of high performance SSDs while still saving money with low cost high capacity disk drives.

Solaris ZFS can easily be combined with Intel's SSDs by simply adding Intel Enterprise SSDs into the server’s disk bays. ZFS is designed to dynamically recognize and add new drives, so SSDs can be configured as a cache disk without dismounting a file system that is in use. Once this is done, ZFS automatically optimizes the file system to use the SSDs as high-speed disks that improve read and write throughput for frequently accessed data, and safely cache data that will ultimately be written out to mechanical disk drives.

Intel's SSDs

Intel's SSDs provide 100x I/O performance improvement over mechanical disk drives with twice the reliability:

  • One Intel Extreme SATA SSD (X25-E) can provide the same IOPS as up to 50 high-RPM hard disk drives (HDDs) -- handling the same server workload in less space, with no cooling requirements and lower power consumption.
  • Intel High-Performance SATA SSDs deliver higher IOPS and throughput performance than other SSDs while drastically outperforming traditional hard disk drives. Intel SATA SSDs feature the latest-generation native SATA interface with an advanced architecture employing 10 parallel NAND Flash channels equipped the latest generation (50nm) of NAND Flash memory. With powerful Native Command Queuing to enable up to 32 concurrent operations, Intel SATA SSDs deliver the performance needed for multicore, multi-socket servers while minimizing acquisition and operating costs.
  • Intel High-Performance SATA SSDs feature sophisticated “wear leveling” algorithms that maximizes SSD lifespan, evening out write activity to avoid flash memory hot spot failures. These Intel drives also feature low write amplification and a unique wearleveling design for higher reliability, meaning Intel drives not only perform better, they last longer. The result translates to a tangible reduction in your TCO and dramatic improvements to system performance

Benefits of HSP

Architectures based on HSP can consume 1/5 the power and 1/3 the cost of standard monolithic storage pools while providing maximum performance.

For example, if an application environment with a 350 GB working set needs 30,000 IOPS to meet service level agreements, 100 15K RPM HDDs would be needed. If the drives are 300GB, consume 17.5 watts, and cost $750 each, this traditional environment provides the IOPS needed, has 30TB capacity, costs $75,000 to buy, and consumes 1.75 kWh of electricity.

Using a Hybrid Storage Pool, six 64 GB SSDs (at $1,000 each) provide the 30,000 IOPS required, and hold the 350GB working set. Lower cost, high-capacity drives can be used to store the rest of the data; 30 1TB 7200 RPM drives, at $689 each ($20,670) and consuming 13 watts, provide cost-effective HDD storage. The savings are dramatic:

  • Purchase cost is $26,670, a 64-percent savings
  • Electricity consumed is 0.392 kWh, a 77-percent savings

Link to docs:

Solaris ZFS Enables Hybrid Storage Pools - Shatters Economic and Performance Barriers

UPDATE: Brendon from the Fishwork team has posted some speed and performance notes here

Wednesday Oct 07, 2009

Sun last week announced the release of the latest version of the Sun Java Communications Suite (what a mouthful), it's now version 7!

So what are the key products and features?

  • Calendar Server 7, with CalDAV support, enabling interoperability with Mac iCal/iPhone and Mozilla Thunderbird.
  • Sun Convergence 1 U3,  provides an AJAX rich client web experience for all the components.
  • Indexing and Search Service 1, provides real time indexing and search of messaging and attachments.
  • Instant Messaging 8, supporting standards compliant IM for fixed and mobile users.
  • Messaging Server 7 U3, the latest highly scalable, secure and high performing messaging platform.

 Interested?  

  1. Do you have over 1,000 users of communications/collaboration software?
  2. Is your Communications/Collaboration solution critical to the success of your business?
  3. What is the total cost of ownership of your current communications/collaboration implementation? Or, how much are you spending per month to keep this solution up and running?
  4. Are you locked into a single vendor's proprietary communications solution or do you have choice through open standards?
  5. Are you worried about your implementation's susceptibility to viruses, worms, and spam?

What to learn more and see some demos?

Here is the main Sun Java Communications Suite page http://www.sun.com/comms

So who are these clever individuals who have been able to make some sense, give order and introduce simplicity into my work?

Edward R Tufte: http://www.edwardtufte.com Since the early 1980s he has provided lots of thoughts around how we view and perceive data. One concept (among the many in his several books) I find very useful is the data-ink ratio, as it's name suggests it was born back when screens and digital information were not a ubiquitous as they are now, but the same concept applies. 

When quantitative data is displayed in printed form, some of the ink that appears on the page presents data and some presents visual context that is not data (also called non-data).  The greater share of the ink should be on the data elements, not on the non-data elements (borders, shading, images, graphics).

Colin Ware:  wrote a groundbreaking book: "Information Visualization: Perception for Design". Some key quotes from this book, say more than I ever could:

“If we can understand how perception works, our knowledge can be translated into rules for displaying information. Following perception-based rules, we can present our data in such a way that the important and informative patterns stand out. If we disobey the rules, our data will be incomprehensible or misleading.”

“We can easily see patterns presented in certain ways, but if they are presented in other ways they become invisible.”

Colin organised the preattentive attributes of visual perception into four categories:

 - Colour

 - Form

 - Spatial Position

 - Motion

These categories are tied with the Gestalt principals in setting ground rules for design. 

Stephen Few: http://www.perceptualedge.com/ Has combined the previous theories with design, dashboard and communication ideas to give some key rules for report and dashboard design.

Stephen has written several books on designing tables, graphs, dashboards to progress the understanding of effective visual communication for data and quantitative analysis.  He and his team regularly blog,review and debunk some examples.  Always a great read: http://www.perceptualedge.com/blog/

These are three very intelligent masters in the art of data visualisation.

The 3 key rules I now have are:

  1. Simplify: reduce the data presented
  2. Simplify: concentrate on the important information
  3. Simplify: remove unnecessary non data images, colours, images

Please read and review the best practices provided by others and then implement in your work.  Together we can improve getting the message across and understood clearly.

Monday Sep 14, 2009

Hot on the heels of the previous WSJ ads is this teaser for launch 15th Sept @1PM PST:

What is it? this is the blurb from the teaser: "the world’s first OLTP database machine with Sun FlashFire technology"

It's great to see some collaboration and new technology ;-)

You can sign up for the webcast here.

Thursday Sep 10, 2009

Great news, as seen in the Wall Street Journal:

Oracle promises to invest in SPARC and Solaris technology. Thanks Larry ;-)

Footnote: I should add that it's not just thanks to Larry, but Charles, Safra and the other 85,000 employees too.

Tuesday Sep 08, 2009

As a definite power user of StarOffice there are times when the tool doesn't really help, or actually makes it difficult to do your job.

Thankfully with an open extension framework other can submit ideas and create improvements for all.

A great example of this is DataPilot Tools for OpenOffice.org Calc.

Using DataPilot (similar to Pivot tables within Excel) in StarOffice can be a bit tiresome as it doesn't tell you what the range is that feeds the DataPilot, so if you suddenly have more data or what to make sure all the data is included you can't.

That's where this handy extension comes in, it gets added as part of the DataPilot toolbar menu and tells you:

  • what the source data range is
  • can update the source data range
  • provides and option for refreshing all the DataPilots.
Great news, thanks Peter!

Thursday Sep 03, 2009

The last key concept is not new and is almost 100 years old: In 1912 the Gestalt School of Psychology began research into how people perceive patterns, forms and organisations in what we see.  This research culminated in the collection of the Gestalt principals of perception that explain the visual characteristics that cause us to group objects together.

A lot has been written about these, but briefly the characteristics are:
Proximity, Similarity, Enclosure, Connection, Continuity and Closure. 

These all relate to how close, alike, together and aligned objects are that cause our brains to interpret them as grouped objects, which helps us quickly determine aspects about them (whether they are bold or green for example).

The way objects are grouped, aligned or different means we perceive them as groups, this has 2 ideas for report designers:
 - we should group like items together, focus on the value add we are presenting by organising and minimising the data shown.
 - we should separate distinct items, by arranging the information in a way that makes sense, making sure that the important data stands out.

This way we can enable users to quickly interpret the data we are presenting.

But how do we do this?? Thankfully we don't have to work this out for ourselves, there are some very clever and intelligent people who have researched these topics and provided some answers and guidelines.

In the next post, I'll review some people who have been able to help me in my work and help others get the message across simply and clearly. 

Sunday Aug 23, 2009

Business Intelligence tools provide lots of ways to summarise and aggregate data, however this alone does not mean that the audience will interpret, understand and gain value from it. Only from careful design and planning can report developers and system designers structure the output in a meaningful and value added way.

To enable me to be able to sell others on design ideas and concepts, I needed to understand more about what I saw, how I interpreted data and what I saw as errors.  My research showed me that these were not new concepts and had previously been applied to paper based items not dynamic or interactive screens, which we mostly work with today.

There were 3 concepts which stood out, the first of which is human memory limits:

Memory Limits:

There are 3 types of memory limits, Iconic, Short Term and Long Term.  The important ones for BI developers are iconic and short term.

Iconic memory is very much like a computer memory buffer, where items are held before they are processed - what goes on here is pre-conscious. If we group items (either by size, shape or colour etc) it can help users process the information in iconic memory, called pre-attentive processing.

Short Term memory is the key limitation in human cognition, studies have proved that we can only store 3-9 chunks of visual information at one time in short term memory.  We can help users perceptions here by grouping items and intelligently using charts.

The key for designers is to reduce the short term memory load by using familiar items, objects, actions and directions.

Data Encoding

The key here is how we can visually encode data for faster perception, it's better for pre-attentive processing to occur rather than attentive processing which is sequential and therefore takes longer. The best example is trying to find how many 4s are in the following string:

172634950980273849

It's difficult and slow and there's nothing to distinguish the 4s from the other numbers, in the example below we make them stand out: 

172634950980273849

Much easier!  That's all for this episode, catch the next instalment soon.

Thursday Jul 30, 2009

For those not in the know, Sun Ray is a thin client technology provided by Sun, with no local disk storage and they are totally stateless.

These very smart, very low power consuming machines (4 watts!) or integrated monitors allow you to keep a session (Solaris, Windows, Linux) running on the server and access at each machine you insert your java card. 

This allows you to do some work, pull out your card, walk over to a colleagues desk or meeting room and insert your card and pull up the same session.

Now that the intro is done . . . Microsoft have embraced this technology at their Enterprise Engineering Center (EEC). More info here.

Now the other piece of cool news:  Sun Ray Soft Client is now available as part of the Sun Ray Software 5 Early Access program:

The Sun Ray Soft Client is a software application that easily installs on common client operating systems and provides the ability to connect to a Sun Ray server and initiate a Sun Ray desktop session from a Windows laptop or desktop computer. The Sun Ray Soft Client also provides the flexibility to 'hotdesk' to and from your Sun Ray thin client and any supported Sun Ray Soft Client enabled PC. Currently available for Windows only.

Wednesday Jul 22, 2009

Firstly thanks to all who helped with this event, from the OUG folks, through to the presenters and various committee members.  It was a great event, with lots of good content and interesting discussions.

From a personal perspective, thanks to all who attended my slot and gave me such good feedback.  I'll be adding some blogs shortly covering "Best Practices for Data Visualisation" to those who missed it.

With the next event appearing shortly on the horizon, please let me know if there's anything essbase or Hyperion reporting related you'd like to see covered in October.  No schedule yet, but check out the OUG Hyperion site for details shortly.

This blog copyright 2009 by Thin Slice