Georg Edelmann's Weblog Georg's Weblog

Friday Apr 04, 2008

Today was the last day of Open Repository 2008. I chose to attend the Fedora User Group session today, which centred around users describing real projects they undertake with Fedora. Right up my alley.

Out of the three presentation, I want to highlight the last one. Marcos Santoz, a research assistant from the University of Koblenz-Landau, Germany, talked about project TAS3. TAS3 stands for "Trusted Architecture for Securely Shared Services". What an ominous title !!! The team used Fedora to build a system with the following goals :

  • manage lifelong generated personal information for individuals in the employability and e-Health sector.
  • support "sticky policies" and "break the glass" (I explain this later with an example)
  • support lots of different data standards (see their website for details).

Pretty abstract, right ? It was for me, until Marcos talked us through two used cases. Here's number one. A doctor treats a severely wounded patient who just arrived in the emergency theatre. The doctor needs the patient's medical record. She accesses the patient record repository, and learns that she does not have the access privileges to the patient's medical history ("sticky policies"). She then tries to access the record a second time, which trigger an audit trail review, which, if successful, clears our doctor for access ("break the glass").

The second used case was less of a life and death scenario. An graduate applies for a position at a company. Our graduates learning records are being maintained via IMS-LIP, yet our hiring manager's system works only on HR-XML. Our graduate wants to grant access to his learning records to the hiring manager. TAS3 will take care of the format and protocol mediation, thus allowing the hiring manager to look at our graduates data without exiting his environment.

That was it. My OR2008 is over. What a blast, I learned just enough to appreciate what I don't know. What I do know, is that whatever tough problems are being solved here, Honeycomb and Sun's technology line-up is a perfect platform for open repositories.

It's good-bye Southampton from me. See you at OR2009 in Atlanta, Georgia.

Today is "User Group" day at Open Repository 2008. This is where the three main open-source communities, Eprints, DSpace and Fedora Common talk about their initiatives.

I started the day listening to the presentation from Michelle Klimpton, executive director of the DSpace foundation. She walked us through the plans the foundation has around promotion and awareness for DSpace. They have done or will do webinars, maintain discussion lists and the DSpace website, attend and present at community gatherings, organise and facilitate training event, create marketing materials, etc. Looking forward, they want to create a Global Outreach Committee. A forum that takes the foundation's material and localize it to their specific region. Michelle was looking for volunteers to support this effort. Contact her, if you are interested. She also want to get more involved in the coordination of user group meetings. What was my takeaway from her talk ? She has a "we'll do whatever it takes to facilitate an active community" approach. It might sound obvious, but an active community does not happen by accident. It's people like Michelle that feverishly work in the background to grease the wheels. Thanks, Michelle and team.

Next up, was the chief developer of DSpace, Scott, talking about the new features in DSpace 1.5.

There were a couple of significant improvements for the DSpace code base. The main ones are :

  • Manakin: a new tool to create web graphical user interfaces for DSpace.
  • Maven : the Apache build manager for Java projects, Maven, is now being used to build DSpace.
  • Improved workflow : the way you can get content into the repository was made more flexible, and therefore can be easier adjusted to the business processes of DSpace users.
  • Improved browsing : better use of searches and indexes.

There were also some major "under the hood improvements" :

  • SWORD : integration of the ingestion protocol SWORD.
  • Leightweight Network Interface : protocol for managing content in DSpace. It's kind of what SWORD does.
  • Event system : notifies listeners by events whenever an object changes.

In Scott's Q&A session, he got quite a deluge of questions. Here's a small sample :

  • Q: How did the server performance change under DSpace 1.5 ?
  • A: We've seen better performance in areas of ingestion. Batch ingestion tests moved ingestion times from half a day to 1 hours.
  • Q: What hardware do I need to run DSpace ?
  • A: There's guidance in the manual. You do need more than half a Gb of memory.
  • Q: What did you measure in terms of performance ? What's the metric ?
  • A: Scott measures user experience with Jmeter, and batch ingestion, as mentioned before.

I swapped session, and jumped over to the Eprints user group. The talk was planned to be around "Research Assessment Experience", but when the talk was a about to start, a group decision was made to extend the previous session and talk about new features in Eprints 3.1. Chris Gutteridge, the fast-speaking Eprints chief developer, answered questions about features. I notices quite a bit of shifting towards using Eprints plugins to add new features. Chris' main reason for that was that the Eprints system administrators at Southampton University has to go through hoops to revise Eprints core services. Adding bug fixes and plugin is much less complex. It was interesting to observe how Chris balances his responsibilities as an employee of the Southampton University and his leading role in the Eprints community.

Next Eprints section was on the experiences of Eprints users going through an Research Assessment exercise. From the RAE website, you will learn that the RAE is :

The Research Assessment Exercise is conducted jointly by the Higher Education Funding Council for England (HEFCE), the Scottish Funding Council (SFC), the Higher Education Funding Council for Wales (HEFCW) and the Department for Employment and Learning, Northern Ireland (DEL). The primary purpose of the RAE 2008 is to produce quality profiles for each submission of research activity made by institution.

In a nutshell, this is when the universities are being audited by a governing body in the UK. A positive audit results has positive impact on grants and funding, hence the heightened level of focus in this area. Money talks. Not being an Eprints technology related session, I was about to walk out, when the speaker was announced as being from the Open University, an institution I highly regard. So, I stayed and listened. After all, as a supplier knowing the "scary monsters" of your customers is key.

Here are a couple of challenges I picked up :

  • Finding hard evidence for the audit. Example : an art installation that is based on rolling barrels. This was a transient display, and there was hardly any lasting evidence after the installation was dismantled. RAE did not have a category for that.
  • Librarians were not skilled in using Eprints, and needed to be trained. After being trained, the good people left. Their market value significantly increased. One of the "culprits" was in the audience and was pointed to by the presenter. Everybody laughed.
  • Out-of-the-box metadata for Eprints were insufficient for RAE. Needed customisation e.g. added a field to store the physical location of an arts item. This comment came from a Kingston University employee.
  • Researcher needed to be organised to use the repository in line with what the RAE wanted.

I spent the afternoon listening to a series Fedora presentations. First up, my esteemed Sun colleague, Eric Reid. Eric's job at Sun is working with open-source communities making sure their technologies integrates and works well on Sun. His presentation was entitled "Fedora and Honeycomb : A New Buzz in Creating and Managing Large Scale Digital Archives". Eric spoke about the systems (or better platform) aspects of Fedora Commons. He explained to the audience what a Honeycomb system is, what it can do, how it works. If you want to learn all that yourself, start here. The key point he wanted to make around Honeycomb was its paradigm of object data store, and its RAIN (Redundant Array of Inexpensive Nodes) architecture. He got quite a reaction from the crowd when he mentioned that the calculated "mean time between data loss" is 2 million years. This is music to the ears of people who are in charge of long-term preservation.

Eric made a great point about the integration. The LLStorePlugin for Fedora is available today, and glues the storage back-end interface with the Honeycomb API, thus making Honeycomb a first class storage citizen for Fedora. Note: this interface is for content only. Fedora's metadata store is not yet kept on Honeycomb. We're working on that.

Eric got a bunch of questions after his talk. Here's some of them:

  • Q: What happens when it breaks ? Disks do not live for 2 million years ?
  • A: Honeycomb takes care of this via self-healing, and flagging the faults to the administrator for attention.
  • Q: How do you backup ?
  • A: I prefer the notion of multiple Fedora data store for availability, but there is also a NDMP backup interface.
  • Q: How many object can you store in a 64TB Honeycomb ?
  • A: There are no practical limits to the number of objects that can be stored.
  • Q: 64TB per rack is too much. Do you have anything smaller ?
  • A: Yes, a half-cell Honeycomb at 16TB is available today. We have even smaller solutions on the roadmaps. We can talk about these under NDA.

By the way, Santosh, a repository research engineer from Microsoft, sat next to me. His presentation feedback was : "This is good stuff. Let me see how we can integrate Microsoft SQL server with Honeycomb". We agreed to exchange Email on this topic. I pointed him to the Honeycomb API documentation and the Honeycomb emulator for further studies. Cool.

Next up was Andreas Aschenbrenner, State and University Library Goettingen, talked about "Using Fedora to manage complex objects". He made some very interesting points on how institutions want to share repositories infrastructure. I think he refered to what we would call "cloud storage capacity" or "Storage as a Service" ala Amazon S2 grid, or on-demand storage for repositories. I liked his approach here. The images that came to mind was that of a network based grid repository service, that a department can buy storage capacity from, and grow as their data needs expand.

Ben O'Steen, Oxford University, talked about the Fedora-based architecture. For me, this was by far the most exciting presentation of the whole week. Here's why. In my mind, the solved all the architectural problems everybody spoke about for the last three days. Here are the highlights :

  • Scalability : objects can be placed in any object store on the network, and located via their object meta data. This means that scaling over multiple Fedora instances is a no brainer. Need more storage capacity, just another Fedora instance with another Honeycomb. Need more server resources, just add another virtual Fedora server in a VMware/Solaris/xVM container.
  • Open remote access : They chose UUIDs to identify objects, who are being exposed to the world via unique identifiers. This include the RDF relationships of the objects. Ben showed a demo where he created a blog about a paper on the Oxford repository. As the blog entry was posted, the repository picked up the fact an object was being linked to, and added this fact into the metadata store. Cool.
  • Extend functionality: They use the JMS interface that interfaces via iCAL for any scheduling needs, e.g. scheduling virus scans, log user logins.
  • Ingest : ingesting content via a staging archive, which can be used to moderate data, e.g. cleaning up duplicates, and then move the "clean" data to the real archive.

He also explained the object relationship model using the example of how a book is stored. A page of a book (think a TIFF file of the scanned page) links to a chapter, which then links to book. All via RDF, and query-able in their XML form. These relationships can be defined on the fly.

For further information, see

That was it for me for the day. The evening was a formal awards dinner. After the meal, the winner of the Open Repository Challenge were being announced. Guess who won the $5000 price ? Ben O'Steen and team. In the last two days, Ben and team whipped up the code to exchange the content of a Eprints repository with a Fedora Commons repository. That's what I call interoperability. Well done, Ben and team.

Day 2 started with a series of extremely interesting presentations on national and international perspectives of open repositories, followed by six talks describing a wide selection of scientific repositories. The afternoon was occupied by talks about models, architectures and framework, and a section on usage. I'll try to pick a representative sample of the day here.

Simon Coles from the University of Southampton did an fun-packed talk about his experiences with repositories and blogs in laboratories. His R4L project aims to address the gap between actual laboratory experiments and the publications of papers. I got quite a bit of contextual understanding around the academic life from his talk. Here's one example. Simon said : "40 years ago a PhD student would determine 3 crystal structures during the course of their study, this can now be done in one day." Now, that's what we call data explosion !

Christian Gumpenberger, Novartis, gave the audience a deep understanding of the trials and tribulations of introducing an Eprints based pharmaceutical repository corporate-wide at Novartis. His talk stood out for me, as it was one of the very few session in which a commercial entity took it upon themselves to organize their knowledge in a repository. Project OAK (Open Access to Knowledge) was a master's lesson on how to navigate the corporate world when it comes to implement a central knowledge data bank. Challenges were many, most of which were in keeping the project going after a successful start. On the technology side, Eprints was according to Christian, the right choice for Novartis. A good thing to say, when you present at the "Home of Eprints".

I the jumped onto the "Models, Architecture and Framework" track for two sessions. One of which was a presentation by Herbert van de Sompel, Los Alamos National Laboratory (LANL), on aDORe Federation Architecture. This was a brilliant talk. Herbert explained how LANL designed and implemented an architecture to federate repositories for scale. Scale at LANL means the 100 million objects in 9200 repositories. Massive scale, I'd say. Tons of ideas popped into my mind here. I could see how one could build hardware platform building blocks that would support the idea of scaling repositories by federation in a completely transparent manner.

My last session of the day was entitled "MESUR: Implications of usage-based evaluations of scholarly status for open repositories" by Johan Bollan, Los Alamos National Laboratory. Just reading the show brochure, this looked like a less interesting topic. Statistics and numbers, right ? Not so. Johan, being a skilled presenter, combined with his fast-paced style, was a blast. The project mined a wide choice of journals and created a graphical model via their citations on how the publications (and therefore the sciences) interconnect. Very interesting. For me, their work was one of the best visualisations of huge datasets I have ever seen. Check out the project's website.

Before I forget, I also attended the Microsoft session from Lee Dirks, Santosh Balasubamanian and Savas Parastatidis. We met the guys in the hotel earlier the same day, and got talking. The folks are working on a research project around using Microsoft technologies for repositories. Build on top of Microsoft SQL server, Santosh and Savas showed a series of impressive demos centered around the ease of development for repository software. From what I have seen during the last couple of days, this is probably the most complete development environment, even at this early stage of the project. It does require the developer to stay within the well-padded Microsoft environment, and as the question and answer session illustrated, cross-platform (read non-Windows) deployment does present a challenge. What did surprise me was the presenters sincere commitment to being open. Have the winds shifted to a more open-source attitude at Microsoft ? I wondered.

This was a long day. Off to the pub for some well-deserved pints of London Pride.

Wednesday Apr 02, 2008

This week, I have the pleasure to attend Open Repository 2008 in Southampton, UK.   Labelled the "International Conference of Open Repositories", it brings together researchers, scientists, repository managers, software architects and IT providers from all over the world.   From the our part of the industry, I've met folks from Sun, Microsoft and HP.   This is a very mixed crowd, and I'll talk about this aspect later for a bit.

Over the next few blog entries, I would like to share some of my impressions from the show.

Tuesday morning's keynote was given by Peter Murray-Rust from Cambridge University, UK.   My biggest takeaway was his point that PDF data format is bad, even evil when it comes to mining the document content with machines.   He promoted the idea of using Microsoft Word document formats instead.   I wonder if his intent was supporting the idea of the the adoption of any industry-standard format, e.g. ODF...   or maybe just anything better than PDF.

The following train of session was called "Web2.0", which surprised me as a topic for this kind of show to an extend that I attended all the presentations.   The first one was on the issue of inter-repository authentication via OpenID.   Interesting talk.   The project is called Connotea.   I took an action to find out how this all plays with Sun's Identity Management solutions.

The next talk was about project SNEEP "Social Networking Extensions for Eprints" from Richard Davis of the University of London.   The notion here was to use Web2.0 practises like Blogs, comments, bookmarks and tagging for content in repositories. Why ?   Mainly, to enrich the material by encouraging the participation of the users.   Interesting thought, but I wonder if the community is ready to make a leap forward here.

A note about the domain of institutional repositories.   These are the people who are in charge in ingesting, maintaining, managing, and preserving a sometimes phenomenal about of data.   One customer I spoke to, talked about 80 million objects (think images, thesis, research papers, experimental data, etc), stored within 9200 distinct repositories.   When asked about the storage capacity he required, he mentioned "multiple petabytes", growing at steep rates.   Wow.

The first day of the show ended with a preamble to the poster sessions called "Minute Madness".   Every poster session presenter got 60 seconds to introduce the topic of his/her poster.   There was a clock counting down behind the presenter, which turned red when going negative.   Fun to watch the presenter trying to cram as much into the 60 seconds as possible.

The subsequent poster session was an informal gathering of the attendees over a beer.   I spoke to the folks who developed Manakin, a tool to develop user interfaces with Dspace.   Pretty powerful stuff.   I was impressed by the flexibility Manakin provides to the user when customizing their web interface to a Dspace repository.   There's a demo here.

Sun had a poster stand that demonstrated Honeycomb's fit as a data repository platform.   We got tons of interest.   Gail Truman, our Honeycomb product manager, was visibly exhausted.   Well done, Gail.

Day one, ended with a quiet dinner at our hotel.

Sunday Mar 16, 2008

You probably heard the term "Liquid Gold" in lots of different context e.g. oil, the melted precious metal, water in the desert, etc. Here in New England, when winter's breath is shallow, and spring has not yet woken up, liquid gold has a distinct meaning. It's the magic that happens when you pierce a maple tree to draw its juice, and boil it until the sugar content reaches 66-67 percent to create maple syrup.

This weekend, Kimberley and I had a great time creating our very own liquid gold. We tapped 7 trees with 12 taps, and collected about 35 gallons of maple tree sap. Boiling the sap in a home made evaporator, kept us occupied over both days. I'm just about to finish the process on our kitchen stove. I expect 4-5 pints of the finished product today.

Here comes the science bit. Sap turns into maple syrup when the boiling point reaches 7.1 degree Fahrenheit over the boiling point of water, which is dependant of your elevation and local barometric pressure. Today, at our home this is 216 degree Fahrenheit.

The not so scientific approach, is to stick a cold spoon into the boiling sap and watch it drip from the spoon. If it is "sheeting", your syrup is ready. Of course licking the spoon is part of the process, which makes this my preferred testing method.

For centuries, alchemists tried to turn all kind of materials into gold without success, when all the time the real liquid gold was just a boil away.

Friday Mar 14, 2008

Like every year, our local town calls upon its town people to step up and vote for a set of articles that were published beforehand in the town report. Articles are very concrete, well documented proposal, mainly around raising funds for causes like a new police cruiser, a new bridge that was washed away in the floods two years ago, the creation of a reserve fund for the costs that the town will incur in the 2010 revaluation exercise. You get the idea. In most cases, it means your personal property taxes will go up if you vote "Yes".

Our town meeting was this week, and for the first time, Kimberley and I attended. It was an astounding experience from a couple of angles.

Let's talk about the format of the meeting first. Imagine a decent sized town hall, with about 150 attendees, eager to get going. You need some form of process to make sure you get decision on the 20 or so articles in a timely, and most importantly, predictable manner. I was thinking about how this can be done before the meeting, but never envisioned to see what kind of well oiled machine I will encounter. The system that was used is called "Robert's Rule of Order". Suffice to say, that the "Robert rules" provide a superb framework for the meeting. Have a look at the website, it's impressive. I bought the book already.

Secondly, it was the high degree of asserting direct power of the future of our town. Money does talk, and we had our voice. Next time, I see our local police officer drive past in his new cruiser, I know I paid a piece of the $33.000 machine. Good stuff, and so much more direct that the income tax we pay to the IRS.

Lastly, I was especially interested in the articles that were voted down. Like this one. The planning board asked for an additional $10.000 to pay for a traffic research at one of the local intersection. One could sense, that the crowd had no desire to spend any more money on this issue. I actually think, people did not see that there was a problem to start with, despite the passionate pleas of the representative from the planning board.

Oh, there was a bit of a bonus as well. The state capital newspaper covered the event, and Kimberley and I ended up on the front page of the Concord Monitor the day after the meeting. Check out the picture below, we're the couple on the balcony just left of the centre.

Thursday Feb 28, 2008

I just finished listening to the excellent and highly entertaining web event labeled "Sun's Open Archive Solution" brought to you to Sun's Executive Vice President John Fowler.

One thought that really provoked my mind was the notion of customers requesting to keep data forever. Forever is a long time. And if you like me have a friend with a PhD in Geology, a really, really long time. In our industry, innovation brings rapid change, and more often than not this means that there's absolutely no guarantee that the most critical data you have today can be processed by the technology we have in 10 years, let alone forever. As an example, I have here in my hands a floppy disk written by Wordstar running on CP/M some approximately some 25 years ago. If I where to need this data, I'd have a hard time to get it, and that's only 25 years ago.... hardly "forever".

Therein lies the challenge if you are in the data archiving business, as Sun is with the Sun StorageTek 5800 (aka Honeycomb). How do you implement "forever" ?

If you believe this BBC article, the 5,500 year old pottery piece is the oldest writing of which today's scientists believe to indicate the content of the jar. Think about it. Some archaeologist in the year 7,500 A.D. unearths my floppy disks. What would he believe the content to be ? To give our archaeologist a fighting chance, we need to give him the ability to retrieve the data, to understand the data format, and/or a way to interpret the data. If you as an technology provider keep all this intelligence to yourself, you surely have no right to claim "forever" as your goal post. In that light, for Honeycomb to be open-sourced make perfect sense.

Good luck my future archaeologist.

Friday Aug 10, 2007

From time to time, I do get a bit homesick for England. I've got a lot great memories from the "Dear Old Blighty". In another such moment, I came across one of those numerous "English for Americans" website, which made me laugh. I'm still being regularly caught using English phrases with my American friends. Here's one for you folks to figure out what this gentlemen emailed to his friend :

"Hi, the wife and I went out for some posh nosh last night, followed by a knees up. We were a bit late, so we gave it welly. At the roundabout, just before the restaurant, we met our mate Gary and had a bit of a chin wag. He said he flogged his juggernaut to his mate for six thousand quid. That throws a spanner in the works for us. We wanted to use it to move our trolleys. What a load of cobblers.

The grub at the Indian was the bee's knees. The geezers had some original stout. Had 6 pints of it and got sloshed. Woke up knackered at Her Majesty's pleasure. The misses had to give me a lift. You can imagine how big a wobbly she threw. We had an almighty row.
Anyway, skived from work today. Gave Gary another bell. He told me I can have his other lorry for a couple of days. Hunky-dorey, we don't need to shell out the dosh for a hire car after all.
Come round later for a pint at the pub. Let's suss out when we can shift that gear.
Cheers, mate."

Comment if you think you know what's going on here.

Wednesday Jul 25, 2007

When I recently had to put together a skill building plan for storage technologies, I came across this excellent article about the basics of SAN. This is for you if you want to get your basic heading on the key concept of SAN and FibreChannel. Worth a read.

Kudos goes to Tim Thomas, whose blog is worth reading as well.

Tuesday Jul 17, 2007

Day 3. With the finishing line in sight, the day started with a Storagetek partner panel, followed by a customer panel and a presentation of Dan Berg, VP and CTO in EMEA.

I passed on the first slot of the breakout sessions today, to spend some time with my Sun storage peers. We had a great and sometimes heated discussion around the topic of ISVs in the context of Sun's storage business. We also compared notes on our lifestyles, especially interesting since we had participation from the UK, Germany and the US. Good fun, we laughed a lot. Thanks, Tim and Stefan.

My next presentation was on the capabilities of Sun's high-end disk array offering, the Sun Storagetek ST9900 family. This is a product Sun brings to market in partnership with Hitachi. The latest incarnation of this product is the Sun Storagetek ST9990V. The “V” represents the virtualization capabilities of this new model. In laymen terms, this allows you to divvy up a single ST9990V to make is look like distinct, independent disk arrays. One used case for this was a bank that wanted to separate mission-critical data from the rest of the business.

But it was two numbers that caught my eye. The ST9990V is capable of 3.5M IOPS, and with 300 GB drives can reach a maximum capacity of 332 TB. Wow, that's a lot of storage.

The last session of the day was presented by Chris Wood, CTO of Sun's storage practise. Chris managed to put my head straight around the terms of data availability and data protection.

Chris defined “Data Availability” as : “data is accessible to the application whenever needed and at the required performance”. That obviously means that just because data is archived on tape does not mean it's available. It might rest safe and secure in a vault, miles away from any tape drive.

On the data protection, he warned the audience about jumping to the answer to quickly. He urged us not to jump to the product pitch too quickly, and instead ask our customers what they want their data to be protected from. Is is data loss, corruption, hackers, operator errors, bit rot ? Depending on the needs, your answer might be very different. I remember Chris using an example where a customer used tape backup to prevent data loss, yet a recent data corruption meant that the customer was backing up bad data again and again.

On the topic of data loss, Chris had a great sound bite, I've heard before, but I still liked it. It goes as follows. There are only two types of disks : Those that have failed, and those that are about to.

Chris had some excellent customer examples. One of which, he demonstrated how a customer used SAMFS to consolidate their backup strategy onto two SL8500, and reclaim 21% of floor space in their data centre. Cool stuff.

That was the last session. Storage Academy 2007 is over. I had a great time, and can't wait for next years event.

Day 2 of the Storage Academy was all about learning in break out sessions. There were eight session happening in parallel. The choice of session turned out to be an difficult task. There was so much great material to be explored that I had a hard time picking the one that I thought was right for me. Then, I recalled my guiding principle : “pick the topics that you know least about”.

With that in mind, I first picked Christian Bandulet's presentation labeled “Understanding Object Storage”. What a great choice that was. I only had a rough idea what Object Storage was though my exposure to Honeycomb (aka Sun Storagetek 5800), and the freely available Honeycomb emulator, but I definitely lacked context here. Christian corrected this in a presentation that managed to mix theory and implementation of CAS (content aware storage) well.

My big takeaway from this session : the world's data consists of mainly unstructured, fixed content (use Email as an example), which would be better stored in objects that managed the data and its meta-data and has capability to easily store and retrieve object autonomously in a flat hierarchy. By doing so, we can build storage that can behave intelligently. For example, instead of asking a disk array to retrieve a block, we can now query storage for all the Xray images of patients which early signs of breast cancer who lived in the New Hampshire since 2005 and where incorrectly diagnosed. And by the way, we want this images to be converted into JPG format on the fly.

Next up was my dear friend Tim Thomas' presentation aptly named “Storage ISV Solutions for Sun's Breakthrough Storage Products”. I leave my report blank here in the hope Tim will blog about his presentation himself. He said he would (or was it the we said he should :-). Anyway, watch Tim's blog here.

In the spirit of learning about technology I know little about, I attended a session on performance tuning of Sun storage library, the SL8500. Jacques Villain and Steve Johnson talked us through the recent performance improvements of the SL8500 in comparison with the 9310 (aka Powderhorn). Jacques talked us through how the SL8500 can be partitioned. I guess virtualization has made its entrance into the tape library world as well. I have to say, high-end enterprise tape library technology for me is a world on his own. Beware here, never make the mistake to assume that tape libraries are at the opposite spectrum of bleeding edge. Tape is still the most cost-effective long-term storage medium. But cost is just one dimension. More and more compliance laws are being released that force enterprises to keep more and more data for longer and longer periods of time. And when it comes to capacity (the SL8500 can hold up to 2,048 tape drives) and reliability (2,000,000 mean exchanges/swaps between failure per HandBot), the SL8500 is a formidable force. The sheer engineering competence needed to design a device like the SL8500 is mind-boggling.

I ended the day listening to Peter Brouwer's presentation on Sun's Common Array Manager (CAM). The learning I took away from this presentation is that CAM is the single place to manage all Sun storage disk devices easily and efficiently. I also learned that CAM has a “profile” feature which allows the storage to be pre-configured for a specific workload. Imagine you have an OLTP type of workload with on Oracle database. Just select “Oracle_OLTP”, and CAM will configure RAID level 1, segment size of 512KB, and enable read ahead. Don't like what CAM chose ? Just clone the profile, change it and store the profile under your own name. Of course, now you can apply your profile to all other arrays. Easy.

The evening event was a dinner for all attendees of the Storage academy. I sat with a group of Sun reseller partners and Sun employees from Denmark. These guys know how to have fun. That's all I can say :-)

To bed at 1am. I am very tired. Can't wait for day 3.


Thursday Jul 12, 2007

I have the great pleasure to attend this week's Storage Academy in Frankfurt, Germany. I thought that I use this medium to process some of the personal learning and impressions I made today, and hope this is of interest to you.

Firstly, a couple of words about the location and event organization. The Storage Academy EMEA takes place at the Sheraton Hotel next to Frankfurt's main airport. From my hotel room, I can overlook the main terminal and the runways of the airport. This is great, I love it. The hotel is a perfect location for a conference of this size (approximately 600 Sun employees and partners are present). For attendees arriving by air, it's a short walk from the terminal. No need for a rental car. I like that. The hotel staff is clearly experienced to cater for an event this size. Top marks here for everything related to event infrastructure.

Today was packed mainly by a series of general sessions from Sun's top storage brass, and a list of Sun's key storage partners. This was a lot of stuff to digest. Some of which was new, some not, but all very interesting. Here are some highlights.

We heard again about the importance of Sun Storage to the overall business success of Sun as a systems company. Sun is organized around the 4S (four Esses)- System, Storage, Software, Service - all contributing to Sun's key mission of solving customer problems through innovation, and plenty of innovation can be found in the storage world. Just look at FISHworks, Honeycomb, Thumper and ZFS.

Within the storage "S", the huge product portfolio can be categorized as tape, disk and breakthrough. I'll blog some other time about the three categories.

A big thing in storage land, is thin provisioning. Hugh Yoshida, CTO Hitachi Data Systems, gave us an excellent insight in how the new ST9990V can use thin provisioning to increase storage utilization. Great talk, Hugh, thanks for that.

In the breakout session, I chose a presentation about a topic I knew absolutely nothing about : Sun VTL. Oversimplified, VTL is a server/storage/OS/application combo that pretends to be a tape drive. It takes backup data and stores them on disk with the option to move them to tape later. Great solution if you need to reduce you get close to exhausting your backup window.

I also managed to spend a bit of time at the partner pavillon. I got a demo of G10's IP video surveillance solution, based on Sun Fire x4500. They use the Thumper for video stream processing and for recording the streams into video files. Tons of data, tons. What a great use of the x4500. What struck me as a great feature, was the capability to do video recognition on programmable patterns. As a example, I was shown a video of an attacker raising his hand for a punch on an innocent victim. The motion of the aggressor making a fist and moving it towards the face of the victim was detected by the application and marked with a red rectangle on the screen. If you are a police officer sitting in front of a wall of surveillance screens, this could easily alert you in real time of a wrong doing. Endless other possibilities on how this can be used came to mind without having to think too hard (a hallmark of any great innovation).

That's it for now. I'm getting tired, and I want to be fully awake for another learning day at Storage Academy tomorrow. Would not want to miss a beat here.

Thursday Mar 22, 2007

I admit, I am a fan of UNIX tools.   But what does that mean?   Here's my interpretation:

**   A tool does one thing only, and does it properly.

**   If you need to do more complex things, chain a set of tools together.   After all that's what pipes were invented for command line tools are best   (granted, OpenOffice is cool).

**   Tools come as source tar balls with simple Makefiles.   If you can't find an existing tool, take one that as close as possible to your needs and change it to fit.

**   Compilers are your friends.

**   Graphical desktops are ok, because they allow you open a bunch of terminals to start a more tools at the same time.

**   Documentation comes in form of a "-h" option of the tool.   If that's not enough, "Use the source, Luke".

You might think that in today's times of modern GUIs and its slew of colorful applications, this is an old-fashioned ineffective view of the world.   Not so, I would like to propose.

Let me make an example.   Recently, I needed a tool to figure out the network bandwidth between two machines with a different CPU architectures: SPARC and x86.   Luckily both systems run Solaris (actually, with over 4 million Solaris downloads this has probably little to do with luck alone).   One was a Sun Blade 150 machine, the other one an old Dell PC.

Where would you start?   Oh well, I guess you start where everybody starts with anything nowadays:   Google.   Try it for yourself,....   google for "UNIX tool to measure network bandwidth" and three clicks later you find yourself at http://dast.nlanr.net/Projects/Iperf/ reading about Iperf.

It took me 5 minutes to determine that this is the tool that fits my needs.   I downloaded the sources, unpacked them on both machines, changed "Makefiles.rules" to use the free Studio 11 C/C++ compilers, ran "iperf -s" on one box and "iperf -c

92.6 Mbit/s.

TTQtoA (Total Time from Question to Answer):  20 min
Software Costs (incl. all tools):             $0
Job satisfaction:                             High

Iperf, one example of the right tool for the job.

If you can beat this time, please let me know.   I'd love to learn how.

Tuesday Aug 29, 2006

Dentists are not system admins[Read More]

Thursday May 25, 2006

Type in "define:office" into the search engine at google and you get :

place of business where professional or clerical duties are performed;

If that is true, my office today is the San Jose International Airport in San Jose, Califiornia. How come ?

Over the last two days, I attend a planning meeting for next financial year. This is about working on the goals for my team. Today was the day I was supposed to return home. The plan was to fly from San Jose to Chicago O'Hare and connect to a short flight to Manchester, New Hampshire, for a late arrival - a trip I've done several times before. When I arrived at the airport this morning, I was told that my flight to Chicago was delayed by 4 hours, meaning that I will miss my connecting flight home. I will have to stay in Chicago for the night to catch an afternoon flight to Manchester tomorrow. Equipped with my laptop, a tmobile wireless connection, my Sun email on IMAP over SSL, and my cell phone, I've now been working for 3 hours from a waiting bench in the departure lounge of the airport. This is weird experience. I _am_ productive, there are few interuptions, food/drink are within easy reach, there was just nothing that I did not have to do my work.

I guess we need to redefine the word "office" to

any time, any place where professionals are connected to perform business duties

Welcome to the participation age.