The Sun BabelFish Blog

Don't panic !

Sunday Dec 18, 2005

limitation of iPhoto: how RDF can help

My mother has pushed iPhoto to its limits and beyond. She has taken so many pictures, over the last couple of years, that it brought iPhoto to a crawl, filled up her disk, and nearly made her loose all her photos as she struggled to move some of them onto an external hard disk.

The problem with iPhoto (and for that matter iTunes) is that it does not make it easy to move the storage of content from the internal hard drive of a laptop to an external hard drive, whilst maintaining the content browseable in a transparent manner. For end users it should not be important where the content lives. The application should be able to signal if the information is off-line by greying out the representation of the content (be it a photo or an iTunes song title), and be able to notify the user which storage medium needs to be placed online to access the data. Moving data from one storage medium to another should be as easy as just moving all the files directly using Explorer or the command line mv command, or clicking on the files in the media browser (think of iTunes or iPhoto as just specialised media browsers) and asking them to be archived somewhere. The media browser should also immediately recognise duplicate content, and if possible use the content that is easiest or cheapest to access.

In order for this to work at both the file system and the browser levels one probably does need to keep track of file metadata. Most file systems now have the ability to do this [1]. Each file needs some unique tag, like the atom id [2]. Moving a media file from one location to another would then be expressible as relations on the id. The media browser would be able to identify content by its id, keep track of where content has moved to, keep track of duplicates, maintain other metadata information about the file in its index even if it no longer has access to the file itself. iTunes and iPhoto would then just become specialised media browsers for information located anywhere: on the local hard drive, on an external hard drive (such as a small iPod or a larger 500GB external backup drive), or on the web at either a WebDav location (such as an iDisk), a even a simple http url writeable via the Atom protocol, ftp or any other well known method.

So let us look more carefully at file metadata. When metadata is attached to a file it is attached as a property value pair. So as explained in the ars-technica article [3], you can list the properties on a file using the separately downloadable xattr utility

% xattr --list file.txt
file
        color   blue
         name   John

If you think about it this is just the same as saying

≤./file.txt≥ color "blue".

IE. The property value pair are properties of the file.txt file refered to via a relative url. Once we make the 'implicit' subject of the property value pair explicit we have a triple. The nice thing about the explicit subject format, is that it is then possible to keep metadata of data that is not stored locally. So it becomes possible to store metadata for files anywhere on the web. For example on one's ipod [4]:

≤file:/myipod.pod/Songs/SadAndLonesome.mp3≥   artist  "John Lee Hooker".
≤file:/myipod.pod/Songs/SadAndLonesome.mp3≥   album "Sad and Lonesome".
≤file:/myipod.pod/Songs/SadAndLonesome.mp3≥   genre "blues".

or in short N3 notation

≤file://myipod.pod/Songs/SadAndLonesome.mp3≥ artist "John Lee Hooker";
                                             album "Sad and Lonesome";
                                             genre "blues".

All we now need is to make the properties be URLs (or URIs) and we have RDF [5]. The advantage of this is huge:

  • URLs are universally unique: no name clashes
  • by clicking them one can get their meaning (eg the foaf:knows relation)
  • they can be created in a distributed fashion (no central naming authority bottleneck other that that allread provided by DNS)
  • RDF is an open standard given to us by the most successful standard organisation around: the world wide web consortium
  • It automatically comes with a query language
  • It is very well defined (full mathematical support, see the RDF Semantics and OWL Semantics)
  • Above all else it is easy to understand (you don't need to understand the maths above to get it, all you need is what I explained in the paragraphs above)
  • Very easy to use: command line wise you would just need to allow the user to define a number of name space abbreviations in order to make it easier to read the output. Something like:
    export RDF_NAMESPACE=foaf[http://xmlns.com/foaf/0.1/]:doap[http://usefulinc.com/ns/doap#]
  • It all designed with inferencing built in. No need to retag a whole bunch of properties with a new name. You can just add the triple p1 owl:sameAs p2 in your database. But best of all: inferencing is not required! So you can start immediately, in the knowledge that you are working in a framework that has a lot of room to grow.
  • efficient: you don't need to store URLs for every property of course. The file system would just store some unique number that point to an index into a url table. As long as the user only sees the urls and it is consistent, there is no problem.

So there is crystal clearly continuity between file metadata and RDF that it would be crazy not to make use of. For one it could help make browsers such as iPhoto or iTunes a lot more useable by my mother, and do so in an open way that works seamlessly with the whole architecture of the web. And I am only scratching the surface of what is possible here.

So perhaps Hal Stern is correct. Web 2.0 is the read write web + metadata.

Notes:

  1. Some pointers for metadata on various file systems:
    • Apple has reintroduced metadata support as explained in the execellent Ars Technica review of OSX 10.4
    • Microsoft is going to be releasing a metadata enabled file system in the next version of windows, WinFS
    • Solaris has xattrs metadata support (I learnt recently) try the runat(1) man page.(Thanks to Tim Foster for the reference) And so will the next version of NFS.
    • There is also an interesting article on metadata and ReiserFS
  2. see section 4.2.6 of RFC 4287.
  3. page 7.
  4. I am not sure what the correct url for a device such as an ipod would be.
  5. Not surprisingly RDF stands for Resource description Framework, and a file is the most basic form of resource in computing. Perhaps the only difference between RDF and property value pairs is that property value pairs usually allow only one property of a type per subject, whereas RDF allows for multiple properties of a type per subject. But this should be something on can work one's way around.

Comments:

Try Picasa.

Posted by richard kenyon on December 18, 2005 at 06:05 PM CET #

Yep, xattrs for Solaris work fine - have a look at the runat(1) man page...

Posted by Tim Foster on December 18, 2005 at 11:22 PM CET #

Curious about how many photos you are talking about as well as the system specs. I ha ve about 20,000 pics on my apple, and iphoto doesn't evern blink at it. All pics are at least 5megapixels as well (most are 6-7).

Posted by 68.239.173.246 on December 19, 2005 at 02:31 AM CET #

My mother has just gone to Salzburg, Austria for Xmas. So I don't have the exact details handy right away. I have not upgraded her to OSX Tiger yet, as I have been travelling quite a lot lately.

But if her drive is like my hard drive and we average 1.5MB per picture, then 1000 pictures would be 1.5GB and 10 thousand pictures would be 15GB. Now if you add to that at least 40GB of music in iTunes digitised from their whole life of CD collections, then we are at 55GB of disk used up on the laptop, which is not enough for her to make movies of my nephew Joshua and Louis, which she spends a lot of time on, and which takes up a lot of space on the disk.

She does not really listen to all her music, so it would be nice if she could just store the music she likes less on an external hard drive. But there is no easy way to explain how to do this to her. The same is true of her photos. If she could just place all the non starred photos she has on a bulk drive, she could save a lot of space. Again, there is no easy way to explain how to do this. I can copy directories over and make symbolic links to it, but this is not going to be very stable, is not well integrated into the whole user experience and is certainly not something she is going to be able to follow.

Some PC dude took over as her system admin when I was travelling and decided the best way to do things was to browse the files via the file system using explorer. Well that is clearly one way to do things, and one that may seem immediately obvious to PC people who clearly don't have that much experience in ease of use. But I think Apple can do better, and so can the Unix crowd as a whole: by using metadata in a more general way, creating an index of this metadata in the way apple has done with Spotlight (though I would have a SPARQL interface on the data of course), one could have something very general, that escapes from the limits of the hard drive and encompasses the whole internet as one medium. So I think this is worth exploring whatever commerical solution I find for her provisionally.

I'll post some more details about the number of her photos when I know it. And I'll also see how using Tiger helps.

Posted by Henry Story on December 19, 2005 at 03:51 AM CET #

Note on comments:

Post a Comment:
Comments are closed for this entry.

Search

Recent Entries

Navigation

Referers