The Sun BabelFish Blog
Don't panic !
limitation of iPhoto: how RDF can help
My mother has pushed iPhoto to its limits and beyond. She has taken so many pictures, over the last couple of years, that it brought iPhoto to a crawl, filled up her disk, and nearly made her loose all her photos as she struggled to move some of them onto an external hard disk.
The problem with iPhoto (and for that matter iTunes) is that it does not
make it easy to move the storage of content from the internal hard drive
of a laptop to an external hard drive, whilst maintaining the content
browseable in a transparent manner. For end users it should not be
important where the content lives. The application should be able to
signal if the information is off-line by greying out the representation
of the content (be it a photo or an iTunes song title), and be able to
notify the user which storage medium needs to be placed online to access
the data. Moving data from one storage medium to another should be as
easy as just moving all the files directly using Explorer or the command
line mv command, or clicking on the files in the media
browser (think of iTunes or iPhoto as just specialised media browsers)
and asking them to be archived somewhere. The media browser should also
immediately recognise duplicate content, and if possible use the content
that is easiest or cheapest to access.
In order for this to work at both the file system and the browser levels one probably does need to keep track of file metadata. Most file systems now have the ability to do this [1]. Each file needs some unique tag, like the atom id [2]. Moving a media file from one location to another would then be expressible as relations on the id. The media browser would be able to identify content by its id, keep track of where content has moved to, keep track of duplicates, maintain other metadata information about the file in its index even if it no longer has access to the file itself. iTunes and iPhoto would then just become specialised media browsers for information located anywhere: on the local hard drive, on an external hard drive (such as a small iPod or a larger 500GB external backup drive), or on the web at either a WebDav location (such as an iDisk), a even a simple http url writeable via the Atom protocol, ftp or any other well known method.
So let us look more carefully at file metadata. When metadata is attached to a file it is attached as a property value pair. So as explained in the ars-technica article [3], you can list the properties on a file using the separately downloadable xattr utility
% xattr --list file.txt
file
color blue
name John
If you think about it this is just the same as saying
≤./file.txt≥ color "blue".
IE. The property value pair are properties of the file.txt file refered to via a relative url. Once we make the 'implicit' subject of the property value pair explicit we have a triple. The nice thing about the explicit subject format, is that it is then possible to keep metadata of data that is not stored locally. So it becomes possible to store metadata for files anywhere on the web. For example on one's ipod [4]:
≤file:/myipod.pod/Songs/SadAndLonesome.mp3≥ artist "John Lee Hooker". ≤file:/myipod.pod/Songs/SadAndLonesome.mp3≥ album "Sad and Lonesome". ≤file:/myipod.pod/Songs/SadAndLonesome.mp3≥ genre "blues".
or in short N3 notation
≤file://myipod.pod/Songs/SadAndLonesome.mp3≥ artist "John Lee Hooker";
album "Sad and Lonesome";
genre "blues".
All we now need is to make the properties be URLs (or URIs) and we have RDF [5]. The advantage of this is huge:
- URLs are universally unique: no name clashes
- by clicking them one can get their meaning (eg the foaf:knows relation)
- they can be created in a distributed fashion (no central naming authority bottleneck other that that allread provided by DNS)
- RDF is an open standard given to us by the most successful standard organisation around: the world wide web consortium
- It automatically comes with a query language
- It is very well defined (full mathematical support, see the RDF Semantics and OWL Semantics)
- Above all else it is easy to understand (you don't need to understand the maths above to get it, all you need is what I explained in the paragraphs above)
-
Very easy to use: command line wise you would just need to allow the
user to define a number of name space abbreviations in order to make
it easier to read the output. Something like:
export RDF_NAMESPACE=foaf[http://xmlns.com/foaf/0.1/]:doap[http://usefulinc.com/ns/doap#] -
It all designed with inferencing built in. No need to retag a whole
bunch of properties with a new name. You can just add the triple
p1 owl:sameAs p2in your database. But best of all: inferencing is not required! So you can start immediately, in the knowledge that you are working in a framework that has a lot of room to grow. - efficient: you don't need to store URLs for every property of course. The file system would just store some unique number that point to an index into a url table. As long as the user only sees the urls and it is consistent, there is no problem.
So there is crystal clearly continuity between file metadata and RDF that it would be crazy not to make use of. For one it could help make browsers such as iPhoto or iTunes a lot more useable by my mother, and do so in an open way that works seamlessly with the whole architecture of the web. And I am only scratching the surface of what is possible here.
So perhaps Hal Stern is correct. Web 2.0 is the read write web + metadata.
Notes:
-
Some pointers for metadata on various file systems:
- Apple has reintroduced metadata support as explained in the execellent Ars Technica review of OSX 10.4
- Microsoft is going to be releasing a metadata enabled file system in the next version of windows, WinFS
- Solaris has xattrs metadata support (I learnt recently) try the runat(1) man page.(Thanks to Tim Foster for the reference) And so will the next version of NFS.
- There is also an interesting article on metadata and ReiserFS
- see section 4.2.6 of RFC 4287.
- page 7.
- I am not sure what the correct url for a device such as an ipod would be.
- Not surprisingly RDF stands for Resource description Framework, and a file is the most basic form of resource in computing. Perhaps the only difference between RDF and property value pairs is that property value pairs usually allow only one property of a type per subject, whereas RDF allows for multiple properties of a type per subject. But this should be something on can work one's way around.
Posted at 02:02PM Dec 18, 2005 [permalink/trackback] by Henry Story in SemWeb | Comments[4]
Note on comments:
- I know the forms below are a little small. We have asked for years for this to be changed, but I don't think it's going to happen soon. In Apple's Safari you can resize the entry box with you mouse. For people using other browsers click on this javascript link, that should allow you to resize your form.
- Comments are moderated, so they will take a little time to appear. Currently moderation means I have to read them personally. Hopefully with OpenId deployment, this will become more automated.
- HTML markup no longer works here, due to some decision made somewhere. Sorry about that.
- If you are having trouble posting, it may be that you need javascript to be enabled. I don't think javascript should be needed for submitting a form, but that's the way it is here.
- Check your comments by using the preview button...

Posted by richard kenyon on December 18, 2005 at 06:05 PM CET #
Posted by Tim Foster on December 18, 2005 at 11:22 PM CET #
Posted by 68.239.173.246 on December 19, 2005 at 02:31 AM CET #
But if her drive is like my hard drive and we average 1.5MB per picture, then 1000 pictures would be 1.5GB and 10 thousand pictures would be 15GB. Now if you add to that at least 40GB of music in iTunes digitised from their whole life of CD collections, then we are at 55GB of disk used up on the laptop, which is not enough for her to make movies of my nephew Joshua and Louis, which she spends a lot of time on, and which takes up a lot of space on the disk.
She does not really listen to all her music, so it would be nice if she could just store the music she likes less on an external hard drive. But there is no easy way to explain how to do this to her. The same is true of her photos. If she could just place all the non starred photos she has on a bulk drive, she could save a lot of space. Again, there is no easy way to explain how to do this. I can copy directories over and make symbolic links to it, but this is not going to be very stable, is not well integrated into the whole user experience and is certainly not something she is going to be able to follow.
Some PC dude took over as her system admin when I was travelling and decided the best way to do things was to browse the files via the file system using explorer. Well that is clearly one way to do things, and one that may seem immediately obvious to PC people who clearly don't have that much experience in ease of use. But I think Apple can do better, and so can the Unix crowd as a whole: by using metadata in a more general way, creating an index of this metadata in the way apple has done with Spotlight (though I would have a SPARQL interface on the data of course), one could have something very general, that escapes from the limits of the hard drive and encompasses the whole internet as one medium. So I think this is worth exploring whatever commerical solution I find for her provisionally.
I'll post some more details about the number of her photos when I know it. And I'll also see how using Tiger helps.
Posted by Henry Story on December 19, 2005 at 03:51 AM CET #