Tuesday Feb 12, 2008

Last week I went over to Bentley college to talk about recommender systems for a class on user-centered design.  Not because I know anything about user-centered design - quite the opposite.  I'm hoping that some of the students in the class will apply their user-centered design skills to help us improve the user experience of our recommender.   It was fun talking to the class,  Managing a User-Centered Design Team.   The students  asked great questions, and  seemed  ready to  dive in.

I've promised to leave some pointers to some background reading for usability and recommender systems, so I'll be posting links in this blog post as I come across them.  

The first article is Interaction Design for Recommender Systems by Kirsten Swearingen, and Rashmi Sinha.  This article presents a study of user interaction with 11 online recommender systems.  Of particular interest is the observations about the role of transparency in  recommendations.  (Note also that author Rashmi Sinha has written some excellent articles about tagging as well: A cognitive analysis of tagging, and a Social Analysis of Tagging).

Update: Andre Vellino suggests Nava Tintarev's "A Survey of Explanations in Recommender Systems"

Thursday Feb 07, 2008

There's an ever growing number of videos on the web.  YouTube seems to have an endless supply of everything from professional documentaries to phonecam captures of the latest campus tasing.  Despite this video glut, finding interesting videos is not so easy.  My current approach is to go to YouTube or Digg and just watch the most popular videos of the day.  It's not too satisfying.  A video of a 30 second car crash  has a much better chance of making these charts than a thoughtful, well-crafted film. Clearly tools for helping us explore this long tail of video will be increasingly important.

ffwd is  new service (still in private beta), that is hoping to help you find discover web video.   I've taken it for a quick spin.  They have a pretty nifty enrollment process - where you click on shows that you like so they can get an idea of your taste.   Once you've selected your shows they assign you a video personality based upon your selections (I'm a 'comedy writer', apparently I like comedies).   Once you are enrolled you can start to discover videos.

The video discovery is labeled as 'alpha' and they aren't kidding. The recommendations don't seem to be related to what I like at all.  They offered reality shows (The Apprentice, Extreme Makeover Home Edition), weight loss success stories from San Antonio,  a 'sexy webcam girl', nothing that I was interested in. The videos are coming from video sharing sites like iFilm and YouTube.   Since they are in alpha, we can forgive them their recommendations, especially since they  do seem to be looking for the right type of people to build their recommender.  They have a job posting for an AI expert with skills like:

  • computational linguistics
  • collaborative filtering
  • behavioral profiling
  • relationship mapping
  • semantic clustering
  • symbolic systems
  • Bayesian statistics
  • feedback analysis
  • personalized search

 That's a pretty good sign.    I think ffwd is targeting a real void - video discovery on the web - it will be interesting to see if they can do better than what we see now on Digg or YouTube.  Oh yeah, The ffwd blog is pretty interesting too.

Monday Feb 04, 2008

In a new article in the Guardian, Jemima Kiss suggests that "If web 2.0 could be summarized as interaction, web 3.0 must be about recommendation and personalization".  Jemima hits the nail on the head when she says that privacy, credibility and trust are more important than ever for the next generation of tools we'll use to find content on the web.  She also wonders what the implications are for the music critic.  Will the machines replace the music review?  Personally, I don't think that will happen, but the dynamics of power certainly our changing.  The listening habits of millions of people can be strong indication of what is 'good music' - perhaps more so than the opinion of the few critics in the ivory towers.

Thursday Jan 31, 2008

I went to Amazon during lunch to buy a book recommended by Tim Spaulding over at the LibraryThing Thingology blog:

 

While I was there I noticed Amazon recommending this book:

 

 which I couldn't resist. When I added it to my cart,  I was also offered this recommenation:

 

which again, I couldn't resist. Amazon then suggested this book:

I really like some of the things that Elias has done with Processing - so I added that one to my cart too.

At that point, I quickly hit the 'Check Out' button before I went broke.  Amazon's recommender turned my $27 order into a $121 order.  Multiply that by a million and it is easy to see why companies like Amazon and Netflix are investing so much in recommender systems. 

Sunday Jan 20, 2008

tagurself is a widget that displays your interests as a tag cloud.  You give tagurself a URL to an APML, RSS, blog or web page and it will give you a web 2.0 tag cloud.  It works especially well with APML - for example here's the tagurself cloud based upon my last.fm music interests (using the tastebroker to generate my listening APML).

 

Wednesday Dec 19, 2007

If you go to one of the many social music sites out there and get 'similar artists' recommendations for Jimi Hendrix.  You are likely to get a list such as the one you get from Last.fm:

 

There's no arguing that this is a good list - but it is also a rather diverse list.  Eric Clapton's blues guitar is quite different from the psychedelic acid rock of the Doors. I'd really like to know a bit more about the recommendations - in particular I'd like to know why a particular artist was recommended. This can help me gain trust in the recommender as well as help direct me to artists based on criteria that  are most relevant to me.  Unfortunately, most recommenders are based strictly on consumption habits, so the only recommendation explanation available is the Amazonian "People who listened to Jimi Hendrix also listened to Eric Clapton and the Doors"  - which is not too helpful for me.

We want to make recommendations transparent - so that you can ask 'why did you recommend this' and get a useful answer beyond the typical 'people who bought X also bought Y'.  In order to generate transparent  recommendations you need to have some understanding of the content - there has to be some way of knowing that Hendix tracks typically contain distorted guitar, for example.  Companies like Pandora, with their music genome project have spent years analyzing to hundreds of thousands of tracks, assigning 400+ attributes to each track. This lets Pandora give those great explanation that makes them so popular with their users:

Pandora can generate these excellent explanations because they've taken the time (and spent lots of money) to listen to and extensively label all of their music.  Most companies won't have the time, money or patience to do this - and even Pandora that is committed to this approach, can't keep up with the volume of new music that is generated every year.  Luckily, there are other sources of content description that we can use to generate recommendations.

One such source are social tags.  Social tags have been all of the rage in the web 2.0 world.  Sites like del.icio.us, Flickr and Last.fm demonstrate how the such tags can be extremely useful for searching and organizing content, especially non-text content.  Social tags can also be used to give us good transparent recommendations.  

Here are the most frequent social tags applied to Jimi Hendrix at Last.fm:

 

Now tags like 'rock' and 'blues' occur rather frequently for many artists, so are less descriptive than tags like 'guitar god' which occur infrequently across artists.  So we can take this into account and generate a list of the most distinctive tags for Jimi Hendrix.  These are tags that occur rarely across the entire set of artists, but occur frequently for Hendrix.  (This is the classic TF-IDF term weighting technique).  Here are the distinctive tags for Hendrix:

The less descriptive tags such as 'rock' have fallen off the list while extremely targeted tags like 'jimi hendix' and 'acid rock' have risen to the top.  This list is a much better description of Hendrix than the 'Frequent Tags' list and can serve as the basis for our transparent recommendations.

With these distinctive tags, we can generate recommendations based upon the cosine distance of these distinctive tags to the distinctive tag sets of other artists.  This type of recommendation, based upon the similarity of distinctive tags, gives us surprisingly good results.  My colleague, and resident neologist, Steve, coined the term 'tagomendations' for recommendations based on the social tags.  Here are the tagomendations for Jimi Hendrix (ordered by artist popularity):

Interestingly (and surprisingly) our tagomendations compared favorably to recommendations generated with the more traditional collaborative filtering techniques when we evaluated them in a survey.  And of course, with these tagomendations based upoon distinctive tags we can now explain why a recommendation was made, based upon how the distinctive tags for two artists overlap.  

For instance, for the Clapton recommedation, we can look at how the distinctive tags for Clapton and Hendrix overlap:


Clearly if you like Hendrix because of his blues guitar playing, you might want to give Clapton a listen. Compare this to the overlapping tags of The Door and Hendrix:

This is quite a different vibe - with the focus on psychedelia of the 60s.

These tag clouds showing how the distinctive tags for recommended artists overlap with the seed artist give me the opportunity to explore the recommendations based on my taste.  If I like Hendrix because I am a fan of face-melting guitar, I will quickly find artists, like Joe Satriani, Gary Moore, or Steve Vai - but if the reason I like Hendrix is because of the 60s, psychedelic vibe, I may find Jefferson Airplane or Steppenwolf more to my taste.

That's it in a nutshell - how we are using social tags to generate transparent,  explainable  recommendations.   And by the way, to do the heavy lifting for our Tagomendations  we are using the text search engine called Minion, developed by the Advanced Search Technology group here at Sun Labs.  Minion is a high quality, highly configurable search engine that is perfect for doing these types of experiments.  Look for Minion, It's coming to an open source repository near you very soon.

Likewise, we hope to release our web-based music explorer that can generate transparent tagomendations soon.  Here's a little bit of what it looks like:

 


Monday Dec 03, 2007

Last week, Google quietly added recommendations to Google Reader.   Now if you go to your google reader home page, you'll find a 'Top Recommendations' section:

Clicking on the 'view all' link brings you to a full set of recommendations:

You can preview a feed to decide whether or not the recommendation is a good one before subscribing.  I like how they show how many subscribers each recommendation has - but that is just about the only feedback one gets about the recommendation.  What I would really like is a Pandora-style "Why are we recommending this?" tab that could tell me why a something is being recommended.  Text like "We are recommending this blog because you like to read posts about music recommendation on the web, and this blog focuses on iLike, Last.fm, Pandora and other music recommender sites".  

Google reader doesn't just take my blog reading behavior into account  when creating recommendations.  It also uses my Google search history - so presumably if I've been searching and clicking on pages about a particular topic - say 'support vector machines' - Google could recommend a blog on machine learning

The recommendations seemed good for the most part ... of course it is hard to evaluate the recommendations - there were no real clunkers, but no new favorites either.  Frankly, right now I am not looking to add more feeds to my collection of over 200.  What I could really use is something that goes through the feeds that I am already subscribed to and gives me the cream of the crop - and leaves the redundant or irrelevant behind.   

Friday Nov 30, 2007

Last week, I created a web service that will generate an APML profile based upon your last.fm listening habits (see the post here).  This week, I've added support for del.icio.us.  With this web service, you can get an APML profile based upon your tagging behavior.  For instance,  I can retrieve my Del.icio.us APML profile with:

 http://research.sun.com:8080/AttentionProfile/apml/web/plamere

This returns an APML file that looks like this:

 

<APML xmlns="http://www.apml.org/apml-0.6" version="0.6" >
<Head>
<Title>Taste for del.icio.us user plamere</Title>
<Generator>Created by TasteBroker.org </Generator>
<DateCreated>2007-11-30T05:30:34</DateCreated>
</Head>
<Body defaultprofile="web">
<Profile name="web">
<ImplicitData>
<Concepts>
<Concept key="android" value="0.16666667" from="tastebroker.org" updated="2007-11-30T05:30:34" />
<Concept key="attention" value="0.16666667" from="tastebroker.org" updated="2007-11-30T05:30:34" />
<Concept key="audio" value="0.16666667" from="tastebroker.org" updated="2007-11-30T05:30:34" />
<Concept key="bbc" value="0.16666667" from="tastebroker.org" updated="2007-11-30T05:30:34" />
<Concept key="blog" value="0.16666667" from="tastebroker.org" updated="2007-11-30T05:30:34" />
<!-- many lines omitted -->
</Concepts>
</ImplicitData>
</Profile>
</Body>
</APML>

Unfortunately, Del.icio.us doesn't provide any web services to get at this data directly so I had to scrape the HTML to extract the concepts and counts.  This makes the service quite fragile.  The next time Del.icio.us changes it's page layout, this may break.  Hopefully, we can convince sites like Del.icio.us to start making their data available directly as APML so people like me don't have to wrestle with regular expressions on a Thursday evening.  There's a bit more info about these web services on the tastebroker page.
 

 

Wednesday Nov 21, 2007


Lately there's been quite a bit of attention being paid to making sure that the data that describes the things that we like,  our attention data, is portable.  With portable attention data, we could go to any music store and be directed to the music that we are most likely to want to listen to.  We won't have to spend any time rating tracks or artists, we'll just show the music store our taste data.  Of course, this taste data needs to be in some standard format so that everyone can understand it. One effort at standardizing our taste data is APML.  APML is an XML based language that allows users to share their own personal taste data in much the same way that OPML allows the exchange of reading lists between blog readers.   APML is new and not finished yet, but even in its infant state, it is garnering lots of support.  

I am particularly interested in how APML could be used to represent an individual's music taste.   One possibility is to have the APML file for the individual list the artists that a person likes (or vehemently dislikes).  Another approach is to have the preferences be more abstract - to list weighted affinities toward music genres or styles.   The latter approach seemed much more interesting to me - it offers some bit of privacy (instead of seeing Paris Hilton in my APML file, you would just see  Female Pop Singer). 

As an experiment, I've created a little APML generator web service for last.fm users.  If you give the web service your last.fm user name, the service goes to last.fm, retrieves data about your listening habits and generates an APML representation of your taste.  For example, to retrieve the APML for my listening tastes visit the following URL:

http://research.sun.com:8080/AttentionProfile/apml/music/lamere


This yields:

<APML version="0.6">
<Head>
<Title>music taste for lamere</Title>
<Generator>Created by TasteBroker.org in 2777 ms </Generator>
<DateCreated>2007-11-21T16:15:24</DateCreated>
</Head>
<Body defaultprofile="music">
  <Profile name="music">
  <ImplicitData>
   <Concepts>
       <Concept key="rock" value="1.0" from="tastebroker.org" updated="2007-11-21T16:15:24"/>
       <Concept key="alternative" value="0.74616855" from="tastebroker.org" updated="2007-11-21T16:15:24"/>
       <Concept key="indie" value="0.63257456" from="tastebroker.org" updated="2007-11-21T16:15:24"/>
       <Concept key="alternative rock" value="0.38583755" from="tastebroker.org" updated="2007-11-21T16:15:24"/>
   <!-- many lines omitted -->
  </Concepts>
  </ImplicitData>
</Profile>
</Body>

</APML>

To generate the implicit concepts, I gather the top 50 artists for the user, and for each of these artists I gather the top 50 tags that have been applied to each of those artists.  I adjust the weight of the tags based on the user's affinity for the associated artist. I then take top scores to generate the APML. Of course, all this chattering with last.fm can make the web service quite slow.  I do try to cache as much data as I can to try to speed things up, but if you have eclectic tastes, it can take up to a minute or so to generate your APML file.

The resulting APML seems to be a good representation of my taste.  I'm interested in hearing from others that might be last.fm users whether or not the generated APML file is a good map of their taste.  Feel free to try out the web service.  The URL is:

http://research.sun.com:8080/AttentionProfile/apml/music/YOUR_LAST_FM_USER_NAME

The next step is to see how well we can generate recommendations based upon these APML files.

Thursday Nov 15, 2007

APML, the attention profiling mark-up language,  is a standard for representing and exchanging attention data - data about what I like and what I don't like. Anyone who's involved in recommendation to should be paying attention to APML.  A good place to start is this article: Attention Profiling: APML Beginner's Guide

Saturday Oct 27, 2007

I'm starting to work on a new project (still related to music discovery and recommendation).  Starting a brand new project is really fun.  There are not too many times in a decade that I get to start with a completely blank slate.  It's a great time for me to refactor my development process, learn about new tools and learn about new ways to do things.  Still, sometime all the newness can be a tad overwhelming ... here are all the new things just in the last couple of weeks:

  • New source code control (mercurial)
  • New version of netbeans
  • New version of GWT
  • New team mates
  • New coding standards (what are those '/**' and '*/' for?)
  • New project hosting platform
  • New organizational chart
And of course there are all sorts of new project-centric design  challenges as well.   It is all new, and it is a lot of fun.

 

Saturday Oct 20, 2007


Justin's Poster
Originally uploaded by PaulLamere.

120 or so folks who are keen on figuring out how to make recommendations have descended onto the campus of the University of Minnesota for the two day RecSys'07 conference. It is a good mix of attendees, 47 participants from have come from outside of the U.S.. 50 of the participants are from industry. There are 16 long papers (out of 35 submitted) and 14 short papers (out of 23 submitted).

The keynote, given by Krishna Bharat, was an excellent presentation on Google News, placing it into the wider context of news and journalism.   Greg Linden has an excellent description of the talk

 The first paper session on Privacy and Trust had me thinking quite a bit more about how people can get good recommendations without revealing too much about themselves. 

 The afternoon panel session included members of industry who opined about what issues were important to the commercial world.  Themes emerged about search vs. discovery vs. recommendation (just some semantic problems), APIs, portability of attention data, cross content recommendation, and the difficulty of evaluation.

I particularly enjoyed the poster session - I like being able to talk to folks about their research, and a poster session is the best way to do it.

Day 2 of the conference is about to begin - and I am the session chair for the first session, so I now need to read through a few papers so I can avoid a repeat of my worst conference moment.

Thursday Oct 18, 2007

In a few hours I hop on a plane to head to Minneapolis to attend ACM Recommenders 2007.  I'm really looking forward to this two day event, for the talks, for meeting up with old friends, and to finally meet f2f many folks that I've communicated via email and facebook over the last year.

Friday Oct 12, 2007

Inspired by me*dia*or (an aggregation of music technology blogs), I've created The Taste Blog.  The Taste Blog is an aggregation of my favorite technical blogs focusing on recommender systems. 

If you have a favorite tech blog about recommender systems, feel free to suggest it.

Thursday Aug 16, 2007

An oldie but goodie from the onion:

Area resident Pamela Meyers was delighted to receive yet another thoughtful CD recommendation from Amazon.com Friday, confirming that the online retail giant has a more thorough, individualized, and nuanced understanding of Meyers' taste than the man who occasionally claims to love her, husband Dean Meyers."I don't know how Amazon picked up on my growing interest in world music so quickly, but I absolutely love this traditional Celtic CD," Meyers said. "I like it so much more than that Keith Urban thing Dean got me. I'm really not sure what made him think I like country music."

This blog copyright 2009 by plamere