Thursday May 08, 2008

Notes from the Tech Talk at the SanFran music tech summit. This panel was a discussion about the technology behind some of the most popular music sites. Moderator is Colin Brumelle.

photo.jpg

photo-1.jpg

Panelists:

  • Colin Brumelle - Moderator
  • Tom Conrad - CTO Pandora
  • Marc Urbaitel - CTO In-Ticketing
  • Shaun Haber - Warner Bros. Records - Director of operations, using Drupal to build an artist platform.
  • Jeremy Riney - Project Playlist, CTO, Founder
  • Jack Moffit - Xiph, Chesspark - IM, gaming
Why did you chose a particular type of technology:

Marc - uses php - quicker turnaround time, lets them be much more nimble. Open source is good.

Shaun - open source CMS - chose Drupal: a big reason is active developer community.

Jeremy - also uses Drupal - Paylist is the largest Drupal user with 25 million active users.

Tom - were on Java, Oracle, Jetty servlets due to legacy reasons. Oracle was a disaster, so they ported it all to postgres. Re-implemented Oracle procesdures at the Java language. Some core routines in C, - huge memcachd - 200 servers, 2000 interactions per second. 64 bits linux, intel CPUs, the shiny frontend is flash. They didn't have anyone who knows about flash. Used openLazlo to build the application using javascript and their framework and compile it down to flash. Tom says Lazlo is a great piece of software.

Jack - Perl, then Python, with webware frameworks, mysql, postgres, now Jango (rails-like python), they run everything on Amazon EC2 and S3. Wrote lots of Javascript - use all scriptaculous, prototype and others.

Colin: Is EC2 the future?

Jack - Went through CO-LO hell. Was hard to provision new hardware. On Amazon, they can type one command and get 10 more machines. Jack is very happy with EC2, S3. Jeremy was concerned with complexity but Jack says it was not too hard.

Tom: If they were starting today, they would be considering cloud computing like EC2. The hardest part to scale horizontally is the database. Risk become predicting the future. How do you provision just the right amount of servers. This would become a guessing game.

Marc uses cloud computing to do scaling testing (buying lots of tickets at once).

Tom - also the cloud is useful for data recovery - use the cloud to serve as the failover. Pandora decided to do their own CDN. The save much money they by doing it themselves.

Tom says don't by Foundry load balancers

Questions from Derick of CDBaby - Tells the story about how he rewrote CD Baby for PHP and Ruby On Rails. After 2 years of frustration, he threw it all away. Nothing to do with Rails - but keeping the two systems (PHP and Rails) alive was hard. Derick also lauds EC2. Tom does say that you are still paying a margin to Amazon for this so it could cost you more than doing it yourself.

Tom talks about "test driven development". They can rip their system apart and put it back together and be confident that it will work because of their tests


Digital Thought Leaders Panel
Originally uploaded by PaulLamere.
The digital thought leaders panel moderated by Brian Zisk, with Tim Westergren of Pandora, Aza Raskin of Songza, Michael Pertricone of the consumer electronics association and Ty Roberts of Gracenote/Sony.

First topic is the well-worn topic of how we navigate the intellectual property minefield of music. How can companies make money while still compensating artists. I wish the panel would focus a bit more about technology and less on rights and IP. There's a separate legal track. As K7lim says "this is a kindergarten discussion of IP policy."

Tim Westegren calls for better, simpler design is necessary to engage with listeners, especially new listeners. "Simplicity brings people in" says Aza.

Brian asks: "What is the Future?" - Ty Roberts talks about music product. He's interested in 'music packaging' - augmenting the simple MP3 with all of the ancillary metadata (album art, reviews and bios). Aza suggests 'continuity of experience'. Eliminate the facebook.iTunes silos - get rid of having to worry about where your music is coming from. Michael says it is 'Simplicity'

Tim points out that radio has always been popular. He says that people don't want to spend alot of time administrating their listening experience. He suggest that music will be everywhere, supporting by advertising. Once this is in place, there will be lots of ways that people can use and interact with the music.

Aza points to the Kindle as a good example of where things should go with music. "Feels like free" is key - whether it is ad supported or some other model.

Discussion about the "metadata problem".

Tim offers advice to artists - add a new member of the band - a non-musician - to be the marketing person to get the band exposure on the 'nets.


Brian opens the summit
Originally uploaded by PaulLamere.
Brian Zisk has just launched the SanFran Musc Tech Summit. There are hundreds of music tech folks gathered in Japan town. It looks to be a fun event.

Monday May 05, 2008


CommunityOne
Originally uploaded by PaulLamere.
Javaone week has started. We are waiting for the communityone for for the keynote to start. Lots of fun.

Saturday May 03, 2008

Olinda is a prototype digital radio that has your social network built in, showing you the stations your friends are listening to. It’s customisable with modular hardware, and aims to provoke discussion on the future and design of radios for the home.

Wednesday Apr 30, 2008

Here's an unusual recommendation from MyStrands, but in a good way, not a bad way. While I was listening to some Aphex Twin I checked the MyStrands application to see what they recommended. MyStrands evidently didn't have enough data to give a good recommendation, and so they told me that - they didn't blindly give me a bad recommendation, they recommended something popular, but they also told me that they were punting on a specific recommendation.

strands.png
One Llama is a music recommender that uses "acoustic analysis, cultural analysis and collaborative filtering tools for music navigation, discovery and search. On their website they say One Llama uses a combination of Collaborative Filtering and Audio Similarity modeling to generate recommendations. Our model harvests cultural references and social networking data about each track, and listens to the audio using an advanced "virtual ear." The result is a stronger combined logic for all our recommendations. The One Llama method has the advantage of being able to give intelligent recommendations for new audio tracks immediately while becoming increasingly smarter as additional information is collected about the tracks from playlists, downloads, user feedback, etc.

So with all that advanced mojo, one would expect some pretty good recommendations. Here's a recommendation based on the seed song 'Hey Jude' (I chose the Elvis version because they didn't seem to have the Beatles version in their catalog).

onellama-heyjude.png

There's no doubt that these songs are "like 'Hey Jude'", but somehow the recommendation lacks subtlety and novelty of a real recommendation. Clearly the songs are not acoustically similar (Arthur Fielder vs. Tiny Tim?), and I can't imagine any set of users that would be listening to this set of song, so this is not being driven by a collaborative filtering algorithm. It seems that, at least for this recommendation, the primary driving force is metadata similarity. It is almost as if they just grabbed the Musicbrainz track data, tossed it all into a text similarity engine and turned the crank to get these similarities.

Zac points out another case where One Llama seems to be relying mostly on metadata. Here's a playlist that One LLama generates for songs similar to "Let Go" by Frou Frou. The set seems mostly reasonable from an acoustic point of view - the playlist could have been constructed by an expert - and in fact it was. The songs (with one exception) can all be found on the Garden State soundtrack. shins-one-llama.png

This is probably what one could expect from a collaborative filtering system. Lots of music listeners have bought the soundtrack. Any good CF algorithm will notice this and tie the items together. However, I don't think that is what is going on here. Looking at the One LLama playlist, there is one song that is not on the Garden State album. One Llama has added The Postal Service's 'Such Great Heights' to the playlist, while the Garden State has the cover of 'Such Great Heights' by Iron & Wine - although this is a cover, they sound very different; one is electronic-noise-pop, while the other is strictly acoustic. I suspect that, as with the Hey Jude example, One Llama is relying mostly on metadata similarity to determine similarity

Here's the track list for the Garden State:

amazon-garden-state.png

Using metadata to generate track similarity is not inherently bad. It makes sense to use what works best. A young recommender company like One Llama doesn't have the deep user data necessary to generate good CF recommendations. Creating recommendations based on automatic acoustic analysis is really hard, acoustic-based recommendations are frequently prone to making mistakes that no human would make. I suspect that One Llama has adjusted the dials on their recommender to give more weight to the metadata until they get more user data and their automated analysis is up to par.

Tuesday Apr 29, 2008

stalin similar artists.png

Last.fm has a similar artist feature. When you are looking at the page for an artist they will show you artists that are similar based upon the wisdom of the crowds. Last.fm can tell you for instance, that people who listen to Emerson, Lake and Palmer also listen to Yes.

If you go the Hillary Rodham Clinton page at Last.fm and take a look at her 'similar artists' you'll find a motley crew that includes Joseph Stalin and Adolf Hitler. Perhaps not the types of world leaders that Hillary would want to be associated with.

It is even worse if you go to Joseph Stalin's page, where you'll find similar artists such as Michael Savage, Ann Coulter and Rush Limbaugh. Now this was clearly engineered as a prank. Someone (or a group of someones), must have created a playlist with Hillary, Adolf, Coulter, Limbaugh and Stalin and just played them over and over again, feeding their play data into the Last.fm audioscrobbler until Last.fm noticed the correlation and declared that they were similar artists. This is one of the first instances of I've seen where a music recommender has been noticeably manipulated to produce a dishonest recommendation. It certainly demonstrates how these types of systems can be vulnerable to attack.

Luckily, there are some smart people working to protect us from this hacking. Bamshad Mobasher has some good papers on the topic that are worth reading.

As more people seek out long tail content, recommenders will become increasingly important, which means that the folks who are spamming and splogging and seo-ing, will be trying to hack our recommenders to get their remedies for hair loss treatment at the top of the list. (Thanks Elias)

Monday Apr 28, 2008

I was looking at the Keith Fullerton Whitman Google Music page when I noticed that there was a Google Sponsored Link on the side for Pandora. KFW is not exactly a mainstream artist, so it seemed odd that Pandora would be purchasing Sponsored Links for his page. Well, I clicked on the link and much to my surprise I was brought to a Pandora Radio station for Goldenboy. Now, lets be clear, there's absolutely no similarity between Goldenboy and KFW. Looking at the google page, I can't figure out the reason behind the Pandora ad placement. Something went awry somewhere - or perhaps there's some connection between KFW and Goldenboy that I don't know about. Perhaps Pandora or the Echonest can answer the question.

keithfullertonwhitman.pnggoldenboy.png

I hope people don't think I am picking on Amazon. Amazon clearly has one of the best recommenders in the world. The last time I went to Amazon, I intended to buy 1 book, I ended up with 5 all because of their recommender. 99% of the recommendations from the Amazon recommender are spot on - but there's a small number of recommendations that are surprising, funny or just plain crazy. Now this doesn't always mean that these are bad recommendations. For example, here's one that was sent to me by Anita Lillie. She says:

I noticed you are posting freakomendations on your blog, and it reminded me of how I was looking for flowers for a mother-in-law-type person for Christmas last year, and I got a recommendation for the video game Halo. I went back today to try to find the same recommendation, but I couldn't find it. Instead, I see a "recommendation" ("other customers who bought... also bought...") for the movie "Hot Fuzz" fairly frequently within the "flowering indoor plants" product category. Anyway, it was particularly funny with the Halo, and I'm guessing it's all those 20-something guys who go online to order something for their moms.

Picture 5.png

As Anita suggests, the demographic of Amazon flower purchases probably skews to 20-somthing guys getting something for their moms, so throwing in Halo or Hot Fuzz, may not be a bad way for Amazon to make an extra sale or two.

Zac sends along this freakomendation: I don't know if you're a Kinky Friedman fan, but his books are detective stories -- kind of a foul-mouthed cross between Phillip Marlowe, Hunter Thompson and Groucho Marx. He happens to have a cat. Cats.gif

Steve points to a freakomendation thread on John Scalzi's blog: "Today Amazon suggested The Last Colony to me for purchase. Yeah, you know, I’ve read that. But it’s nice to know Amazon’s algorithm thinks I might like my own stuff." One interesting comment: Amazon’s algorithm also has an annoying (well, it was funny the first time, since it happened on April 1st. But then it kept on happening, and I realized they were serious) habit of treating writer’s names, without bothering to check if it’s the same writer or not. I bought a few of Sharon Lee’s and Steve Miller’s Liaden Universe books through Amazon. So they started to give me recommendation for other books by Steve Miller. Which would have been fine, except this new Steve Miller is a completely different Steve Miller and Amazon apparently thinks I would really like illustration advice books.

This is what happened with yesterday's Steve Martin freakomendation, where Amazon recommended a book by the wrong Steve Martin. LibraryThing, another book recommender, at least understands that there are two Steve Martin's that write books, but they still can't tell them apart. At the LibraryThing author page for Steve Martin there is this notice: Steve Martin is actually two authors, Steve Martin the comedian and author of Cruel Shoes, Kindly Lent By Their Owner: The Private Collection of Steve Martin, Shopgirl, Pure Drivel, WASP, The Pleasure of My Company, and Born Standing Up; and Steve Martin the author of Britain and the Slave Trade. In the future LibraryThing will be able to split authors with identical names. At present, it cannot.

This makes be appreciate MusicBrainz so much more. MusicBrainz knows all about the various ambiguous artist names and can tell them apart. I guess there's no such thing as BookBrainz yet.

Sunday Apr 27, 2008

Here are a few more freakomendations from Netflix.

No Country for Old Ogres

From "The Killers" to "Looney Tunes"

A Death Wish for Moses

Here's a recommendation from Netflix. Because you enjoyed Trekkies, Netflix suggests that you'll enjoy Frontier House.

Are these really
Better Together?
Some people are bird watchers, some collect cars. I like to collect unusual recommendations. I'm calling these 'Freakomendations'. There's almost always a story behind the recommendation - but sometimes it is hard to track them down.

This recommendation, Amazon suggests that since Aaron Hurly has purchased books by Steve Martin, he may be interested in the (no doubt) hilarious Public Services Inspection in the Uk: Research Highlights in Social Work

2CA2C0F9-6405-498C-8274-0E83BCC7C68F.jpg

It looks like Amazon decided that the Steve Martin who edited this book was that same wild and crazy guy.

625C8001-9F72-459F-B430-CFC546F19F73.jpg

Via aaronhurley.org

Friday Apr 25, 2008

A classic problem in traditional collaborative filtering recommendation is the 'cold start' problem. It is hard to generate recommendations for new items because there isn't enough taste data about the new items to make reliable correlations with other items. That's where content analysis comes in. The cold start problem can be alleviated by basing recommendations on similarity of content as well as the wisdom of the crowds. New items can be analyzed and enrolled into a recommender, making these items available and recommendable.

However, using content-based techniques doesn't guarantee the elimination of the cold-start problem. Pandora, everyone's favorite Internet Radio, uses content-analysis to drive their customized radio. However, since Pandora performs all of their analysis by hand, there may be some lag before your favorite artist makes it into the Pandora catalog.

There's another content-based recommender - BookLamp.org. The BookLamp F.A.Q describe BookLamp as a

"book recommendation system that uses the full text of a book to match it to other books based on scene-by-scene measurements of elements such as pacing, density, action, dialog, description, perspective, and genre, among others. In other words, BookLamp.org is a Pandora.com for books, based on an author's writing style. If you match against multiple books, the self-learning system adjusts your formulas to make the match specific to your tastes. As the system moves out of beta, it will also incorporate human feedback into the recommendation systems, blending the strengths of social networks with the strengths of computer analysis. Ultimately, we want users to be able to create and share their own formulas, creating a community of book lovers that have tools to discover and share books in a way never before possible. Because the system matches books through objective data from the text itself instead of relying solely on social networks to generate recommendations, the recommendations are impervious to outside influences such as advertising or author marketing. It also allows you to match to a far greater detail than alternative systems. With BookLamp, you can request a book similar to Stephen King's The Stand, but half the length, first person, literary mainstream fiction, with slightly more dialog, less description, and a rising action level across the first 10 scenes. If that's what you're looking for".

It is a neat idea, and sounds very similar to the types of things we are doing with Search Inside the Music and Project Aura. Using content analysis gives you better ways to help people discover new items. However, BookLamp has its own cold start problem. Again, from the BookLamp FAQ:

Does BookLamp Work? Can I use it right now to find a book to read?

The simple answer to this question is that while BookLamp works, it doesn't have enough books in the database to work well. While the technology behind the system is capable of finding you books to read right now, BookLamp will remain a technology demonstration until we have a large enough database of books to give the system enough data to make realistic recommendations. Without more books, not only will most users have a hard time finding a book to match against, but the system will have a limited number of books that are capable of being matches. In other words, if we don't have a book in the database that matches, we won't be able to recommend a book for you. Additionally, with so few books in the database, we're not able to match against all the metrics that we would like. In order to be the most effective, BookLamp needs to match against 7 to 8 metrics; with less than 300 books in the database, we're having to make recommendations after matching against only 3 or 4 metrics. To get any matches at all, we've had to turn down the sensitivity of the measures (see the next question) a bit already.We estimate that it will take a database of at least 10,000 books to make BookLamp a usable system. The more, the better.

So BookLamp has a bit of a problem, with only 300 books in its database, it is not going to be the best book recommender. And unlike music, it is not so easy to enroll a new book - scanners and page turners are involved. So BookLamp is trying to figure out its next step. If I were them, I'd build a recommender for the Gutenberg project with its over 25,000 titles. Of course there are no NY Times best sellers in the bunch, but it would be a great way to fine tune the content-analysis while providing a service to a worthy project.

This blog copyright 2008 by plamere