Information, Transmission, Modulation, and Blog
    RSS        OpenSolaris: Innovation Matters
Who?
Richard Friedman is a senior staff information engineer who documents the Sun Studio compilers and contributes to the Sun Studio portal at developers.sun.com.
rchrd wrote his first computer program in FORTRANSIT on the IBM 650 in 1962.
He also is a photographer and has a life and a radio program.
Email to rchrd at sun.com
Where Else?

»All I Know::
Information, Transmission, Modulation, and Noise

»MUSIC FROM OTHER MINDS on KALW-FM

»All I've Seen :: photo blog

Elsewhere?
»Sun Studio Developer's Portal
»Solaris Developer Blog
Search
Lijit Search
Recent Entries:

Complete Archives

Menu

XML
Site Meter

Thursday December 06, 2007 20071206

• Data Mining, Music Streaming, and Pandora

 

I had the opportunity last week to visit the offices of Pandora (pandora.com), the music streaming service.

I'd heard that Pandora started serving classical music recently, along with its pop and jazz genres, and other genres (like ethnic music) are coming.

If you don't know about Pandora, it's worth a visit. The technology is remarkable. They use a data mining strategy to find selections in their really vast library that match certain criteria that you define. And it's adaptive .. it learns as you say yea or nay on the selections it decides to stream.

At their office in downtown Oakland I saw small mob of 20-something worker bees in front of display screens, wearing headphones, listening to music and assigning rankings to each track they hear on a matrix of what seemed like hundreds of possible attributes.

They're called "musicologists", and each is supposidly an expert in their genre. Their rankings go into a database that is used to find connections between pieces (they call them "songs") of music. So, when you log into Pandora and create a "station" by entering the name of an artist or song, it will find a sequence of tracks that all seem to have some sort of releationship with each other as defined by a deep dive into the database of attributes.

How they do this (the actual algorithm) is their special secret. But all those hundreds of data points per track form a data space that the algorithm travels thru and picks the tracks it thinks you want to hear.

At any point you can say yea or nay to a track, or you can add songs that sound more to your liking and the algorithm readjusts its trajectory, allowing you to tune the criteria in time.

They have many hundreds of thousands of CDs stored on their server farm in San Jose (all running Linux). They download the AIFF track data and stream them to you at 128 kbps, which gives reasonably high quality if you have a fast connection. But the website does make some serious CPU demands on your system. I've got a spare 1.5 GHz laptop running over my home wifi network connected to my home sound system via a USB-to-audio converter and it all works extremely well. But my older 500 MHz iMac couldn't handle it.

Pandora works really well with pop and jazz, but is frustrating and sometimes even infuriating with classical music. But, it's not their fault. Pandora follows every letter of the digital copyright law. They'd be out of business if they didn't. And one of the tenets of the law requires that no two tracks on the same CD be streamed in the same hour.

This is fine for pop/jazz. But a disaster mostly for classical music. It means, for instance, that it would take at least 33 hours to eventually hear all of the Beethoven Diabelli Variations (33 tracks). And you'd be hearing a lot in between each variation. The law doesn't differentiate between music where a track is its own self-contained piece of music, and where a track may be just one part of a symphony or opera, etc.

So here comes the 3rd movement of Beethoven's 5th Symphony slamming into a Rossini overture just as you were expecting the dramatic starting chords of the 4th movement! Its enough to send you screaming from the room!

But, hey, blame Congress, not Pandora.

Luckily, a lot of 20th century classical music is in the one-movement format. But sometimes the juxtapositions can be as pleasantly surprising as seemingly random.

Their classical library is still growing. But what they already have is probably larger than anything you or I will ever own. And they're adding a lot of contemporary music as well. I created a "station" starting with Edgard Varese and was amazed at some of the connections... Penderecki, Xenakis, Ligeti, Lutoslawski ... good choices in this case. But some of my other attempts went way off beam from the start. Starting a station with Olivier Messiaen led to Debussy (ok, sorta), but then it went towards the late romantics. This is probably due to someone's lack of understanding of Messiaen in the greater stream of things when assigning ratings. (But, I'm biased and so much of this is subjective.)

Still, it's quite fascinating with what they come up with. And, I have some issues:

The website is very resource-intensive, and they need to optimize the application. Better yet, provide a stand-alone application I can download that works outside the browser. Live365 does this. Since Pandora makes their money thru advertising and a limited subscription service, they could, like Live365.com, offer the player with an advertising-free subscription. If it were priced resonably, I'd subscribe just as I do to Live365.com

They really have to do something to mitigate the 1 track per hour problem for classical music. Perhaps note in the database that a sequence of tracks on a CD comprise a single "work" and the beginning and ending of these tracks should not be played without a fade and a short break. That would help. Some programmers on Live365 combine tracks into a single item in their playlists to get around this. It may be breaking the law, but I'd like to see the court case to be made that running all four movements of a Mozart symphony together was hurting the profits of Deutsche Grammaphone!

I've been listening to my John Coltrane station all day and it's been great. Not everything is Coltrane, and the stream took some wide diversions, but it's doing a great job. I don't recognize everything that's chosen and have to run back to the computer to see what it is. But that's nothing new. Unlike Live365, I don't have to look for a station playing Coltrane ... I can create one.

But Pandora's database could probably use some extra sub and sub sub genres to help it figure out why John Cage and Morton Feldman are different than Anton Webern and Karlheinz Stockhausen. And, why when I start an Anton Bruckner station do I get mostly choral music? He wrote little choral music. I bet someone marked Bruckner as a composer of church music. Mistake.  Maybe it would be nice to allow users to see all the rankings assigned to a piece of music and suggest a counter evaluation!

Pandora's worth a visit, even if you're just curious. You can view my "stations" on my profile.   It doesn't replace a good library, and so far it's not the place to hear a complete piece of music in one sitting. But it could be good source of discovery. That's what radio used to be .. a way of discovering music you didn't know about. That's how we learn.

There may be a solution to the multiple-track per work issue that I just thought of: First, give the user the option to select "Play complete works of music if possible" on a station. (Because not everybody would want to hear the entire Mahler 8th for example) Then, if selected, if there are multiple recordings of a piece of music, such as the Beethoven 5th, follow each movement with a track from different recordings. (Thats why the "if possible") Sounds reasonable.  I mentioned this idea to the folks at Pandora and they said that it is one of many suggestions they are currently considering. So, stay tuned.


( Dec 06 2007, 01:02:03 PM PST ) [Misc.] Permalink

Comments:

Post a Comment:

Comments are closed for this entry.