|
Thursday December 06, 2007
• Data Mining, Music Streaming, and Pandora 
I had the opportunity last week to visit the offices of Pandora (pandora.com), the music streaming service.
I'd
heard that Pandora started serving classical music recently, along with
its pop and jazz genres, and other genres (like ethnic music) are
coming.
If you don't know about Pandora, it's worth a visit. The
technology is remarkable. They use a data mining strategy to find
selections in their really vast library that match certain criteria
that you define. And it's adaptive .. it learns as you say yea or nay
on the selections it decides to stream.
At their office in
downtown Oakland I saw small mob of 20-something worker bees in front
of display screens, wearing headphones, listening to music and
assigning rankings to each track they hear on a matrix of what seemed
like hundreds of possible attributes.
They're called
"musicologists", and each is supposidly an expert in their genre. Their
rankings go into a database that is used to find connections between
pieces (they call them "songs") of music. So, when you log into Pandora
and create a "station" by entering the name of an artist or song, it
will find a sequence of tracks that all seem to have some sort of
releationship with each other as defined by a deep dive into the
database of attributes.
How
they do this (the actual algorithm) is their special secret. But all
those hundreds of data points per track form a data space that the
algorithm travels thru and picks the tracks it thinks you want to hear.
At
any point you can say yea or nay to a track, or you can add songs that
sound more to your liking and the algorithm readjusts its trajectory,
allowing you to tune the criteria in time.
They have many
hundreds of thousands of CDs stored on their server farm in San Jose
(all running Linux). They download the AIFF track data and stream them
to you at 128 kbps, which gives reasonably high quality if you have a
fast connection. But the website does make some serious CPU demands on
your system. I've got a spare 1.5 GHz laptop running over my home wifi
network connected to my home sound system via a USB-to-audio converter
and it all works extremely well. But my older 500 MHz iMac couldn't
handle it.
Pandora works really well with pop and jazz, but is
frustrating and sometimes even infuriating with classical music. But,
it's not their fault. Pandora follows every letter of the digital
copyright law. They'd be out of business if they didn't. And one of the
tenets of the law requires that no two tracks on the same CD be
streamed in the same hour.
This is fine for pop/jazz. But a
disaster mostly for classical music. It means, for instance, that it
would take at least 33 hours to eventually hear all of the Beethoven
Diabelli Variations (33 tracks). And you'd be hearing a lot in between
each variation. The law doesn't differentiate between music where a
track is its own self-contained piece of music, and where a track may
be just one part of a symphony or opera, etc.
So here comes
the 3rd movement of Beethoven's 5th Symphony slamming into a Rossini
overture just as you were expecting the dramatic starting chords of the
4th movement! Its enough to send you screaming from the room!
But, hey, blame Congress, not Pandora.
Luckily,
a lot of 20th century classical music is in the one-movement format.
But sometimes the juxtapositions can be as pleasantly surprising as
seemingly random.
Their classical library is still growing. But
what they already have is probably larger than anything you or I will
ever own. And they're adding a lot of contemporary music as well. I
created a "station" starting with Edgard Varese and was amazed at some
of the connections... Penderecki, Xenakis, Ligeti, Lutoslawski ... good
choices in this case. But some of my other attempts went way off beam
from the start. Starting a station with Olivier Messiaen led to Debussy
(ok, sorta), but then it went towards the late romantics. This is
probably due to someone's lack of understanding of Messiaen in the
greater stream of things when assigning ratings. (But, I'm biased and
so much of this is subjective.)
Still, it's quite fascinating with what they come up with. And, I have some issues:
The
website is very resource-intensive, and they need to optimize the
application. Better yet, provide a stand-alone application I can
download that works outside the browser. Live365 does this. Since
Pandora makes their money thru advertising and a limited subscription
service, they could, like Live365.com, offer the player with an
advertising-free subscription. If it were priced resonably, I'd
subscribe just as I do to Live365.com
They really have to do
something to mitigate the 1 track per hour problem for classical music.
Perhaps note in the database that a sequence of tracks on a CD comprise
a single "work" and the beginning and ending of these tracks should not
be played without a fade and a short break. That would help. Some
programmers on Live365 combine tracks into a single item in their
playlists to get around this. It may be breaking the law, but I'd like
to see the court case to be made that running all four movements of a
Mozart symphony together was hurting the profits of Deutsche
Grammaphone!
I've been listening to my John Coltrane station all
day and it's been great. Not everything is Coltrane, and the stream
took some wide diversions, but it's doing a great job. I don't
recognize everything that's chosen and have to run back to the computer
to see what it is. But that's nothing new. Unlike Live365, I don't have
to look for a station playing Coltrane ... I can create one.
But
Pandora's database could probably use some extra sub and sub sub genres
to help it figure out why John Cage and Morton Feldman are different
than Anton Webern and Karlheinz Stockhausen. And, why when I start an
Anton Bruckner station do I get mostly choral music? He wrote little
choral music. I bet someone marked Bruckner as a composer of church
music. Mistake. Maybe it would be nice to allow users to see all the
rankings assigned to a piece of music and suggest a counter evaluation!
Pandora's worth a visit, even if you're just curious. You can view my "stations" on my profile.
It doesn't replace a good library, and so far it's not the place to
hear a complete piece of music in one sitting. But it could be good
source of discovery. That's what radio used to be .. a way of
discovering music you didn't know about. That's how we learn. There
may be a solution to the multiple-track per work issue that I just
thought of: First, give the user the option to select "Play complete
works of music if possible" on a station. (Because not everybody would
want to hear the entire Mahler 8th for example) Then, if selected, if
there are multiple recordings of a piece of music, such as the
Beethoven 5th, follow each movement with a track from different
recordings. (Thats why the "if possible") Sounds reasonable. I
mentioned this idea to the folks at Pandora and they said that it is
one of many suggestions they are currently considering. So, stay tuned.
( Dec 06 2007, 01:02:03 PM PST )
[Misc.]
Permalink
|