Running Mahout on Elastic MapReduce
The Apache Mahout project was started to build the algorithms described in the paper on the Hadoop MapReduce framework (the original paper describes running the algorithms on multicore processors.) They've also brought in the Taste Collaborative Filtering framework, which is interesting to us as recommendation folks. As it turns out, they had just released Mahout 0.1. around the time we were going to read the paper.
Coincidentally, Amazon had just announced their Elastic MapReduce (EMR) service that lets you run a MapReduce job on EC2 instances, so I decided to see what it would take to get Mahout running on EMR.
I didn't manage to get it running in time for the reading group, but one Mahout issue and a few "Oh, that's the way it works"es later, I had it running.
Apparently I'm the first person to have run Mahout on Elastic MapReduce, which just shows, as my father used to say, that brute force has an elegance all its own.
If you're interested the details are on the Mahout wiki.


is it not possible to try on your own project caroline ?
Posted by anon on May 06, 2009 at 02:31 PM EDT #
Hi, anon.
We haven't really spent any time trying to get Hadoop running on Caroline (the no-fork provisions would mean that we would have some problems with some of the execs that HDFS wants to do), so we had to run it on EC2.
Posted by Stephen Green on May 06, 2009 at 03:44 PM EDT #