Tuesday Jun 12, 2007

I've been involved in the JCP expert group defining the next revision of JSAPI 2.0 for a long time.  The expert group has been meeting weekly for years! It's not because we are particularly slow, or lack motivation, it is because speech is hard and defining a speech API that is complete and easy to use is very hard.  The years of work are finally coming to fruition.  The expert group has released the proposed final draft.  Check it out on the JSAPI page.
 

Monday May 14, 2007

It was great to hear a little bit of FreeTTS during the technical keynote at JavaOne.  Tor Norbye gave a demo of JRuby integration into netbeans.  He showed how you can create a Rails application using JRuby and Netbeans.  In his app, he wrote a blog reader that would speak an article.  He used FreeTTS to to do the speech synthesis.  Tor's demo is online here on the JavaOne site. Tor's demo is about 25 minutes into the video. Unfortunately, the FreeTTS audio is barely audible in the online version of the demo. 

Wednesday Dec 13, 2006

Check out this recent post in Dave Berlind's TestBed Blog Demo out of Sun’s Labs proves the best tech of all is the one you forget is there that describes the Sun Labs Meeting Suite - a project by the Collaborative Environments team here in Sun Labs.  In his blog, Dave is brimming with enthusiasm for this project.  He says: ...I'm fairly certain I saw for the first time since I started writing about technology in 1991, a communications technology that makes you forget that the communications taking place are being assisted by technology: and a rocket science-like technology at that. It was a real glimpse into a future where technology is a seamless and transparent part of what we do, rather than something that requires a lot of frustrating knobs and levers to make it do our bidding.

Dave's right, the Collaborative Environments team has put together a really compelling system for improving distributed meetings.  The folks on this team are  hyper-creative  and execute really well - not only do they put together a good demo - but the systems that they build are solid - not pieced together with glue and duct tape like many demos. 

Tuesday Apr 04, 2006


A few days ago I mentioned PodZinger, a system that will index podcasts and make the audio avaiable in a search engine.  A related system is castingWords, that will provide podcasts transcriptions at 42 cents per minute of audio. So with castingWords I could get a half-hour weekly podcast transcribed for $12 per month.  The interesting thing is that CastingWords uses the Amazon Turk to do the work, using humans instead of a speech recognizer to generate the transcription.  There's a sample transcript at the amazon web services blog.


Friday Mar 31, 2006



Finally, a good application for speech recognition ... Podzinger uses speech recognition to index podcasts and videocasts, making this whole world of spoken audio searchable.  When you search on Podzinger, podzinger will show the set of podcasts that match your query, and even allow you to directly play the content at the point that matches your query.  It's an audio-based passage retrieval.  PodZinger is not trying to create transcripts of podcasts, but instead to make them searchable. That means the speech recognition doesn't have to be 100% accurate (or even 85% accurate), it just has to be good enough to get the 'content' words ... which tend to be longer and less confusable than all of the typical stop words like 'of', 'a', 'and' and 'the'.  Still, the text passages shown by PodZinger are suprisingly understandable, and give you a good idea whether the associated podcast is interesting enough to warrant a listen.

PodZinger also has a feature called 'the ZING index which is like the Google Zeitgeist.  It reports on who and what is being talked about most of all on podcasts.  Dick Cheney is topping this week's Zing Index.

The key to success for such an ambitious project is the quality of the speech engine.  Speaker independent, continuous speech recognition of spontaneous speech (especially with multiple speakers, background music and noise) is very difficult.  Add to that the scaling problems ... trying to process 50,000 hours of speech in a week takes a lot of CPU time.  This is not a problem that I'd expect a small startup company like PodZinger to be able to tackle, but it turns out PodZinger is not really a small startup ... its tied to the venerable BBN, the research contractor that has a long history of developing speech recognition engines.  BBN certainly has the know-how to deal with these issues. 

Wednesday Jan 26, 2005

And so the epic tale begins ...

    Wants pawn term, dare worsted ladle gull hoe lift wetter murder inner ladle cordage, honor itch offer lodge, dock, florist. Disk ladle gull orphan worry putty ladle rat cluck wetter ladle rat hut, an fur disk raisin pimple colder Ladle Rat Rotten Hut.
Wha..? That looks like the string of words one finds in spam email that is trying hard to wind its way through the spam filters. But its not spam and it's not just random words, it's the opening paragraph of a well-known story. This version was written by Professor H. L. Chace, who wanted to demonstrate that prosody - that is, the melody of a language - is an integral part of its meaning.

To hear this story read with the proper prosody, which will make the story much more recognizeable, check out the Exploratorum page on Ladle Rat Rotten Hut.

And never for get the moral of the story:

    Yonder nor sorghum stenches shut ladle gulls stopper torque wet strainers

Thanks to Paul Martin for showing me this one.

Thursday Jan 20, 2005

An article at NewsForge: Speak to me, Linux is a quick survey of Linux desktop speech. It includes brief descriptions of the various recognizers (sphinx-[2,3,4]) and synthesizers (festival, flite and freetts) as well as some of the speech apps such as KTTS and Perlbox. The article concludes: The ease of using these speech engines and speech recognition systems could make Linux the preferred OS for the visually impaired. Hmmm ... I've never heard the terms 'ease of use' 'speech' and 'Linux' used together in one sentence before, interesting.

This blog copyright 2008 by plamere