I attended a demo today in Saint Paul College's virtual Wonderland classroom that showed how you can do a loose integration of a speech recognizer and the Wonderland chat pane. Computer Careers instructor Eric Level has a student who is deaf in his introductory Java class, so he decided to try using Dragon NaturallySpeaking to automatically transcribe everything he said in the class. To accomplish this, he brought up Wonderland on a laptop with Dragon installed and set input focus to the chat pane. Everything he said appeared in the chat pane, as shown below.
It's a feature of the Dragon application that speech recognition results can be output to any application that accepts text input.
While this technique does work, it got me thinking about how one might design a more streamlined solution. One idea is to put the recognizer on the server with the voice bridge so that everyone's audio can potentially be transcribed, not just the presenter's. This would not only eliminate the need for a separate computer to devote to the recognizer, but it would also be possible to annotate the recognition stream with the names of the speakers since the voice bridge can determine who is speaking at any given time. The downside to recognition on the server is that the audio may not be as clear as it would be if captured locally on the client. Also, it might not be as easy to train the recognizer on the presenter's voice.
From the user interface point of view, in 0.5 it might be interesting to build a "subtitle channel" that users could subscribe to. Rather than have the recognition output typed into the chat window which everyone sees, it could be displayed in an optional HUD panel, which could be resized, positioned, or magnified as needed. This would leave the chat pane available for back-channel communication among in-world participants.
Providing accessibility to Wonderland for users with a range of disabilities is an important area of focus. For anyone who might be interested in doing a project in this area, here are a few open source resources to start with:
Sphinx-4 - open source speech recognizer
eSpeak or FreeTTS - open source speech synthesizers
GNOME Accessibility Project - accessibility solutions for graphical user interfaces

I don't know how large these learn databases get but perhaps one idea is to have the user's database accessible through either an "inventory" item carried by the avatar, or stored on the remote server and associated to the avatar name. Perhaps have 2 ways to populate the database: 1. upload a database already created, or 2. have an in-world trainer that can scroll text and record your voice at the push of a button.
Posted by larrytek on February 06, 2009 at 02:25 PM PST #