How much translation do you need ?
Over the last few days, I've been thinking to myself : how much translation do you need to get the jist of a user interface ?
I mean, if you can't read English, but were presented with a UI that has simple dictionary lookups of English words placed in brackets after the original text, is that interface better or worse than just seeing the English interface ?
As I'm a native English speaker, this is a tough one for me to get my brain around, so as an experiment, I wrote some code to wrap itself around calls to gettext (and other i18n message calls), tokenize the input into words, and perform dictionary lookups on the fly. Here are results of me running gedit against a cz-CZ dictionary : would this sort of thing be helpful to non-English speakers ?
dynamically generated interface, from a dictionary lookup of tokenized strings
I know that the ideal state would be properly translated messages for an application, but in the absence of those, does this help at all ? Going further, I can't see any reason why I couldn't the same thing to a non-internationalised application - wrapping calls to GUI-text-drawing primitives, etc... Specifically, I'm wondering if this helps make computers more or less accessible.
Thoughts and comments welcome !
Posted by Software Documentation Weblog on August 16, 2005 at 08:12 PM IST #
Posted by Danilo Segan on August 16, 2005 at 10:31 PM IST #
Posted by Gleb on August 17, 2005 at 12:15 PM IST #
Hey - thanks for the hint ! I did try running it with fr-FR as a source language, (and an fr-FR -> en-US dic.) and it actually seems to work quite well. I'm missing some heavyweight NLP methods at the moment (like stemming), which would make dictionary lookups a lot more reliable : this was pretty evident when using de-DE as source, for example.
My main interest though here, was to see if this would help non-en-US users and the answer seems to be "yes". As regards whether or not the system can try to look for the message file, yes it can, quite easily. I'm just using techniques available with Sun's rtld - not sure if it's possible on Linux though. More information on the technique I used to implement this at http://developers.sun.com/solaris/articles/lib_interposers.html.
Posted by Tim Foster on August 17, 2005 at 04:05 PM IST #
Posted by Danilo Šegan on August 17, 2005 at 05:07 PM IST #
Posted by Micah Bly on August 17, 2005 at 09:18 PM IST #
Posted by Tim Foster on August 17, 2005 at 10:24 PM IST #
-------------------------------------------------------------------
Recent | splay | rescue | photo | unbutton | convert | cut.
(Type of / kind of / friend??):
- Change the announcement.
- To remove, he duplicates.
- Not to know the matter.
Start near crowd: 1
! You cannot unbutton and the kind the operation.
Help | ??? | kind.
-------------------------------------------------------------------
Um. If I was Czech, I'd probably go for "Not to know the matter"; then I'd click "help" hoping this "announcement" will be "changed" soon...
Sorry, I have been using machine translated software before, and I would not recommend it, I found it more missleading then helpful.
If you do it, you should definitely
- use a technical dictionary (!) and
- translate on phrase/sentence level (like Micak suggests),
otherwise it makes no sense. The tool is a good idea, but you cannot use a general service such as babelfish for that task if you are serious about it.Posted by R. Kusterer on August 19, 2005 at 04:57 PM IST #
Posted by Tim Foster on August 19, 2005 at 05:02 PM IST #
Yes, because then they would just not use it instead of trustingly doing damage to their data.
It also depends on the language pair and the quality of the dictionary used. If you find a language pair + dictionary that makes your test audience happy, OK, use sentence-based MT for that pair. If the average outcome for a language pair is like in your screenshot, don't (IMHO)...
I agree with you in so far as that an MT pidgin is better than nothing; I just think ambiguity is worse than pidgin. *sigh* It's about time we invented a really good interlingua specialized on typical IT phrases and vocabulary. :-) Then we could translate all dialogues and menus to this unambiguous semantic language and use MT to automatically translate that into all other languages. The result might not sound idiomatic, but at least it would not be wrong.
PS: Today's "simple math question" for adding a comment is "0+98" !? tsk...
Posted by R. Kusterer on August 21, 2005 at 05:20 PM IST #
Posted by Marcin on August 21, 2005 at 09:49 PM IST #
Marcin: This is not "machine translation" per se. Simply words are looked up from a dictionary and presented along in the message. The end-user simply does not need to do the dictionary lookup manually. The words in parenthesis are simply "hints".
The issue is that it is achievable to do what Tim shows above. Therefore, can we think of an application of this functionality?
There are people that speak beginner's level English and the linguistic resources for their mother-tongue are limited. Is there an enabler to use I.T. considering the scarce resources?
Posted by Simos on August 23, 2005 at 11:15 PM IST #
1) Pick all translation strings (msgid) from GNOME 2.12 (~32800 messages),
2) split to words
3) and sort by frequency (most frequent on top).
Then, the list would be translated strarting from the most frequent, giving a single word for the translation. Obviously, some words will have many meanings depending on context; we do not mind in this task as we focus on the most common meaning.
Also, this dictionary could be used for the mechanism that Tim suggests. In addition, the initial list would be helpful to localisation teams (at list for our team).
Posted by Simos on August 23, 2005 at 11:27 PM IST #
Hey Simos, thanks for the backup ! Yes, that's exactly the area I'm aiming this at : how much translation is enough (not perfect, just "enough")
The quality of the dictionary seems to be paramount, so I'm looking at that. I've just written ~150 lines of Java which, in conjunction with the filters.jar file in the Open Language Tools project, will produce such a frequency list. Will try to get my hands on GNOME 2.12 pot files, and have a go (can only find 2.6 messages here from a very quick search)
Posted by Tim Foster on August 24, 2005 at 01:33 PM IST #
Posted by Marcin on August 27, 2005 at 07:18 PM IST #