Over the last few days, I've been thinking to myself : how much translation do you need to get the jist of a user interface ?

I mean, if you can't read English, but were presented with a UI that has simple dictionary lookups of English words placed in brackets after the original text, is that interface better or worse than just seeing the English interface ?

As I'm a native English speaker, this is a tough one for me to get my brain around, so as an experiment, I wrote some code to wrap itself around calls to gettext (and other i18n message calls), tokenize the input into words, and perform dictionary lookups on the fly. Here are results of me running gedit against a cz-CZ dictionary : would this sort of thing be helpful to non-English speakers ?

dynamically generated interface, from a dictionary lookup of tokenized strings

I know that the ideal state would be properly translated messages for an application, but in the absence of those, does this help at all ? Going further, I can't see any reason why I couldn't the same thing to a non-internationalised application - wrapping calls to GUI-text-drawing primitives, etc... Specifically, I'm wondering if this helps make computers more or less accessible.

Thoughts and comments welcome !


Comments:

[Trackback] Dave Johnson posts some preliminary screenshots of the upcoming Roller 2.0 with group blog support . I'm really looking forward to installing this on weblogs.goshaky.com.

Posted by Software Documentation Weblog on August 16, 2005 at 08:12 PM IST #

Impressive Tim. I'm not sure if it would be useful either, but I still like the way you think about it. Maybe it would be easier to evaluate if you used some other language than English as a base language (the language not many people hacking on l10n would be familiar with, such as Czech as a base language, and English translations in parentheses).

Posted by Danilo Segan on August 16, 2005 at 10:31 PM IST #

Hey Tim! Looks interesting, and an option of having something like might be really cool. What happens if an application has a proper translation file? Will it simply use it there and fall back to your way of suggesting translations only when there's no translation file at all? The idea is simply great though! Good luck, Gleb

Posted by Gleb on August 17, 2005 at 12:15 PM IST #

Hey - thanks for the hint ! I did try running it with fr-FR as a source language, (and an fr-FR -> en-US dic.) and it actually seems to work quite well. I'm missing some heavyweight NLP methods at the moment (like stemming), which would make dictionary lookups a lot more reliable : this was pretty evident when using de-DE as source, for example.

My main interest though here, was to see if this would help non-en-US users and the answer seems to be "yes". As regards whether or not the system can try to look for the message file, yes it can, quite easily. I'm just using techniques available with Sun's rtld - not sure if it's possible on Linux though. More information on the technique I used to implement this at http://developers.sun.com/solaris/articles/lib_interposers.html.

Posted by Tim Foster on August 17, 2005 at 04:05 PM IST #

Yes, that's possible with GNU (not just Linux ;-) systems as well: just make use of LD_PRELOAD. FWIW, GNU gettext documentation even describes how to use it for some things (search for "preloadable_libintl.so")—I think this is also used to gather statistics on message use.

Posted by Danilo Šegan on August 17, 2005 at 05:07 PM IST #

Hmm. I think that is very hard to look at, let alone read. Wouldn't it be easier on the readers to simply machine translate the entire phrase (all phrases in the UI) before building. If you have a real translation, you would use that instead of course. Languages don't have the same word order necessarily, and things like "at", "in", etc, are going to be problematic. I guess my suggestion would be: 1) if real trans available, show only that. 2) if no trans available, machine translate in advance (on sentence level, not per word), show English in parens under or after it, AS OPTION. That way, if the MT is impossible to understand, at least you could fall back on th EN and get someone to tell you what it means.

Posted by Micah Bly on August 17, 2005 at 09:18 PM IST #

Hi Micah, Yep - ideally if you have an MT system handy, then that would be preferable : again, the thinking here, is that there probably aren't machine translation systems available for a lot of languages that are spoken in various parts of the world (there aren't!) but maybe there are at least bi-dictionaries available for these languages. In that case, is seeing some of an interface translated (badly) better than just seeing the original English text ?

Posted by Tim Foster on August 17, 2005 at 10:24 PM IST #

So what it actually says is...
-------------------------------------------------------------------
Recent | splay | rescue | photo | unbutton | convert | cut.

(Type of / kind of / friend??):

- Change the announcement.
- To remove, he duplicates.
- Not to know the matter.
Start near crowd: 1
! You cannot unbutton and the kind the operation.

Help | ??? | kind.
-------------------------------------------------------------------

Um. If I was Czech, I'd probably go for "Not to know the matter"; then I'd click "help" hoping this "announcement" will be "changed" soon...

Sorry, I have been using machine translated software before, and I would not recommend it, I found it more missleading then helpful.

If you do it, you should definitely

  • use a technical dictionary (!) and
  • translate on phrase/sentence level (like Micak suggests),
otherwise it makes no sense. The tool is a good idea, but you cannot use a general service such as babelfish for that task if you are serious about it.

Posted by R. Kusterer on August 19, 2005 at 04:57 PM IST #

Ok, so an English interface is preferable to people who don't understand a word of english, to a badly translated software interface like this ?

Posted by Tim Foster on August 19, 2005 at 05:02 PM IST #

Yes, because then they would just not use it instead of trustingly doing damage to their data.

  • What happens if I accidently sort, say, my carefully aligned trigram corpus alphabetically with your un-undoable dialogue above, because I not even grasp that I'd better want to cancel it? What if I trust that "druh" just non-destructively switches, say, the kind of display?
  • I once used an MT'ed tool where the MT was inconspicuous and not unintellegible -- worse, it was Fnord. I basicly thought, the tool could not do what I needed. If I hadn't had the abiliy to understand the English version (which changed my mind in favour), I'd not only have trashed it, but also confidently given it a bad review (which I wouldn't do if I just don't understand a thing).

It also depends on the language pair and the quality of the dictionary used. If you find a language pair + dictionary that makes your test audience happy, OK, use sentence-based MT for that pair. If the average outcome for a language pair is like in your screenshot, don't (IMHO)...

I agree with you in so far as that an MT pidgin is better than nothing; I just think ambiguity is worse than pidgin. *sigh* It's about time we invented a really good interlingua specialized on typical IT phrases and vocabulary. :-) Then we could translate all dialogues and menus to this unambiguous semantic language and use MT to automatically translate that into all other languages. The result might not sound idiomatic, but at least it would not be wrong.

PS: Today's "simple math question" for adding a comment is "0+98" !? tsk...

Posted by R. Kusterer on August 21, 2005 at 05:20 PM IST #

Actually, I think this is not a good idea. I mean the screen is cluttered, and that makes it really hard to understand. Moreover, MT will always be problematic, so I wouldn't go along that way. That has been the way to translate docs for NON-localized sw. But anyway, such notation makes docs quite unreadable in some parts (try to read a menu path with eleven pairs of brackets in it). I think good localization is localization, not machine translation and leaving the original version. Simply put, in most cases it won't do the trick. But with huge repositories of TMs (like whole set of KDE, Gnome, Mozilla, OOo etc.) you could try to pretranslate the app. This could work if you would stick to some style of menu writing (like always use KDE-style menu options). Just my 2 cents, Marcin

Posted by Marcin on August 21, 2005 at 09:49 PM IST #

I think the last two people missed the point of Tim's proof-of-concept.

Marcin: This is not "machine translation" per se. Simply words are looked up from a dictionary and presented along in the message. The end-user simply does not need to do the dictionary lookup manually. The words in parenthesis are simply "hints".

The issue is that it is achievable to do what Tim shows above. Therefore, can we think of an application of this functionality?

There are people that speak beginner's level English and the linguistic resources for their mother-tongue are limited. Is there an enabler to use I.T. considering the scarce resources?

Posted by Simos on August 23, 2005 at 11:15 PM IST #

I was wondering how big a list (glossary) would be if one would
1) Pick all translation strings (msgid) from GNOME 2.12 (~32800 messages),
2) split to words
3) and sort by frequency (most frequent on top).

Then, the list would be translated strarting from the most frequent, giving a single word for the translation. Obviously, some words will have many meanings depending on context; we do not mind in this task as we focus on the most common meaning.

Also, this dictionary could be used for the mechanism that Tim suggests. In addition, the initial list would be helpful to localisation teams (at list for our team).

Posted by Simos on August 23, 2005 at 11:27 PM IST #

Hey Simos, thanks for the backup ! Yes, that's exactly the area I'm aiming this at : how much translation is enough (not perfect, just "enough")

The quality of the dictionary seems to be paramount, so I'm looking at that. I've just written ~150 lines of Java which, in conjunction with the filters.jar file in the Open Language Tools project, will produce such a frequency list. Will try to get my hands on GNOME 2.12 pot files, and have a go (can only find 2.6 messages here from a very quick search)

Posted by Tim Foster on August 24, 2005 at 01:33 PM IST #

Actually, this Simos´ idea could be just what CATs are about: to provide consistent translation of similar elements. Now, I think that you could try this method with strings of words, say with 3-grams. I bet there are plentys of "Open file" entries in GNOME UI. But you´d need a bigger TM for that. KDE is big, so I would suggest that. There are repositories of translation which lack this functionality (see pootle.wordforge.net). I think this would be highly recommendable to have some top-list of UI terms (where term could extend to some 3 or more words). Best, Marcin

Posted by Marcin on August 27, 2005 at 07:18 PM IST #

Post a Comment:
  • HTML Syntax: NOT allowed

This blog copyright 2009 by timf