World Views

Mutilated Words

Working in internationalization, I’m confronted with shredded words every day: “i18n” (for internationalization), “l10n” (localization), and “g11n” (globalization). And recently, an imitator showed up: “a11y” (accessibility).

I don’t understand why people use these abbreviations. They’re ugly. They confuse readers. They’re not adapted to any human language. They’re inaccessible. Nobody else mutilates words in this fashion.

Can we please stop using them?

For those nerd gods who are absolutely unable to type the complete words, two hints: First, even in the abbreviated form, the words are nouns. You can’t use them as verbs. There is no word “internationalizationed”, so there’s no “i18ned”. Second, the abbreviations are not acronyms, so they’re capitalized like nouns. “I18n” can occur at the beginning of a sentence, but not in the middle, and “I18N” is always wrong. If these rules are too complicated, just teach emacs to expand the abbreviations.

2005-02-16 (Mittwoch) – Comments [0]

OpenSolaris uses Unicode

OpenSolaris.org is open – and guess what: It’s all Unicode! Yes, all pages are encoded in UTF-8! As far as I know, this is the first Sun-sponsored web site that’s ready for content in all languages from day 1. There actually is no non-English content yet as far as I can tell, but when it arrives, the far-too-common text data corruption won’t occur. In the meantime, the “smart quotes” give a little taste of what’s possible.

Whoever made this decision: Thank you!

2005-01-25 (Dienstag) – Comments [0]

Internationalization at JavaOne

As you’ve probably heard, the call for papers for JavaOne 2005 is open. The last few years, the Java Internationalization team has always had a BOF session where we talked about current work and then had some time for questions and answer. Questions usually ranged widely, from font rendering issues to character conversion to unsupported locales. In some years, we also had talks on specific technologies, such as input method support, or on best practices in application development.

So, what would you like to hear about internationalization at JavaOne 2005? Any areas that are particularly difficult to navigate and where you’d like to hear an in-depth discussion? Or a general overview talk? Or internationalization hints integrated into the talks on specific technologies? Or just the BOF? Looking at the Java Internationalization home page may remind you of something...

Please reply through comments on this blog entry.

2005-01-05 (Mittwoch) – Comments [4]

インタビュー

The Sun Developer Connection Japan has published an interview with Masayoshi and myself. As far as I can tell, it’s something about Java internationalization, in particular supplementary character support in J2SE 5.0. There are also some personal trivia, to which Yuka adds in her blog.

It’s amusing to see an interview that quotes me in a language that I can’t really speak or read. We did the interview while I was in Japan in October, and apparently it took a while to condense over an hour of mostly English conversation into a page of Japanese.

2004-12-17 (Freitag) – Comments [2]

Searching for Java Documentation Beyond English

If you’ve been searching for Japanese documentation for J2SE 5.0, it’s finally available on java.sun.com. Translating over 10,000 web pages is a huge effort, even with the use of translation memory and other fancy technology – in the end, it’s still humans who have to understand deeply technical material in one language and render it into another without losing critical information. It’s quite amazing that the localization team got it all done in just two months after we shipped the English version. As an extra goody, we also pushed out a Japanese version of the Java Internationalization FAQ – the FAQ is not part of the J2SE documentation bundle anymore, but it used to be, and so I managed to convince the localization team that it should remain available in Japanese.

Searching all this new content unfortunately is still a problem. The engineers supporting the search functionality on the Sun developer sites have recently done quite a bit of work to enable searching for non-English content, such as consistently using UTF-8 for the search string and the results pages. You can now, for example, search for “增补字符” and find the Chinese version of an article about supplementary character support. However, searching for “国際化” does not find the Japanese internationalization pages or anything else. That’s because a long time ago, when the search functionality didn’t work for anything other than English, the Japanese J2SE documentation was excluded from the index, and some folks at Sun are concerned that users who don’t understand Japanese would be offended if the search engine suddenly returned lots of Japanese pages. The obvious solution is to look at the browser settings and return only pages that the user can actually read, but that’s not implemented yet. So, for the time being, if you want to find non-English content on java.sun.com, you’re still better off using Google.

What about other languages? Well, Sun management has noticed that there are many enthusiastic Java developers in China, and that Java documentation in Chinese is now the most requested RFE in our bug database. They also noticed (again) that translating our entire documentation is a huge effort, so it’s not clear yet if and when this can happen. In the meantime, Sun China has set up a vibrant developer web site, and Java Studio Creator is moving ahead with both Chinese localization of the tool and a Chinese version of its developer program. Sun Korea has created a Korean Java site as well, and Sun Japan has had a Japanese one for a long time. For European developers, the assumption has so far been that they usually know English well enough to get by, but as Sun reaches out to larger numbers of developers, this assumption is being rethought as well.

2004-12-03 (Freitag) – Comments [0]

Mojibake on blogs.sun.com

Bloggers on blogs.sun.com currently see broken text in many places when writing in French, Chinese, or Japanese – or even just English with “smart quotes.” Instead of the intended text, they get what Japanese users call “mojibake” (文字化け) – “changed characters” or “ghost characters.” The reason is incorrect use of character encodings in the version of the Roller software and the templates used on blogs.sun.com. In the latest version of Roller most of these problems have been fixed by consistent use of UTF-8 everywhere, a solution that I highly recommend.

If you’re one of the affected authors, you can fix the problem at least for readers who access blogs.sun.com directly. Log into the site, go to the Website:Pages section, and add a <meta> tag declaring the content type immediately after the <head> tags of your page templates. For templates that begin with a DOCTYPE declaration for an XHTML type, the correct form of the tag is

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>

For templates with a DOCTYPE declaration for an HTML type or no DOCTYPE declaration at all, the correct form is

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

There seems to be no practical way for authors to fix the problems seen in RSS clients such as planetsun.org. RSS feeds from blogs.sun.com that contain non-ASCII characters are fundamentally broken and are not under the authors’ control. The solution will be to upgrade blogs.sun.com to a newer version of Roller.

2004-11-14 (Sonntag) – Comments [0]

Multilingual Font Rendering

One of the most common questions I see on mailing lists and forums related to Java internationalization is “Why can’t my applet display Japanese and Chinese text when my native applications can?”

Well, if you’re using JDK 5.0, the famous “Tiger” release, there’s a good chance you don’t need to ask this question anymore. That’s because this release supports multilingual font rendering. Multi-what? OK, let’s look at a demo first – here’s one that comes with the JDK: Font2DTest.

Here’s Font2DTest running on J2RE 1.4.2 with English as the user language on Windows XP. The “CJK Unified Ideographs” characters are selected, the collection of basic Chinese (“C”) characters that are also used for Japanese (“J”) and Korean (“K”).

Font2DTest on JRE 1.4.2 in English

We see: Lots of empty boxes, indicating that the Java runtime in this environment can’t display Chinese characters. Let’s try the same with Japanese as the user language:

Font2DTest on JRE 1.4.2 in Japanese

Well, that looks slightly better. The number of empty boxes has decreased significantly, and real characters have taken their place. The reason for the remaining empty boxes is that the CJK Unified Ideographs block contains the union of the basic characters used with Chinese, Japanese, and Korean. When running in a Japanese environment, JRE 1.4.2 finds glyphs for Japanese characters, but not for characters that are only used for Chinese or Korean.

Now, let’s see what happens with the mighty Tiger. And let’s try the most difficult environment first – plain English.

Font2DTest on JRE 5.0 in English

No more boxes! This is what we mean by “multilingual font rendering” – all languages together in one big happy application.

Would running in a Japanese environment still make a difference? In fact, it would:

Font2DTest on JRE 5.0 in Japanese

If you look closely, you’ll see that some of the glyphs look different between the two screen shots. That’s because in a Japanese environment the JRE gives preference to Japanese fonts (and in a Korean environment to Korean fonts), while in other environments it prefers Chinese fonts.

In which situations does your application benefit from this feature? Well, first the host OS needs to have the necessary fonts installed. Most operating systems these days come with fonts for a long list of European and Asian languages; you just have to ask for them during installation or through a control panel such as the “Regional and Language Options” in Windows XP. Then, your application has to use Swing components or the Java 2D APIs for font rendering. If it uses AWT, results vary. One place where Swing applications may still see problems are window titles, because these are typically rendered using AWT. Finally, the application should use logical fonts – that’s Serif, SansSerif, Monospaced, Dialog, and DialogInput. If it uses a physical font, it gets the glyphs provided by that font, and nothing else.

There’s more information on related issues in the Internationalization FAQ. But at least for this question there’s now an easy answer: Get Tiger!

2004-11-08 (Montag) – Comments [0]

Guten Tag. Hello. Bonjour. Buenos días. こんにちは. 你好.

My name is Norbert Lindenberg. I am the technical lead for Java internationalization at Sun, and also work with several other groups around Sun on internationalization in their areas. If you have any interest in this topic, you’ve probably come across my articles on Developing Multilingual Web Applications Using JavaServer Pages Technology and on Supplementary Characters in the Java Platform, or across the Java Internationalization home page and the Internationalization FAQ, which I maintain.

In this blog, you can expect news and trivia that complement these publications. If you have any particular topic you’d like to have addressed, feel free to comment either here or on the Feedback page (the latter has the advantage that it reaches the entire internationalization team). Note that for programming questions you’re likely to get a faster reply on the Java Internationalization Forum, and bug reports get the most attention if submitted directly into the Bug Database. Also, while I did try to learn each of the languages used above at some point in my life, I was less and less successful with the later ones, so we’ll communicate best in the first two...

[Update 2005-07-30: I’m no longer at Sun.]

2004-10-29 (Freitag) – Comments [1]
  © World Views. All rights reserved.