Tuesday Jul 08, 2008

This post follows on from Language Support On Docs.sun.com

Docs.sun.com hosts a vast amount of documentation in many languages.

However, the localized content has not always been easy to access. - The docs.sun site just doesn't make it easy to ascertain if a particular English book is available in your language.
Typically you have to re-navigate the product tree in your chosen language, just to see if the book has been translated. This at best, could be described as tedious.

So now, we've implemented translation linking [similar to what we did on the BigAdmin last year]. That is, every book automatically cross-links to the translations of that book.

This should significantly improve the 'findability' of our translated content.

This effort took quite some time, since the data that mapped the translations to each other, was scattered in many places, in many formats.

Examples:
http://docs.sun.com/app/docs/doc/819-1337?l=es
http://docs.sun.com/app/docs/doc/817-0544?l=en

Note - we can only link from book to book; we cannot link from page to page. This is because of the way docs.sun.com dynamically serves content at runtime - with sometimes unpredictable URLs.

We have found that most of our users have some proficiency in English, though obviously still prefer to read the content in their native language.

Unfortunately translation is not a lossless process - something meaning is often skewed, or even lost. It is often useful to be able to refer back to the source.

For a while now, we have been cross-linking the various language versions of a page. Now however, we are enabling users to view the source and translated material in a side-by-side way [beta version].

We believe this will improve the usability of, and confidence in our translated content.

This feature is available from every translated page. It is currently called 'Compare Translations' - though that name will likely change.

See the following examples:

http://www.sun.com/bigadmin/hubs/multilingual/japanese/content/flash_archive.jsp
http://www.sun.com/bigadmin/hubs/multilingual/trad_chinese/content/sunmcnew/sunmcnew.jsp

Sunday Oct 28, 2007

A nice new feature on the BigAdmin is automatic translation linking.

So long as we tag our content correctly, the AJAX component will spit out the links to any available translations.

This should considerably improve the findability of localized articles. - The usability too - since users will be able to easily refer back to the source.

 

See the example on this page: http://www.sun.com/bigadmin/hubs/multilingual/japanese/content/device_driver_install.jsp

Thanks to Robert Weeks for his work on this.

Wednesday Aug 08, 2007


  • General chit-chat »

  • Recent Language Support Improvements »

  • Soon To Be Released Language Support Improvements »

  • What I'd like To See »

  • What do you want? »


General Chit-Chat


docs.sun.com is Sun's main documentation site. There have been a spate of positive postings recently about docs.sun.com.

It's been a long time coming, but in the past 6 weeks or so, the hosting infrastructure has changed - so it's much faster.

We also migrated to a new search.



Recent Language Support Improvements:

  1. UI Translation

    Russian and Brazilian Portuguese versions of the UI.

  2. Multilingual Search
    It works, for all languages. There is an issue with PDFs that have non-ASCII metadata - but that's a problem with PDFs, not the search. More on that later.

Soon To Be Released Language Support Improvements:

  1. Rendering Of Japanese Text
    Our company wide main stylesheet, doesn't help the rendering of Japanese text. This will soon be mitigated by the inclusion of jp.css which will be added to the Japanese templates. This significantly improves the rendering of Japanese text, especially on Solaris. See the before and after shots below - taken on Firefox 2 for Solaris.
     BeforeAfter
     Before jp.css was added to docs.sun.com

     

     after jp.css was added

    Better definition of the Japanese characters. 



  2. Serving All Content As UTF-8.
    Previously content was served in what ever encoding you wanted. Yes, really. If your preferred encoding [not configurable on Internet Explorer] was say, ISO-8859-1, then that was the encoding of the content served to you - even if you requested a Korean page .... Don't believe me? - See the (slightly edited)  wget output below. Relevant strings are highlighted.




    Odd, yes, I know, but it worked, because docs.sun.com converts all non-ASCII characters in the source to numeric character references. So the encoding really doesn't make a difference. I still don't know why this is done - probably a throwback to old browser days. However, serving content in anything other than UTF-8 can pose problems for search, or other form based features.

    Anyway, a part of the docs.sun.com code was reading the client's HTTP accept-charset header. This has been found, and will soon be removed - so everything will get served as UTF-8 - whether you like it or not :)
  3. Non-ASCII Metadata In PDFs

    It turns out that our PDFs [most in v1.3] had no metadata. Indexing of these files was done purely on the basis of their main body content. We had a database containing all this metadata - and decided to apply the data to all the existing PDFs, since our new search handles the presentation of PDF results slightly differently.
    This was straightforward for ASCII text. A perl script using the PDF::API2 module did all the work.
    However it failed for non-ASCII text.


    From reading the PDF 1.3 Reference Guide , it's clear that all non-ASCII [or at least non western European] metadata should be UTF-16BE encoded. We had been passing in UTF-8 strings, and didn't really know what PDF::API2 was doing with them. Well, my colleague Phil Hooper figured it out, and fixed the PDF::API2 module in the process. I believe his fix will be in release 0.62 of the module.  Nice work Phil.


What I'd like To See


  1. Translation Linking

    It irks me that we have million$$ worth of translations on docs.sun.com - but there's no easy way to find out which books are actually available in a particular language. Or if I navigate to an English book, is it available in Korean?

    Having to navigate the product tree for each language is more than a little cumbersome.

    I think I've found an internal database that maps the relationship between English part numbers and translations. Armed with this, and provided the mappings are accurate, it should be quite straightforward to add a widget that automatically lists the available translations for a page/book/part number.
    Here's a very alpha prototype of Translation Finder.
     



  2. Extending the Translation Finder concept to search results

     If the Translation Finder widget works, then there's no reason why it couldn't be extended to search results. That is, for each search result, you're provided with links to the translations, if available. Something like below, where the flag icons depict the available translations:

    A rough mockup of what the search results might look like



Tell us what you want below » 

I'm mainly concerned with internationalization/localization features, but I'll advocate for any other general feature requests/improvements.

Friday Jun 15, 2007

We've translated some targeted sys-admin type docs into Simplified Chinese & Japanese.
Hopefully this will make it easier for some of our customers to use our products.

Right now we've published these translations to the Multilingual BigAdmin hub, but going forward we're considering trying to more properly internationalize the BigAdmin portal itself, as opposed to confining this to a hub.

I'll post more here, as that effort progresses.

http://www.sun.com/bigadmin/hubs/multilingual/



Just a word of thanks to the user Suleyman who contributed a whole bunch of Turkish translations to the Multilingual  Technology Glossary - an OpenSolaris project.  

Suleyman has provided the translations for both many terms and definitions.

Nice work! 

 

Monday Dec 11, 2006

So recently, I was doing some internationalization [i18n] testing on one of our customer facing web applications.
I came across a most obscure bug that caused the Asian characters in HTML <title> tags to be garbled.

The offending <title> tag strings were all passed as parameters in a JSP include - as per:

<jsp:include page="header.jsp" flush="true" >
<jsp:param name="title" value="some, シ, sample‚ 理は, Japanese, テサ, characters" />
</jsp:include>

Content-type headers were all set to UTF-8 .

Now, garbled Asian characters are standard fare in i18n. What made this one unusual, was that it only occured when you did not have "en-US" somewhere your browser language preference list.

Anyway,  having read John O'Conner's very informative article, I took a look at the sun-web.xml deployment descriptor. It had the following:

</locale-charset-map>
<locale-charset-map agent="Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:1.2.1)
Gecko/20030711" charset="UTF-8" locale="en_US">
<description>Charset mapping for English</description>
</locale-charset-map>
<locale-charset-map agent="Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)"
charset="UTF-8" locale="en_US">
<description>Charset mapping for English</description>
</locale-charset-map>

So, if I've read the docs correctly, the above tells the app server which encoding
to use for request processing - depending upon the client agent, and how it is
configured.

So, if you are using the above agents on the above platforms, with "en-US" configured in your language preferences, then the application server will use UTF-8 for request processing. Otherwise it will default to the system encoding, which in my environment was ISO-8859-1.

I presume that this legacy configuration was from a time when UTF-8 was not as well supported as it is now.

Anyway, I removed the above, and replaced it all with:

<parameter-encoding default-charset="UTF-8"> 

Bug fixed.

This blog copyright 2009 by MickM