Monday Nov 13, 2006

Wow - this is a very good thing.

Having been heavily involved in the open sourcing of a (very small) project myself, I can well imagine the huge amount of work that must have gone on behind the scenes in order to get Java released: big ups to the people that made it happen!

I'm not getting to write as much Java these days as I used to, but as a programming environment, I've always really enjoyed working with Java - Netbeans made that experience better still, so here's hoping that the open sourcing of the language and platform will open the doors to an even brighter future for everyone.

Oh, interesting factoid, vaguely related to this topic: the first project that I was the lead engineer for in the Sun European Localisation group, was for JDK 1.1, back in 1996 - writing fonts.properties files for a bunch of European locales on Solaris 2.6. Wonder are any of my additions still in there ? (I'm guessing not, but you never know!)

Thursday May 11, 2006

Happy Birthday Open Language Tools!

Lucas was nice enough to point out on the Open Language Tools users@ mailing list that today is the project's 1st birthday - this is pretty cool, and it's nice to see our project is still being used by people out there!

These days, I'm not getting to do much code on the project, Boris & Michal are doing a fantastic job though. I still remain a huge supporter for the cause: Open Language Tools : let's keep making translation easier for everyone!

In related news, I see that the OpenSolaris Internationalisation & Localisation Community have released sources for OpenSolaris ON messages - anyone looking for a decent translation editor ? I know of one if you need it ? :-)

Monday Oct 17, 2005

I'm in Prague at the moment starting my last week in the G11n org at Sun. I'm going to be moving into the Operating Platforms Group next week, working initially on test development for ZFS and who knows what else after that. (hence my post of a few weeks back, when I heard I'd been offered the job)

This was a pretty difficult decision to make : I've been working on g11n-related things for the last 9 years, since I started at Sun, initially doing localisation testing, then moving into l10n engineering and release management, a little work on internationalisation and then finally into translation tools development. I've been working on this problem area for quite a while, and have been doing my very best to lose my job. Did I succeed at that ? Well, no, I don't think I did, however I am proud of many of the things I've achieved while working on g11n. In particular, I still grin from ear to ear when I think about the Open Language Tools project, which was my first foray into open source development. I've left G11n at Sun in a better state than it was in when I started, so I'm pleased with that.

The pros of the move of course outweighed the cons (which is why I'm moving). Sun, as I've mentioned before, is an amazing company, and there's no shortage of interesting projects that any engineer would give anything to be able to work on. Solaris (and OpenSolaris) is a pretty large draw for me and the opportunity to work on it was something I had to grab. So, for the next while, I get to be a complete newbie : I'm not sure how much I'm going to be able to post about the work I'm doing - they say you should blog about stuff you know - I know translation tools pretty well, but test development ? Well, it'll be a learning curve that I'm going to enjoy whooshing along - so things may go quiet here for a bit, in terms of work-releated posts. We'll see how it goes.

So, this week, I'm here in the Czech Republic, giving tutorials and chatting with Boris, Michal and Zdenek about the remaining bits of translation technology that we haven't yet open sourced, and working out how we're going to do that. Let me know if you've any thoughts, questions or concerns : as I say, I'm just moving to another group, I'm not going to instantly forget everything I know about g11n, and I'll keep this blog category and will try to post occasionally here : there's not many other folks on blogs.sun.com talking about translation and internationalisation, where are all you guys ?! Come on in, the water's fine !

Wednesday Sep 21, 2005

This is the third (and possibly last) post in this series of me thinking about the question "How much translation do we need ?" For the full background, read posts about on-the-fly dictionary lookups and word/phrase frequency in GNOME..

Remember, the point of all this, is to see how to best use our translation resources. Given a small pile of resources (be it hard-cash, pizza, free-beer, whatever..) how do we use those resources most effectively to do translation ? Note that this isn't necessarily the most complete translation, nor the most accurate - I'm interested mainly in what's "good enough" here.

I remember reading a while back, some ghastly figures showing exactly how many of the features in a given needlessly bloated wordprocessing application (names withheld to protect the guilty!) that typical users ever access, and being pretty amazed at the results. I can't find the original article now, but the summary was, that there was a huge number of features that were never actually used!

So, the thing is - if there's loads of features that are never used, that in turn must mean that there's loads of translations that are never used either ! Can you see where this is going ?

What I thought would be interesting, would be to have a go at determining what strings are actually displayed in a given application, and then compare that against the software message files for that application and see what we come up with. Initially, I started wandering around the excellent GNOME Accessibility Project, thinking that I should be able to easily capture the displayed text in an application that way. So far, my results seem to have been mixed. Being of a Java persuasion, I immediately started messing about with the test programs JNav and TestAT. These seem to work up to a point, but unfortunately not all displayed strings actually appear from these programs output (for example, it doesn't seem to read tooltips to me) - I could be doing something wrong here, comments welcome!

I was a bit dismayed by this - thinking that I'd have to do backflips to get this info - previously I had been using library interposers to intercept calls to gettext, which gives me some info alright, but a string that's being that looked up via gettext isn't necessarily the same as a string that's displayed on screen.

But, there's more than one way to do it - DTrace to the rescue! - a 6 line script (sigh) seems to tell me what I'm after - at least for applications that draw strings using Pango, but of course, we could do similar with any text rendering mechanism or function call :

#!/usr/sbin/dtrace -qs

pid$1:*pango*:pango_layout_set_text:entry
{
  printf("called \"%s\"\n",copyinstr(arg1));
}

Of course, this will give me everything that's displayed, not just translatable strings, so it'll include user-output and other dynamically generated strings too (and indeed, they'll already have the printf format strings from the message files replaced by the actual values) but I'm making progress. Next step, compare these against message files, and find out what strings are actually called. Does this sound like fun ?

Given any application, I'd love to be able to say to a translator : "at least you should translate these strings first, then worry about the others..."

More effective use of translation resources - I like it.

Tuesday Sep 13, 2005

I'm awake insanely early this morning, since about 4:30am, as the missus was catching the red-eye over to London to attend a nutrition conference. I tried to get back to sleep, but it just wasn't happening. So, I decided to spend the dark hours of the morning catching up on the weekly podcasts - which are something I'm still getting used to : trying to find good tech-related content seems to be very difficult (any suggestions for good content ?)

One of the shows I've been listening to, is the BBC's Go Digital programme - which is also broadcast via the World Service. This week's show, had an item that caught my attention that I thought would be worth writing about - and the fact that it's actually on-topic for this blog is a pleasant surprise, given the recent frequency of translation/tools related posts !

The item concerned the 2nd generation of a phone, the FOMA RAKURAKU II, which is being rolled out by NTT DOCOMO in Japan and is aimed directly at elderly users (I found one reference to it at japan.c-news.net - sorry, finding more details about the phone in English seems to be difficult, but if anyone has links, I'll update this post).

Targeting it's intended audience, the phone has large buttons, a very clear display and an easy-to-use interface, with text-to-speech elements built in. What I thought was interesting though, was that it has a voice-slow-down feature. Incoming conversations are recorded, and can then be played back at a slower speed, so that users of the phone can take in all of what's being said. The example used in the podcast, was someone who was in a hurry trying to give directions via mobile, and the recipient of the conversation was able to get a slowed down version of those directions (oh, please tell me that emergency-services operators (999/911/etc.) use a variant of this technology ?) Sounds like a really useful service, but apparently, the voice-slow-down feature is implemented on the server-side !

This being the case (you knew that The Network is the Computer, didn't you ?) I started thinking - this should be on every phone, since all the intelligence is server-side : being able to slow down conversations wouldn't just be useful for elderly users - if I'm trying to book a service where the operator at the far end speaks a different language than I do, the first thing I usually want, is for them to slow down ! With this technology, I could have them do that without having to embarrass them or me. Fantastic.

Thinking a bit more about that : what other services should we have on our phones that we don't already, from a multi-lingual point of view ? In terms of making phones more accessible, wouldn't the ability to have SMS messages interchanged via some machine translation system be useful ? (or even speech-to-text, machine translation and text to speech conversions ?) Yes, I really want a babelfish, but while those things are still fictional, would a simplistic implementation of existing translation technologies at least be worth exploring ? I'm sure there's areas out there which could really help users that mobile operators haven't started using yet...

btw. short status update, unrelated to this post : I'm looking at XLIFF 1.1 support in the Open Language Tools at the moment, and hope to have something before the end of the week. I'm also trying to cobble together a prototype for my next exercise in the "How much translation do we need" series, which I'll blog about once I've got something working.

Tuesday Aug 30, 2005

Following on from my bag of words post, folks suggested I look at doing n-gram word lists as well, and seeing their frequency distribution. I did that, but I'm not sure if the results are terribly interesting, but it wasn't much extra work, so that's okay. I suspect my sample size was too small ...

The Wikipedia page on N-grams might be more interesting perhaps ?

(the post that started all this is back in the archives a bit)

Cool - Jean-Christophe Groult has written an SVG config file for the Open Language Tools XLIFF Filters ! I'm pretty excited about this, as SVG was one of the formats I'd always wanted to have a go at but just never got the chance. There's just a test version in his email announcement, so hopefully with a little help, we'll get some more heavy-duty testing of it, so that it'll be able to make it into the next release of the XLIFF filters. If you'd like to lend a hand, that'd be great!

By the way, if anyone else has XML filetypes out there that you'd like to see supported in the Open Language Tools, then have a look at xml-config.dtd and some of the sample files we ship with and perhaps submit a new config file. XHTML, anyone ?

Wednesday Aug 24, 2005

Simos was thinking more about what may or may not have been a fairly zany idea that I've been playing around with in my ample free time of late. He suggested that it might be interesting to see what sort of word distribution we have in the GNOME UI at the moment (so as to determine what words would be most beneficial in a bi-lingual dictionary that said-zany-idea uses).

I managed to dig up some sources for GNOME 2.10, and since I didn't want to build all the POT files, I just took the pa.po translations (which were listed as 100% translated), and concentranted on the msgid strings.

I wrote a quick bit of Java (~150 lines), which, using the Open Language Tools PO parser, pulls the msgids, uses a Java BreakIterator to split up words, blasts them to lower case, and writes out a frequency distribution. The program stdout is here along with an OpenOffice doc containing the list of the words, and the frequency they appear in the UI.

Now, if we got the top x words translated and put into a dict formatted dictionary, then perhaps my idea of trying to bridge the digital divide by providing just enough translation wasn't as zany after all ?

As always, thoughts and comments welcome.

by the way, it's nice to know the most common English word in the GNOME 2.10 UI is "the" - who'd have thunk ;-)

update: - of course, I should have said "... top X nouns translated ..." above : translating other parts of speech probably wouldn't help in this case.

Friday Aug 19, 2005

Wow - that's embarassing : the '06 Dublin area phone book has an internationalization bug, where names containing characters that should have a fada ( eg. á,é,í,ó and ú ) were replaced by a space. Wonder what my friend "S an" thinks of this ? Nice one éircom ! :-)

Tuesday Aug 16, 2005

Over the last few days, I've been thinking to myself : how much translation do you need to get the jist of a user interface ?

I mean, if you can't read English, but were presented with a UI that has simple dictionary lookups of English words placed in brackets after the original text, is that interface better or worse than just seeing the English interface ?

As I'm a native English speaker, this is a tough one for me to get my brain around, so as an experiment, I wrote some code to wrap itself around calls to gettext (and other i18n message calls), tokenize the input into words, and perform dictionary lookups on the fly. Here are results of me running gedit against a cz-CZ dictionary : would this sort of thing be helpful to non-English speakers ?

dynamically generated interface, from a dictionary lookup of tokenized strings

I know that the ideal state would be properly translated messages for an application, but in the absence of those, does this help at all ? Going further, I can't see any reason why I couldn't the same thing to a non-internationalised application - wrapping calls to GUI-text-drawing primitives, etc... Specifically, I'm wondering if this helps make computers more or less accessible.

Thoughts and comments welcome !

Tuesday Aug 09, 2005

I'm stunned, amazed and very pleased to see that Boris Kalabuhov has completed a Russian translation of our our XLIFF Translation Editor User Manual ! Wow - impressive.

Monday Aug 08, 2005

On Friday, we announced version 1.1 of the Open Language Tools. Our release notes for the XLIFF Translation Editor and XLIFF Filters contain the full story, but here's a quick rundown of the changes in this release :

  • Stability and performance enhancements in the Editor
  • Initial support for the Open Document Format in the Filters
  • Improved XLIFF support in the Editor
  • Internationalisation and translation into 6 languages for the XLIFF Filters
  • Stability/Functionality bug fixes in the XLIFF Filters

So now we continue the work towards the next release of the tools - here's some of the things we're targetting :

  • Improved XLIFF compliance
  • SRX support
  • Internationalisation for the Editor
  • Ease of use for XLIFF Filter configuration
  • Mini-TM import/export to TMX
  • Trados TTX to XLIFF conversion
  • anything else ? you decide !

We're looking forward to working on the 1.2 release - there are great things ahead, and we're all pretty excited about the new features coming down the line. In the meantime, we'd love to hear any feedback you have on the 1.1 versions of the tools - let us know !

Friday Aug 05, 2005

Micah was in touch recently, saying that our segmentation of Japanese text in the filters wasn't very good. That was fair enough : really the segmenters were really only ever tested for English text, and I didn't think they'd do a good job for any other language. In a vague attempt to handle non-English scripts, I did scour the Unicode character list, and include anything that sounded like it might indicate a sentence boundary : unfortunately, I got it wrong : Middle-Dot in Katakana isn't a sentence separator. Just shows how much Japanese I know !

Today, I put back some changes and mailed our dev@ list to see if these were okay - can anyone else help to test Japanese segmentation ? I'm not a Japanese speaker (though I did try learning it a few years ago : didn't have enough time or inclination to continue, unfortunately) so I'm a bit in the dark here...

Now, I had thought that using the default BreakIterator for the Japanese locale would be enough for doing sentence segmentation in Japanese, but other folks were saying this wasn't so good : can anyone explain more ? (I'm using that BreakIterator to produce more accurate wordcounts for Japanese now, by the way)

Oh, thanks to Monma for unwittingly supplying me with some convenient Japanese text I used to see if the segmenter worked - any more opinions would be gratefully received!

Tuesday Aug 02, 2005

We're definitely nearly there, so in the meantime, have a look at our announcement of release candidate 3 of the open language tools !

Downloads available at the usual place...

(by the way, if it seems like this blog has degenerated into an announcement forum and a place for me to vent, then you're probably right : sorry about that - will try to work on some more positive, exciting and altogether happier content real soon!)

Thursday Jul 28, 2005

Gaah, too busy ! We released RC2 of the 1.1 Open Langauge Tools this week which is pretty cool. I unfortunately am spending the this week trying to get some new keyboard layouts done for Solaris and am not able to contribute to all the excellent discussions that are going on in the Open Language Tools aliases about RTF support, Japanese-language sentence segmentation and the direction the tools are going in - hopefully I'll be back there next week...

This keyboard support work is pretty mind numbing, but the tight deadline I'm working to is making it mind numbing, but stressful work! Still, if you're likely to want to use an Albanian, Belarusian, Bulgarian, Croatian, Icelandic, Maltese, Serbian and Montenegran, Slovakian or Slovenian keyboard in Solaris (for Xsun at least), then I guess it's all for a good cause. Hopefully I'll be able to put together a blog post about what's involved in adding keyboards to Solaris at some stage - unless someone wants to pre-empt me ?

Back to the grindstone...

Thursday Jul 21, 2005

We've now got Release Candidate 1 builds of the 1.1 releases of the Open Language Tools, both the XLIFF Translation Editor and the XLIFF Filters. Check out Boris' announcement here.There's been a lot of changes to the XLIFF Editor to make it more stable and improve user-experience over the 1.0 releases, and quite a few changes to the XLIFF Filters behind the scenes (including preliminary support for the Open Document Format) (sorry, I still need to work on the release notes for the filters)

The release candidate builds are available here : we'd really appreciate your help in testing them out in preparation for the 1.1 release, thanks !

Tuesday Jul 19, 2005

Making more progress on the tools front - Boris put a load of fixes into CVS today, and I added support for converting CSV files to XLIFF [ okay, command-line access only, for the time being, but it's better than nothing ].

Full details on to the stuff we committed can, as ever, be seen in our cvs@... mailing list archive. You should be able to see the results in tomorrow's nightly builds, or you're feeling impatient, you can always compile from the sources. Enjoy!

Saturday Jul 16, 2005

Wow - http://www.java.net has a link to our project on the front page : that's pretty excellent for our little project to have made it that far already, next step is of course, total world domination ! Guess we can keep those sunglasses on for a bit longer, and definitely need to get our 1.1 release out soon, while we're getting all this extra publicity !

Thursday Jul 14, 2005

Yep, feel free to wear sunglasses indoors all you Open Language Tools people ! We're the Java Tools community Cool Tool of the week - Fantastic !

Wednesday Jul 13, 2005

Just posted this announcement about nightly builds for our project - it's taken us a while, but we're getting there!

Monday Jul 11, 2005

Found via Simos' post to GNOME-i18n, the IOSN have released the first version of their FOSS Guide to Localisation. This is really quite an excellent document, covering all you need to know about translation, localsiation and internationalisation for a lot of open source projects. Haven't had the chance to read it in-depth yet, but from a quick browse through the contents, it looks pretty comprehensive ! Congratulations to all involved !

It doesn't yet seem to mention the Open Language Tools, but perhaps we'll get a mention in the next release - I've been in touch with the ISON guys already (before today's announcement actually !), but have just been too swamped to get around to writing the article about our tools that they asked for... Hopefully, I'll get around to it for the next release of the guide.

Tuesday Jul 05, 2005

I'm taking a few days off work at the moment (not part of the July shutdown though) Both myself and the missus have had an extremely hectic few weeks, so we're spending Monday, Tuesday and Wednesday this week hanging around the house, doing the sorts of things that we don't normally get a chance to (you know, finally getting around to that weeding, hedge trimming, me cleaning my bike, that sort of thing).

One of the things we were able to do, was go furniture shopping yesterday - together, for once! We've been meaning to get a new bed for one of the spare rooms for a while, and thought we'd try to get that sorted out.

Most double beds, it seems, come in two sizes : 4'6 and 5ft. The room we have isn't huge, so a 4'6 bed would do fine. Now, here's the great part : matresses also come in two sizes - 4'6 and 5'. Duvets, fitted sheets, bedspreads, throws and valances (no I don't know what they are either) are also made in in 4'6 and 5' sizes.

Interoperability ! Wow, what a concept - the bedding industry has this already sorted. (maybe you can see where this is going, in a sort of Rolf Harris-"Can you tell what it is yet?" stylee...)

The translation tools industry hasn't worked this out : we're definitely getting there, but sadly, not everyone's on board so far. One of the first requests we got on the Open Language Tools project, was support for the the Trados TTX format. (here's the thread if you want to follow the whole discussion)

Initially, I wasn't sure what to make of this request. I mean, our philosophy states that we want to deal with open standards, and that everything we do should be based around those standards. TTX is not an open standard, it's a de-facto standard, but it's not open.

My thoughts to begin with, were that we shouldn't support the format : by making our tools deal with a standard that wasn't (so to speak) we'd end up in a constant tail-chasing exercise, having to jump everytime Trados changed their format. What's worse was, that since Trados don't document the standard, I'd have to rely on sample files people mail me in order to work it out...

Now, thinking a bit more about the whole thing, I'm starting to think that is is probably more useful to try to reverse-engineer the file format, so that people can get out of the Trados-trap, and start using real open standards.

Here's the problem though, and this is why open standards matter : I spent a while trawling through Google trying to find how TTX files were structured. Of immediate interest, was a document on the Trados website called The Trados File Formats Reference Guide - but nothing there (or on the rest of the Trados website, for that matter) was of any use at all. Then spent a bit longer looking through translator's news groups and forums looking for something similar - no joy there either : maybe I'm just missing something obvious - is this stuff documented anywhere ? Our users can't open Trados files until I write this support, and I can't even find sample TTX files anywhere.

Here's a request : Trados, can you please document this format somewhere that's freely available to anyone who wants it ?

SDL announced recently that they're buying Trados, so I've got great hopes that good stuff will come of this - I've talked to SDL folks in the past, they're good guys and seem to grok open standards too, which is great. Let's hope they do the right thing, and both open up the specs for the Trados TTX format, and really start to push people over to using XLIFF and TMX : let's rid ourselves from these closed standards please and advance ourselves to the technological level of the bedding industry !

(If you feel like having a go at describing the TTX format, let me know, and I'll post a link to it here)

Monday Jun 27, 2005

Wow, so I've been pretty quiet here since the Open Language Tools announcement, for excellent reasons. There's been quite a lot of interest in the tools so far, which is really encouraging. There are bugs in the code, and we're making fast progress at getting them resolved. As expected, we've got a fair few eyeballs on the project already - if you feel like adding yours to the effort, we'd love to have you along !

Tuesday Jun 21, 2005

I'm delighted at last, to be able to announce the first binaries, and source code availability of the Open Language Tools project.

The aim of these tools is to make the task of translating software and documentation as easy as possible, and so allow more people to use computers than ever before. If you can't use a computer because it's interface isn't translated into your language then we want to provide tools that can help.

Based around open standards such as XLIFF and TMX, we strive for interoperability with other translation tools, while at the same time creating an open-source implementation of these standards that can serve as a reference for others. The source code is released under the terms of the CDDL.

Today, we're announcing the two components of the tool set : the Open Language Tools XLIFF Translation Editor and the Open Language Tools XLIFF Filters - for screenshots, downloads, information on how to access our CVS, and a project FAQ, please visit our project home page. We've got a list of areas where we'd like help on our home page, so if you feel like lending a hand, we'd love to have you along !

Tuesday Jun 14, 2005

... well, now you can ! Congratulations to the OpenSolaris folks for getting the code out there. Now, if you're done using them, can I have some of your legal folks to finish getting our translation tools released as well ? :-)

aside #1 : actually, I'm kidding - the process for releasing our tools is well under way, suspect next week we'll have some news.

aside #2 : I was really hoping to beat OpenSolaris to a release date : chatting to some of the guys here yesterday, their response was "What, you think you'll have your stuff ready by 4pm tomorrow?" Aah well...

Tuesday May 24, 2005

So I'm still working on documentation for the release of our translation tools, oh and getting distracted on all sorts of other stuff, hence the slow progress, but I am making progress.

I've got the (currently, somewhat monochromatic) project home page started on java.net, with some information about what we're doing and why. I need to spend a little while playing around with some graphics to give some colour to the front page, though that's very low priority right now. I should also probably add some screenshots once I have them. There's already a simple FAQ up there and also some articles to introduce people to the translation editor and XLIFF filters (as if this web log for the last year wasn't enough !)

Tomorrow's tasks will include refining the package-renaming script (which now also adds the contents of a file to a java comment at the top of each piece of source, in place of a copyright statement) which I talked about a while ago, doing some test builds to make sure that everything works and then getting started on some build and installation docs. The former will probably be very easy, a README file containing the words "ant filters" and "ant transeditor" will probably do it ;-) but I might spend a bit longer on the latter though : there's still a few usability problems with the XML to XLIFF filter during installation, so it'll require a bit more work (hmm, should probably get the new OpenOffice OASIS file format working too before we release, would people like that ?)

I had been thinking that I wouldn't mention too much more about the tools on this blog till I actually had source code or binaries (at least) on the web site to show you, but it's looking like that'll take another week or so to get all the legal stuff resolved. So in the meantime, wanting to show you that I am doing something, I figure it mightn't hurt after all to give you a sneak peak at what's there at the moment.

You can get to the project (which is still inside the java.net incubator) at http://open-language-tools.dev.java.net The fact that we're still in the incubator means that we haven't got everything working as well as we'd like yet, the lack of source code is obviously quite a barrier to the project going anywhere yet, but hey - small steps.

The project is called Open Language Tools, and I hope you like it. I'd be interested in hearing any comments about what's on the site so far. As soon as we get out of the incubator, we'll be making all sorts of official announcements, so you didn't hear this from me, okay ? :-)

ps. this appears to be my 100th blog entry ! I'm sure that's a cause for celebration (or something)

Friday May 20, 2005

The New Scientist feedback page has another translation gem I thought you'd find amusing...

Generously conceding Peter Shaw's right to claim the coining of "redundant translation syndrome" and to a first in the Latin division with "seize the carpe diem" (2 April), Hal Kouns becomes a strong contender in the French division by pointing out that over the years he has received a number of requests to "please RSVP". But he has a challenger in Tom Reidy in Brussels, who read our piece on this topic only to switch on the TV news and hear a cardinal described as "parmi des 'top ten' des dix premiers candidats" for pope.

Meanwhile, Nick Papadakis storms into the Spanish division by reminding us of the La Brea Tar Pits in Los Angeles - or, translated, the The Tar Tar Pits. No doubt there will be more. (Quote Source)

Right, I'd better get back to the grindstone. By the way, I've found a way to ease the chore of writing documentation : I'm putting together a simple interface for our XLIFF filters - previously, the only way you could run them was from the command-line, eg. java blah.foo.filters.html.HtmlToXliff , but that's not terribly user-friendly, so I've got the bones of a gui that's a simple panel with a large "Drop files here" label to which you can drag & drop files that are then converted to XLIFF.

Drag and drop from native platforms to Java applications isn't as easy as I'd hoped it'd be; I seem to need to write special methods for different platforms, but so far, I've got it working fine with GNOME, Mac OS X and Windows, so that'll do for now. Of course, this code will be part of our open source release once we get that last bit of legal stuff we're waiting for. I'll let you know more as soon as I've got it!

Monday May 16, 2005

A photo of a bottle-nosed dolphin, quite obviously not worried about writer's block

Chris wrote a post the other day called "Writer's block sucks" and I just wanted to say, that I agree with him

As noted in my last entry, I'm trying to spend time writing up documentation, articles and FAQs about our translation tools in anticipation of them being open sourced. So far, it's not going too well : as I mentioned a while back, the corollary to "Writer's block sucks" is "Everything is interesting when you should be working". I tend to just want to lark about, doing things I enjoy doing (a bit like the photo on the left there - taken by me on Milford Sound, back in March ; clearly that dolphin isn't sitting about, fretting over how to write technical documentation !)

I'm trying hard to focus, honest - but it's quite a challenge ! I've now got the site up on java.net, though we're still in an incubator area, where we'll stay at least until we've got source code available, some decent docs and a mission statement of some sort. Work in progress, but as soon as we're promoted out of the incubator, I'll make lots of noise on this blog and will try to send emails to anyone who's talked to me about the tools at all over the last year or so. Thankfully the lack of documentation isn't a botttleneck so far : I'm still hanging on for the copyright statements from Legal before I can put any source code up there, but I'm not going to let that become an excuse !

As regards writer's block sucking, during lunch last week, JohnC (one of the original authors of the translation tools, now moved onto other things) made some pretty good recommendations for books about writing. He hasn't posted reviews yet, but perhaps this post will help to extract said review :-). The first was The Elements of Style and I've completely forgotten the second, sorry. While I haven't yet read that book, the reviews I've seen make it seem quite a bit more approachable than the venerable Chicago Manual Of Style which is a scary tome that is sitting here on my desk, looming darkly at me. Getting tired of my incessant whinging about having to write docs not code, DamienD flung it at me earlier in the week (it's a heavy book - I'm glad his aim was off!). I suspect that if I do take the time to read the chicago book before writing this documentation, I'll be old and grey when finished, and the tools will never get released. Better get back to it.

Tuesday May 10, 2005

An engineer's worst nightmare ? I get to spend today writing documentation for the impending open source release of our translation tools. I'm really excited about the release, but kinda wish I'd started writing docs a long time back.

The code itself is already well documented from a programmers point of view (aah, javadoc and Netbeans auto-comment, what would I have done without you!) and the translation editor has had an excellent user-oriented manual (in the form of a 56 page OpenOffice document) for a while, but there's not much documentation for our XLIFF filters and not much technical material covering the overall design of the system.

So today, I want to sit down and start documenting ! I think I'll try to write it from the perspective of what I'd want to know if I were a new programmer thinking about working on the tools, let's hope that's a good approach. (Definitely also need to consider the users point-of-view for the XLIFF filters though - need to do a little coding in there to clean up usage messages, so perhaps I will get to write some code today after all...)

I need to also get some html pages together with descriptions and introductions to the tool we're open sourcing, so I get to pretend to be a marketing person for some of the day as well ! eg. Hmm, now what font shall I use ?:-)

As regards timelines for all this stuff, it shouldn't be too long now - though I don't know more than that. We've got the final clearance to release the code, so I'm now just waiting for a copyright statement that I need to stick on the top of each file, then need to setup a java.net project, and we'll be ready to go !

Friday Apr 29, 2005

New Scientist have a rather interesting bit of feedback here about a German website that seems to have fallen foul of overt trust in the effectiveness of machine translation systems :

Searching for campsites in Germany, Candida Frith-Macdonald came across www.www.schwarzwald-camping.de. She suggests that English speakers with a few minutes to kill on a dull day could do worse than visit and click on the little flag that gives you the site in English. Or at least, she corrects herself, "something loosely approximating English". After all, how often are you offered a holiday including activities such as "Pancake shoes" and "Sausage crickets at the campfire", in surroundings "determined by natural wood and extreme substances"? (Quote source)

Worth considering, the next time you use a machine translation system I think. (Thanks to the missus for forwarding this link on to me!)

Update: maybe I'm wrong, perhaps it's just a plain old bad translation...

This blog copyright 2009 by timf