Tuesday Jun 14, 2005

Did you get the message?

In what language do you plan to run OpenSolaris?

Fundamentally, it's not hard to write a localized application, right? You just list all of the different strings that your program might output, and you provide translations for them all in the language of your choice.

Since my native language is English, I would start with a file something like this:

	# usr/lib/locale/C/LC_MESSAGES/yummy.po
	msgid "I would like another beer, please."
	msgstr

After bribing enough multilingual translators, or perhaps drinking all of the beers myself and surfing on over to AltaVista BabelFish, eventually I would end up with several more files. For example:

	# usr/lib/locale/es/LC_MESSAGES/yummy.po
	msgid "I would like another beer, please."
	msgstr "Quisiera otra cerveza, por favor."

	# usr/lib/locale/fr/LC_MESSAGES/yummy.po
	msgid "I would like another beer, please."
	msgstr "Je voudrais une autre bière, svp."

	# usr/lib/locale/de/LC_MESSAGES/yummy.po
	msgid "I would like another beer, please."
	msgstr "Ich möchte ein anderes Bier, bitte."

	# usr/lib/locale/zh_HK/LC_MESSAGES/yummy.po
	msgid "I would like another beer, please."
	msgstr "我會要其它啤酒, 請。"

But sometimes, as they say, el diablo is in the details. Instead of "you" and "your program" and "the language of your choice," let's talk about "hundreds of contributors" and "hundreds of programs and libraries" and "all of the languages spoken by your customers."

Now we've got a much more interesting problem space; in the ON Consolidation command and library domains alone, a quick grep reveals over 22,000 unique messages:

	fivetwelve {4}% grep '^msgid' SUNW_OST_OS*.po | wc -l
	   22625
	fivetwelve {5}% 

As you might expect, we've developed both processes and tools to help us manage this complexity. Also as you might expect, those processes and tools are somewhat arcane. And, since they're largely integrated into the build system, they're further obfuscated by some pretty heavyweight makefiles. Just for grins, let's see if we can shed a little light into the dimly lit corner of Hell in which these makefiles reside.

If you're more interested in the bigger picture, you'll want to shout out to the folks over in the Internationalization and Localization Community. But if what you really want is some quick and dirty information on how to hook into the ON makefiles, grab a cup of coffee and dig in.

We'll start with individual commands and libraries, then describe the magic that binds it all together. In general, for any directory below usr/src, you should be able to make _msg and generate the catalogs for all of the subdirectories in that part of the source tree. So, for every intermediate directory, you'll need to be sure the following macros and rules are set in the makefile:

	MSGSUBDIRS= some list of subdirectories
        
	_msg := TARGET= _msg

	_msg: $(MSGSUBDIRS)

	$(MSGSUBDIRS): FRC
		@cd $@; pwd; $(MAKE) $(TARGET)

	FRC:

What does that do? It tells us that, whenever we try to build the _msg target in this directory, we really need to build it in each of the specified subdirectories. You can find an example of this in the ON commands makefile.

At some point, you'll decide that "all the messages below this point belong together, and should be collected into a single catalog." When you decide that, you'll want to specify the following in your makefile:

	POFILE= the name of your single message catalog

	$(POFILE): list of sources
		build rule

	_msg: $(MSGDOMAINPOFILE)

	include $(SRC)/Makefile.msg.targ

The included file, usr/src/Makefile.msg.targ, contains the rules that will actually take your message catalog and put it into a staging area, where it will later be combined with other catalogs from the same text domain. That's what the $(MSGDOMAINPOFILE) dependency does. You've got some other decisions to make for "list of sources" and "build rule," and this is where it gets harder to follow. Hang on; the next couple of examples will further clarify.

Your list of sources will contain some combination of shell scripts, lex code, yacc code, c code, or other message catalogs. To limit the scope of this lengthening diatribe, let's assume that you don't have any lex or yacc or shell scripts. Instead, you have two subdirectories (call 'em fu and bar, of course) with c code. For your list of sources, you're going to have two message catalogs, one per subdirectory:

	MSGSUBDIRS= fu bar

	POFILES= $(MSGSUBDIRS:%=%/%.po)

	POFILE= fubar.po

	_msg := TARGET= _msg

	$(POFILE): $(POFILES)
		$(BUILDPO.pofiles)

	$(MSGSUBDIRS): FRC
		@cd $@; pwd; $(MAKE) $(TARGET)

	_msg: $(MSGSUBDIRS) .WAIT $(MSGDOMAINPOFILE)

	FRC:

	include $(SRC)/Makefile.msg.targ

If you're a makefile geek, you can safely skip this explanation. Here's what you've just done: you've told make(1S) that you want to build a single catalog, called fubar.po, by combining the two catalogs fu/fu.po and bar/bar.po using the $(BUILDPO.pofiles) macro. To keep make(1S) from saying "I don't know how to make fu/fu.po," you've told it to wait until it's done with make _msg in fu before you try to reference fu/fu.po (and similarly for bar/bar.po). You could still shoot yourself in the foot by trying to make fubar.po, but that's why we limit ourselves to make _msg for this stuff. You can find an example of this in usr/src/cmd/lvm/Makefile.

Now let's finish with the trees in this example, so we can at least touch on the forest. Let's say that, in fu, you've got x.c, y.c, and z.c. Furthermore, if you run z.c through the C preprocessor, the message strings change. Here's the pertinent part of the fu makefile:

	MSGSRC1= x.c y.c
	MSGSRC2= z.c
	MSGFILES= $(MSGSRC1) $(MSGSRC2:%.c=%.i)

	POFILE= fu.po

	$(POFILE): $(MSGFILES)
		$(BUILDPO.msgfiles)

	_msg: $(POFILE)

	include $(SRC)/Makefile.msg.targ

What's different? First, you're not referencing the $(MSGDOMAINPOFILE) dependency. That's because you don't want fu.po to be treated as an independent message catalog; instead, it will be further processed into ../fubar.po by the previous makefile. Second, you're causing z.c to be preprocessed according to the .c.i suffix rule . Third, you're using the $(BUILDPO.msgfiles) macro instead of $(BUILDPO.pofiles). It's more efficient to call xgettext(1) just one time on three source files than it is to call it three times and then postprocess the resulting catalogs. And, like you might infer from the previous example, you can find an example of this in usr/src/cmd/lvm/util/Makefile.

So, with these examples, perusal of the make(1S) manpage, usr/src/Makefile.msg.targ, and usr/src/Makefile.master, and sufficient caffeine, you can probably figure out how to build your message catalogs. Take a deep breath, because we're about to shift our focus.

That takes care of your side of things; now what happens with those catalogs after you're done? If you were really paying attention, you probably noticed the following in usr/src/Makefile.msg.targ:

	$(MSGDOMAIN):
		$(INS.dir)

	$(MSGDOMAINPOFILE): $(MSGDOMAIN) $(POFILE)
		$(RM) $@; $(CP) $(POFILE) $@

and the following in usr/src/Makefile.master:

	#
	# For source message catalogue
	#
	.SUFFIXES: $(SUFFIXES) .i .po
	MSGROOT= $(ROOT)/catalog
	MSGDOMAIN= $(MSGROOT)/$(TEXT_DOMAIN)
	MSGDOMAINPOFILE = $(MSGDOMAIN)/$(POFILE)

What's going on now? We're taking all of the catalogs with a dependency on $(MSGDOMAINPOFILE) and we're copying them into a staging area. More specifically, that staging area lives in $(ROOT)/catalog. If you're building the fubar command message catalog on an x86 box, for example, that means you've just installed your catalog in proto/root_i386/catalog/SUNW_OST_OSCMD/fubar.po. And there it will stay, until make _msg gets invoked from usr/src/pkgdefs/SUNW0on.

Which brings us to usr/src/pkgdefs/SUNW0on/Makefile, which is the bridge between the tools and processes that we mentioned way back at the top of this entry. In this makefile, we find the following comment describing the _msg target:

	# The _msg target gathers the output of the top-level _msg
	# target into text-domain-specific message files under the
	# ROOTMSGDIR for packaging.

...and that's it. This takes everything from that staging area and uses it to build a package called SUNW0on. If you've read my other opening entry, then you know that "on" is short for "operating system and network code," and that we're just one of many consolidations that make up the Solaris product. All of the different consolidations send their SUNW0* packages to a group of folks that localize them and provide additional locales for Solaris.

For convenience, here's a collection of makefiles that should satisfy curiosity about the standard way to do this. These files might make useful cut and paste sources, but be careful: the messaging-related macros, rules, and includes are scattered throughout the files, and the placement (particularly of the include) is important.

  • usr/src/cmd/lvm/Makefile

    This is an example of a command- or library-level makefile. It passes the message target on to its subdirectories, then combines the resulting catalogs. It then installs the combined catalog into the staging area.

    Note that it uses catalog as the target, instead of _msg. These two target values are often used interchangeably. The important thing isn't which one you choose, but that your parent and child makefiles agree. No matter what you use in your own hierarchy, your top level must handle the _msg target, because that's what it's going to get from the ON makefiles. Supposedly, _msg was used instead of msg because the message catalog extraction was retrofitted into the makefiles, and there was concern about conflicting with existing usage.

  • usr/src/cmd/lvm/metassist/Makefile
  • usr/src/cmd/lvm/metassist/Makefile.targ

    Together, these are an example of an intermediate-level makefile. It passes the _msg target on to its subdirectories, then combines the resulting catalogs. It does not install this catalog, but relies on its parent makefile to take care of that.

  • usr/src/cmd/lvm/metassist/common/Makefile

    This is short and sweet. It uses $(MSGFILES) and $(BUILDPO.msgfiles) to create a message catalog from preprocessed C source code. It does not install this catalog, but relies on its parent makefile to take care of that.

If you're still reading, I'm amazed, but I can suggest some related questions:

  • How do you provide a translation note in your catalog?
  • When do you use gettext(3C) vs dgettext(3C) in your code?
  • How are non-LC_MESSAGE category messages handled?
  • What are some pitfalls of translating format strings?
  • Why should you use the BUILDPO.* macros as the action in your own rule instead of just depending on the pofile_* targets?
  • How come usr/src/cmd/chown/Makefile causes chown.po to be created without referencing any of this complicated crap?
  • How do I test my internationalization?

...if somebody's interested, I can follow up on these questions or any others. Or not; perhaps I'll choose something less useful and more interesting for my next ramble through the woods.


Technorati Tag:
Technorati Tag:
Comments:

Post a Comment:
Comments are closed for this entry.