星期一 九月 22, 2008

According to the man-page, iconv(3C) would return the number of non-identical (or non-reversible in Gnu's vocabulary), when that happens. But what's the non-identical/non-reversible conversion? Try the following example:

$ echo "abc测试" | iconv -f UTF-8 -t ASCII
abc??


As you can see, the last two characters are out-of scope of ASCII, however they are legal/valid UTF-8 characters. In this case, iconv(3C) shall convert them to another two characters (non-identical-conversion character, in some cases it's '?'), and returns 2. However, Gnu's iconv raises an error (return -1, and set errno to EILSEQ) in this case. Though, according to its manpage, EILSEQ is set to indicate there is an invalid multibyte sequence in the input.

So, it not a portable way to use this method to tell if the target encoding is capable to represent the source contents. And you should not either rely on the non-identical conversion numbers. A successful conversion may return -1 and with E2BIG when output buffer is exhausted, meanwhile non-identical conversions may happen. And there is no flag in iconv(3C)/iconv_open(3C) to control whether to perform non-identical conversions or to raise an error.

In gernal, iconv(3C) is not a well-defined interface.

P.S., this post is a summary of the discussion between JDS/Evolution team and myself, to locate/isolate an iconv(3C) related bug.

星期五 九月 19, 2008

Just saw the message on [osol-discuss], that David is working on porting gnu-libc porting to Solaris/OpenSolaris, and made really impressive progress. Likes the efforts for porting glibc to BSD, it makes the GNU/kOpenSolairs (which means OpenSolaris Kernel + GNU userland) to be reality.

Here are the references:
  1. http://csclub.uwaterloo.ca/~dtbartle/opensolaris
  2. https://savannah.nongnu.org/projects/glibc-bsd

星期一 八月 18, 2008

Gnu-iconv supports more encoding conversions and provides better performances for some conversions over the Solaris-iconv. E.g., currently, Solaris-iconv does not support the conversions between GB18030 and UCS-2BE, UCS-4LE/BE, UTF-16LE/BE; and the conversions of GB18030<->UTF-8 (UCS-2LE) in Gnu-iconv is two times faster. And Gnu-iconv has a star-shaped structure with some exceptions, which uses UCS-4 as the intermediary encoding. While Solaris-iconv has a peer-2-peer structure (with alias), it's really painful to add a new encoding.

So, you may want to use Gnu-iconv library. For Solaris 10/Nevada, you could download&install gnu-libiconv from www.sunfreeware.com,  for opensolaris, you could install SUNWgnu-libiconv from pkg.opensolaris.org,  but the OS.o package does not contain the header files.

You may notice that, the function symboles in gnu-libiconv, had been added the prefix of "lib", e.g., iconv_open -> libiconv_open. So, LD_PRELOAD and RUNPATH are not sufficient for replacing iconv(3) routines in libc. You need to make sure to include the "iconv.h" from gnu-libiconv.



和Solaris的iconv相比较,Gnu-iconv支持更多的编码转换,并且在某些编码转换上有更好的性能。例如,目前Solaris-iconv不支持从GB18030到UCS-2BE、UCS-4LE/BE和UTF-16LE/BE之间的转换;而GB18030<->UTF-8 (UCS-2LE)在Gnu-iconv中的转换速度,是Solaris-iconv的两倍。并且Gnu-iconv是一种星型结构(也有某些点到点的例外情况),它使用UCS-4作为中间转换的介质。而Solaris-iconv是一种点到点的结构(支持别名),因此添加一个新的编码实在是有些痛苦。

因此,你可能希望使用Gnu-iconv程序库。对Solaris 10/Nevada来说,你可以从www.sunfreeware.com下载并安装gnu-libiconv,对opensolaris你可以用pkg(1)从pkg.opensolaris.org上安装SUNWgnu-libiconv的程序包,不过这个包没有包括头文件。

你可能已经注意到了,gnu-libiconv中的符号名,都被加上了"lib"的前缀,例如iconv_open->libiconv_open。因此LD_PRELOAD和RUNPATH并不能替换libc中的iconv(3)调用。你必须确保include gnu-libiconv中的"iconv.h"头文件。

星期五 五月 30, 2008

自由软件届的精神领袖和教父,Richard Stallman今天下午在清华科技园就进行了一场演讲。应该说,对RMS所倡导的自由软件思想有了更深入的了解。其中有几点印象深刻,free is for freedom, freedom has different levels, opensource != free software, commercial software != proprietary software, most linux distribution is not entirely free anymore。不过RMS在演讲中,提及了敏感的西藏话题,令人颇感意外和不悦。在演讲后的提问时段中,RMS简单带过了有与会者的疑问,只是表示说我们应该去看看那些我们看不到的东西。不知道RMS本人是否去西藏亲身体验和考察过。看来西方人普遍对西藏问题持有“成见”。另一个小插曲是,有位与会者和RMS就学校教育使用专有软件进行了“激烈”的讨论,且在教主面前痛陈中国教育专制、公民不自由甚至不能讨论自由,其论偏悖,众皆哗然,更令许多观众齐声喝止之。

拍了些照片:

星期五 一月 04, 2008

1. Resolve the dependency of gnu-gettext

In most cases, the gettext(3C) on solaris could fulfill the requirements of your application. You could make following change in configure.in (or configure.ac):

-AM_GNU_GETTEXT
+AM_GLIB_GNU_GETTEXT
+LTLIBINTL=
+AC_SUBST(LTLIBINTL)

The source package may ship with a completed gnu-gettext in its source tree (normally named 'intl'), remove it from the 'SUBDIRS' in the top-level Makefile.am. Sometimes, there is a 'm4' directory in the source tree, contains some macro files for checking gnu libraries or GCC compiler options, remove the option '-I m4' from 'ACLOCAL_AMFLAGS' in the top-level Makefile.am.

Then execute the following steps to update m4 macros and configure script:

glib-gettextize --force
aclocal $ACLOCAL_FLAGS
autoheader
libtoolize -c --automake
automake --add-missing
autoconf


Another note is, the gnu-gettext could not retrieve the localized message compiled by solaris' msgfmt (/usr/bin/msgfmt), but solaris' gettext works fine with the message compiled by gnu's msgfmt.

2. Build socket programs

You may find that the commonly used macro 'SUN_LEN' is not defined in Solaris, add the follow definition in your header file:

+#if defined(sun) && !defined(SUN_LEN)
+#define SUN_LEN(su) (sizeof(*(su)) - sizeof((su)->sun_path) + strlen((su)->sun_path))
+#endif


And before you run configure script, set the LDFLAGS as following:

export LDFLAGS=-lsocket

3. 0-sized array member in C struct

struct Foo {int bar; char data[0];};

-char data[0];
+char data[];    //change the 0-sized array to flexible array


Note, according to C99 standard, the flexible array member could only be placed in the end of a structure. And this change will not impact the layout and size of the original data structure. (Thanks tchaikov for providing the perfect solution!) While, if the 0-sized array member is not on the tail, you may have to use 'union', which requires to change the accessing code.

4. struct initialization

struct point {int x, y, z;};
- struct point x = {x:2, z:3};
+ struct point x = {.x=2, .z=3}; // c99 extension,
not supported
                                 // by sunstudio C++ compiler

5. alloca(3C) on Solaris

You need include alloca.h in your source file where you call alloca(3C).

6. wchar_t

Do NOT assume a wide char is always a UCS4 character. It's true only in UTF-8 locales on Solaris.

7. Using gcc if the source uses too much gcc extensions.

The last choice, /usr/sfw/bin/gcc. The SunStudio C compiler and gcc are compatible in ABI. But C++ compilers are different. If you are building the package on SPARC platform, GCC4SS has better performance than gcc.

This blog copyright 2009 by yongsun