According to the man-page, iconv(3C) would return the number of non-identical (or non-reversible in Gnu's vocabulary), when that happens. But what's the non-identical/non-reversible conversion? Try the following example:$ echo "abc测试" | iconv -f UTF-8 -t ASCII
abc??
As you can see, the last two characters are out-of scope of ASCII, however they are legal/valid UTF-8 characters. In this case, iconv(3C) shall convert them to another two characters (non-identical-conversion character, in some cases it's '?'), and returns 2. However, Gnu's iconv raises an error (return -1, and set errno to EILSEQ) in this case. Though, according to its manpage, EILSEQ is set to indicate there is an invalid multibyte sequence in the input.
So, it not a portable way to use this method to tell if the target encoding is capable to represent the source contents. And you should not either rely on the non-identical conversion numbers. A successful conversion may return -1 and with E2BIG when output buffer is exhausted, meanwhile non-identical conversions may happen. And there is no flag in iconv(3C)/iconv_open(3C) to control whether to perform non-identical conversions or to raise an error.
In gernal, iconv(3C) is not a well-defined interface.
P.S., this post is a summary of the discussion between JDS/Evolution team and myself, to locate/isolate an iconv(3C) related bug.
星期一 九月 22, 2008
星期五 九月 19, 2008
Here are the references:
星期一 八月 18, 2008
So, you may want to use Gnu-iconv library. For Solaris 10/Nevada, you could download&install gnu-libiconv from www.sunfreeware.com, for opensolaris, you could install SUNWgnu-libiconv from pkg.opensolaris.org, but the OS.o package does not contain the header files.
You may notice that, the function symboles in gnu-libiconv, had been added the prefix of "lib", e.g., iconv_open -> libiconv_open. So, LD_PRELOAD and RUNPATH are not sufficient for replacing iconv(3) routines in libc. You need to make sure to include the "iconv.h" from gnu-libiconv.
和Solaris的iconv相比较,Gnu-iconv支持更多的编码转换,并且在某些编码转换上有更好的性能。例如,目前Solaris-iconv不支持从GB18030到UCS-2BE、UCS-4LE/BE和UTF-16LE/BE之间的转换;而GB18030<->UTF-8 (UCS-2LE)在Gnu-iconv中的转换速度,是Solaris-iconv的两倍。并且Gnu-iconv是一种星型结构(也有某些点到点的例外情况),它使用UCS-4作为中间转换的介质。而Solaris-iconv是一种点到点的结构(支持别名),因此添加一个新的编码实在是有些痛苦。
因此,你可能希望使用Gnu-iconv程序库。对Solaris 10/Nevada来说,你可以从www.sunfreeware.com下载并安装gnu-libiconv,对opensolaris你可以用pkg(1)从pkg.opensolaris.org上安装SUNWgnu-libiconv的程序包,不过这个包没有包括头文件。
你可能已经注意到了,gnu-libiconv中的符号名,都被加上了"lib"的前缀,例如iconv_open->libiconv_open。因此LD_PRELOAD和RUNPATH并不能替换libc中的iconv(3)调用。你必须确保include gnu-libiconv中的"iconv.h"头文件。
星期五 五月 30, 2008
自由软件届的精神领袖和教父,Richard Stallman今天下午在清华科技园就进行了一场演讲。应该说,对RMS所倡导的自由软件思想有了更深入的了解。其中有几点印象深刻,free is for freedom, freedom has different levels, opensource != free software, commercial software != proprietary software, most linux distribution is not entirely free anymore。不过RMS在演讲中,提及了敏感的西藏话题,令人颇感意外和不悦。在演讲后的提问时段中,RMS简单带过了有与会者的疑问,只是表示说我们应该去看看那些我们看不到的东西。不知道RMS本人是否去西藏亲身体验和考察过。看来西方人普遍对西藏问题持有“成见”。另一个小插曲是,有位与会者和RMS就学校教育使用专有软件进行了“激烈”的讨论,且在教主面前痛陈中国教育专制、公民不自由甚至不能讨论自由,其论偏悖,众皆哗然,更令许多观众齐声喝止之。
拍了些照片:星期五 一月 04, 2008
1. Resolve the dependency of gnu-gettext
In most cases, thegettext(3C) on solaris could fulfill the requirements of your application. You could make following change in configure.in (or configure.ac):-AM_GNU_GETTEXT
+AM_GLIB_GNU_GETTEXT
+LTLIBINTL=
+AC_SUBST(LTLIBINTL)
The source package may ship with a completed gnu-gettext in its source tree (normally named
'intl'), remove it from the 'SUBDIRS' in the top-level Makefile.am. Sometimes, there is a 'm4' directory in the source tree, contains some macro files for checking gnu libraries or GCC compiler options, remove the option '-I m4' from 'ACLOCAL_AMFLAGS' in the top-level Makefile.am.Then execute the following steps to update m4 macros and configure script:
glib-gettextize --force
aclocal $ACLOCAL_FLAGS
autoheader
libtoolize -c --automake
automake --add-missing
autoconfAnother note is, the gnu-gettext could not retrieve the localized message compiled by solaris' msgfmt (/usr/bin/msgfmt), but solaris' gettext works fine with the message compiled by gnu's msgfmt.
2. Build socket programs
You may find that the commonly used macro'SUN_LEN' is not defined in Solaris, add the follow definition in your header file:+#if defined(sun) && !defined(SUN_LEN)
+#define SUN_LEN(su) (sizeof(*(su)) - sizeof((su)->sun_path) + strlen((su)->sun_path))
+#endif
And before you run configure script, set the LDFLAGS as following:
export LDFLAGS=-lsocket3. 0-sized array member in C struct
struct Foo {int bar; char data[0];};
-char data[0];
+char data[]; //change the 0-sized array to flexible arrayNote, according to C99 standard, the flexible array member could only be placed in the end of a structure. And this change will not impact the layout and size of the original data structure. (Thanks tchaikov for providing the perfect solution!) While, if the 0-sized array member is not on the tail, you may have to use
'union', which requires to change the accessing code. 4. struct initialization
struct point {int x, y, z;};
- struct point x = {x:2, z:3};
+ struct point x = {.x=2, .z=3}; // c99 extension, not supported
// by sunstudio C++ compiler
5. alloca(3C) on Solaris
You need includealloca.h in your source file where you call alloca(3C).6. wchar_t
Do NOT assume a wide char is always a UCS4 character. It's true only in UTF-8 locales on Solaris.7. Using gcc if the source uses too much gcc extensions.
The last choice, /usr/sfw/bin/gcc. The SunStudio C compiler and gcc are compatible in ABI. But C++ compilers are different. If you are building the package on SPARC platform, GCC4SS has better performance than gcc.This blog copyright 2009 by yongsun

