星期六 十二月 31, 2005
星期六 十二月 31, 2005
Glib use utf8 as internal encoding and thus all gtk+/gnome application use utf8 to represent text, so all text you got from widgets are utf8. In order to use legacy encoding, you need do some converstion.
Before converstion between two encoding, you should know what is encoding it is first. Of course we can guess the encoding of text, but unfortunately there is not perfect way to determine the encoding of one segment of text throught program. So many applications provide one encoding list to user and let user make the decision.
Filename handling is especially hard, because there is no indication whatsoever what character encoding a filename is in (it might have been created when the user was using a different locale, so filename encoding is basically unreliable and broken).
Glib has no idea to get the filename encoding either, so it let user to config filename encoding through enviroment: G_FILENAME_ENCODING and G_BROKEN_FILENAME. By default, Glib assumes that filenames on disk are in UTF-8 encoding, and through these enviroment variables, user can instruct Glib to use that particular encoding for filenames raterh than UTF-8.
g_get_charset will get the character set from the C runtime on the current locale, that is to say g_get_charset will get the current locale encoding if you call setlocale (LC_ALL, "") in your applications; if you call setlocale (LC_ALL, "zh_CN.GB18030") in apps, then the later encoding of C runtime will be GB18030, thus g_get_charset will got GB18030.
g_get_filename_charsets determine the prefered character sets(encoding maybe more accurate) used for filenames. The firest character set from the character sets is treated as filename encoding by Glib, the subsequent character sets are used when trying to generate a displayable respresentation of a filename, see g_filename_display_name().
On Unix, the character sets are determined by consulting the environment variables G_FILENAME_ENCODING and G_BROKEN_FILENAMES. On Windows, the character set used in the GLib API is always UTF-8 and said environment variables have no effect.
G_FILENAME_ENCODING may be set to a comma-separated list of character set names. The special token "@locale" is taken to mean the character set for the current locale. If G_FILENAME_ENCODING is not set, but G_BROKEN_FILENAMES is, the character set of the current locale is taken as the filename encoding. If neither environment variable is set, UTF-8 is taken as the filename encoding, but the character set of the current locale is also put in the list of encodings.
Notes:
The string parameter of g_locale_to_utf8 is a string in the encoding of the current locale of applications(C Runtime locale). On Windows this means the system codepage.
Converts a string which is in the encoding used for strings by the C runtime (usually the same as that used by the operating system, cause most of applications use setlocale (LC_ALL, "") to set it locale) in the current locale into a UTF-8 string.
eg: If the current locale is gb18030, while you set your applications using setlocale (LC_ALL, "zh_TW.BIG5), then the C runtime encoding is BIG5, while the OS encoding is gb18030.
If the current C runtime encoding is UTF-8, then duplicate simply.
Converts a string from UTF-8 to the encoding used for strings by the C runtime (usually the same as that used by the operating system) in the current locale.
Converts a string which is in the encoding used by GLib for filenames into a UTF-8 string. Filename encoding is the first encoding of list returned by g_get_filename_charset().
Converts a string from UTF-8 to the encoding used for filenames. Filename encoding is the first encoding of list returned by g_get_filename_charset().
Converts an escaped ASCII-encoded URI to a local filename in the encoding used for filenames.
Converts an absolute filename to an escaped ASCII-encoded URI.
Converts a filename into a valid UTF-8 string. The conversion is not necessarily reversible, so you should keep the original around and use the return value of this function only for display purposes. Unlike g_filename_to_utf8(), the result is guaranteed to be non-NULL even if the filename actually isn't in the GLib file name encoding(always return one name for display purpose).
If you know the whole pathname of the file you should use g_filename_display_basename(), since that allows location-based translation of filenames.
Parameters:
Returns the display basename for the particular filename, guaranteed to be valid UTF-8. The display name might not be identical to the filename, for instance there might be problems converting it to UTF-8, and some files can be translated in the display
You must pass the whole absolute pathname to this functions so that translation of well known locations can be done.
This function is preferred over g_filename_display_name() if you know the whole path, as it allows translation.
Parameters:
Notes:
glib use native iconv routines or libiconv if has no native iconv implementation to do encoding converstion
g_iconv_open (to_codeset, from_codeset) will try codeset alias, so that provide more powerful conversion. (will learn g_charset_get_aliases() later)
Please got simple example from http://blogs.sun.com/roller/resources/yydzero/main.c