Web blog of yydzero
姚延栋(Yandong Yao)的博客
归档
« 十一月 2009
星期日星期一星期二星期三星期四星期五星期六
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
今天
Click me to subscribe
Search In My Blog

链接
 

今日点击: 36

Locations of visitors to this page
« Font selection widge... | Main | Display Name of... »
星期六 十二月 31, 2005
Encoding Internal in Glib

1. Introduction

Glib use utf8 as internal encoding and thus all gtk+/gnome application use utf8 to represent text, so all text you got from widgets are utf8. In order to use legacy encoding, you need do some converstion.

Before converstion between two encoding, you should know what is encoding it is first. Of course we can guess the encoding of text, but unfortunately there is not perfect way to determine the encoding of one segment of text throught program. So many applications provide one encoding list to user and let user make the decision.

Filename handling is especially hard, because there is no indication whatsoever what character encoding a filename is in (it might have been created when the user was using a different locale, so filename encoding is basically unreliable and broken).

Glib has no idea to get the filename encoding either, so it let user to config filename encoding through enviroment: G_FILENAME_ENCODING and G_BROKEN_FILENAME. By default, Glib assumes that filenames on disk are in UTF-8 encoding, and through these enviroment variables, user can instruct Glib to use that particular encoding for filenames raterh than UTF-8.

2. Get Encoding Using Glib Function

2.1 g_get_charset

g_get_charset will get the character set from the C runtime on the current locale, that is to say g_get_charset will get the current locale encoding if you call setlocale (LC_ALL, "") in your applications; if you call setlocale (LC_ALL, "zh_CN.GB18030") in apps, then the later encoding of C runtime will be GB18030, thus g_get_charset will got GB18030.

2.2 g_get_filename_charsets

g_get_filename_charsets determine the prefered character sets(encoding maybe more accurate) used for filenames. The firest character set from the character sets is treated as filename encoding by Glib, the subsequent character sets are used when trying to generate a displayable respresentation of a filename, see g_filename_display_name().

On Unix, the character sets are determined by consulting the environment variables G_FILENAME_ENCODING and G_BROKEN_FILENAMES. On Windows, the character set used in the GLib API is always UTF-8 and said environment variables have no effect.

G_FILENAME_ENCODING may be set to a comma-separated list of character set names. The special token "@locale" is taken to mean the character set for the current locale. If G_FILENAME_ENCODING is not set, but G_BROKEN_FILENAMES is, the character set of the current locale is taken as the filename encoding. If neither environment variable is set, UTF-8 is taken as the filename encoding, but the character set of the current locale is also put in the list of encodings.

Notes:

3. Conversion between C Runtime encoding and UTF-8

3.1 g_locale_to_utf8

The string parameter of  g_locale_to_utf8 is a string in the encoding of the current locale  of applications(C Runtime locale). On  Windows this means the system codepage.

Converts a string which is in the encoding used for strings by the C runtime (usually the same as that used by the operating system, cause most of applications use setlocale (LC_ALL, "") to set it locale) in the current locale into a UTF-8 string.

eg: If the current locale is gb18030, while you set your applications using setlocale (LC_ALL, "zh_TW.BIG5), then the C runtime encoding is BIG5, while the OS encoding is gb18030.

If the current C runtime encoding is UTF-8, then duplicate simply.

3.2 g_locale_from_utf8

Converts a string from UTF-8 to the encoding used for strings by the C runtime (usually the same as that used by the operating system) in the current locale.

4. Conversion between Glib filename encoding and UTF-8

4.1 g_filename_to_utf8

Converts a string which is in the encoding used by GLib for filenames into a UTF-8 string. Filename encoding is the first encoding of list returned by g_get_filename_charset().

4.2 g_filename_from_utf8

Converts a string from UTF-8 to the encoding used for filenames. Filename encoding is the first encoding of list returned by g_get_filename_charset().

4.3 g_filename_from_uri

Converts an escaped ASCII-encoded URI to a local filename in the encoding used for filenames.

4.4 g_filename_to_uri

Converts an absolute filename to an escaped ASCII-encoded URI.

5. Display Name

5.1 g_filename_display_name

Converts a filename into a valid UTF-8 string. The conversion is not necessarily reversible, so you should keep the original around and use the return value of this function only for display purposes. Unlike g_filename_to_utf8(), the result is guaranteed to be non-NULL even if the filename actually isn't in the GLib file name encoding(always return one name for display purpose).

If you know the whole pathname of the file you should use g_filename_display_basename(), since that allows location-based translation of filenames.

Parameters:

  1.  filename: a pathname hopefully in the GLib file name encoding
  2. Returns : a newly allocated string containing a rendition of the filename in valid UTF-8

5.2 g_filename_display_basename

Returns the display basename for the particular filename, guaranteed to be valid UTF-8. The display name might not be identical to the filename, for instance there might be problems converting it to UTF-8, and some files can be translated in the display

You must pass the whole absolute pathname to this functions so that translation of well known locations can be done.

This function is preferred over g_filename_display_name() if you know the whole path, as it allows translation.

Parameters:

Notes:

6. Lower Level Function

glib use native iconv routines or libiconv if has no native iconv implementation to do encoding converstion

g_iconv_open (to_codeset, from_codeset) will try codeset alias, so that provide more powerful conversion. (will learn g_charset_get_aliases() later)

7. Example

Please got simple example from http://blogs.sun.com/roller/resources/yydzero/main.c

8. References

Posted at 03:22上午 十二月 31, 2005 by Yaodong Zero Yao in gnome  |  评论[0]

评论:

发表一条评论:
  • HTML语法: 禁用