Web blog of yydzero
姚延栋(Yandong Yao)的博客
归档
« 七月 2009
星期日星期一星期二星期三星期四星期五星期六
   
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
 
       
今天
Click me to subscribe
Search In My Blog

链接
 

今日点击: 322

Locations of visitors to this page
« Encoding Internal in... | Main | The encoding of... »
星期三 一月 04, 2006
Display Name of Filename in Glib

1. Introduction

glib always use utf8 as internal encoding, such as all text you see on any widget are utf8, so if you need display one filename in gtklable or gtkbutton, you need convert filename to utf8 first.

glib provide one convenient function g_filename_display_name() to get the utf8 display name of one filename which will convert filename from one encoding to UTF-8.

And problem appears: How glib know the encoding of one filename? The answer is that glib don't know the encoding of filename, user or system administrator tell glib the encoding of filename. And glib do provide one way to let user tell him the filename encoding:  G_FILENAME_ENCODING and G_BROKEN_FILENAMES enviroments.
g_get_filename_charsets() encapsulate above operations and return the glib encoding list.

2. Current Implementation of g_filename_display_name()

Try each encoding in encoding list which is returned by g_get_filename_charsets() and convert it to UTF-8 until the convert success. If still failure, then call make_valid_utf8() which will call g_utf8_validate() and return one utf8 eventually.

Notes:
  1. 'Try' means that glib has no accurate way to get the encoding of filename cause it may be created on any locale and maybe in any encoding.
  2. 'Convert success' means the converted result is not NULL, and thus maybe still incorrect, such as for one gb18030 encoded string, it may return non-NULL value when convert it from Big5 to UTF-8( treat the gb18030 string as big5 ).
  3. The ability of g_utf8_validate() is limited, it can only determine whether the string is utf8 characters stream or not, but can't ensure that the string (especially filename) is utf8 encoded even it pass the validation.

3. Examples

The common situation is that G_FILENAME_ENCODING and G_BROKEN_FILENAMES are both unset or set to '@locale,UTF-8' and 'yes' seperately.

3.1 Set to '@locale,UTF-8' and 'yes'

The Glib filename encodiing list is '@locale,UTF-8' and the glib filename encoding is locale encoding. eg on zh_CN.GB18030 locale, the glib filename encoding list is 'GB18030,UTF-8' and the filename encoding is GB18030(used in g_filename_to_utf8). If the actually filename encoding on disk is:
    1) GB18030: All works fine, glib will try convert the string from gb18030 to utf8 first, and return correct result.
    2) UTF-8: glib will first try convert the utf-8 string from gb18030 to utf8, it will return NULL (In most cases it return null more accurately, but I am not sure whether it will always return NULL or not), then try convert it from utf8 to utf8 and got correct result. So glib can display utf-8 on-disk encoded filename correctly.
    3) Other encoding, such as Big5: glib still try convert this big5 encoded filename from gb18030 to utf8 first, and usually the conversion will return one result but it is not correct, normally will be garbled characters for end user. If the converstion failed, the try convert this big5 string from utf8 to utf8, commonly it will failed again. Then g_filename_display_filename will call make_valid_utf8 to return one utf8 filename.

3.2 Both unset

The glib filename encoding list is 'UTF-8,@locale' and assume the filename is utf8 by default in this situation. eg: on zh_CN.GB18030 locale, the filename encoding list is 'UTF-8,GB18030', and filename encoding is UTF-8. If the actually filename encoding on disk is:
    1) GB18030: First glib will call g_utf8_validate to validate that whether this gb18030 encoded string is valid utf8 or not. Commonly g_utf8_validate will fail, but I am not sure whether there exist one gb18030 string which is valid utf8 also or not. Then try convert this gb18030 string from gb18030 to utf8, and the result is correct, so glib can display it correctly.
    2) UTF-8: it will pass g_utf8_validate and duplicate it and return directly.
    3) Other encoding, such as big5: like gb18030, it will failed on g_utf8_validate and then glib try convert this big5 filename from gb18030 to utf8. commonly it will return one garbled string, if it return NULL, then call make_valid_utf8 to return one valid utf8 string.

4. Advantage and Disadvantage of display name

4.1 Advantage

Using display name, glib can display current locale encoded filenames and utf8 encoded filenames correctly, reduce garbled characters to users and make gnome more friendly to end users.

As most of users and developers use just one locale most time and use two locale at most(one popular legacy locale such as zh_CN.gb18030 and UTF-8 locale), the above strategy is very well for these situation.

4.2 Disadvantage

As glib can display multiple encoded filename correctly at the same time, gnome program (such as gtk+,libgnomeui) need handle these filename(not display name), if deal with it incorrectly, problem will appear, and in fact some of bugs are caused by this. So need programmer give more attention on this, especially handle filename in gconf(gconf always store utf8 string internal), please see below sample:

if (g_utf8_validate (item->filename, \-1, NULL)) {
gconf_client_set_string (capplet->client, WP_FILE_KEY,
item->filename, NULL);
} else {
utf8_filename = g_locale_to_utf8 (item->filename, \-1, NULL, NULL, NULL);
gconf_client_set_string (capplet->client, WP_FILE_KEY,
utf8_filename, NULL);
g_free (utf8_filename);
}

if (g_utf8_validate (item->fileinfo->name, \-1, NULL))
item->name = g_strdup (item->fileinfo->name);
else
item->name = g_locale_to_utf8 (item->fileinfo->name, \-1, NULL, NULL, NULL);
item->options = gconf_client_get_string (capplet->client,
WP_OPTIONS_KEY,
NULL);

even if programmer handle these problem correctly, there still exist problem, because
Posted at 02:26上午 一月 04, 2006 by Yaodong Zero Yao in gnome  |  评论[3]

评论:

Could you send me a font software similar to TAXI receipt or Star UX500 font??? Tks indeed. Jimmy

发表于 jimmy 在 2007年01月04日, 11:33 上午 CST #

Could you make me font same as TAXI receipt??? dot matrix or laser printable. Tks, indeed. Jimmy

发表于 jimmy 在 2007年01月04日, 11:38 上午 CST #

Hi Jimmy, sorry, i have no such software, so couldn't help you:(

发表于 yydzero 在 2007年01月04日, 07:39 下午 CST #

发表一条评论:
  • HTML语法: 禁用