Few days ago, I found a bug and looking into it which happens only on PCK locale. It happens only for specific Japanese character. That character is, I used, '本' at that time. Other Japanese strings can be shown correctly but it does show English strings instead of Japanese if strings has that specific character.

The root cause was that the program uses '{' ascii character for searching in C function.
The Japanese character '本' is encoded as below.

#root@mamushi: echo 本 | od -t x1 -C
0000000  96  7b  0a
          本  **  \n
On the other hand, '{' is encoded as below.
#root@mamushi: echo '{' | od -t x1 -C
0000000  7b  0a
           {  \n
           0000002
These are encoded with same 0x7b byte value. The program handles strings as byte one by one and compare each byte with '{'. In the other Japanese encoding, EUC and UTF-8, ASCII characters can be handled with no care. There is no duplicated values with ASCII in Japanese multibyte character. So bug was not found in these two locales. PCK encoding interprets second byte based on first byte value and this second byte(even if it's ascii value) shows multibyte chararacter with first byte. Then program handles strings differently from what it intended.

In general, lot's of programs for command line does not do formatting or such strings operation, so program works fine with raw byte sequences instead of wide character strings. But if it handles characters one by one, strings are needed to convert wide character for that function.

投稿されたコメント:

コメント
コメントは無効になっています。

This blog copyright 2008 by kazuhiko