Few days ago, I found a bug and looking into it which happens only on
PCK locale. It happens only for specific Japanese character.
That character is, I used, '本' at that time. Other Japanese strings can
be shown correctly but it does show English strings instead of Japanese
if strings has that specific character.
The root cause was that the program uses '{' ascii character for searching in
C function.
The Japanese character '本' is encoded as below.
#root@mamushi: echo 本 | od -t x1 -C
0000000 96 7b 0a
本 ** \n
On the other hand, '{' is encoded as below.
#root@mamushi: echo '{' | od -t x1 -C
0000000 7b 0a
{ \n
0000002
These are encoded with same 0x7b byte value. The program handles strings as
byte one by one and compare each byte with '{'. In the other Japanese
encoding, EUC and UTF-8, ASCII characters can be handled with no care.
There is no duplicated values with ASCII in Japanese multibyte character.
So bug was not found in these two locales. PCK encoding interprets
second byte based on first byte value and this second byte(even if it's
ascii value) shows multibyte chararacter with first byte. Then
program handles
strings differently from what it intended.
In general, lot's of programs for command line does not do formatting or
such strings operation, so program works fine with raw byte sequences
instead of wide character strings.
But if it handles characters one by one, strings are needed to convert wide
character for that function.