Web blog of yydzero
姚延栋(Yandong Yao)的博客
归档
« 十一月 2009
星期日星期一星期二星期三星期四星期五星期六
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
今天
Click me to subscribe
Search In My Blog

链接
 

今日点击: 10

Locations of visitors to this page
« test | Main | C books and articles »
星期二 七月 18, 2006
UTF-8 Migration tool analysis

Utf8-migration-tool

utf8-migration-tool will change your default locale to the equivalent UTF-8 locale to the one you are using. It will also rename any files to the equivalent UTF-8 file name. It is one Python-based tool, You can donwload it from http://packages.ubuntulinux.org/breezy/misc/utf8-migration-tool.
advantages
disadvantages

convmv

convmv is one perl-based command line tool to converts filenames from one encoding to another. refer http://j3e.de/linux/convmv/man/ for more information.
advantages
disadvantages
there is some disscussion about this tool in http://www.linuxsir.org/bbs/printthread.php?t=168463.

Fsexam

fsexam is one gnome applications shipped in solaris 10 which can convert filename and file contents from legacy encode to UTF-8. You can get binary from http://www.opensolaris.org/os/community/desktop/communities/jds/building/.
And we will open its source code in the near future(when we finished all the legal review process).
advantages
disadvantages

Others

    There is idea to use find utils to convert file name to utf8 at http://mail.nl.linux.org/linux-utf8/2001-02/msg00108.html, but it has no implementation yet.

Conclusion

    From above analysis, we can see that even though there are several tools which can help migrate to UTF-8, but each tool has some disadvantages. One perfect tool needed for UTF-8 migration.

    Then what features needed before one tool becomes one 'perfect' tool?

    Here is the feedback i got currently:

    Currently we have plan to enhance fsexam to include these features. If you have any thoughts, please add comment freely!  Or it  will be great if you have other tool to share.
Posted at 02:57上午 七月 18, 2006 by Yaodong Zero Yao in solaris  |  评论[8]

评论:

I don't see why you might want a tool like convmv to convert file content. File content is almost never a thing one wants to convert. What will happen if you convert a zip file (OpenOffice etc.), a jpg file from iso8859-1 to utf-8 ? That will be fun. Editors like Emacs and Vim can take care of content charset by themself.

发表于 Bjoern Jacke 在 2006年09月27日, 07:34 上午 CDT #

I don't see why you might want a tool like convmv to convert file content. File content is almost never a thing one wants to convert. What will happen if you convert a zip file (OpenOffice etc.), a jpg file from iso8859-1 to utf-8 ? That will be fun. Editors like Emacs and Vim can take care of content charset by themself.

发表于 Bjoern Jacke 在 2006年09月27日, 07:36 上午 CDT #

Yandong, I checked the site http://www.opensolaris.org/os/community/desktop/communities/jds/building/. could not locate the source file for Fsexam tool, do you have any update on that? Thanks

发表于 Xin Guo 在 2006年09月29日, 01:25 下午 CDT #

Hi Bjoern, actually i don't want convmv to convert content, I just compare these tools:). And for professional user, there is some way to handle file content such as use iconv directly, but for many novice users, it will be great if he want to convert the file content from legacy encoding to UTF-8.

发表于 Yandong Yao 在 2006年09月29日, 08:51 下午 CDT #

Hi Xin, Sorry, i make a mistake. Currently you can only donwload binary from that link. And we are in the processing to open our source code, and i think you can get it in the very near futuer.

发表于 Yandong Yao 在 2006年09月29日, 08:53 下午 CDT #

Hi Yandong, How are you? Can you provide a Linux version of auto_ef command? I don't need to have an exact one as auto_ef, I just want to detect whether a document is a CJK doc or not. Can you provide the source code so that I can compile? The reason I need this is to use it to bypass CJK mail from my antispam solution. My antispam solution doesn't work well with CJK. Or do you have a word breaking algorithm for CJK. If so, I can utilize it in my antispam solution so that it can support CJK as well. Thanks, Shu Liu

发表于 Shu Liu 在 2006年11月22日, 01:58 下午 CST #

Hi shu, auto_ef will be open sourced soon as part of g11n common workspace at opensolaris.org. I will inform you once it open.

发表于 Yandong Yao 在 2006年11月22日, 07:44 下午 CST #

Any update on auto_ef being ported to Linux?

发表于 Lloyd Budd 在 2008年07月15日, 04:44 下午 CDT #

发表一条评论:
  • HTML语法: 禁用