Utf8-migration-tool
utf8-migration-tool will change your
default locale to the equivalent UTF-8
locale to the one you are using. It will also
rename any files
to
the equivalent UTF-8 file name. It is one Python-based tool, You can
donwload it from
http://packages.ubuntulinux.org/breezy/misc/utf8-migration-tool.
advantages
- Can change you locale enviroment to UTF-8. But there is some bug:
after run this tool, my language in ~/.dmrc become zh_CN.GB18030.UTF-8.
- Convert all the files under your home directory
- GUI
disadvantages
- Hard to install from source tarball.
- User has very little control on it: such as only convert home
directory, can't select files to convert, can't set the legacy encoding
name, can't convert forcefully.
- Not stable, I run on ubuntu to convert some zh_CN.GB18030 files
to UTF-8, but got error: "OSError: [Errno 2] No such file or directory".
- No Document, No Localization.
- Can't convert file content, of course you can write script to use
iconv to convert file content. But it is not a good way to let every
user to write one script.
convmv
convmv is one perl-based command line tool to converts filenames from
one encoding to another. refer http://j3e.de/linux/convmv/man/ for more
information.
advantages
- Use has more option to control the conversion, such as recursive conversion, interactive conversion.
- Can handle symbolic files.
- Can select between normalization form
C (NFC) and normalization form D (NFD).
- Can Undo what you have done.
disadvantages
- No GUI, a number of options can give advanced user more control, while too much options may make user scared also.
- Can't convert file content.
there is some disscussion about this tool in http://www.linuxsir.org/bbs/printthread.php?t=168463.
Fsexam
fsexam is one gnome applications shipped in solaris 10 which can
convert filename and file contents from legacy encode to UTF-8. You can
get binary from
http://www.opensolaris.org/os/community/desktop/communities/jds/building/.
And we will open its source code in the near future(when we finished all the legal review process).
advantages
- Has a friendly GUI, user can control conversion preference in some extent.
- Can undo what you have done.
- Has a report pane from which you can see what you have done.
- Provide encoding list and user can select from several
candidates. It is very useful when he doesn't know what encoding
filename/filecontent is.
- Can handle file content, could preview file content.
- Support automatically converstion and recursive conversion.
- Good localization, good Online help.
disadvantages
- No easy way to add encoding into encoding list, though we can do this using gconf-editor.
- Can't convert forcefully when fsexam think text is utf8 already.
- Can't do good batch mode conversion as it is not a command line tool.
- No way to try conversion before real conversion.
- Can not handle symbolic file.
Others
There is idea to use find utils to convert file name to utf8 at
http://mail.nl.linux.org/linux-utf8/2001-02/msg00108.html, but it has
no implementation yet.
Conclusion
From above analysis, we can see that even though there are
several tools which can help migrate to UTF-8, but each tool has some
disadvantages. One perfect tool needed for UTF-8 migration.
Then what features needed before one tool becomes one 'perfect' tool?
Here is the feedback i got currently:
- Advanced user want to use command line tool, while normal user
more like GUI tools. So it will be great if we provide two tools for
this and they can share the same code base.
- Can add/remove encoding from encoding list.
- Use some library such as auto_ef(solaris lib) to detect encoding automatically.
- Enable/Disable UTF-8 validation.
- Undo when you got error.
- Log file needed for post-mortem analysis.
- Support more file type such as tar.gz, zip, mp3.
- Support symbolic file.
- Dry run before real run
- Support for better filter so that user can apply conversion on selected files.
- Enable/Disable hidden file support.
- Convert both file name and file content.
Currently we have plan to enhance fsexam to include these features.
If you have any thoughts, please add comment freely! Or it will be
great if you have other tool to share.
发表于 Bjoern Jacke 在 2006年09月27日, 07:34 上午 CDT #
发表于 Bjoern Jacke 在 2006年09月27日, 07:36 上午 CDT #
发表于 Xin Guo 在 2006年09月29日, 01:25 下午 CDT #
发表于 Yandong Yao 在 2006年09月29日, 08:51 下午 CDT #
发表于 Yandong Yao 在 2006年09月29日, 08:53 下午 CDT #
发表于 Shu Liu 在 2006年11月22日, 01:58 下午 CST #
发表于 Yandong Yao 在 2006年11月22日, 07:44 下午 CST #
Any update on auto_ef being ported to Linux?
发表于 Lloyd Budd 在 2008年07月15日, 04:44 下午 CDT #