You python interpreter maybe compiled with --enable-unocide=ucs2, so that the built-in unichr(i) function will raise an exception, if the given value is larger than 0xFFFF. While the 'ucs2' here actually means utf16, which is a variable length encoding. And you need a simple function to convert utf32/ucs4 to utf16. Here is the example code snippet,

def ucs4chr(codepoint):
    try:
        return unichr(codepoint)
    except ValueError:
        hi, lo = divmod (codepoint-0x10000, 0x400)
        return unichr(0xd800+hi) + unichr(0xdc00+lo)

def ucs4ord(str):
    if len(str)==1:
        return ord(str)
    if len(str)==2:
        hi, lo = ord(str[0])-0xd800, ord(str[1])-0xdc00
        return hi*0x400+0x10000
    raise TypeError("ucs4ord() expected a valid ucs4 character")

评论:

发表一条评论:
该日志评论功能被禁用了。

This blog copyright 2009 by yongsun