Osamu Sayama's Weblog
Reduce locale shared object size
Current shared object size for UTF-8 locale is about 2Mbyte per a locale. This size is increasing because new unicode standard introduces new characters whenever it is released.
% ls -lah /usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3 /usr/lib/locale/fr_FR.UTF-8/fr_FR.UTF-8.so.3 /usr/lib/locale/fr_CA.UTF-8/fr_CA.UTF-8.so.3
-r-xr-xr-x 1 root bin 2.4M Sep 19 04:31 /usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3
-r-xr-xr-x 1 root bin 1.7M Aug 7 19:42 /usr/lib/locale/fr_CA.UTF-8/fr_CA.UTF-8.so.3
-r-xr-xr-x 1 root bin 1.7M Aug 7 19:42 /usr/lib/locale/fr_FR.UTF-8/fr_FR.UTF-8.so.3
'locale -a|grep -i utf-8 | wc' shows 108 locales on nevada 100 with full locale support and more than 400MByte is used for UTF-8 locale shared objects. This was not so problem on the installed system (however, when creating a patch for the locale, huge size of patch will be created...). However, it is a problem for OpenSolaris Live CD because the size is more limited. So we should try to reduce this size as possible. The root cause of this size is that the weight tables in _LC_collate_t lc_coll (ct_wgts* and subs_map) and qmask index table (qifx) in _LC_ctype_t lc_ctype. Since many of UTF-8 locales are sharing LC_CTYPE and LC_COLLATE definition between locales (ex, fr_FR.UTF-8 and fr_CA.UTF-8), spliting these tables from locale shared object and creates new shared object for ctype and collation tables can reduce the total disk size dramatically. It looks that the table size of LC_CTYPE and LC_COLLATE consists of 99% of the total size and 90% is LC_COLLATE on UTF-8 locale.
- en_US.UTF-8
[68] | 982856| 974848|OBJT |LOCL |0 |11 |ct_wgts0
[67] | 8008| 974848|OBJT |LOCL |0 |11 |ct_wgts1
[72] | 1958144| 243713|OBJT |LOCL |0 |11 |subs_map
[81] | 2202848| 243456|OBJT |LOCL |0 |11 |qidx
- fr_FR.UTF-8 and fr_CA.UTF-8
[72] | 1120768| 607132|OBJT |LOCL |0 |11 |weightstr
[69] | 793096| 262136|OBJT |LOCL |0 |11 |ct_wgts0
[68] | 530960| 262136|OBJT |LOCL |0 |11 |ct_wgts1
[67] | 268824| 262136|OBJT |LOCL |0 |11 |ct_wgts2
[66] | 6688| 262136|OBJT |LOCL |0 |11 |ct_wgts3
[71] | 1055232| 65535|OBJT |LOCL |0 |11 |subs_map
[80] | 1728456| 65278|OBJT |LOCL |0 |11 |qidx
As a trial, I splited fr_FR.UTF-8.c, which is created by localedef command, to 3 parts. CLDR.UTF-8-ctype.c, CLDR.fr.UTF-8-collate.c and fr_FR.UTF-8.c. Then compiled and linked like the following.
% cc -xO3 -K PIC -G -Xa -h CLDR.UTF-8-ctype.so.3 -o CLDR.UTF-8-ctype.so.3 ./CLDR.UTF-8-ctype.c
% cc -xO3 -K PIC -G -Xa -h CLDR.fr.UTF-8-collate.so.3 -o CLDR.fr.UTF-8-collate.so.3 ./CLDR.fr.UTF-8-collate.c
% cc -xO3 -K PIC -G -Xa -h fr_FR.UTF-8.so.3 -o fr_FR.UTF-8.so.3 ./fr_FR.UTF-8.c /usr/lib/locale/common/methods_unicode.so.3 ./CLDR.UTF-8-ctype.so.3 ./CLDR.fr.UTF-8-collate.so.3 -R /usr/lib/locale/common
Then copy CLDR.UTF-8-ctype.so.3 and CLDR.fr.UTF-8-collate.so.3 to /usr/lib/locale/common, copy fr_FR.UTF-8.so.3 to /usr/lib/locale/fr_FR.UTF-8. Here is modified source.
% ldd /usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3 /usr/lib/locale/fr_FR.UTF-8/fr_FR.UTF-8.so.3 /usr/lib/locale/fr_CA.UTF-8/fr_CA.UTF-8.so.3
/usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3:
libc.so.1 => /lib/libc.so.1
/usr/lib/locale/common/methods_unicode.so.3
en_US.UTF-8-ctype.so.3 => /usr/lib/locale/common/en_US.UTF-8-ctype.so.3
en_US.UTF-8-collate.so.3 => /usr/lib/locale/common/en_US.UTF-8-collate.so.3
libm.so.2 => /lib/libm.so.2
/usr/lib/locale/fr_FR.UTF-8/fr_FR.UTF-8.so.3:
libc.so.1 => /lib/libc.so.1
/usr/lib/locale/common/methods_unicode.so.3
CLDR.UTF-8-ctype.so.3 => /usr/lib/locale/common/CLDR.UTF-8-ctype.so.3
CLDR.fr.UTF-8-collate.so.3 => /usr/lib/locale/common/CLDR.fr.UTF-8-collate.so.3
libm.so.2 => /lib/libm.so.2
/usr/lib/locale/fr_CA.UTF-8/fr_CA.UTF-8.so.3:
libc.so.1 => /lib/libc.so.1
/usr/lib/locale/common/methods_unicode.so.3
CLDR.UTF-8-ctype.so.3 => /usr/lib/locale/common/CLDR.UTF-8-ctype.so.3
CLDR.fr.UTF-8-collate.so.3 => /usr/lib/locale/common/CLDR.fr.UTF-8-collate.so.3
libm.so.2 => /lib/libm.so.2% ls -lah /usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3 /usr/lib/locale/fr_FR.UTF-8/fr_FR.UTF-8.so.3 /usr/lib/locale/fr_CA.UTF-8/fr_CA.UTF-8.so.3
-rwxr-xr-x 1 root root 44K Oct 17 15:19 /usr/lib/locale/en_US.UTF-8/en_US.UTF-8.so.3
-rwxr-xr-x 1 root root 14K Oct 19 08:59 /usr/lib/locale/fr_CA.UTF-8/fr_CA.UTF-8.so.3
-rwxr-xr-x 1 root root 14K Oct 17 18:58 /usr/lib/locale/fr_FR.UTF-8/fr_FR.UTF-8.so.3
% ls -lah /usr/lib/locale/common/CLDR.* /usr/lib/locale/common/en_US.UTF-8-c*
-rwxr-xr-x 1 root root 90K Oct 18 15:09 /usr/lib/locale/common/CLDR.UTF-8-ctype.so.3
-rwxr-xr-x 1 root root 1.6M Oct 18 15:09 /usr/lib/locale/common/CLDR.fr.UTF-8-collate.so.3
-rwxr-xr-x 1 root root 2.1M Oct 17 15:18 /usr/lib/locale/common/en_US.UTF-8-collate.so.3
-rwxr-xr-x 1 root root 243K Oct 17 15:18 /usr/lib/locale/common/en_US.UTF-8-ctype.so.3
This simple modification works fine with current libc (no modification is needed in libc !) and meet our requirement. The number of current UTF-8 collation types are about 15 and ctype types are 2. So I expect that this change will
reduce the size to 1/6 ((15 collation types + 2 ctype types) / 100 UTF-8 locales)... Now I'm thinking that localedef should add the option to produce 3 shared objects. I will try later...
Posted at 10:15午後 10 20, 2008 by sayama in English | 投稿されたコメント[0]