星期日 六月 24, 2007

Rod Evans suggested me NOT to use "-Bdirect" for a C++ library. And thanks a lot to Stephen Clamage (ANSI C++ chairman), taught me to use "-instlib" option of CC. 

$ CC -flags
... ...
-instlib=<library>    Inhibit generation of instances already in <library>
... ...

The option causes the the compiler to scan the named library for template instances (and inline functions generated out of line), then omit generating them in the current .o file. Therefore, by specifying -instlib=./libbase.so when building libtest.so, the instantiated template "std::basic_string<unsigned int> str" would not be linked into libtest.so. So that the cyclic dependencies would not happen.

I added this option in the Makefile.am of im-scim.so, and it worked as expected.

-im_scim_la_CXXFLAGS=@GTK2_CFLAGS@
+im_scim_la_CXXFLAGS=@GTK2_CFLAGS@ \
+                   -instlib=$(top_builddir)/src/.libs/libscim-1.0.so

You could refer to [osol-tools-linking] thread for details.

星期四 六月 21, 2007

libbase.cpp:

#include <stdio.h>
#include <stdlib.h>
#include <string>

class Bar
{
public:
        std::basic_string<unsigned int> name;
        char *buf;
        Bar ()  { buf = new char[10]; buf[9] = '\0';}
        ~Bar () { delete [] buf; }

};

static Bar bar;

void __initialize_libbase ()
{
        printf ("%s\n", bar.buf);
}

libtest.cpp: 

#include <string>
#include "libbase.h"

class Foo {
public:
        std::basic_string<unsigned int> str;
        Foo () { __initialize_libbase (); }
        void test () { printf ("test ()!\n"); }
};

static Foo foo;

test.c:

#include <dlfcn.h>
#include <link.h>

int main (int argc, char **argv)
{
        dlopen ("./libtest.so", RTLD_LAZY|RTLD_GLOBAL);
}

When the main program call dlopen ("libtest.so"), it resolves the dependencies, then adds libbase.so in the initialization sequence ahead of libtest.so (the order is reversed). And the static constructors in a shared library are in the .init routine.

In our case, while it's trying to initialize the static object "bar", it finds there is a symbol "xxx::__null_string_ref_rep<xxx>" (introduced by std::basic_string<unsigned int>), then looks up this symbol in loaded libraries. Firstly, it looks up the symbol in main program, then in libc.so, then finds matched one in libtest.so, and stops to move on (actually, libbase.so also has this symbol). Then it tries to initialize libtest.so, and initializes the static object "foo". Unfortunately, the constructor of Foo calls a external function in libbase, and this function accesses the static instance "bar", which is not initialized yet (the "buf" is not allocated).

So, it cores. That's the root cause why Scim GtkIMModule makes applications core.

While, if you added a main() in libtest.cpp, and compile it to an executable program, this problem would not happen. If you changed the flag from RTLD_LAZY to RTLD_NOW, this problem would not happen either. To resolve this, add -Bdirect option when you link the library. Refer to the new "Direct Binding" chapter of "Linker and Libraries guide".

And I need thank Rod Evans, he taught me to set LD_DEBUG env variable to show the debug informations. You could refer to [osol-tools-linking] thread for details.

星期三 六月 20, 2007

When I build the scim-1.4.6 packages on Solaris Nevada build 66 with SunStudio 12 (or 11) and CBE 1.6. I found gtk-query-immodules-2.0 would core after loading im-scim.so. In dbx, it reports "no mapping at the fault address", as following:

signal SEGV (no mapping at the fault address) in __rwstd::__rb_tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,__rwstd::__select1st<std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > >::begin at line 741 in file "tree"

It's really a nightmare to tell what's wrong from the error message :( I thought it's a STL problem at the beginning.

It took me 1 day to find the real reason, which is fairly simple: the static object instance initializations are not in proper order.

In im-scim.so, there is a static object instance, and it calls a global function in libscim-1.0.so, in that global function, it accesses another static object instance which is not initialized yet. I wrote a small test program, it just uses dlopen() to load im-scim.so. From the debugger, I could see, the static constructor in im-scim.so is called before the static constructor in libscim-1.0.so. Therefore, the test program cores. I don't know if it is a bug of libCrun or ld.so?

I also asked this question in SunStudio C++ forum as well as [osol-tools-linking]. Here is an workaround patch, please note, it leaves an un-finalized heap object,  __config_repository, so your configuration may not be saved.

P.S., when I am using snv_62, the shared objects works fine, but failed on snv_66.

This blog copyright 2009 by yongsun