星期四 六月 21, 2007

libbase.cpp:

#include <stdio.h>
#include <stdlib.h>
#include <string>

class Bar
{
public:
        std::basic_string<unsigned int> name;
        char *buf;
        Bar ()  { buf = new char[10]; buf[9] = '\0';}
        ~Bar () { delete [] buf; }

};

static Bar bar;

void __initialize_libbase ()
{
        printf ("%s\n", bar.buf);
}

libtest.cpp: 

#include <string>
#include "libbase.h"

class Foo {
public:
        std::basic_string<unsigned int> str;
        Foo () { __initialize_libbase (); }
        void test () { printf ("test ()!\n"); }
};

static Foo foo;

test.c:

#include <dlfcn.h>
#include <link.h>

int main (int argc, char **argv)
{
        dlopen ("./libtest.so", RTLD_LAZY|RTLD_GLOBAL);
}

When the main program call dlopen ("libtest.so"), it resolves the dependencies, then adds libbase.so in the initialization sequence ahead of libtest.so (the order is reversed). And the static constructors in a shared library are in the .init routine.

In our case, while it's trying to initialize the static object "bar", it finds there is a symbol "xxx::__null_string_ref_rep<xxx>" (introduced by std::basic_string<unsigned int>), then looks up this symbol in loaded libraries. Firstly, it looks up the symbol in main program, then in libc.so, then finds matched one in libtest.so, and stops to move on (actually, libbase.so also has this symbol). Then it tries to initialize libtest.so, and initializes the static object "foo". Unfortunately, the constructor of Foo calls a external function in libbase, and this function accesses the static instance "bar", which is not initialized yet (the "buf" is not allocated).

So, it cores. That's the root cause why Scim GtkIMModule makes applications core.

While, if you added a main() in libtest.cpp, and compile it to an executable program, this problem would not happen. If you changed the flag from RTLD_LAZY to RTLD_NOW, this problem would not happen either. To resolve this, add -Bdirect option when you link the library. Refer to the new "Direct Binding" chapter of "Linker and Libraries guide".

And I need thank Rod Evans, he taught me to set LD_DEBUG env variable to show the debug informations. You could refer to [osol-tools-linking] thread for details.

星期三 六月 20, 2007

When I build the scim-1.4.6 packages on Solaris Nevada build 66 with SunStudio 12 (or 11) and CBE 1.6. I found gtk-query-immodules-2.0 would core after loading im-scim.so. In dbx, it reports "no mapping at the fault address", as following:

signal SEGV (no mapping at the fault address) in __rwstd::__rb_tree<std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,__rwstd::__select1st<std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::less<std::basic_string<char,std::char_traits<char>,std::allocator<char> > >,std::allocator<std::pair<const std::basic_string<char,std::char_traits<char>,std::allocator<char> >,std::basic_string<char,std::char_traits<char>,std::allocator<char> > > > >::begin at line 741 in file "tree"

It's really a nightmare to tell what's wrong from the error message :( I thought it's a STL problem at the beginning.

It took me 1 day to find the real reason, which is fairly simple: the static object instance initializations are not in proper order.

In im-scim.so, there is a static object instance, and it calls a global function in libscim-1.0.so, in that global function, it accesses another static object instance which is not initialized yet. I wrote a small test program, it just uses dlopen() to load im-scim.so. From the debugger, I could see, the static constructor in im-scim.so is called before the static constructor in libscim-1.0.so. Therefore, the test program cores. I don't know if it is a bug of libCrun or ld.so?

I also asked this question in SunStudio C++ forum as well as [osol-tools-linking]. Here is an workaround patch, please note, it leaves an un-finalized heap object,  __config_repository, so your configuration may not be saved.

P.S., when I am using snv_62, the shared objects works fine, but failed on snv_66.

星期二 六月 05, 2007

SunStudio 12 was released, it resolved my 2nd problem listed in my blog "2 Tips of C++ Programming with const", but VLA in C++ is still not supported yet :(

星期五 七月 28, 2006

1. Implicit type conversion and copy constructor

If you want to define your own copy constructor, you should define like that:

class Foo {
public:
  Foo (const Foo& obj) {}
};

Pay attention to the const in the parameter declaration. If your copy constructor definition loses const (Sun's C++ compiler does NOT complain), your implicit type conversion member will fail to compile, though the copy constructor is not used in type conversion.

class Foo {
public:
  Foo (Foo& obj) {}
  Foo (Bar *obj) {}
};

Foo test () { return new Bar(); }

On linux with g++, you will get the error messages as following:
  error: no matching function for call to ‘Foo::Foo(Foo)’
  ... ...
  error:   initializing temporary from result of ‘Foo::Foo(Bar*)’

On solaris with sun studio c++ compiler, you will get:
  Error: Cannot use Bar* to initialize Foo without "Foo::Foo(const Foo&)"

2. The reference of pointer and "this"

Look at this piece of code:

void test (Foo* &obj) {}
//void test (Foo* const &obj) {}

class Foo {
public:
  void bar () { test (this); }
};

On linux with g++, you will get error messages as following:
  invalid initialization of non-const reference of type ‘Foo*&’ from a temporary of type ‘Foo* const’
  in passing argument 1 of ‘void test(Foo*&)’

On solaris with sun studio c++ compiler, you will get:
  Error: Formal argument obj of type Foo*& in call to test(Foo*&) requires an lvalue.

Seems that sun studio c++ compiler treats the & in "&obj" as the addressing operator, so it requires an lvalue (left value). If we uncomment the 2nd test () (which in blue), then in function bar (), we must call test ((Foo *const) this); or just comment out the 1st test (), so that we could pass the compiling.

For non const members or external functions, the type of 'this' is Foo *const, for const members, its type is const Foo *const.

This blog copyright 2009 by yongsun