« November 2009
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
     
       
Today
XML

Blog::Navigation

Blog::Editing

Bookmarks::Blogroll

Bookmarks::News

Blog::Referers

Today's Page Hits: 6

Site notes

This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.

Powered by Roller Weblogger.
All | About | General | Java | Music
20050614 Tuesday June 14, 2005
Runtime Dynamic Linker Performance Runtime Dynamic Linker performance Last summer, before I started working at sun full time, I worked on an experimental project to improve the performance of the Solaris run time dynamic linker. The idea behind dynamic linking is that the vast amount of code in a system can be shared by multiple programs. For example, xterm and gaim both use functions like strlen() defined in the C standard library so it is much more efficient for these programs to use a single copy of libc. Dynamic linking also allows a library implementers to change the implementation details of a library as long as the interface remains the same. For example, we can implement a faster version of strlen() in libc and programs will not need to be recompiled to take advantage of it.

Dynamic linking requires that the names of symbols be resolved to addresses at runtime as opposed to compile time. This is accomplished by a special library /lib/ld.so.1 which we call RTLD, the Run Time Link eDitor. This is equivalent to linux's /lib/ld-linux.so.1 as Linux used Solaris as a model for its dynamic library system. RTLD is loaded by the kernel when it exec()s an ELF executable.

Dynamic linking is very efficient for medium sized programs. In almost all cases the performance win for only keeping a single copy of code in memory outweighs the costs of having to resolve symbols at runtime. On medium sized programs with thousands of symbols this cost is insignificant. On small programs (like /bin/date) this cost is significant because there is nothing else to do.

Megaprogams like Staroffice and Mozilla have hundreds of libraries and link against about 200,000 symbols. These programs spend several seconds of startup time in RTLD. As you will see, much of this time is spent doing work that is usually unnecessary.

RTLD maintains a linked list of all the libraries in use. When RTLD needs to find a symbol it walks the link map checking each library for a version of that symbol. Depending on the kind of symbol, it might start at the head of the list, or at the library where the relocation will take place. Because RTLD will take the first symbol it finds that matches, it is possible for libraries to override other symbols. The exact semantics of where the list link traversal starts and how it is ordered ensure that the RightThing happens given the variety of bindings a symbol may have.

For example the link map created during execution of xhelloworld.c might look like: (this is highly simplified)
a.out --> libX11 --> libsocket --> libnsl --> libm --> libc.

When a.out references XCreateSimpleWindow, RTLD searches for that symbol in a.out but doesn't find it, then it goes and searches for it in libX11 and finds it. But when a.out references something like printf(), RTLD has to search a.out, then libX11, then libsocket, and so on until it finally finds it in libc().

This lets you override things when you need to, but it is not optimized for the common case. Interposition is rare and we can expect that most of the time a symbol will appear exactly once in a link map.

We tried to develop an algorithm with the same semantics that sacrificed performance in the case where there is interposition to gain performance in the common case. We already have a technique called direct binding where ld caches that information when the binary is created. Direct binding forces you to take an all or nothing approach when running the executable. It also requires that -Bdirect be specified when the program is linked. If you need to interpose on even one symbol at runtime you must tell RTLD to ignore every direct binding hint. You might to this to temporarily operate a buggy application under a special version of malloc().

What we decided to try -- and the word try needs emphasis because it failed and was not integrated into solaris -- was to make a system wide table of symbols that was dynamically and lock-lessly updated. RTLD could consult this table for a list of every library exporting a particular symbol and then quickly find the right library on its link map. If a RTLD discovered that it linked against a library that was not in the cache, then it would still have to check that library every time it had the opportunity to interpose on a symbol. It would also notify LSD (the linker symbol-caching daemon) of the "cache miss" so that LSD could add the library to the table.

We took advantage of Total Store Ordering to update the table atomically without locks. The table was a fixed size and included two hash tables and a "heap" for storing the symbol lists. One hash table stored dev/inode pairs for every library known to LSD. RTLD used this table to know which libraries were present in LSD's symbol table and which would need to be checked manually. The second table was a hash table of symbol names. Hash collisions were resolved with a linked list. Every symbol had a linked list of objects (defined by dev/inode pair) that exported that symbol as well as the index into that object's symbol table.

When a process started, RTLD mmap()ed the database and as it loaded libraries, it marked those libraries that were present in the cache. When it needed to lookup a symbol, it used LSD's database to find the libraries contained the symbol. Then it found which of those libraries was first on the visible portion of the link map and checked any libraries in between the current head and the library it found that were not in the cache. This logic guaranteed that RTLD behaved exactly as it should unless the cache was toxic.

This increased the speed of lookups by about 25 percent. When firefox-bin referenced printf() in the old system, RTLD would have to search hundreds of objects before it finally found printf(). In the new system it would search firefox-bin, not find it, and then instantly fall back to libc.

It also added more complication and failure modes to the system. We decided drop the project because the added complexity was not outweighed by the performance benefits. RTLD went 25 percent faster, but most systems spend less than one percent of their time in RTLD. The overhead of using LSD actually penalized small programs (like running /bin/date in a loop) that we originally wished to optimize.

OpenSolaris
Solaris ~

Copyright (C) 2003, David Stafford's Weblog