A Peek into Solaris Sustaining engineer's life...

Thoughts of an engineer(Raja Gopal Andra) working on two worlds

All | General | Music | Solaris | Sports

20050614 Tuesday June 14, 2005

Application(popper) mis-behaving when using multiple LDAP servers Application(popper) mis-behaving when using multiple LDAP servers  
 

One of the interesting bugs (4809808) I handled was related to
an application mis-behaving when multiple LDAP servers were used
in a Native LDAP client. NS_LDAP_SERVERS parameter in ldap client
profile indicates the list of LDAP servers used. Services has done
pretty good analysis and provided us with following information:

---x---
Customer is using the mailing application qpopper 4.0.4 (please see
http://www.eudora.com/qpopper) on a Solaris 9 system that has been
configured as a Native LDAP client.

This program performs some getpwnam() or getspnam() calls that fail,
when there has been defined more than one Directory Server (either by
IP-addresses or hostnames) in the parameter NS_LDAP_SERVERS
of /var/ldap/ldap_client_file. It works fine, when only one Directory Server
is defined in this parameter.

In case of failure, the following messages become logged to /var/adm/messages:

Jan 24 12:10:00 hostname /src/net/qpopper/qpopper4.0.4/popper/popper[3011]: [ID
293258 local0.error] libsldap: Status: 0  Mesg: Invalid server (10.10.10.10,) in
 NS_LDAP_SERVERS
Jan 24 12:10:00 hostname last message repeated 1 time
Jan 24 12:10:00 hostname /src/net/qpopper/qpopper4.0.4/popper/popper[3011]: [ID
293258 local0.error] libsldap: Status: 2  Mesg: Unable to load configuration '/v
ar/ldap/ldap_client_file' ('Invalid server (10.10.10.10,) in NS_LDAP_SERVERS').

As the message "Invalid server (10.10.10.10,)" shows, there is a comma at
the end of the string that contains the IP-address of the 1rst Directory Server.

Also in a truss can be seen when the function __s_val_serverList() is calling
__s_api_isipv4(), the function __s_api_isipv4() is examining a string in the
length of the IP-address plus the comma. This means the wrong string (incl.
the trailing comma) has already been handed over to __s_val_serverList()
when this function became called.

/etc/nsswitch.conf contains "files ldap" only for passwd and group. All the other
databases are defined for "files" only:
passwd:     files ldap
group:      files ldap
hosts:      files dns
ipnodes:    files
----x---
 

I downloaded the qpopper application and was able to reproduce the problem
in-house. This made my life easier to troubleshoot the issue. Once I was able
to reproduce the problem, I built debug libsldap.so.1 binary which pointed out
to the parsing code in libsldap which builds list of servers. Scanning the
code, i didn't find anything wrong with that part of code. Then I used the dbx
debugger to further debug the issue. After a little bit of intro from the
dbx expert, Binu Jose philip, and excellent help feature that dbx provides,
I was able to use dbx very easily. I used the step tracing feature and checked
the threadlist when the wrong server address was passed from __s_val_serverList()
to the function it calls. I started popper in dbx and stopped at __s_api_isipv4().
The stack looks like:

(/ws/on81-tools/SUNWspro/SC6.1/bin/../WS6U1/bin/sparcv9/dbx) where
  [1] __s_api_isipv4(addr = 0x6d5a8 "129.158.233.109,"), line 1536 in
"ns_common.c"
=>[2] __s_val_serverList(i = NS_LDAP_FILE_VERSION_P, def = 0xff029b00, param =
0xffbfbf54, errbuf = 0xffbfaf90 ""), line 3496 in "ns_config.c"
  [3] __ns_ldap_setParamValue(ptr = 0x70560, type = NS_LDAP_SERVERS_P, data =
0xffbfc848, error = 0xffbfdcc8), line 2444 in "ns_config.c"
  [4] set_default_value(configptr = 0x70560, name = 0xffbfc838
"NS_LDAP_SERVERS", value = 0xffbfc848 "129.158.233.109, 129.158.233.128", error
= 0xffbfdcc8), line 1288 in "ns_config.c"
  [5] SetDoorInfo(buffer = 0x6de48 "NS_LDAP_FILE_VERSION=2.0", errorp =
0xffbfdcc8), line 3234 in "ns_config.c"
  [6] LoadCacheConfiguration(error = 0xffbfdcc8), line 3272 in "ns_config.c"
  [7] __s_api_loadrefresh_config(), line 1332 in "ns_config.c"
  [8] init_search_state_machine(), line 735 in "ns_reads.c"
  [9] __ns_ldap_list(service = 0xff066b7c "shadow", filter = 0xffbfdfc0
"(&(objectClass=shadowAccount)(uid=tempuser1))", init_filter_cb = 0xff053c10 =
&_merge_SSD_filter(), attribute = 0xff06710c, auth = (nil), flags = 0, rResult =
0x6dd5c, errorp = 0xffbfde4c, callback = (nil), userdata = 0xffbfdec0), line
2206 in "ns_reads.c"
  [10] _nss_ldap_lookup(be = 0x6dd40, argp = 0xffbfe1a0, database = 0xff066b7c
"shadow", searchfilter = 0xffbfdfc0
"(&(objectClass=shadowAccount)(uid=tempuser1))", domain = (nil), init_filter_cb
= 0xff053c10 = &_merge_SSD_filter(), userdata = 0xffbfdec0), line 108 in
"ldap_common.c"
  [11] getbynam(be = 0x6dd40, a = 0xffbfe1a0), line 240 in "getspent.c"
  [12] _nss_search(0x2, 0xff051f38, 0x6dce8, 0xffbfe1a0, 0x0, 0xff1bcf14), at
0xff149298
  [13] getspnam_r(0xffbfe7ec, 0x6d8b4, 0x6d8d8, 0x400, 0x0, 0x0), at 0xff196a90
  [14] auth_user(p = 0xffbfe7c4, pw = 0xffbff92c), line 627 in "pop_pass.c"
  [15] pop_pass(p = 0xffbfe7c4), line 1378 in "pop_pass.c"
  [16] qpopper(argc = 2, argv = 0xffbffc3c), line 334 in "popper.c"
  [17] motherforker(newsockfd = -1, sockfd = 5), line 945 in "main.c"
  [18] main(argc = 2, argv = 0xffbffc3c), line 619 in "main.c"
(/ws/on81-tools/SUNWspro/SC6.1/bin/../WS6U1/bin/sparcv9/dbx)

Please note that this is stack from a debug version of libsldap.so.1, and is
the reason for us to see the symbols getting resolved automatically. Once
I was here, i was able to suspect that something wrong is going on while
copying over NS_LDAP_SERVERS which made me check if popper application
has it's own version of strlcpy(). Running nm(1) on qpopper binary showed it's
version of strlcpy() which made me easy to write up the following evaluation

---x---

The application popper is defining strlcpy() which is behaving
differently(wrongly) from our strlcpy() which is the cause of
problem.

The strlcpy() from application(popper) is as shown below:

size_t
strlcpy ( char *strdest, const char *strsource, size_t bufsize )
{
    register char       *dst = strdest;
    register const char *src = strsource;
    register size_t      len = 0;

    /*
     * Copy characters as long as they fit
     */
    for ( len = bufsize; len > 0; len-- )
    {
        *dst++ = *src;
        if ( *src++ == '\0' )
            break;
    }

    /*
     * Add null if we stopped early
     */
    if ( len == 0 && bufsize != 0 )
    {
        *dst = '\0';
    }

    /*
     * We always return length of what we would have
     * copied if the buffer were infinite.
     */
    while ( *src != '\0' )
        src++; /* make sure we are at end of source string */
    return ( src - strsource );
}
 

Example showing the difference in behaviour:

strlcpy( dst, src, 16)

assume src ="129.158.233.109, 129.158.233.128"

with the above strlcpy() code, "129.158.233.109," is stored in
'dst' where as with our standard code, "129.158.233.109" is stored
in 'dst'. This is why we are seeing the problem only with
popper application.

Applications shouldn't re-define standard functions like strlcpy()
and in doing so we might see issues like this which would also
be hard to track down.

----x----

Once the strlcpy() function in the qpopper was renamed, things started
working fine. So, this bug taught me about how not to re-write standard
functions and also taught nice tracing features available in dbx. I hope
this is an interesting bug to share with OpenSolaris release.
 
 
Technorati Tag:
Technorati Tag: (2005-06-14 08:18:35.0) Permalink Comments [0]

Calendar

« June 2005
SunMonTueWedThuFriSat
   
1
2
3
4
5
6
7
8
9
10
11
12
13
15
16
17
18
19
21
22
23
24
25
26
27
28
29
30
  
       
Today

RSS Feeds

XML
All
/General
/Music
/Solaris
/Sports

Search

Links




Navigation



Referers

Today's Page Hits: 8