
Tuesday June 14, 2005
Application(popper) mis-behaving when using multiple LDAP servers
Application(popper) mis-behaving when using multiple LDAP servers
One of the interesting bugs (4809808)
I handled was related to
an application mis-behaving when multiple
LDAP servers were used
in a Native LDAP client. NS_LDAP_SERVERS
parameter in ldap client
profile indicates the list of LDAP
servers used. Services has done
pretty good analysis and provided
us with following information:
---x---
Customer is using the mailing application
qpopper 4.0.4 (please see
http://www.eudora.com/qpopper)
on a Solaris 9 system that has been
configured as a Native LDAP client.
This program performs some getpwnam()
or getspnam() calls that fail,
when there has been defined more than
one Directory Server (either by
IP-addresses or hostnames) in the
parameter NS_LDAP_SERVERS
of /var/ldap/ldap_client_file. It
works fine, when only one Directory Server
is defined in this parameter.
In case of failure, the following messages
become logged to /var/adm/messages:
Jan 24 12:10:00 hostname /src/net/qpopper/qpopper4.0.4/popper/popper[3011]:
[ID
293258 local0.error] libsldap: Status:
0 Mesg: Invalid server (10.10.10.10,) in
NS_LDAP_SERVERS
Jan 24 12:10:00 hostname last message
repeated 1 time
Jan 24 12:10:00 hostname /src/net/qpopper/qpopper4.0.4/popper/popper[3011]:
[ID
293258 local0.error] libsldap: Status:
2 Mesg: Unable to load configuration '/v
ar/ldap/ldap_client_file' ('Invalid
server (10.10.10.10,) in NS_LDAP_SERVERS').
As the message "Invalid server (10.10.10.10,)"
shows, there is a comma at
the end of the string that contains
the IP-address of the 1rst Directory Server.
Also in a truss can be seen when the
function __s_val_serverList() is calling
__s_api_isipv4(), the function __s_api_isipv4()
is examining a string in the
length of the IP-address plus the
comma. This means the wrong string (incl.
the trailing comma) has already been
handed over to __s_val_serverList()
when this function became called.
/etc/nsswitch.conf contains "files
ldap" only for passwd and group. All the other
databases are defined for "files"
only:
passwd: files
ldap
group:
files ldap
hosts:
files dns
ipnodes: files
----x---
I downloaded the qpopper application
and was able to reproduce the problem
in-house. This made my life easier
to troubleshoot the issue. Once I was able
to reproduce the problem, I built
debug libsldap.so.1 binary which pointed out
to the parsing code in libsldap which
builds list of servers. Scanning the
code, i didn't find anything wrong
with that part of code. Then I used the dbx
debugger to further debug the issue.
After a little bit of intro from the
dbx expert, Binu
Jose philip, and excellent help feature that dbx provides,
I was able to use dbx very easily.
I used the step tracing feature and checked
the threadlist when the wrong server
address was passed from __s_val_serverList()
to the function it calls. I started
popper in dbx and stopped at __s_api_isipv4().
The stack looks like:
(/ws/on81-tools/SUNWspro/SC6.1/bin/../WS6U1/bin/sparcv9/dbx)
where
[1] __s_api_isipv4(addr = 0x6d5a8
"129.158.233.109,"), line 1536 in
"ns_common.c"
=>[2] __s_val_serverList(i = NS_LDAP_FILE_VERSION_P,
def = 0xff029b00, param =
0xffbfbf54, errbuf = 0xffbfaf90 ""),
line 3496 in "ns_config.c"
[3] __ns_ldap_setParamValue(ptr
= 0x70560, type = NS_LDAP_SERVERS_P, data =
0xffbfc848, error = 0xffbfdcc8), line
2444 in "ns_config.c"
[4] set_default_value(configptr
= 0x70560, name = 0xffbfc838
"NS_LDAP_SERVERS", value = 0xffbfc848
"129.158.233.109, 129.158.233.128", error
= 0xffbfdcc8), line 1288 in "ns_config.c"
[5] SetDoorInfo(buffer = 0x6de48
"NS_LDAP_FILE_VERSION=2.0", errorp =
0xffbfdcc8), line 3234 in "ns_config.c"
[6] LoadCacheConfiguration(error
= 0xffbfdcc8), line 3272 in "ns_config.c"
[7] __s_api_loadrefresh_config(),
line 1332 in "ns_config.c"
[8] init_search_state_machine(),
line 735 in "ns_reads.c"
[9] __ns_ldap_list(service
= 0xff066b7c "shadow", filter = 0xffbfdfc0
"(&(objectClass=shadowAccount)(uid=tempuser1))",
init_filter_cb = 0xff053c10 =
&_merge_SSD_filter(), attribute
= 0xff06710c, auth = (nil), flags = 0, rResult =
0x6dd5c, errorp = 0xffbfde4c, callback
= (nil), userdata = 0xffbfdec0), line
2206 in "ns_reads.c"
[10] _nss_ldap_lookup(be =
0x6dd40, argp = 0xffbfe1a0, database = 0xff066b7c
"shadow", searchfilter = 0xffbfdfc0
"(&(objectClass=shadowAccount)(uid=tempuser1))",
domain = (nil), init_filter_cb
= 0xff053c10 = &_merge_SSD_filter(),
userdata = 0xffbfdec0), line 108 in
"ldap_common.c"
[11] getbynam(be = 0x6dd40,
a = 0xffbfe1a0), line 240 in "getspent.c"
[12] _nss_search(0x2, 0xff051f38,
0x6dce8, 0xffbfe1a0, 0x0, 0xff1bcf14), at
0xff149298
[13] getspnam_r(0xffbfe7ec,
0x6d8b4, 0x6d8d8, 0x400, 0x0, 0x0), at 0xff196a90
[14] auth_user(p = 0xffbfe7c4,
pw = 0xffbff92c), line 627 in "pop_pass.c"
[15] pop_pass(p = 0xffbfe7c4),
line 1378 in "pop_pass.c"
[16] qpopper(argc = 2, argv
= 0xffbffc3c), line 334 in "popper.c"
[17] motherforker(newsockfd
= -1, sockfd = 5), line 945 in "main.c"
[18] main(argc = 2, argv =
0xffbffc3c), line 619 in "main.c"
(/ws/on81-tools/SUNWspro/SC6.1/bin/../WS6U1/bin/sparcv9/dbx)
Please note that this is stack from
a debug version of libsldap.so.1, and is
the reason for us to see the symbols
getting resolved automatically. Once
I was here, i was able to suspect
that something wrong is going on while
copying over NS_LDAP_SERVERS which
made me check if popper application
has it's own version of strlcpy().
Running nm(1) on qpopper binary showed it's
version of strlcpy() which made me
easy to write up the following evaluation
---x---
The application popper is defining
strlcpy() which is behaving
differently(wrongly) from our strlcpy()
which is the cause of
problem.
The strlcpy() from application(popper)
is as shown below:
size_t
strlcpy ( char *strdest, const
char *strsource, size_t bufsize )
{
register char
*dst = strdest;
register const
char *src = strsource;
register size_t
len = 0;
/*
* Copy
characters as long as they fit
*/
for ( len =
bufsize; len > 0; len-- )
{
*dst++ = *src;
if ( *src++ == '\0' )
break;
}
/*
* Add
null if we stopped early
*/
if ( len ==
0 && bufsize != 0 )
{
*dst = '\0';
}
/*
* We always
return length of what we would have
* copied
if the buffer were infinite.
*/
while ( *src
!= '\0' )
src++; /* make sure we are at end of source string */
return ( src
- strsource );
}
Example showing the difference in
behaviour:
strlcpy( dst, src, 16)
assume src ="129.158.233.109, 129.158.233.128"
with the above strlcpy() code, "129.158.233.109,"
is stored in
'dst' where as with our standard
code, "129.158.233.109" is stored
in 'dst'. This is why we are seeing
the problem only with
popper application.
Applications shouldn't re-define
standard functions like strlcpy()
and in doing so we might see issues
like this which would also
be hard to track down.
----x----
Once the strlcpy() function in the
qpopper was renamed, things started
working fine. So, this bug taught
me about how not to re-write standard
functions and also taught nice tracing
features available in dbx. I hope
this is an interesting bug to share
with OpenSolaris release.
Technorati Tag: OpenSolaris
Technorati Tag: Solaris
(2005-06-14 08:18:35.0)
Permalink