Wednesday Jan 10, 2007
Wednesday Jan 10, 2007
Following a few days working this out I thought I'd share my thoughts on why the locking information returned by pfiles(1) should be taken with a pinch of salt.
On Solaris, if we run pfiles(1) on a process that has open files it checks each file to see if it has any locks. The way it does this is to use fcntl(2) and F_GETLK. This is more complex than it first appears as it runs the system call from the context of the process being inspected. Suffice to say, it produces output like this:
# pfiles 102226
102226: /home/foo/lock /home/foo/testfile
Current rlimit: 256 file descriptors
0: S_IFCHR mode:0620 dev:301,0 ino:12582918 uid:54321 gid:7 rdev:24,1
O_RDWR|O_NOCTTY|O_LARGEFILE
/devices/pseudo/pts@0:1
1: S_IFCHR mode:0620 dev:301,0 ino:12582918 uid:54321 gid:7 rdev:24,1
O_RDWR|O_NOCTTY|O_LARGEFILE
/devices/pseudo/pts@0:1
2: S_IFCHR mode:0620 dev:301,0 ino:12582918 uid:54321 gid:7 rdev:24,1
O_RDWR|O_NOCTTY|O_LARGEFILE
/devices/pseudo/pts@0:1
3: S_IFREG mode:0666 dev:314,5 ino:2557867 uid:54321 gid:10 size:343
O_RDWR
advisory write lock set by system 0x1 process 102225
/home/foo/testfile
#
All fine and dandy, but what is meant by "system" and is that process ID actually useful?
To understand this we need to look much closer at how these values are set and in particular whether we're using NFSv3 or NFSv4. I'm ignoring NFSv2 but that will behave similarly to NFSv3.
There are two sides to this, who sets the lock and who checks it with pfiles(1).
For NFSv3 we use lockd/statd and the locking code passes the PID of the locking process to the server. There is no 'system id' passed to the server. The NFSv3 server passes the PID to underlying file system in the flock structure of the VOP_FRLOCK() call.
For NFSv4 it is considerably more complicated but essentially it passes what should be an opaque 'lock_owner' structure to the server. This can only by understood by the client that sent it. The NFSv4 server creates a unique identifier which it assigns to the PID member of the flock structure. As with NFSv3 this flock structure is passed to the underlying file system.
The pfiles(1) output is based on a F_GETLK call for the open file. This returns a structure with l_pid and l_sysid. The values of these is as follows:
For NFSv3 the locking code returns the PID passed to the underlying file system in the original VOP_FRLOCK() call. For locks set for NFSv3 this is the true PID of the locking process. The sysid may be set to zero in which case it is ignored by pfiles(1). If the lock was held by a process on this machine the sysid will be set to the system id of the NFS server plus 0x4000. For locks set by NFSv4 the PID is not the PID but the unique identifier used at lock time on the server.
For NFSv4, if we're the client that set the lock we can return sensible values for the PID and system ID - though the latter is *always* the client itself. If we're not the client we fabricate the PID and sysid as follows:
- sysid: we take the sysid of the server and add 1
- pid: we add up all the bytes in the clientid and owner
15370 /* 15371 * Construct a new sysid which should be different from 15372 * sysids of other systems. 15373 */ 15374 15375 flk->l_sysid++; 15376 flk->l_pid = lo_to_pid(&lockt_denied->owner);
So, with that in mind we have the following result:
Process ID is correct, no system identification. eg:advisory write lock set by process 102250Or perhaps:
advisory write lock set by system 0x4001 process 102265
Process ID is unique but meaningless to the client.
No system identification. eg:
advisory write lock set by process 776458
The lock owner will not match on the client,
result is fabricated. eg:
advisory write lock set by system 0x2 process 512
If the lock owner matches, we return the true PID of the process running on this system. The sysid is that of the server as understood by the client. eg:advisory write lock set by system 0x1 process 102313If the lock owner doesn't match the client the result is fabricated. eg:advisory write lock set by system 0x2 process 1580
While checking this out I found and logged two curious bugs in this area.
If pfiles(1) is run on an x64 box the 64-bit version is run. If the target process is 32-bit it mixes up the process and system id. Compare these two outputs:
# /usr/bin/amd64/pfiles 2353 | grep advisory
advisory write lock set by system 0x943
# /usr/bin/i86/pfiles 2353 | grep advisory
advisory write lock set by system 0x1 process 2371
#
Jan 11th - the following is not a bug
The other bug is that for failed lock attempts on NFSv4 or NFSv3 the PID is not set, in my test code I set the l_pid to 12345.
My code output:
F_GETLK returned type 2, sysid 1, pid 2371 F_SETLK returned type 2, sysid 1, pid 12345
truss -t fcntl -vall output
fcntl(3, F_GETLK, 0x08047C60) = 0
typ=F_WRLCK whence=SEEK_SET start=0 len=0 sys=1 pid=2371
fcntl(3, F_SETLK, 0x08047C60) Err#11 EAGAIN
typ=F_WRLCK whence=SEEK_SET start=0 len=0 sys=1 pid=12345
Notice the PID is wrong when the lock fails but is correct when tested?
The fcntl(2) man page clearly states that the l_pid and l_sysid values are only used for F_GETLK calls:
The l_pid and l_sysid fields are used only with F_GETLK or
F_GETLK64 to return the process ID of the process holding a
blocking lock and to indicate which system is running that
process.
The reason pfiles(1) fails to show locks held by a process is due to the way that pfiles works. It effectively hijacks the process being checked and makes the fcntl(2) call within that process context. Processes can re-lock files they already have open and locked as per the fcntl(2) man page:
There will be at most one type of lock set for each byte in
the file. Before a successful return from an F_SETLK,
F_SETLK64, F_SETLKW, or F_SETLKW64 request when the calling
process has previously existing locks on bytes in the region
specified by the request, the previous lock type for each
byte in the specified region will be replaced by the new
lock type.
The output of pfiles(1) is tremendously useful but the locking information is not necessarily accurate or useful.
If you're running pfiles(1) against an NFSv4 file then if the process ID matches a local process it's more than likely the process holding the lock. Otherwise, it's meaningless.
If you're running pfiles(1) against an NFSv3 file then it's harder. Again if the process ID makes sense on this client then it's probably correct. Otherwise you could check it against known NFS clients of the same file system ... but that's time consuming.
To properly identify lock owners some mdb(1) or dtrace(1M) is required - and this blog entry is already too long
I've since logged a man page bug for this:
6512137 pfiles(1)/fcntl(2): Process ID (PID) returned for NFS locked files is ambiguous
Posted by Peter Harvey on January 11, 2007 at 11:49 AM GMT #
I've now logged a bug on the 64-bit/32-bit problem.
6512145 pfiles(1) (64-bit) locking output is broken for 32-bit targets
Posted by Peter Harvey on January 11, 2007 at 12:44 PM GMT #
My client application is connecting to a remote server using TCP/IP. When the connection is established and the communication is going on between my application and the remote server, i am seeing the following lines in the pfiles output:
4: S_IFIFO mode:0666 dev:306,77000 ino:37354 uid:23718 gid:23718 size:0
O_RDWR
/dataserver/bis2_server/tmp/fifo.7998
5: S_IFSOCK mode:0666 dev:349,0 ino:58451 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(49152),SO_RCVBUF(1002016),IP_NEXTHOP(0.15.74.3)
sockname: AF_INET 150.252.11.19 port: 33764
peername: AF_INET 16.101.119.5 port: 4712
When the server was brought down, my application is supposed to close the existing connection and try to reconnect. When the server was down I am seeing the following lines in the pfiles output
4: S_IFIFO mode:0666 dev:306,77000 ino:37354 uid:23718 gid:23718 size:0
O_RDWR
/dataserver/bis2_server/tmp/fifo.7998
5: S_IFSOCK mode:0666 dev:349,0 ino:58451 uid:0 gid:0 size:0
O_RDWR
SOCK_STREAM
SO_SNDBUF(49152),SO_RCVBUF(1002016),IP_NEXTHOP(0.15.74.3)
sockname: AF_INET 150.252.11.19 port: 33764
The socket is not fullt closed. Is it half-close or half-open connection? How to resolve this issue?
Thanks in advance.
Naan
Posted by Naanthan on October 09, 2007 at 01:50 PM BST #
What do you mean by 'server was brought down'? Network cable pulled? Orderly shutdown? Power off?
The client-side state of a socket when the server is somehow disconnected depends on whether the server notified the client, how it notified the client, the flags associated with the socket and so on.
The half-closed socket is well documented in books like "UNIX Network Programming" (W. Richard Stevens). In my Volume 1, Second Edition sections 5.12, 5.14 and 6.6 are of interest.
However, as you point out, this is perhaps more like a half-open state. pfiles(1) effectively calls getpeername(3SOCKET) on behalf of the examined process and I'm wondering how a half-close could leave the socket in this state, ie unable to identify the peername. Weird.
I'd be using truss(1) on the application, netstat(1M) to check the system view of the open ports and snoop(1M) (or Wireshark) to observe the network traffic.
If this is Solaris 10 or Nevada then DTrace(1M) is another useful observability tool.
Posted by Peter Harvey on October 10, 2007 at 04:14 PM BST #