星期三 四月 12, 2006

[Follow up] Looking into a core dump of deadlock

[follow up last post]
We will see how the above deadlock illustrated in core dump generated after deadlock happens.
<SUT> mdb 0

/* let's find the hang dladm */
> ::ps ! grep dladm
R  12476  12268  12268    725      0 0x42004000 fffffe87705e23d0 dladm

/* see what are its threads */
> fffffe87705e23d0::ps -t
S    PID   PPID   PGID    SID    UID      FLAGS             ADDR NAME
R  12476  12268  12268    725      0 0x42004000 fffffe87705e23d0 dladm
        T  0xffffffffa69be740 <TS_SLEEP>

0xffffffffa69be740 is in state <TS_SLEEP>, let's see what it is waiting for:

> 0xffffffffa69be740::findstack -v
stack pointer for thread ffffffffa69be740: fffffe8000a2e570
[ fffffe8000a2e570 _resume_from_idle+0xf8() ]
  fffffe8000a2e5b0 swtch+0x185()
  fffffe8000a2e650 turnstile_block+0x80d(0, 0, ffffffff93785580,
  fffffffffbc039d8, 0, 0)
  fffffe8000a2e6b0 rw_enter_sleep+0x186(ffffffff93785580, 0)
  fffffe8000a2e6f0 mac_rx_remove+0x32(ffffffff93785310, fffffe86d2d1e368)
  fffffe8000a2e720 aggr_port_delete+0x3d(ffffffff8614ccb0)
  fffffe8000a2e780 aggr_grp_rem_port+0x1e4(ffffffff85798720, ffffffff8614ccb0,
  fffffe8000a2e7e4)
  fffffe8000a2e800 aggr_grp_rem_ports+0x182(1, 1, ffffffff9d1faeb8)
  fffffe8000a2e840 aggr_ioc_remove+0xc5(ffffffffa27164c0, 100000)
  fffffe8000a2e890 aggr_ioctl+0x9b(fffffe8692c06108, ffffffffa27164c0)
  fffffe8000a2e8b0 aggr_wput+0x2e(fffffe8692c06108, ffffffffa27164c0)
  fffffe8000a2e920 putnext+0x246(fffffe80caeb3658, ffffffffa27164c0)
  fffffe8000a2e9f0 strdoioctl+0x3bb(fffffe8692bfee68, fffffe8000a2ea68, 100003,
  1, ffffffff9ec93f40, fffffe8000a2ee9c)
  fffffe8000a2ed20 strioctl+0x3a73(ffffffff94512e80, 5308, 8037490, 100003, 1,
  ffffffff9ec93f40, fffffe8000a2ee9c)
  fffffe8000a2ed70 spec_ioctl+0x83(ffffffff94512e80, 5308, 8037490, 100003,
  ffffffff9ec93f40, fffffe8000a2ee9c)
  fffffe8000a2edc0 fop_ioctl+0x36(ffffffff94512e80, 5308, 8037490, 100003,
  ffffffff9ec93f40, fffffe8000a2ee9c)

It sleeps on a reader/writer lock 0xffffffff93785580. Let's see further in what state the lock is:

> ffffffff93785580::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
ffffffff93785580        READERS=1  B011 ffffffffa69be740 (W)
                                     ||
                 WRITE_WANTED -------+|
                  HAS_WAITERS --------+
>

It has a reader already, since "READERS=1". Let's see what is that:

> ffffffff93785580::kgrep | ::whatis
fffffe8000359b28 is in thread fffffe8000359c80's stack
fffffe8000a2e5d8 is in thread ffffffffa69be740's stack
fffffe8000a2e638 is in thread ffffffffa69be740's stack
fffffe8000a2e670 is in thread ffffffffa69be740's stack
fffffe8000a2e6a8 is in thread ffffffffa69be740's stack
ffffffffa2f3ce48 is ffffffffa2f3ce38+10, bufctl ffffffffa2f5b478 allocated from
turnstile_cache
ffffffffa69be7c0 is ffffffffa69be740+80, allocated as a thread structure

We already know 0xffffffffa69be740 is thread of dladm. So what is 0xfffffe8000359c80's stack looks like?

> fffffe8000359c80::findstack -v
stack pointer for thread fffffe8000359c80: fffffe80003592b0
[ fffffe80003592b0 resume_from_intr+0xbb() ]
  fffffe80003592f0 swtch+0xad()
  fffffe8000359390 turnstile_block+0x80d(0, 1, ffffffff85798720,
  fffffffffbc039d8, 0, 0)
  fffffe80003593f0 rw_enter_sleep+0x1fb(ffffffff85798720, 1)
  fffffe8000359430 aggr_m_tx+0x2d(ffffffff85798720, ffffffff98778be0)
  fffffe8000359450 dls_tx+0x20(ffffffff9540ca88, ffffffff98778be0)
  fffffe8000359480 str_mdata_fastpath_put+0x2c(ffffffff85be6338,
  ffffffff98778be0)
  fffffe8000359520 tcp_send_data+0x6d7(fffffe80caedbf80, ffffffffa12eb0f8,
  ffffffff98778be0)
  fffffe8000359600 tcp_send+0x87b(ffffffffa12eb0f8, fffffe80caedbf80, 5b4, 28,
  14, 0, fffffe80003596ac, fffffe80003596b0, fffffe80003596b4, fffffe8000359678
  , 4fe574, 7fffffff)
  fffffe80003596d0 tcp_wput_data+0x6db(fffffe80caedbf80, 0, 0)
  fffffe8000359820 tcp_rput_data+0x2dbc(fffffe80caedbd80, ffffffffa0ecd8c0,
  ffffffff843def40)
  fffffe80003598b0 squeue_drain+0x212(ffffffff843def40, 4, 2fc20433e599)
  fffffe8000359930 squeue_enter_chain+0x3bb(ffffffff843def40, ffffffffa66b4f60,
  ffffffff98ede720, 10, 1)
  fffffe8000359a00 ip_input+0x780(ffffffff85fadae8, fffffe8692bf3088,
  ffffffffa66b4f60, e)
  fffffe8000359ab0 i_dls_link_ether_rx+0x1ae(ffffffff93784290, fffffe8692bf3088
  , ffffffffa66b4f60)
  fffffe8000359b00 mac_rx+0x7a(ffffffff85798750, fffffe8692bf3088,
  ffffffffa66b4f60)
  fffffe8000359b70 aggr_recv_cb+0x1b9(ffffffff8614ccb0, fffffe8692bf3088,
  ffffffffa66b4f60)
  fffffe8000359bc0 mac_rx+0x7a(ffffffffa2112e00, fffffe8692bf3088,
  ffffffffa66b4f60)
  fffffe8000359c00 e1000g_intr+0xd2(fffffe813c11f000)
  fffffe8000359c60 av_dispatch_autovect+0x83(19)
  fffffe8000359c70 intr_thread+0x50()
>

This guy sleeps on another rwlock 0xffffffff85798720.  Let's take a look:

> ffffffff85798720::rwlock
            ADDR      OWNER/COUNT FLAGS          WAITERS
ffffffff85798720 ffffffffa69be740  B101 fffffe8000359c80 (R)
                                    | | fffffe800016dc80 (R)
                 WRITE_LOCKED ------+ |
                  HAS_WAITERS --------+

Current thread(e1000g receive) is trying to take a lock as READER which is already held by 0xffffffffa69be740 (which is dladm's as mentioned before, held as WRITER).

While thread 0x0xffffffffa69be740 is blocked on lock 0xffffffff93785580, it tries to acquire the lock as RW_WRITER. (see from the dis/src code)

mac_rx_remove:                  pushq  %rbp
mac_rx_remove+1:                movq   %rsp,%rbp
mac_rx_remove+4:                subq   $0x18,%rsp
mac_rx_remove+8:                pushq  %r12
mac_rx_remove+0xa:              pushq  %r13
mac_rx_remove+0xc:              pushq  %r14
mac_rx_remove+0xe:              movq   %rdi,-0x8(%rbp)
mac_rx_remove+0x12:             movq   %rdi,%r14
mac_rx_remove+0x15:             movq   %rsi,-0x10(%rbp)
mac_rx_remove+0x19:             movq   %rsi,%r12
mac_rx_remove+0x1c:             movq   %r14,%r13
mac_rx_remove+0x1f:             addq   $0x270,%r13
mac_rx_remove+0x26:             movq   %r13,%rdi
mac_rx_remove+0x29:             xorl   %esi,%esi
mac_rx_remove+0x2b:             xorl   %eax,%eax
mac_rx_remove+0x2d:             call   -0x38746f        <rw_enter> /* the first "rw_enter" comparing with source code */
mac_rx_remove+0x32:             movq   %r14,%r8


But mac_rx has held it as RW_READER already.
> mac_rx::dis
mac_rx:                         pushq  %rbp
mac_rx+1:                       movq   %rsp,%rbp
mac_rx+4:                       subq   $0x20,%rsp
mac_rx+8:                       pushq  %r12
mac_rx+0xa:                     pushq  %r13
mac_rx+0xc:                     pushq  %r14
mac_rx+0xe:                     pushq  %r15
mac_rx+0x10:                    movq   %rdi,-0x8(%rbp)
mac_rx+0x14:                    movq   %rsi,-0x10(%rbp)
mac_rx+0x18:                    movq   %rsi,%r15
mac_rx+0x1b:                    movq   %rdx,-0x18(%rbp)
mac_rx+0x1f:                    movq   %rdx,%r13
mac_rx+0x22:                    movq   0x10(%rdi),%r14
mac_rx+0x26:                    movq   %r14,%r12
mac_rx+0x29:                    addq   $0x270,%r12
mac_rx+0x30:                    movq   %r12,%rdi
mac_rx+0x33:                    movl   $0x1,%esi
mac_rx+0x38:                    xorl   %eax,%eax
mac_rx+0x3a:                    call   -0x3879f3        <rw_enter>
mac_rx+0x3f:                    movq   0x278(%r14),%r14
mac_rx+0x46:                    testq  %r14,%r14
mac_rx+0x49:                    je     +0x45    <mac_rx+0x8e>
mac_rx+0x4b:                    movq   (%r14),%r8
mac_rx+0x4e:                    testq  %r8,%r8
mac_rx+0x51:                    je     +0x11    <mac_rx+0x62>
mac_rx+0x53:                    movq   %r13,%rdi
mac_rx+0x56:                    xorl   %eax,%eax
mac_rx+0x58:                    call   -0x1a7153        <copymsgchain>
mac_rx+0x5d:                    movq   %rax,%rdx
mac_rx+0x60:                    jmp    +0x5     <mac_rx+0x65>
mac_rx+0x62:                    movq   %r13,%rdx
mac_rx+0x65:                    testq  %rdx,%rdx
mac_rx+0x68:                    je     +0x12    <mac_rx+0x7a>
mac_rx+0x6a:                    movq   0x8(%r14),%r8
mac_rx+0x6e:                    movq   0x10(%r14),%rdi
mac_rx+0x72:                    movq   %r15,%rsi
mac_rx+0x75:                    xorl   %eax,%eax
mac_rx+0x77:                    call   *%r8


So this is what the deadlock looks like with a core dump file.


Technorati Tags:

评论:

发表一条评论:
  • HTML语法: 禁用