"Everybody's lost but me", is one of my favorite quotes from Indian Jones "The Last Crusade". Yet how often do we feel like no one knows what is going on but ourself. Why? Usually because your the person who has the technical expertise. With OpenSolaris you don't have to be lost. Since this is in my Debugging Tips section, I would like for everyone to get the expertise in at least being able to start understanding problems.
So today's tip is with looking at what is transpiring on your server as it relates to Storage Area Networks. Under Solaris 10, there is a feature within the leadville stack that traces fiber channel events. It's scattered all through fp driver source call FP_TRACE.
Looking at a live system, you can retrieve this data using mdb (Modular Debugger)
# mdb -k
Loading modules: [ unix krtld genunix dtrace specfs ufs sd isp pcisch sgsbbc md ip sctp usba wrsm fcp fctl qlc nca ssd zfs random ptm nfs crypto sppp logindmux cpc wrsmd fcip ipc ]
Using the fptrace macro.
> ::fptrace
fp trace buffer contents
[Wed Oct 22 14:31:50 2008] 0=>fp(1)::fp_fabric_login Begin; port=60003ab3000, job=6000584ff18
[Wed Oct 22 14:31:50 2008] 1=>fp(3)::fp_fabric_login Begin; port=60005837000, job=60005851f20
[Wed Oct 22 14:31:50 2008] 2=>fp(1)::fp_fabric_login End; port=60003ab3000, job=6000584ff18
[Wed Oct 22 14:31:50 2008] 3=>fp(1)::fp_flogi_intr; port=60003ab3000, pkt=600058399c8, state=1
[Wed Oct 22 14:31:50 2008] 4=>fp(3)::fp_fabric_login End; port=60005837000, job=60005851f20
[Wed Oct 22 14:31:50 2008] 5=>fp(3)::fp_flogi_intr; port=60005837000, pkt=6000583b9c8, state=1
[Wed Oct 22 14:31:50 2008] 6=>fp(3)::fp_plogi_intr: port=60005837000, job=60005851f20, d_id=fffffc, jcount=1 pkt=6000583b6f0, state=1
[Wed Oct 22 14:31:50 2008] 7=>fp(1)::fp_plogi_intr: port=60003ab3000, job=6000584ff18, d_id=fffffc, jcount=1 pkt=600058396f0, state=1
[Wed Oct 22 14:31:50 2008] 8=>fp(3)::GAN response; port=60005837000, d_id=20700
[Wed Oct 22 14:31:50 2008] 9=>fp(3)::GAN response details; port=60005837000, d_id=20700, type_id=7ba68800, pwwn=21 0 0 e08b e c2 8d, nwwn=20 0 0 e0 8b e c2 8d
[Wed Oct 22 14:31:50 2008] 10=>fp(3)::GAN response; port=60005837000, d_id=20000
[Wed Oct 22 14:31:50 2008] 11=>fp(3)::GAN response details; port=60005837000, d_id=20000, type_id=7ba68800, pwwn=20 3 0 3ba 4d 36 95, nwwn=10 0 0 3 ba 4d 36 95
[Wed Oct 22 14:31:50 2008] 12=>fp(3)::GAN response; port=60005837000, d_id=20600
[Wed Oct 22 14:31:50 2008] 13=>fp(1)::GAN response; port=60003ab3000, d_id=10700
[Wed Oct 22 14:31:50 2008] 14=>fp(1)::GAN response details; port=60003ab3000, d_id=10700, type_id=7ba68800, pwwn=21 0 0 e0 8b 9 e8 e9, nwwn=20 0 0 e0 8b 9 e8 e9
[Wed Oct 22 14:31:50 2008] 15=>fp(1)::GAN response; port=60003ab3000, d_id=10000
[Wed Oct 22 14:31:50 2008] 16=>fp(1)::GAN response details; port=60003ab3000, d_id=10000, type_id=7ba68800, pwwn=20 3 0 3ba 4e 85 83, nwwn=10 0 0 3 ba 4e 85 83
[Wed Oct 22 14:31:50 2008] 17=>fp(3)::GAN response; port=60005837000, d_id=20700
[Wed Oct 22 14:31:50 2008] 18=>fp(3)::GAN response details; port=60005837000, d_id=20700, type_id=7ba68800, pwwn=21 0 0 e0 8b e c2 8d, nwwn=20 0 0 e0 8b e c2 8d
[Wed Oct 22 14:31:50 2008] 19=>fp(3)::GAN PD stuffing; pd=6000392c0c0, port_id=70201, sym_len=0 fc4-type=20010000
[Wed Oct 22 14:31:50 2008] 20=>fp(1)::GAN response; port=60003ab3000, d_id=10600
[Wed Oct 22 14:31:50 2008] 21=>fp(3)::GAN response; port=60005837000, d_id=20000
[Wed Oct 22 14:31:50 2008] 22=>fp(3)::GAN response details; port=60005837000, d_id=20000, type_id=7ba68800, pwwn=20 3 0 3ba 4d 36 95, nwwn=10 0 0 3 ba 4d 36 95
[Wed Oct 22 14:31:50 2008] 23=>fp(3)::GAN PD stuffing; pd=60003bd9780, port_id=201, sym_len=0 fc4-type=10010000
[Wed Oct 22 14:31:50 2008] 24=>fp(3)::GAN response; port=60005837000, d_id=20600
>> More [<space>, <cr>, q, n, c, a] ?
MDB gives you the capability of scrolling through the events like more does. You can also pull the data from a core dump, either live core or a post mordem dump.
# mdb unix.0 vmcore.0
Loading modules: [ krtld genunix specfs dtrace ufs ssd fcp fctl qlc pcisch sd ip hook neti sctp arp usba nca random md logindmux ptm cpc sppp crypto fcip wrsmd zfs nfs ipc ]
> ::fptrace
fp trace buffer contents
[Fri Sep 12 2008 13:09:34:745:629:500] 0=>fp(1)::fp_get_lilpmap Begin; port=60032c18000, map=60032c180f0
[Fri Sep 12 2008 13:09:34:745:720:800] 1=>fp(1)::fp_get_lilpmap End; port=60032c18000, map=60032c180f0
[Fri Sep 12 2008 13:09:34:745:726:300] 2=>fp(1)::fp_fabric_login Begin; port=60032c18000, job=60032c95f18
[Fri Sep 12 2008 13:09:34:745:755:500] 3=>fp(1)::fp_fabric_login End; port=60032c18000, job=60032c95f18
[Fri Sep 12 2008 13:09:34:746:029:800] 4=>fp(1)::fp_flogi_intr; port=60032c18000, pkt=60032c8d9c8, state=e
.......
[Fri Nov 14 2008 00:45:57:990:286:400] 3299=>fp(0)::fp_unsol_cb: s_id=3703ef, d_id=375c00, type=1, r_ctl=22, f_ctl=290000
seq_id=0, df_ctl=0, seq_cnt=0, ox_id=ffff, rx_id=ffff ro=0, buffer[0]:5
[Fri Nov 14 2008 00:45:57:990:323:200] 3300=>fp(0)::Accepting LOGO; d_id=3703ef, pd=60032cfcac0, tolerance=0 retain=1
[Fri Nov 14 2008 00:45:57:990:391:400] 3301=>fp(0)::fp_ulp_devc fired; port=60032dc1000, len=1
[Fri Nov 14 2008 00:45:57:990:559:100] 3302=>fp(0)::fp_plogi_group begin; port=60032dc1000, job=60032c95428
[Fri Nov 14 2008 00:45:57:990:567:600] 3303=>fp(0)::fp_plogi_group contd; port=60032dc1000, pd=60032cfcac0
[Fri Nov 14 2008 00:45:58:045:465:100] 3304=>fp(0)::fp_plogi_intr: port=60032dc1000, job=60032c95428, d_id=3703ef, jcount=1 pkt=60032dc96f0, state=e
[Fri Nov 14 2008 00:45:58:045:564:200] 3305=>fp(0)::PLOGI to 3703ef failed. state=e reason=5.
[Fri Nov 14 2008 00:45:58:045:723:700] 3306=>fp(0)::fp_plogi_group end: port=60032dc1000, job=60032c95428
[Fri Nov 14 2008 00:45:58:045:729:900] 3307=>fp(0)::fp_plogi_group begin; port=60032dc1000, job=60032c94ee8
Very cool. Looks somewhat similar to a portlogdump from the brocade switches. Besides, now that solaris is open, you can actually see what things mean. Take for instance fp_unsol_cb. Take a look at opensolaris code and you see what fp_unsol_cb (line 10454) does. And how we are tracing.
10468 FP_TRACE(FP_NHEAD1(1, 0), "fp_unsol_cb: s_id=%x,"
10469 " d_id=%x, type=%x, r_ctl=%x, f_ctl=%x"
10470 " seq_id=%x, df_ctl=%x, seq_cnt=%x, ox_id=%x, rx_id=%x"
10471 " ro=%x, buffer[0]:%x", buf->ub_frame.s_id, buf->ub_frame.d_id,
10472 buf->ub_frame.type, buf->ub_frame.r_ctl, buf->ub_frame.f_ctl,
10473 buf->ub_frame.seq_id, buf->ub_frame.df_ctl, buf->ub_frame.seq_cnt,
10474 buf->ub_frame.ox_id, buf->ub_frame.rx_id, buf->ub_frame.ro,
10475 buf->ub_buffer[0]);
You can now go through the source code and see what things mean. Now there shouldn't be anymore need for you to quote the Young Indiana Jones, "Everybody's lost but me". At least those who read my blog.