[Open-FCoE] A sync bug between openfc and scst module??

charles zhuang charlesz at opengridcomputing.com
Wed Sep 24 20:15:11 UTC 2008


OK, I solved (worked around) this problem myself. I believe my suspect
is correct. I put in a small delay (1 ms) in target fc_sess.c where it
process the plogi request. The delay will allow the scst_register finish
it first then the fcoe recv thread dispatch to process plogi request.
The flow works out perfectly.
Looks like the focus on this board is on the initiator side, I just
update my finding so people will be aware of it. The scst module cmd
thread is based on processor number, the more processors, the faster the
Ethernet nic, the easier you will see this unsync problem. That explain
my environment with quad processors and 10G Nic is seeing this problem,
but dua processor and 1G Nic is fine.
The good news is that p2p mode is working for the latest rearch code, on
both back 2 back and a switch in the middle environment.

Charles
-----Original Message-----
From: devel-bounces at open-fcoe.org [mailto:devel-bounces at open-fcoe.org]
On Behalf Of charles zhuang
Sent: Wednesday, September 24, 2008 11:28 AM
To: devel at open-fcoe.org
Subject: [Open-FCoE] A sync bug between openfc and scst module??

Hi,
I am running into this problem (target crash after prli phase) and
suspect this is a sync issue between open fcoe module and scst module.
Can some one review this and provide me your feedback?
Following is the trace log showing the problem:
Sep 23 16:26:17 vic100 kernel: local_port  10102 event none state FLOGI
-> ready
Sep 23 16:26:17 vic100 kernel: fc_seq_start: exch    4 f_ctl 800000 seq
0 f_ctl      0
Sep 23 16:26:17 vic100 kernel: fcoe_xmit       010102 -> 010101 xids
0004 0004 ELS rep ELS LS_ACC
Sep 23 16:26:17 vic100 kernel: local_port  10102 event none state ready
-> ready
Sep 23 16:26:17 vic100 kernel: [6044]: ENTRY scst_register
Sep 23 16:26:17 vic100 kernel: fcoe_rcv: skb_info: len:162 data_len:0
head:ffff8102219250f8 data:ffff810221925110 tail:00000000000000ba
end:0000000000000100 sum:0 dev:eth2
Sep 23 16:26:17 vic100 kernel: fcoe_percpu_receive_thread: skb_info:
len:162 data_len:0 head:ffff8102219250f8 data:ffff810221925110
tail:ffff8102000000ba end:ffff810200000100 sum:0 dev:eth2
Sep 23 16:26:17 vic100 kernel: fc_sess_recv_plogi_req: incoming PLOGI
from  10101 wwpn 2000000743054364 state INIT - accept
Sep 23 16:26:17 vic100 kernel: fc_seq_start: exch   19 f_ctl 800000 seq
0 f_ctl      0
Sep 23 16:26:17 vic100 kernel: fcoe_xmit       010102 -> 010101 xids
0005 0019 ELS rep ELS LS_ACC
Sep 23 16:26:17 vic100 kernel: sess to  10101 event FC_EV_ACC state
SESS_ST_INIT -> SESS_ST_PLOGI_RECV
Sep 23 16:26:17 vic100 kernel: fcoe_rcv: skb_info: len:66 data_len:0
head:ffff8102193cbd28 data:ffff8102193cbd40 tail:000000000000005a
end:0000000000000080 sum:0 dev:eth2
Sep 23 16:26:17 vic100 kernel: fcoe_percpu_receive_thread: skb_info:
len:66 data_len:0 head:ffff8102193cbd28 data:ffff8102193cbd40
tail:ffff81020000005a end:ffff810200000080 sum:0 dev:eth2
Sep 23 16:26:17 vic100 kernel: fcs_local_port_prli_accept: PRLI
callback. remote  10101 local  10102
Sep 23 16:26:17 vic100 kernel: [6046]: ENTRY scst_register_session
Sep 23 16:26:17 vic100 kernel: [6046]: ENTRY scst_alloc_session
Sep 23 16:26:17 vic100 kernel: [6046]: EXIT scst_alloc_session
Sep 23 16:26:17 vic100 kernel: [6046]: scst_register_session:5031:Adding
sess ffff8102099eacb0 to scst_sess_init_list
Sep 23 16:26:17 vic100 kernel: [6046]: EXIT scst_register_session
Sep 23 16:26:17 vic100 kernel: fcs_local_port_prli_accept: accept remote
fid  10101
Sep 23 16:26:17 vic100 kernel: [6024]: scst_mgmt_thread:5162:Removing
sess ffff8102099eacb0 from scst_sess_init_list
Sep 23 16:26:17 vic100 kernel: [6024]: ENTRY scst_init_session
Sep 23 16:26:17 vic100 kernel: fc_seq_start: exch    6 f_ctl 800000 seq
0 f_ctl      0
Sep 23 16:26:17 vic100 kernel: [6024]: scst_init_session:4923:c.z.,
session:0xffff8102099eacb0, init name:2000000743054364,
tgt:0x0000000000000000, phase:0
 
To summarize this, here's what I understand the flow.
The target openfc module will send out ELS LS_ACC to the initiator
telling it that it's ready for plogi. After that, it will call
scst_register to register a scst_tgt to the session list. However,
before the scst_register successfully return with the newly create
scst_tgt, the session already go thru plogi and prli request, where
fcs_local_port_prli_accept call into scst_register_session. The
scst_mgmt_thread remove the sess from the list and call into
scst_init_session. At that time the sess->tgt is still NULL due to the
fact that scst_register has not returned. 
If my understanding is correct, I think scst_register_session shall not
add itself to the session list until the scst_register has successfully
created the scst_tgt.
 
Thanks for your help.
Charles
_______________________________________________
devel mailing list
devel at open-fcoe.org
http://www.open-fcoe.org/mailman/listinfo/devel




More information about the devel mailing list