[Open-FCoE] [PATCH] libfc, fcoe: fixed locking issues with lport->lp_mutex around lport->link_status
vasu.dev at linux.intel.com
Fri Jan 16 02:59:01 UTC 2009
Joe Eykholt wrote:
> Vasu Dev wrote:
>> The fcoe_xmit could call fc_pause in case the pending skb queue len is larger
>> than FCOE_MAX_QUEUE_DEPTH, the fc_pause was trying to grab lport->lp_muex to
>> change lport->link_status and that caused these issues :-
>> 1. The fcoe_xmit was getting called with bh disabled, thus casing
>> "BUG: scheduling while atomic" when grabbing lport->lp_muex while bh disabled.
>> 2. fc_linkup and fc_linkdown function calls lport_enter function with
>> lport->lp_mutex held and these enter function in turn calls fcoe_xmit to send
>> lport related FC frame, for instance fc_linkup calling fc_lport_enter_flogi
>> to send flogi req. In this call flow grabbing same lport->lp_mutex in
>> fc_puase from fcoe_xmit would cause deadlock.
>> The lport->lp_mutex required to set FC_PAUSE in fcoe_xmit path but
>> I don't see FC_PAUSE directly used anywhere beside just setting and clear this
>> bit in lport->link_status. Shouldn't we report SCSI_MLQUEUE_DEVICE_BUSY or
>> SCSI_MLQUEUE_HOST_BUSY for any new command while FC_PUASE set?
> Absolutely. It used to try to do that. It might not have ever worked, though.
> This is important for reliable non-drop behavior. If we don't stop SCSI, it
> could allocate a lot of frames which would stay in the fcoe queue. Pause
> could last a long time, in theory. Good catch.
Added pause check in resubmitted patch
http://www.open-fcoe.org/pipermail/devel/2009-January/001505.html , see
few more comments below.
>> I'm not sure that lock is really required for lport->link_status but for now
>> changed this to atomic to avoid above listed two issues but I'll look into
>> eliminating this atomic since this is used from fc_queuecommand. If FC_PAUSE
>> is really required then I could add a separate field for this to eliminate
>> locking need for least FC_PAUSE bit setting.
> We should put it in a separate field now, which would eliminate the
> need for locking or lock it under the lport lock.
Make sense as I also indicated this before, done!
> See below for more comments, but I think the bottom line is that this patch
> should be redone such that pause is a separate field and only used in the fcoe
> module. The link_status field shouldn't have pause in it.
Since this pause checking is required for above discussed
SCSI_MLQUEUE_HOST_BUSY indication, therefore leaving this in fc_lport
will be quick to check pause state in fast path for each scsi cmd. So
added this field to fc_lport.
Thanks for reviewing this patch which had many opens from me, your
remaining review comments are also very good but mostly around atomic
uses and conditional if statements which are not applicable to
resubmitted patch any more since no more atomic used and use of bit
fields simplified conditional if statements in resubmitted patch.
More information about the devel