[Open-FCoE] [RESUBMIT-v2 PATCH] libfc, fcoe: fixed locking issues with lport->lp_mutex around lport->link_status
jeykholt at cisco.com
Mon Apr 13 17:27:31 UTC 2009
Mike Christie wrote:
> Vasu Dev wrote:
>> The fcoe_xmit could call fc_pause in case the pending skb queue len is larger
>> than FCOE_MAX_QUEUE_DEPTH, the fc_pause was trying to grab lport->lp_muex to
>> change lport->link_status and that had these issues :-
>> 1. The fcoe_xmit was getting called with bh disabled, thus causing
>> "BUG: scheduling while atomic" when grabbing lport->lp_muex with bh disabled.
>> 2. fc_linkup and fc_linkdown function calls lport_enter function with
>> lport->lp_mutex held and these enter function in turn calls fcoe_xmit to send
>> lport related FC frame, e.g. fc_linkup => fc_lport_enter_flogi to send flogi
>> req. In this case grabbing the same lport->lp_mutex again in fc_puase from
>> fcoe_xmit would cause deadlock.
>> The lport->lp_mutex was used for setting FC_PAUSE in fcoe_xmit path but
>> FC_PAUSE bit was not used anywhere beside just setting and clear this
>> bit in lport->link_status, instead used a separate field qfull in fc_lport
>> to eliminate need for lport->lp_mutex to track pending queue full condition
>> and in turn avoid above described two locking issues.
>> Also added check for lp->qfull in fc_fcp_lport_queue_ready to trigger
>> SCSI_MLQUEUE_HOST_BUSY when lp->qfull is set to prevent more scsi-ml cmds
>> while lp->qfull is set.
>> This patch eliminated FC_LINK_UP and FC_PAUSE and instead used dedicated
>> fields in fc_lport for this, this simplified all related conditional
>> Also removed fc_pause and fc_unpause functions and instead used newly added
>> lport->qfull directly in fcoe.
> I am not sure what is is about this patch exactly (ended up here using
> git revert), but I am hitting a performance regression when doing WRITES
> (READs are fine).
> Doing something like this (you may also need to change the IO scheduler
> to noop (echo noop > /sys/block/sdb/queue/scheduler):
> disktest -PT -T30 -h1 -K128 -B256k -ID /dev/sdb -D 0:100
> will get me 192 MB/s WRITE throughput without the patch. With the patch
> that drops down to 8 MB/s.
> We are hitting the qfull test (if I just remove it in fc_fcp.c then I
> get 192 MB/s again) which is probably causing queuecommand to return
> host busy. That may in turn cause IOs to sit in the queue until the plug
> threshhold is hit or until a IO is completed which would also cause the
> blk/scsi queue to be run again. Without the patch we would fill up the
> fcp/fcoe queue until the scsi_host_template->can_queue (we are using
> 1024 right now) is hit or until the device queue depth is hit (I have
> this set to just 32). With this patch I only get a couple IOs running.
> I think if you want to return SCSI_MLQUEUE_HOST_BUSY when qfull is set,
> then you want to add some code to unplug the queue when it is not set.
> When it clears it does not mean that has a command has completed (a
> command completion will force the blk/scsi queues to be run (this is
> will cause the queuecommand to be called)), so you will not get the
> blk/scsi queues running again quickly.
> Another alternative is to not return SCSI_MLQUEUE_HOST_BUSY when qfull
> is set. You can then just add some code in the t.eh_timed_out to check
> if qfull was hit recently and if it was then give the command some more
> Or can we just remove the qfull check? What was it for? Is qfull going
> to be set for long periods of time? Do commands sit in the fc/fcoe layer
> for a long time before being transmitted?
> What do you think?
> devel mailing list
> devel at open-fcoe.org
Your explanation does make sense.
The reason for qfull is that there are limits to how much we can send on
the net without getting frames dropped. Once the ring is full in the
driver, and the queues at the netdev layer are full, frames will get
dropped, and its better to tell SCSI to stop issuing requests if this
The queue will not get unpaused until we manage to send something
more or until the polling timer goes off.
We should make fcoe_watchdog() timer go faster while the queue is full. Right now
it waits up to a full second which is obviously wrong. Maybe something
like 1 or 2 jiffies would be better.
There's no indication now when the netdev queue becomes drained enough.
We could engineer something in the sk_buff free callback, if the LLD
had control over the skb allocation like it used to. Maybe just when
there's a full or nearly-full queue, set up a callback to unclog it.
More information about the devel