[Open-FCoE] [RESUBMIT-v2 PATCH] libfc, fcoe: fixed locking issues with lport->lp_mutex around lport->link_status
vasu.dev at linux.intel.com
Wed Apr 15 21:30:49 UTC 2009
On Mon, 2009-04-13 at 10:27 -0700, Joe Eykholt wrote:
> Mike Christie wrote:
> > Vasu Dev wrote:
> >> The fcoe_xmit could call fc_pause in case the pending skb queue len is larger
> >> than FCOE_MAX_QUEUE_DEPTH, the fc_pause was trying to grab lport->lp_muex to
> >> change lport->link_status and that had these issues :-
> >> 1. The fcoe_xmit was getting called with bh disabled, thus causing
> >> "BUG: scheduling while atomic" when grabbing lport->lp_muex with bh disabled.
> >> 2. fc_linkup and fc_linkdown function calls lport_enter function with
> >> lport->lp_mutex held and these enter function in turn calls fcoe_xmit to send
> >> lport related FC frame, e.g. fc_linkup => fc_lport_enter_flogi to send flogi
> >> req. In this case grabbing the same lport->lp_mutex again in fc_puase from
> >> fcoe_xmit would cause deadlock.
> >> The lport->lp_mutex was used for setting FC_PAUSE in fcoe_xmit path but
> >> FC_PAUSE bit was not used anywhere beside just setting and clear this
> >> bit in lport->link_status, instead used a separate field qfull in fc_lport
> >> to eliminate need for lport->lp_mutex to track pending queue full condition
> >> and in turn avoid above described two locking issues.
> >> Also added check for lp->qfull in fc_fcp_lport_queue_ready to trigger
> >> SCSI_MLQUEUE_HOST_BUSY when lp->qfull is set to prevent more scsi-ml cmds
> >> while lp->qfull is set.
> >> This patch eliminated FC_LINK_UP and FC_PAUSE and instead used dedicated
> >> fields in fc_lport for this, this simplified all related conditional
> >> code.
> >> Also removed fc_pause and fc_unpause functions and instead used newly added
> >> lport->qfull directly in fcoe.
> > Hey,
> > I am not sure what is is about this patch exactly (ended up here using
> > git revert), but I am hitting a performance regression when doing WRITES
> > (READs are fine).
Before this patch we had "BUG: scheduling while atomic" due to grabbing
lport->lp_mutex to set/reset FC_PAUSE flag, the lport->lp_mutex use
around FC_PAUSE also had possibility of deadlock as explained in point 2
of patch description, mainly these fixes required this patch.
Removed FC_PAUSE was equivalent of added qfull but was never used before
beside just setting FC_PAUSE bit flag when fcoe_pending_queue.qlen
backlog exceeded previously configured threshold FCOE_MAX_QUEUE_DEPTH
(256) and this patch does same for qfull but also used added qfull to
return SCSI_MLQUEUE_HOST_BUS on qfull=1, should we do that or not ? I
think we should as Joe also mentioned below, otherwise
fcoe_pending_queue.qlen backlog build up will consume additional system
resources especially with heavy on IO on several luns and then eh will
lead to thrash, however we do have other controlling limits on resource
usages as cmd_per_lun 32 and can_queue 1024 but their current values
seems bit aggressive for FC devices and perhaps leading to adverse
impact on performance with added return SCSI_MLQUEUE_HOST_BUS on
qfull=1, more details below on these current aggressive values.
I couldn't git-revert working for this patch completely due to several
conflicts from recent major code changes to files modified by this patch
and those were mainly for removed fcoe transport code and added fip
feature code. Instead I ended up doing testing with and without added
qfull check to return SCSI_MLQUEUE_HOST_BUSY to understand impact on
performance from this patch's added SCSI_MLQUEUE_HOST_BUSY return on
> > Doing something like this (you may also need to change the IO scheduler
> > to noop (echo noop > /sys/block/sdb/queue/scheduler):
> > disktest -PT -T30 -h1 -K128 -B256k -ID /dev/sdb -D 0:100
> > will get me 192 MB/s WRITE throughput without the patch. With the patch
> > that drops down to 8 MB/s.
I don't see this much difference with or w/o qfull on above test, I see
hardly any difference on slower tgts (2%) but yes big drop on RAM based
fast tgt when qfull hits with drop as much as 50% on above test. So
there is drop when qfull hits which make sense with faster tgt with your
above test sending 256K blocks writes from 128 threads with current
cmd_per_lun=32. Can currently available FC line rates keep up with this
pace with this test's IO size with cmd_per_lun=32 limit?
I checked lpfc and qla FC HBA are using cmd_per_lun as 3, so tried same
and with that noticed qfull condition never hits and performance
improved significantly on RAM bases tgt, they performed +35% better
relative to their performance with cmd_per_lun=32 with ignoring qfull
check for your above test.
So may it is time to tune cmd_per_lun and it seems cmd_per_lun=3 would
be better to align with other FC HBA setting and current FC netdev line
rate. What you think ? Possibly also tune can_queue value from 1024 ?
> > We are hitting the qfull test (if I just remove it in fc_fcp.c then I
> > get 192 MB/s again) which is probably causing queuecommand to return
> > host busy. That may in turn cause IOs to sit in the queue until the plug
> > threshhold is hit or until a IO is completed which would also cause the
> > blk/scsi queue to be run again. Without the patch we would fill up the
> > fcp/fcoe queue until the scsi_host_template->can_queue (we are using
> > 1024 right now) is hit or until the device queue depth is hit (I have
> > this set to just 32). With this patch I only get a couple IOs running.
> > I think if you want to return SCSI_MLQUEUE_HOST_BUSY when qfull is set,
> > then you want to add some code to unplug the queue when it is not set.
I'm not sure what you meant by unplug the queue and how to do that ?
> > When it clears it does not mean that has a command has completed (a
> > command completion will force the blk/scsi queues to be run (this is
> > will cause the queuecommand to be called)), so you will not get the
> > blk/scsi queues running again quickly.
fcoe_watchdog will flush fcoe_pending_queue.qlen backlog and that will
cause pending cmd completion but this timer is slow as Joe mentioned
below so once qfull hits this leads to slower performance but can mostly
avoid qfull condition with cmd_per_lun=3 as explained above.
In any case eh will also cause IO completion, will that force blk/scsi
queues to run or any explicit api to trigger this quickly once
> > Another alternative is to not return SCSI_MLQUEUE_HOST_BUSY when qfull
> > is set. You can then just add some code in the t.eh_timed_out to check
> > if qfull was hit recently and if it was then give the command some more
> > time.
This solution won't prevent scsi-ml from sending more commands down to
fcoe than FC line rate can handle, so this will cause
fcoe_pending_queue.qlen backlog build up and frequent defer request in
added t.eh_timed_out. However this can be added but preventing more
fcoe_pending_queue.qlen backlog build will throttle traffic at source
and would avoid need for extra steps such as this in t.eh_timed_out.
How about reducing can_queue along with SCSI_MLQUEUE_HOST_BUSY return,
however that won't help for this single lun test here, so can we
dynamically adjust cmd_per_lun also? If we can then that would be ideal
instead full stop by use of SCSI_MLQUEUE_HOST_BUSY return.
The fcoe still need to be fixed for can_queue ramp up, how about adding
code in fcoe_watchdog to ramp up can_queue when qfull=0 on every run
but still can_queue ramp up logic need to be fixed in libfc first.
If I remember correctly, Mike you mentioned once about implementing
generic solution on can_queue ramp up and down for all HBA, not sure if
I missed any recent update on that on linux-scsi mail list or somewhere
> > Or can we just remove the qfull check? What was it for? Is qfull going
> > to be set for long periods of time? Do commands sit in the fc/fcoe layer
> > for a long time before being transmitted?
> > What do you think?
> > _______________________________________________
> > devel mailing list
> > devel at open-fcoe.org
> > http://www.open-fcoe.org/mailman/listinfo/devel
> Your explanation does make sense.
> The reason for qfull is that there are limits to how much we can send on
> the net without getting frames dropped. Once the ring is full in the
> driver, and the queues at the netdev layer are full, frames will get
> dropped, and its better to tell SCSI to stop issuing requests if this
> The queue will not get unpaused until we manage to send something
> more or until the polling timer goes off.
> We should make fcoe_watchdog() timer go faster while the queue is full. Right now
> it waits up to a full second which is obviously wrong. Maybe something
> like 1 or 2 jiffies would be better.
In that case we should start fcoe_watchdog only when qfull is set
otherwise mostly this will do nothing and should be made per interface.
I'll do that change, but in case can_queue ramp up added here then this
1 or 2 jiffies might be too quick to start can_queue ramp up or may be
can_queue ramp up and down should be completely in libfc based on IO
success and failure rates conditions.
More information about the devel