[Open-FCoE] [RESUBMIT-v2 PATCH] libfc, fcoe: fixed locking issues with lport->lp_mutex around lport->link_status

Joe Eykholt jeykholt at cisco.com
Wed Apr 15 21:43:54 UTC 2009

Vasu Dev wrote:
> On Mon, 2009-04-13 at 10:27 -0700, Joe Eykholt wrote:
>> Mike Christie wrote:
>>> Vasu Dev wrote:
>>>> The fcoe_xmit could call fc_pause in case the pending skb queue len is larger
>>>> than FCOE_MAX_QUEUE_DEPTH, the fc_pause was trying to grab lport->lp_muex to
>>>> change lport->link_status and that had these issues :-
>>>> 1. The fcoe_xmit was getting called with bh disabled, thus causing
>>>> "BUG: scheduling while atomic" when grabbing lport->lp_muex with bh disabled.
>>>> 2. fc_linkup and fc_linkdown function calls lport_enter function with
>>>> lport->lp_mutex held and these enter function in turn calls fcoe_xmit to send
>>>> lport related FC frame, e.g. fc_linkup => fc_lport_enter_flogi to send flogi
>>>> req. In this case grabbing the same lport->lp_mutex again in fc_puase from
>>>> fcoe_xmit would cause deadlock.
>>>> The lport->lp_mutex was used for setting FC_PAUSE in fcoe_xmit path but
>>>> FC_PAUSE bit was not used anywhere beside just setting and clear this
>>>> bit in lport->link_status, instead used a separate field qfull in fc_lport
>>>> to eliminate need for lport->lp_mutex to track pending queue full condition
>>>> and in turn avoid above described two locking issues.
>>>> Also added check for lp->qfull in fc_fcp_lport_queue_ready to trigger
>>>> SCSI_MLQUEUE_HOST_BUSY when lp->qfull is set to prevent more scsi-ml cmds
>>>> while lp->qfull is set.
>>>> This patch eliminated FC_LINK_UP and FC_PAUSE and instead used dedicated
>>>> fields in fc_lport for this, this simplified all related conditional
>>>> code.
>>>> Also removed fc_pause and fc_unpause functions and instead used newly added
>>>> lport->qfull directly in fcoe.
>>> Hey,
>>> I am not sure what is is about this patch exactly (ended up here using 
>>> git revert), but I am hitting a performance regression when doing WRITES 
>>> (READs are fine).
> Before this patch we had "BUG: scheduling while atomic" due to grabbing
> lport->lp_mutex to set/reset FC_PAUSE flag, the lport->lp_mutex use
> around FC_PAUSE also had possibility of deadlock as explained in point 2
> of patch description, mainly these fixes required this patch.
> Removed FC_PAUSE was equivalent of added qfull but was never used before
> beside just setting  FC_PAUSE bit flag when fcoe_pending_queue.qlen
> backlog exceeded previously configured threshold FCOE_MAX_QUEUE_DEPTH
> (256) and this patch does same for qfull but also used added qfull to
> return SCSI_MLQUEUE_HOST_BUS on qfull=1, should we do that or not ? I
> think we should as Joe also mentioned below, otherwise
> fcoe_pending_queue.qlen backlog build up will consume additional system
> resources especially with heavy on IO on several luns and then eh will
> lead to thrash, however we do have other controlling limits on resource
> usages as cmd_per_lun 32 and can_queue 1024 but their current values
> seems bit aggressive for FC devices and perhaps leading to adverse
> impact on performance with added return SCSI_MLQUEUE_HOST_BUS on
> qfull=1, more details below on these current aggressive values.
> I couldn't git-revert working for this patch completely due to several
> conflicts from recent major code changes to files modified by this patch
> and those were mainly for removed fcoe transport code  and added fip
> feature code. Instead I ended up doing testing with and without added
> qfull check to return SCSI_MLQUEUE_HOST_BUSY to understand impact on
> performance from this patch's added SCSI_MLQUEUE_HOST_BUSY return on
> qfull=1.
>>> Doing something like this (you may also need to change the IO scheduler 
>>> to noop (echo noop > /sys/block/sdb/queue/scheduler):
>>> disktest -PT -T30 -h1 -K128 -B256k -ID /dev/sdb  -D 0:100
>>> will get me 192 MB/s WRITE throughput without the patch. With the patch 
>>> that drops down to 8 MB/s.
> I don't see this much difference with or w/o qfull on above test, I see
> hardly any difference on slower tgts (2%) but yes big drop on RAM based
> fast tgt when qfull hits with drop as much as 50% on above test. So
> there is drop when qfull hits which make sense with faster tgt with your
> above test sending 256K blocks writes from 128 threads with current
> cmd_per_lun=32. Can currently available FC line rates keep up with this
> pace with this test's IO size with cmd_per_lun=32 limit?
> I checked lpfc and qla FC HBA are using cmd_per_lun as 3, so tried same
> and with that noticed qfull condition never hits and performance
> improved significantly on RAM bases tgt, they performed +35% better
> relative to their performance with cmd_per_lun=32 with ignoring qfull
> check for your above test.
> So may it is time to tune cmd_per_lun and it seems cmd_per_lun=3 would
> be better to align with other FC HBA setting and current FC netdev line
> rate. What you think ? Possibly also tune can_queue value from 1024 ? 
>>> We are hitting the qfull test (if I just remove it in fc_fcp.c then I 
>>> get 192 MB/s again) which is probably causing queuecommand to return 
>>> host busy. That may in turn cause IOs to sit in the queue until the plug 
>>> threshhold is hit or until a IO is completed which would also cause the 
>>> blk/scsi queue to be run again. Without the patch we would fill up the 
>>> fcp/fcoe queue until the scsi_host_template->can_queue (we are using 
>>> 1024 right now) is hit or until the device queue depth is hit (I have 
>>> this set to just 32). With this patch I only get a couple IOs running.
>>> I think if you want to return SCSI_MLQUEUE_HOST_BUSY when qfull is set, 
>>> then you want to add some code to unplug the queue when it is not set.
> I'm not sure what you meant by unplug the queue and how to do that ?
>>> When it clears it does not mean that has a command has completed  (a 
>>> command completion will force the blk/scsi queues to be run (this is 
>>> will cause the queuecommand to be called)), so you will not get the 
>>> blk/scsi queues running again quickly.
> fcoe_watchdog will flush fcoe_pending_queue.qlen backlog and that will
> cause pending cmd completion but this timer is slow as Joe mentioned
> below so once qfull hits this leads to slower performance but can mostly
> avoid qfull condition with cmd_per_lun=3 as explained above.
> In any case eh will also cause IO completion, will that force blk/scsi
> queues to run or any explicit api to trigger this quickly once
>>> Another alternative is to not return SCSI_MLQUEUE_HOST_BUSY when qfull 
>>> is set. You can then just add some code in the t.eh_timed_out to check 
>>> if qfull was hit recently and if it was then give the command some more 
>>> time.
> This solution won't prevent scsi-ml from sending more commands down to
> fcoe than FC line rate can handle, so this will cause
> fcoe_pending_queue.qlen backlog build up and frequent defer request in
> added t.eh_timed_out. However this can be added but preventing more
> fcoe_pending_queue.qlen backlog build will throttle traffic at source
> and would avoid need for extra steps such as this in t.eh_timed_out.
> How about reducing can_queue along with SCSI_MLQUEUE_HOST_BUSY return,
> however that won't help for this single lun test here, so can we
> dynamically adjust cmd_per_lun also? If we can then that would be ideal
> instead full stop by use of SCSI_MLQUEUE_HOST_BUSY return.
> The fcoe still need to be fixed for can_queue ramp up, how about adding
> code in fcoe_watchdog to ramp up can_queue when qfull=0 on every run
> but still can_queue ramp up logic need to be fixed in libfc first.
> If I remember correctly, Mike you mentioned once about implementing
> generic solution on can_queue ramp up and down for all HBA, not sure if
> I missed any recent update on that on linux-scsi mail list or somewhere
> else.
>>> Or can we just remove the qfull check? What was it for? Is qfull going 
>>> to be set for long periods of time? Do commands sit in the fc/fcoe layer 
>>> for a long time before being transmitted?
>>> What do you think?
>>> _______________________________________________
>>> devel mailing list
>>> devel at open-fcoe.org
>>> http://www.open-fcoe.org/mailman/listinfo/devel
>> Your explanation does make sense.
>> The reason for qfull is that there are limits to how much we can send on
>> the net without getting frames dropped.  Once the ring is full in the
>> driver, and the queues at the netdev layer are full, frames will get
>> dropped, and its better to tell SCSI to stop issuing requests if this
>> persists.
>> The queue will not get unpaused until we manage to send something
>> more or until the polling timer goes off.
>> We should make fcoe_watchdog() timer go faster while the queue is full.  Right now
>> it waits up to a full second which is obviously wrong.  Maybe something
>> like 1 or 2 jiffies would be better.
> In that case we should start fcoe_watchdog only when qfull is set
> otherwise mostly this will do nothing and should be made per interface.
> I'll do that change, but in case can_queue ramp up added here then this
> 1 or 2 jiffies might be too quick to start can_queue ramp up or may be
> can_queue ramp up and down should be completely in libfc based on IO
> success and failure rates conditions.
> 	Vasu

Making the timer a little smarter is good, but I like the idea that it
should be rare to hit qfull, and as you say, perhaps cmd_per_lun is too high.


More information about the devel mailing list