[Open-FCoE] [RESUBMIT-v2 PATCH] libfc, fcoe: fixed locking issues with lport->lp_mutex around lport->link_status

Mike Christie michaelc at cs.wisc.edu
Wed Apr 22 06:50:59 UTC 2009


Vasu Dev wrote:
> On Mon, 2009-04-20 at 23:47 -0500, Mike Christie wrote:
>> Mike Christie wrote:
>>> It makes a giant improvement. I am only down about 5%. There is a weird 
>>> period during the test startup where it takes a couple seconds to get to 
>>> its max throughput. If I just set it to 3 commands there is no delay seen.
>> I meant the max throughtput I can get with your patch, which was 5% 
>> lower than the 3 command case or the case without the qfull change.
> 
> Good to know that this is better now and thanks for quick turnaround on
> this. I also did some measurement using previously mentioned single lun
> test "disktest -PT -T30 -h1 -K128 -B256k -ID /dev/sdx  -D 0:100" with or
> without recently submitted patches improving fcoe pending queue handling
> on this top commit of fcoe-fixes tree:-
> 
> 	commit b4e00c6751c5c38a8763859aacbbeae5ba39af1a
> 	Author: Abhijeet Joglekar <abjoglek at cisco.com>
> 	Date:   Fri Apr 3 14:58:30 2009 -0700
>     	libfc: whenever queueing delete ev for rport, set state to NONE
> 
> I did three measurements for each case listed below and then selected
> the best of three to understand relative impact on performance for any
> need to tune cmd_per_lun or remove qfull check, identified cases were:-
> 
>  	1. no parameter change.
>  	2. ignore added qfull check.
>  	3. reduce cmd per lun to 3 from current 32
>  
> I'm using only relative changed % value from least measured value to
> understand impact for any need to change cmd_per_lun or qfull check, I
> think relative values will be more helpful here beside I just used my
> devel setup which is not tuned for best performance to measure true
> absolute performance values.
> 
> I used two types of luns with same 10G NIC with Cisco nexus switch, one
> clarion lun and other RAM based lun and here are my findings with lun
> details:
> 
> 1. Lun : DGC, model: RAID 5, ver:0326 
> 
> Title:		fcoe-fixes tip,	tip+fcoe handling patches
> No change:	36.034%,	42.35% (highest)
> Ignore qfull:	38.10%,		39.29%
> cmd_per_lun=3:	0% (least),	5.78%
> 
> 2. Lun : SANBlaze Model:VLUN FC RAMDisk  Rev: 4.1".
> 
> Title: 		fcoe-fixes tip,	tip+fcoe handling patches
> No change:	0%,(least)	22.11% (close to high)
> Ignore qfull:	6.52%,		23.24% (highest)
> cmd_per_lun=3:	13.63%,		22.44% (close to high)
> 
> I see the case one with fcoe queue handling patches applied as win for
> both luns for above mentioned disktest. 
> 
> However this is just for one lun case for 256K writes so using
> cmd_per_lun=3 might be still needed in case of using too many
> luns(>=256), how about I make another patch with tunable cmd_per_lun
> fcoe parameter with default value as 3. I'm suggesting default as 3
> since other FC HBA are using this value beside it will likely be helpful

You saw in the previous mail where I said they set cmd_per_lun to 3, but 
they end up setting the queue depepth to 30 (lpfc) or 32 (qla2xx) right 
away, right? When we scan the device it uses 3, but after that and 
before we start IO, the FC driver sets the queue depth higher in the 
slave_configure callout.

So if you set cmd_per_lun to 3, you want to also add a slave_configure 
that increases this to a normal default. 3 commands is not normally the 
queue depth we end up with for any FC driver. 3 commands is very low.

I am not 100% sure, but I think if you are sending too many IOs at the 
device level then FC boxes will return QUEUE_FULL. If we get that then 
libfc ramps down as you know.



> with too many luns or other IO sizes. Although I do see -ve perf impact
> on first lun above with 3 but you noticed 5% better with cmd_per_lun=3,
> so I guess your lun could be close to lun-2 above since this case has
> close numbers for all three cases with patches applied (right column).
> 
> I'm not sure if I need to remove qfull check at this point by looking at

I think lport->qfull and what we ended up talking about in this thread 
(logical unit queue_depth) are two different things. lport->qfull is for 
if we are overqueueing at the host port level. Over queueing on a device 
is a separate issue. lport->qfull would go with shost->can_queue 
modifications.


> data above, however I'm fine either way but would prefer to remove this
> check once we are sure this is impacting perf adversely after added
> recent fcoe queue handling improvements.
> 
> What do you think, especially on suggested tune able cmd_per_lun with
> default value 3.
> 
> 	Vasu
> 




More information about the devel mailing list