[Open-FCoE] [RFC PATCH 00/16] rport state machine and locking

Mike Christie michaelc at cs.wisc.edu
Wed Sep 10 17:41:26 UTC 2008


Robert Love wrote:
> On Wed, 2008-09-10 at 10:03 -0500, Mike Christie wrote:
>> Robert Love wrote:
>>> On Tue, 2008-09-09 at 10:22 -0700, Robert Love wrote:
>>>> The following series implements a change to the rport state machine and
>>>> applies a locking strategy to that state machine.
>>>>
>>>> 1) A callback function is added to the LP for RP events (READY, FAILED, LOGO)
>>>> 2) GNN_ID and GPN_ID are removed, rport state machine starts at PLOGI
>>>>    - uses a dummy rport until PLOGI response
>>>> 3) A READY state is added to the RP state machine
>>>> 4) The RP state machine locking has been given a locking strategy
>>>> 5) RP only has one error/retry handling routine
>>>> 6) Naming conventions are changed
>>>>     rport (remote port) = fc_rport pointer
>>>>     rdata (remote port private) = fc_rport_libfc_priv pointer
>>>>
>>>> I think these patches still need a little cleaning up, but they're mostly good.
>>>> I'm curious what everyone thinks about the folowing changes to:
>>>>
>>>> a) the locking policy
>>>> b) the event callback mechanism
>>>> c) the consolidation of fc_rport_error and fc_rport_retry.
>>>>
>>>> All comments are welcome. If there are no major issues I'll do a little cleanup,
>>>> drop the RFC, resubmit and then commit.
>>> I have three problems with this patch-set right now.
>>>
>>> 1. Need to delete dummy rports when there is a timeout scenario since
>>> they're not tracked by the FC transport class.
>>>
>>> 2. Should delay the retry a bit if there is a resource allocation error.
>>>
>>> 3. Can't hold locks while calling some functions.
>>>
>>> It's this last one that's going to be difficult. My problem is that I
>>> want to call fc_remote_port_add/delete while I'm holding the rport lock.
>>> Those functions expect to be called without a lock held. Also, I haven't
>>> finished the lport locking, but I expect that the event_callback
>>> function will want to lock the lport so that it can change the lport's
>>> state.
>>>
>>> In all of these cases I want to call a function while I'm holding the
>>> rport lock and those functions expect to be called without the lock
>>> held. Right now my patches drop the lock momentarily before calling one
>>> of these functions and then re-grab the lock when the function returns.
>>> This is not ideal. If I keep it this way I'd need to re-check the state
>>> once I re-grab the lock and it'll get ugly.
>>>
>> For the fc class rport issues the problem is that those functions can 
>> use GFP_KERNEL or flush a work queue? Do we have to use spin locks for 
> 
> I believe so. The fc_remote_port_add() function uses GFP_KERNEL.
> 
>> the rport and lrport locking? Why not use a mutex or semaphore, since we 
>> have process context and it is not performance critical?
> 
> That's a good idea. I changed my patches to use a semaphore and it's
> working just fine. However, I don't understand the contexts that we
> might be called in from SCSI. Does SCSI always call us from process
> context? For recv and sysfs we should always be in process context, but
> SCSI I'm not sure...
> 

Did you change the rport and lport or just the rport locking?

For normal IO paths we can be called from a softirq or thread. This path 
never takes the rport or lport lock though.

The scsi eh paths (abort, lun reset, host reset, etc) will always be 
called from a thread. host reset will call lport_reset. I think in the 
code with your patches that code might be missing a lock. the 
tt->lport_reset is set to fc_lport_enter_reset and I thought you want to 
hold the lport lock.

If you are changing the rport and lrport locking then don't forget about 
checking out the lrport code. I think you want to change that to use a 
workqueue like how the rport code is, because fc_rport_timeout is run 
from a work queue can be use a semaphore, but fc_lport_timeout is run 
from a timer (softirq/bottom-half) context so that cannot sleep.

I think actually my comment about having process context is wrong for 
the fc_exch.c code paths too. fc_exch_timeout would need to be converted 
to use the workqueue like the rport code too. fc_exch_timeout is run 
from a timer and can call into lport/rport/ns code that takes a lock.



More information about the devel mailing list