[Open-FCoE] [RFC PATCH 00/16] rport state machine and locking

Robert Love robert.w.love at intel.com
Wed Sep 10 17:48:54 UTC 2008

On Wed, 2008-09-10 at 12:41 -0500, Mike Christie wrote:
> Robert Love wrote:
> > On Wed, 2008-09-10 at 10:03 -0500, Mike Christie wrote:
> >> Robert Love wrote:
> >>> On Tue, 2008-09-09 at 10:22 -0700, Robert Love wrote:
> >>>> The following series implements a change to the rport state machine and
> >>>> applies a locking strategy to that state machine.
> >>>>
> >>>> 1) A callback function is added to the LP for RP events (READY, FAILED, LOGO)
> >>>> 2) GNN_ID and GPN_ID are removed, rport state machine starts at PLOGI
> >>>>    - uses a dummy rport until PLOGI response
> >>>> 3) A READY state is added to the RP state machine
> >>>> 4) The RP state machine locking has been given a locking strategy
> >>>> 5) RP only has one error/retry handling routine
> >>>> 6) Naming conventions are changed
> >>>>     rport (remote port) = fc_rport pointer
> >>>>     rdata (remote port private) = fc_rport_libfc_priv pointer
> >>>>
> >>>> I think these patches still need a little cleaning up, but they're mostly good.
> >>>> I'm curious what everyone thinks about the folowing changes to:
> >>>>
> >>>> a) the locking policy
> >>>> b) the event callback mechanism
> >>>> c) the consolidation of fc_rport_error and fc_rport_retry.
> >>>>
> >>>> All comments are welcome. If there are no major issues I'll do a little cleanup,
> >>>> drop the RFC, resubmit and then commit.
> >>> I have three problems with this patch-set right now.
> >>>
> >>> 1. Need to delete dummy rports when there is a timeout scenario since
> >>> they're not tracked by the FC transport class.
> >>>
> >>> 2. Should delay the retry a bit if there is a resource allocation error.
> >>>
> >>> 3. Can't hold locks while calling some functions.
> >>>
> >>> It's this last one that's going to be difficult. My problem is that I
> >>> want to call fc_remote_port_add/delete while I'm holding the rport lock.
> >>> Those functions expect to be called without a lock held. Also, I haven't
> >>> finished the lport locking, but I expect that the event_callback
> >>> function will want to lock the lport so that it can change the lport's
> >>> state.
> >>>
> >>> In all of these cases I want to call a function while I'm holding the
> >>> rport lock and those functions expect to be called without the lock
> >>> held. Right now my patches drop the lock momentarily before calling one
> >>> of these functions and then re-grab the lock when the function returns.
> >>> This is not ideal. If I keep it this way I'd need to re-check the state
> >>> once I re-grab the lock and it'll get ugly.
> >>>
> >> For the fc class rport issues the problem is that those functions can 
> >> use GFP_KERNEL or flush a work queue? Do we have to use spin locks for 
> > 
> > I believe so. The fc_remote_port_add() function uses GFP_KERNEL.
> > 
> >> the rport and lrport locking? Why not use a mutex or semaphore, since we 
> >> have process context and it is not performance critical?
> > 
> > That's a good idea. I changed my patches to use a semaphore and it's
> > working just fine. However, I don't understand the contexts that we
> > might be called in from SCSI. Does SCSI always call us from process
> > context? For recv and sysfs we should always be in process context, but
> > SCSI I'm not sure...
> > 
> Did you change the rport and lport or just the rport locking?
Well, currently I just changed the rport code. I'm trying to get through
that fist and then I'll do the lport stuff (reorg and locking). You're
right that it's all tied together though.

> For normal IO paths we can be called from a softirq or thread. This path 
> never takes the rport or lport lock though.
Excellent, I was just about to go through those paths to make sure.

> The scsi eh paths (abort, lun reset, host reset, etc) will always be 
> called from a thread. host reset will call lport_reset. I think in the 

Yeah, I just saw this too.

> code with your patches that code might be missing a lock. the 

I'm sure that it is. :) I'll get to it. I've got a start on the lport
stuff, but it still needs a fair amount of work.

> tt->lport_reset is set to fc_lport_enter_reset and I thought you want to 
> hold the lport lock.
> If you are changing the rport and lrport locking then don't forget about 
> checking out the lrport code. I think you want to change that to use a 
> workqueue like how the rport code is, because fc_rport_timeout is run 
> from a work queue can be use a semaphore, but fc_lport_timeout is run 
> from a timer (softirq/bottom-half) context so that cannot sleep.
Good point. I'll make this change...

> I think actually my comment about having process context is wrong for 
> the fc_exch.c code paths too. fc_exch_timeout would need to be converted 
> to use the workqueue like the rport code too. fc_exch_timeout is run 
> from a timer and can call into lport/rport/ns code that takes a lock.

I'll look into this too. Thanks for the comments.

More information about the devel mailing list