[Open-FCoE] [PATCH 00/28] rport/lport/ns rework

Mike Christie michaelc at cs.wisc.edu
Wed Oct 1 20:37:05 UTC 2008

Mike Christie wrote:
> Robert Love wrote:
>> The following series implements a change to the rport and lport state 
>> machines and
>> applies a locking strategy to that state machine.
>> 1) A callback function is added to the LP for RP events (READY, 
>> 2) GNN_ID and GPN_ID are removed, rport state machine starts at PLOGI
>>    - uses a dummy rport until PLOGI response
>> 3) A READY state is added to the RP state machine
>> 4) The RP and LP state machines locking has been given a strategy
>> 5) RP only has one error/retry handling routine
>> 6) LP only has one error/retry handling routine
>> 7) Naming conventions are changed
>>     rport (remote port) = fc_rport pointer
>>     rdata (remote port private) = fc_rport_libfc_priv pointer
>>     lport (local port) = fc_lport_pointer
>> 8) All interaction with scsi_transport_fc happens in fc_rport.c. Users of
>>    the RP state machine only create dummy rports and call rport_login(),
>>    the RP state machine deals with adding to the transport layer and
>>    deleting correctly.
>> There are some issues that I've found while working on this patch that 
>> will
>> need to be fixed.
>> 1) probably not cancelling work threads appropriately
>> 2) What happens if discovery fails? Currently it's silent.
>>    Should discovery be pulled into the lport state machine?
>> 3) reset is broken
>> 4) There needs to be a little more commenting about the locking
>> 5) Some of the #define variables need to be named better. In particular
>>    I don't like LPORT_RPORT_EV_* or the function lport_rport_event().
> 6). I think my mutex idea was a dud or maybe it needs two more changes. 
> The mutex is a problem currently because fc_exch.c can call back into 
> fc_rport.c or fc_lport.c by calling the ep->resp function. This is 
> causing problems when fc_exch_timeout calls the resp handler from the 
> timer context. We talked about this before and we can handle it by 
> queueing the timeout in delayed work.
> A new problem is due to the termiate_rport_io callback. If 
> fc_rport.c:fc_rport_terminate_io is run from scsi_transport_fc.c's host 
> work queue (work is queue by calling fc_queue_work), then it calls
> exch_mgr_reset and this calls a ep->resp function which calls back into 
> fc_rport.c then it could try to grab the rp_mutex. If another thread is 
> calling fc_remote_port_add on the same rport, that thread will be 
> holding the rp_mutex, and so when fc_remote_port_add calls fc_flush_work 
> we will lock up because the flush call will be waiting on the workqueue 
> that is calling fc_rport_terminate_io which is waiting for the mutex we 
> are holding.

Do we need to hold the mutex to sync up additions and deltions? This is 
one of the reasons right? For this second problem I think we can just 
take a page out of the other fc LLDs and what I stole from them for 
iscsi. It looks like what other LLDs do (at least qla2xxx and maybe 
lpfc) is that they have a single thread which will handle these events 
so all the addition and deletion calls to the fc class are synchronized 
through it. For iscsi we had less baggage and have a similar problem, so 
I just made a common event thread in the transport class to handle this 
for all iscsi drivers so each LLD did not have to duplicate the threading.

Does this help or is it way off? I just started going over the changes 
with Chris's changes too so I am not sure.

More information about the devel mailing list