[Open-FCoE] [PATCH 00/28] rport/lport/ns rework
michaelc at cs.wisc.edu
Wed Oct 1 20:23:45 UTC 2008
Robert Love wrote:
> The following series implements a change to the rport and lport state machines and
> applies a locking strategy to that state machine.
> 1) A callback function is added to the LP for RP events (READY, FAILED, LOGO)
> 2) GNN_ID and GPN_ID are removed, rport state machine starts at PLOGI
> - uses a dummy rport until PLOGI response
> 3) A READY state is added to the RP state machine
> 4) The RP and LP state machines locking has been given a strategy
> 5) RP only has one error/retry handling routine
> 6) LP only has one error/retry handling routine
> 7) Naming conventions are changed
> rport (remote port) = fc_rport pointer
> rdata (remote port private) = fc_rport_libfc_priv pointer
> lport (local port) = fc_lport_pointer
> 8) All interaction with scsi_transport_fc happens in fc_rport.c. Users of
> the RP state machine only create dummy rports and call rport_login(),
> the RP state machine deals with adding to the transport layer and
> deleting correctly.
> There are some issues that I've found while working on this patch that will
> need to be fixed.
> 1) probably not cancelling work threads appropriately
> 2) What happens if discovery fails? Currently it's silent.
> Should discovery be pulled into the lport state machine?
> 3) reset is broken
> 4) There needs to be a little more commenting about the locking
> 5) Some of the #define variables need to be named better. In particular
> I don't like LPORT_RPORT_EV_* or the function lport_rport_event().
6). I think my mutex idea was a dud or maybe it needs two more changes.
The mutex is a problem currently because fc_exch.c can call back into
fc_rport.c or fc_lport.c by calling the ep->resp function. This is
causing problems when fc_exch_timeout calls the resp handler from the
timer context. We talked about this before and we can handle it by
queueing the timeout in delayed work.
A new problem is due to the termiate_rport_io callback. If
fc_rport.c:fc_rport_terminate_io is run from scsi_transport_fc.c's host
work queue (work is queue by calling fc_queue_work), then it calls
exch_mgr_reset and this calls a ep->resp function which calls back into
fc_rport.c then it could try to grab the rp_mutex. If another thread is
calling fc_remote_port_add on the same rport, that thread will be
holding the rp_mutex, and so when fc_remote_port_add calls fc_flush_work
we will lock up because the flush call will be waiting on the workqueue
that is calling fc_rport_terminate_io which is waiting for the mutex we
More information about the devel