[Open-FCoE] [PATCH v2] libfc: rport retry on LS_RJT fromcertain ELS
vasu.dev at linux.intel.com
Fri Jan 23 00:53:32 UTC 2009
Abhijeet Joglekar wrote:
> The problem seems to be that outstanding exchanges on a rogue remote
> port are not getting cleaned up when we unload the libFC module.
> We add the remote port to the discovery object's peer list only when the
> remote port moves from rogue -> real. During the transition though, the
> rogue part is not part of any list. Thus, if a port is in this
> transition state, and we try to remove the libFC module, the
> fc_disc_stop_rports() function does not find the rogue port and so does
> not flush its retry work.
> My guess is that the exch_mgr_rest(0,0) is resetting and freeing all
> exchanges, but just after that, retry work timeout fires for the rogue
> port and it creates another exchange which hangs around.
If this retry timer fires after libfc stack is fully unloaded then this
will cause more serious issues like kernel crash than just exchange or
stale rogue rport memory leak. Good catch. Not keeping track of rouge
rport anywhere is the main issue here as you also mentioned below.
> What was the original motivation behind not keeping the rogue port in
> the disc object's peer list?
Not sure why we left rogue port untracked, to me keeping all rport
tracked whether rogue or real in single list make sense.
> Can we add the rogue port to the disc list
> right when it is created?
Yeap that would be right thing to do, I think this could be done around
fc_rport_rogue_create calling and any way fc_rport_rogue_create calling
needs to be fixed since this breaks libfc cross LP/RP/EM/DISC block
calling. I mean function calls across these libfc blocks should be via
lport->tt for portability but currently fc_disc.c and fc_lport.c
directly calls fc_rport_rogue_create.
Another reason to do around fc_rport_rogue_create calling is that
currently only fc_disc_rport_callback add rport to list but
fc_rport_rogue_create is also called from fc_lport.c with
fc_lport_rport_callback and fc_lport_rport_callback shouldn't access
fc_disc->rports directly to add a rport.
> Its trans_State would identify whether it's a
> rogue or a real.
Yeap and this could be used as required, for instance to figure out how
to delete a created rport.
> This way, lookup will find all remote ports (rogue or
> real) and clean up exchanges on all.
All make sense.
> BTW, before applying Chris's LS_RJT retry patch, a reject to a Plogi
> from a libFC initiator to another libFC initiator wasn't resulting in a
> retry, so the rogue port would get deleted right away after 1 plogi try
> and so we were not hitting this issue. After I applied the patch, and
> increased the number of plogi retries, I started hitting this issue.
This patch increased the probability of hitting this issue but this
issue was already there due to untracked rogue rport on any list to
later purge them when libfc stack is unloading. I mean we were calling
fc_rport_error in case the fc_frame_alloc or elsct_send failed even
before this latest patch from Chris.
> -- abhijeet
> devel mailing list
> devel at open-fcoe.org
More information about the devel