[Open-FCoE] [PATCH v2] libfc: rport retry on LS_RJT fromcertain ELS

Vasu Dev vasu.dev at linux.intel.com
Fri Jan 23 00:53:32 UTC 2009


Abhijeet Joglekar wrote:
> The problem seems to be that outstanding exchanges on a rogue remote
> port are not getting cleaned up when we unload the libFC module. 
>
> We add the remote port to the discovery object's peer list only when the
> remote port moves from rogue -> real. During the transition though, the
> rogue part is not part of any list. Thus, if a port is in this
> transition state, and we try to remove the libFC module, the
> fc_disc_stop_rports() function does not find the rogue port and so does
> not flush its retry work.
>
> My guess is that the exch_mgr_rest(0,0) is resetting and freeing all
> exchanges, but just after that, retry work timeout fires for the rogue
> port and it creates another exchange which hangs around.
>   

If this retry timer fires after libfc stack is fully unloaded then this 
will cause more serious issues like kernel crash than just exchange or 
stale rogue rport memory leak. Good catch. Not keeping track of rouge 
rport anywhere is the main issue here as you also mentioned below.

> What was the original motivation behind not keeping the rogue port in
> the disc object's peer list? 

Not sure why we left rogue port untracked, to me keeping all rport 
tracked whether rogue or real in single list make sense.

> Can we add the rogue port to the disc list
> right when it is created? 

Yeap that would be right thing to do, I think this could be done around 
fc_rport_rogue_create calling and any way fc_rport_rogue_create calling 
needs to be fixed since this breaks libfc cross LP/RP/EM/DISC block 
calling. I mean function calls across these libfc blocks should be via 
lport->tt for portability but currently fc_disc.c and fc_lport.c 
directly calls fc_rport_rogue_create.

Another reason to do around fc_rport_rogue_create calling is that 
currently only fc_disc_rport_callback add rport to list but 
fc_rport_rogue_create is also called from fc_lport.c with 
fc_lport_rport_callback and fc_lport_rport_callback shouldn't access 
fc_disc->rports directly to add a rport.

> Its trans_State would identify whether it's a
> rogue or a real. 

Yeap and this could be used as required, for instance to figure out how 
to delete a created rport.

> This way, lookup will find all remote ports (rogue or
> real) and clean up exchanges on all.
>
>   

All make sense.

> BTW, before applying Chris's LS_RJT retry patch, a reject to a Plogi
> from a libFC initiator to another libFC initiator wasn't resulting in a
> retry, so the rogue port would get deleted right away after 1 plogi try
> and so we were not hitting this issue. After I applied the patch, and
> increased the number of plogi retries, I started hitting this issue.
>   

This patch increased the probability of hitting this issue but this 
issue was already there due to untracked rogue rport on any list to 
later purge them when libfc stack is unloading. I mean we were calling 
fc_rport_error in case the fc_frame_alloc or elsct_send failed even 
before this latest patch from Chris.

> -- abhijeet
> _______________________________________________
> devel mailing list
> devel at open-fcoe.org
> http://www.open-fcoe.org/mailman/listinfo/devel
>
>   




More information about the devel mailing list