[Open-FCoE] [PATCH v2] libfc: rport retry on LS_RJT fromcertain ELS

Abhijeet Joglekar ajoglekar at nuovasystems.com
Wed Jan 21 23:38:42 UTC 2009


> -----Original Message-----
> From: Abhijeet Joglekar
> Sent: Wednesday, January 21, 2009 2:25 PM
> To: 'Robert Love'
> Cc: devel at open-fcoe.org
> Subject: RE: [Open-FCoE] [PATCH v2] libfc: rport retry on LS_RJT
> fromcertain ELS
> 
> > -----Original Message-----
> > From: Robert Love [mailto:robert.w.love at linux.intel.com]
> > Sent: Wednesday, January 21, 2009 10:14 AM
> > To: Abhijeet Joglekar
> > Cc: devel at open-fcoe.org
> > Subject: Re: [Open-FCoE] [PATCH v2] libfc: rport retry on LS_RJT
> > fromcertain ELS
> >
> > On Tue, 2009-01-20 at 18:33 -0800, Abhijeet Joglekar wrote:
> > > Chris,
> > >
> > > I am getting exchange leak, and hence a crash while unloading
libfc
> > > module after applying this patch. Are you seeing this in your
tests?
> > >
> > I've created a fcoe-fixes.git tree that I'm going to use to send
patches
> > up to linux-scsi. It's based on Linus' tree and I rebased yesterday
> > (maybe the day before) to 2.6.29-rc2  and applied all of our (FCoE
devel
> > list) outstanding patches. There might be a leak since my test
scripts
> > don't test for it, but my test scrips have done a lot of loading and
> > unloading of the modules and I don't see a crash.
> >
> > How are you noticing the leak?
> 
> fc_destroy_exch_mgr() is complaining during module unload that all
> exchanges in "libfc_em" are not free.
> 
> I am not hitting the WARN_ON in fc_exch_mgr_free() where it checks for
the
> total_exches to be zero. So the exch ptr is getting removed from the
> manager exchs array and list, but probably has a non-zero reference
count
> and so does not get released. Maybe we need to put a WARN_ON() when an
ep
> is removed from the mp list, but its reference count is non-zero and
thus
> cannot be freed.
> 
> I am not current though on the libFC patches that have gone in after
the
> 1st week of December, so might be missing some patch (bug fix) that
maybe
> got exposed after applying only the rport retry on LS_RJT patch.
> 
> -- abhijeet

The problem seems to be that outstanding exchanges on a rogue remote
port are not getting cleaned up when we unload the libFC module. 

We add the remote port to the discovery object's peer list only when the
remote port moves from rogue -> real. During the transition though, the
rogue part is not part of any list. Thus, if a port is in this
transition state, and we try to remove the libFC module, the
fc_disc_stop_rports() function does not find the rogue port and so does
not flush its retry work. 

My guess is that the exch_mgr_rest(0,0) is resetting and freeing all
exchanges, but just after that, retry work timeout fires for the rogue
port and it creates another exchange which hangs around.

What was the original motivation behind not keeping the rogue port in
the disc object's peer list? Can we add the rogue port to the disc list
right when it is created? Its trans_State would identify whether it's a
rogue or a real. This way, lookup will find all remote ports (rogue or
real) and clean up exchanges on all.

BTW, before applying Chris's LS_RJT retry patch, a reject to a Plogi
from a libFC initiator to another libFC initiator wasn't resulting in a
retry, so the rogue port would get deleted right away after 1 plogi try
and so we were not hitting this issue. After I applied the patch, and
increased the number of plogi retries, I started hitting this issue.

-- abhijeet



More information about the devel mailing list