[Open-FCoE] [PATCH v2] libfc: rport retry on LS_RJT fromcertain ELS

Robert Love robert.w.love at linux.intel.com
Tue Jan 27 22:00:49 UTC 2009


On Tue, 2009-01-27 at 12:24 -0800, Robert Love wrote:
> On Thu, 2009-01-22 at 16:53 -0800, Vasu Dev wrote:
> > Abhijeet Joglekar wrote:

<snip>

> > > BTW, before applying Chris's LS_RJT retry patch, a reject to a Plogi
> > > from a libFC initiator to another libFC initiator wasn't resulting in a
> > > retry, so the rogue port would get deleted right away after 1 plogi try
> > > and so we were not hitting this issue. After I applied the patch, and
> > > increased the number of plogi retries, I started hitting this issue.
> > >   
> > 
> > This patch increased the probability of hitting this issue but this 
> > issue was already there due to untracked rogue rport on any list to 
> > later purge them when libfc stack is unloading. I mean we were calling 
> > fc_rport_error in case the fc_frame_alloc or elsct_send failed even 
> > before this latest patch from Chris.
> 
> I'm a bit confused about the scenario. The "transition state" that you
> guys have been talking about, are you talking about the rogue state (1)
> or the time between an RTV response and a fc_remote_port_add() (2)?
> 
> (1)
> If it's the rogue state then rogue ports are bound to exchanges and
> unloading the module should cause an EM reset to send the CLOSED event
> to the rport. Is there a reference counting problem here?
> 
> (2)
> If you mean after a valid RTV response, but before the
> fc_remote_port_add() to the FC transport class then the retry timer
> shouldn't fire. The rogue rport wouldn't be bound to anything, but we'd
> be in process context and we'd then try adding to the transport class.
> 
> If this is the scenario, then I think we care about fc_host locking and
> not the disc->rports list. It would be a timing issue as to whether the
> real rport was added to the transport before the fc_host was freed or
> after. For either case I would guess that there is locking in the FC
> transport to prevent problems, but maybe there is a defect.
> 
> I have a patch-set that adds the rogue rport to the disc->rports list
> just after it's created, but I'm not sure that it solves your problem. 
> 
> Can you help me understand the scenario a little better?
> 
I talked to Vasu and he explained that the critical piece of information
is that a timeout is occuring and therefore there is scheduled work, but
the rport isn't bound to anything while the timer is ticking. I have
patches that add rogue rports to the disc->rports list, and they'll
likely fix the problem, but I haven't been able to reproduce the
scenario yet. I'm not sure I'll be able to reproduce this without
hacking the code a bit to force a retry.




More information about the devel mailing list