[Open-FCoE] libfc: exchange circular lock dependency detected.

Robert Love robert.w.love at intel.com
Wed Jul 15 21:38:05 UTC 2009


On Wed, 2009-07-15 at 14:20 -0700, Vasu Dev wrote:
> On Fri, 2009-07-10 at 16:21 -0700, Vasu Dev wrote:
> > On Fri, 2009-07-10 at 15:21 -0700, Joe Eykholt wrote:
> > > This is with the latest fcoe-next.git tree.
> > > 
> > > I see this circular dependency message when running my tests.  I haven't
> > > completely figured it out, but I think there must be cases where ex lock
> > > is held while em lock is acquired, 
> > 
> > That is the case of fc_exch_em_alloc() for new exch allocation, the exch
> > lock is held along with em lock while adding new exch to exches array.
> > 
> > > and that's a problem.
> > 
> > This shouldn't be the issue here since fc_exch_em_alloc() is the only
> > case when these two locks held together, so no possibility of circular
> > locking due to any messed up order of taking these two locks together
> > somewhere else.
> > 
> > > It may also be caused by flushing fc_exch_timeout() work while holding the
> > > em lock it sometimes needs.
> > > 
> > > I also noticed that fc_exch_timeout is calling fc_exch_rrq while holding
> > > an ex_lock and that does a new fc_exch_alloc() ... which gets the em_lock,
> > > and a different ex_lock so that maybe what it's complaining about.
> > > 
> > 
> > Yeah this what this log trace is suggesting and I also got same trace
> > with my exch pool patches ( pasted below, that also suggested exact
> > above call flow could lead to circular locking on exch_lock). However I
> > don't see how since fc_exch_rrq will be called with exch lock held but
> > that same exch's lock won't be held again when sending out fc_exch_rrq
> > in that call flow. The exch lock held before fc_exch_rrq and during
> > fc_exch_seq_send() in that flow are on two different exch instances but
> > perhaps checking engine is deducing as if same exch lock is held twice
> > on same exch instance. So perhaps false alarm from lock checking engine
> > or I'm missing something here, but just to avoid this nasty warning,
> > perhaps exch lock should be dropped before calling fc_exch_rrq() but I
> > need to think more about that since that might have some other issues.
> > 
> 
> Dropping exch lock before calling  fc_exch_rrq() seems to be the best
> fix but with one caveat to handle fc_exch_reset case before RRQ retry
> attempted, I handled fc_exch_reset case by checking additional flags in
> case of grabbing exch lock again to schedule RRQ retry.
> 
> The fix is:-
> 
> diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c
> index a074562..c261893 100644
> --- a/drivers/scsi/libfc/fc_exch.c
> +++ b/drivers/scsi/libfc/fc_exch.c
> @@ -445,9 +445,9 @@ static void fc_exch_timeout(struct work_struct *work)
>         e_stat = ep->esb_stat;
>         if (e_stat & ESB_ST_COMPLETE) {
>                 ep->esb_stat = e_stat & ~ESB_ST_REC_QUAL;
> +               spin_unlock_bh(&ep->ex_lock);
>                 if (e_stat & ESB_ST_REC_QUAL)
>                         fc_exch_rrq(ep);
> -               spin_unlock_bh(&ep->ex_lock);
>                 goto done;
>         } else {
>                 resp = ep->resp;
> @@ -1672,14 +1672,14 @@ static void fc_exch_rrq(struct fc_exch *ep)
>         struct fc_lport *lp;
>         struct fc_els_rrq *rrq;
>         struct fc_frame *fp;
> -       struct fc_seq *rrq_sp;
>         u32 did;
> 
>         lp = ep->lp;
> 
>         fp = fc_frame_alloc(lp, sizeof(*rrq));
>         if (!fp)
> -               return;
> +               goto retry;
> +
>         rrq = fc_frame_payload_get(fp, sizeof(*rrq));
>         memset(rrq, 0, sizeof(*rrq));
>         rrq->rrq_cmd = ELS_RRQ;
> @@ -1695,13 +1695,20 @@ static void fc_exch_rrq(struct fc_exch *ep)
>                        fc_host_port_id(lp->host), FC_TYPE_ELS,
>                        FC_FC_FIRST_SEQ | FC_FC_END_SEQ | FC_FC_SEQ_INIT, 0);
> 
> -       rrq_sp = fc_exch_seq_send(lp, fp, fc_exch_rrq_resp, NULL, ep,
> -                                 lp->e_d_tov);
> -       if (!rrq_sp) {
> -               ep->esb_stat |= ESB_ST_REC_QUAL;
> -               fc_exch_timer_set_locked(ep, ep->r_a_tov);
> +       if (fc_exch_seq_send(lp, fp, fc_exch_rrq_resp, NULL, ep, lp->e_d_tov))
> +               return;
> +
> +retry:
> +       spin_lock_bh(&ep->ex_lock);
> +       if (ep->state & (FC_EX_RST_CLEANUP | FC_EX_DONE)) {
> +               spin_unlock_bh(&ep->ex_lock);
> +               /* drop hold for rec qual */
> +               fc_exch_release(ep);
>                 return;
>         }
> +       ep->esb_stat |= ESB_ST_REC_QUAL;
> +       fc_exch_timer_set_locked(ep, ep->r_a_tov);
> +       spin_unlock_bh(&ep->ex_lock);
>  }
> 
> 
> I tested this fix by triggering ABTS/RRQ and no more circular lock
> warning but not able to trigger circular locking warning without above
> fix also. I've not changed kernel configuration since I got previously
> sent log of circular locking warning on same kernel, so same kernel
> config with all lock checking enabled. I also tried without the fix
> after reboot but still no more same warning, so I'm not sure how to
> force lock checking code to always check checking circular locking for
> exch loch in fc_exch_rrq call flow, any suggestion ?
>  
> Let me know your comments on above fix. I'll provide this fix against
> fcoe-fixes tree once that tree is updated to .31 kernel which is
> currently on 2.6.30-rc4.
> 	
I've updated fcoe-fixes.git to be based on the current scsi-rc-fixes.git
(2.6.31-rc2).





More information about the devel mailing list