[Open-FCoE] deadlock in testing (Re: [RFC PATCH V3 11/13] fcoe: use RTNL mutex to serialize create/destroy and protect hostlist)

Chris Leech christopher.leech at intel.com
Fri Jul 10 00:11:23 UTC 2009

On Thu, Jul 09, 2009 at 04:44:02PM -0700, Joe Eykholt wrote:
> Chris Leech wrote:
> > On Thu, Jul 09, 2009 at 03:47:57PM -0700, Joe Eykholt wrote:
> >> Chris Leech wrote:
> >>> I ran a parallel create/destroy/remove test overnight and something
> >>> deadlocked.  Running with lockdep enabled gives a reproducible warning,
> >>> but I'm having trouble making sense of it.  I'm not sure I understand
> >>> what the "events" lock is here.
> >> I'm not sure why it says "events" either.  I think it has something
> >> to do with flush_work() calling lock_map_acquire/release to indicate
> >> that the work items it will wait on may need locks.
> >>
> >> I see one problem, though.  fcoe_ctlr_destroy() is doing a
> >> flush_work and it uses the general work thread.  So does
> >> linkwatch_event(), which needs rtnl_lock().  So the flush_work()
> >> may hang forever if there's a linkwatch_event queued.  Shoot.
> >>
> >> Ways to fix it:
> >> 1) have FIP use its own work queue.
> >> 2) separately flush the FIP work queue while not holding rtnl_lock.
> >> 3) go back to using a separate mutex for fcoe create/delete, but
> >>     use rtnl_lock for the hostlist to protect the notification.
> >> 4) something better?
> > 
> > What about switching from flush_work to cancel_work_sync?  We don't care
> > if the queued work gets run or not, just that it's off the queue.
> No, we really need it flushed so it doesn't call up into libfc after
> we free the lport.
> > From what I can tell, flush_work attempts to move the work struct to the
> > front of the queue in order to prevent these kinds of deadlocks.  But,
> > the worker thread might already by blocked on the mutex before
> > flush_work gets called.
> > 
> > cancel_work_sync will wait for completion if that work struct is
> > currently being run, otherwise it will delete it off the queue.  So it
> > either gets flushed, or if the work thread is blocked we just remove our
> > work items and keep going.
> > 
> > ... testing ...
> > 
> > Which seems to fix the linkwatch problem, but there are more issues
> > buried in here.  If we flush fip->recv_work under rtnl, it has the
> > potential to call fip->update_mac and fcoe_update_src_mac also wants to
> > take rtnl.
> Yeah.  All this is leading me to thing the solution 3 above is the best.

I do prefer having a separate mutex to a custom work queue (your option
1), just because it seems silly to create more kernel threads for this.

The main issue with that was calling destroy from the netdev notifier
where rtnl is already held.  I solved that before by using schedule_work
to defer the fcoe destroy operation.  I think that was Rob's main
objection.  And thinking about it now I was probably missing a
flush/cancel in a few places; what happens if the netdev is removed, we
queue the destroy, and then before it runs either a user initiated
destroy or an fcoe module remove happens?

More information about the devel mailing list