[Open-FCoE] deadlock in testing (Re: [RFC PATCH V3 11/13] fcoe: use RTNL mutex to serialize create/destroy and protect hostlist)
jeykholt at cisco.com
Mon Jul 13 22:52:39 UTC 2009
Chris Leech wrote:
> On Thu, Jul 09, 2009 at 05:39:24PM -0700, Joe Eykholt wrote:
>> I forgot that you're now trying to destroy from the notifier.
>> Is that really necessary? I can see how it's handy, but making
>> the administrator shut down the fcoe instances or do 'rmmod fcoe'
>> isn't so bad and might actually prevent accidents.
>> I don't think there's a clean way of doing the destroy from the notifier
>> because of this flush issue and the callbacks you mentioned. What could
>> be possible is to disassociate with the netdev somehow and let the
>> destroy happen asynchronously later or just leave fcoe around but
>> If we did arrange to do a work item to do the destroy, would that
>> let the rmmod of the NIC driver work? Does rmmod wait long enough
>> for the work item to do the destroy? I guess you must've tried this
>> and had it work.
>> I'd vote for scrapping the ability to unload the NIC driver while fcoe is
>> using it.
> The main problem I see with this is the belief that you can prevent the
> netdev from going away (the device itself, not the module). Even with
> the driver module referenced to prevent unloading, it is possible to hot
> remove the device itself.
Since the netdev is refcounted, I would think fcoe could hold onto it
at least for a while after the ixgbe driver detaches? It seems like
dev_queue_xmit and other netdev routines will handle things correctly.
One problem would be that it might not be findable by name anymore
so the fcoe instance couldn't be destroyed via /sys. So, I guess we
should solve it as you're trying to do.
> Regardless, the locking around trying to solve all of these issues is
> certainly more complicated than I thought at first. I think it might be
> worth a recap of what's been learned recently.
> 1) The RTNL mutex cannot be held over the entire path of an FCoE
> interface destroy, because the destroy path needs to flush several work
> queue items.
> The exact rules to keep in mind here are:
> a) You can never call flush_scheduled_work() or flush_work() on the
> default/event queue while holding RTNL. The linkwatch events take RTNL
> in the workqueue thread.
> b) You can call cancel_work_sync() holding RTNL, if the work item you
> are canceling does not take RTNL.
> "b" doesn't help us because the FIP receive work might call
> fcoe_update_src_mac() which takes RTNL.
> 2) Using a new mutex in the fcoe module does allow for proper managing
> of create, destroy, module load and module unload. If it were just
> these cases, it might be possible to remove the hostlist lock and rely
> on just this mutex.
Agreed. I like using rtnl_lock for the hostlist lock, just so the netdev
notifier doesn't need another lock. But would does run into problem 1a)
above if you want to do synchronous destroy there, but you can't
because of a) below.
> This still leaves some open issues in trying to handle netdev
> unregistered events:
> a) Because the netdev notifier is called with RTNL held, the interface
> destroy path cannot be called directly from here.
All we need to do is cease using the interface. We can destroy the
instance later. The netdev layer should (IMO) protect against that.
I looked a bit at the code, and it seems like it would work to hold
the netdev from fcoe_create to either fcoe_destroy or the unregister
notification. What do other L2 protocols do when the device goes away?
> b) If some sort of delayed destroy were used here, the device should at
> least be removed from the hostlist right away to prevent racing between
> multiple destroy calls from different causes.
> That causes the hostlist
> lock to remain separate from the new create/destroy mutex in order to
> prevent circular locking issues with RTNL.
Right. One of the main reasons for a create/destroy mutex is to keep
create from happening during module unload.
> c) The destroy can't be made asynchronous using schedule_work(), because
> it then attempts to flush the very workqueue thread it's running in. It
> might be possible if it is safe to change all those flushes to
Agreed. I don't know how hard it would be to make all of the work
cancel-able, but that's a good thing to look into.
An alternative would be to refcount the fcoe instance, and do a get when
scheduling a work item, and a put in the work function when finished.
That way the instance would stay around until the work completes.
The work item would have to take a lock (rtnl_lock) and make
sure the instance is still "alive".
Then we could do the flush on module unload after dropping the locks,
and on other fcoe_destroys we wouldn't have to flush before returning.
It seems unnecessarily complex but I haven't thought of anything simpler yet.
More information about the devel