[Open-FCoE] deadlock in testing (Re: [RFC PATCH V3 11/13] fcoe: use RTNL mutex to serialize create/destroy and protect hostlist)
jeykholt at cisco.com
Fri Jul 10 00:39:24 UTC 2009
Chris Leech wrote:
> On Thu, Jul 09, 2009 at 04:44:02PM -0700, Joe Eykholt wrote:
>> Chris Leech wrote:
>>> On Thu, Jul 09, 2009 at 03:47:57PM -0700, Joe Eykholt wrote:
>>>> Chris Leech wrote:
>>>>> I ran a parallel create/destroy/remove test overnight and something
>>>>> deadlocked. Running with lockdep enabled gives a reproducible warning,
>>>>> but I'm having trouble making sense of it. I'm not sure I understand
>>>>> what the "events" lock is here.
>>>> I'm not sure why it says "events" either. I think it has something
>>>> to do with flush_work() calling lock_map_acquire/release to indicate
>>>> that the work items it will wait on may need locks.
>>>> I see one problem, though. fcoe_ctlr_destroy() is doing a
>>>> flush_work and it uses the general work thread. So does
>>>> linkwatch_event(), which needs rtnl_lock(). So the flush_work()
>>>> may hang forever if there's a linkwatch_event queued. Shoot.
>>>> Ways to fix it:
>>>> 1) have FIP use its own work queue.
>>>> 2) separately flush the FIP work queue while not holding rtnl_lock.
>>>> 3) go back to using a separate mutex for fcoe create/delete, but
>>>> use rtnl_lock for the hostlist to protect the notification.
>>>> 4) something better?
>>> What about switching from flush_work to cancel_work_sync? We don't care
>>> if the queued work gets run or not, just that it's off the queue.
>> No, we really need it flushed so it doesn't call up into libfc after
>> we free the lport.
>>> From what I can tell, flush_work attempts to move the work struct to the
>>> front of the queue in order to prevent these kinds of deadlocks. But,
>>> the worker thread might already by blocked on the mutex before
>>> flush_work gets called.
>>> cancel_work_sync will wait for completion if that work struct is
>>> currently being run, otherwise it will delete it off the queue. So it
>>> either gets flushed, or if the work thread is blocked we just remove our
>>> work items and keep going.
>>> ... testing ...
>>> Which seems to fix the linkwatch problem, but there are more issues
>>> buried in here. If we flush fip->recv_work under rtnl, it has the
>>> potential to call fip->update_mac and fcoe_update_src_mac also wants to
>>> take rtnl.
>> Yeah. All this is leading me to thing the solution 3 above is the best.
> I do prefer having a separate mutex to a custom work queue (your option
> 1), just because it seems silly to create more kernel threads for this.
> The main issue with that was calling destroy from the netdev notifier
> where rtnl is already held. I solved that before by using schedule_work
> to defer the fcoe destroy operation. I think that was Rob's main
> objection. And thinking about it now I was probably missing a
> flush/cancel in a few places; what happens if the netdev is removed, we
> queue the destroy, and then before it runs either a user initiated
> destroy or an fcoe module remove happens?
Those would have to flush the event, as you mentioned.
I forgot that you're now trying to destroy from the notifier.
Is that really necessary? I can see how it's handy, but making
the administrator shut down the fcoe instances or do 'rmmod fcoe'
isn't so bad and might actually prevent accidents.
I don't think there's a clean way of doing the destroy from the notifier
because of this flush issue and the callbacks you mentioned. What could
be possible is to disassociate with the netdev somehow and let the
destroy happen asynchronously later or just leave fcoe around but
If we did arrange to do a work item to do the destroy, would that
let the rmmod of the NIC driver work? Does rmmod wait long enough
for the work item to do the destroy? I guess you must've tried this
and had it work.
I'd vote for scrapping the ability to unload the NIC driver while fcoe is
More information about the devel