[Open-FCoE] libfc: exchange circular lock dependency detected.

Vasu Dev vasu.dev at linux.intel.com
Fri Jul 10 23:21:36 UTC 2009


On Fri, 2009-07-10 at 15:21 -0700, Joe Eykholt wrote:
> This is with the latest fcoe-next.git tree.
> 
> I see this circular dependency message when running my tests.  I haven't
> completely figured it out, but I think there must be cases where ex lock
> is held while em lock is acquired, 

That is the case of fc_exch_em_alloc() for new exch allocation, the exch
lock is held along with em lock while adding new exch to exches array.

> and that's a problem.

This shouldn't be the issue here since fc_exch_em_alloc() is the only
case when these two locks held together, so no possibility of circular
locking due to any messed up order of taking these two locks together
somewhere else.

> It may also be caused by flushing fc_exch_timeout() work while holding the
> em lock it sometimes needs.
> 
> I also noticed that fc_exch_timeout is calling fc_exch_rrq while holding
> an ex_lock and that does a new fc_exch_alloc() ... which gets the em_lock,
> and a different ex_lock so that maybe what it's complaining about.
> 

Yeah this what this log trace is suggesting and I also got same trace
with my exch pool patches ( pasted below, that also suggested exact
above call flow could lead to circular locking on exch_lock). However I
don't see how since fc_exch_rrq will be called with exch lock held but
that same exch's lock won't be held again when sending out fc_exch_rrq
in that call flow. The exch lock held before fc_exch_rrq and during
fc_exch_seq_send() in that flow are on two different exch instances but
perhaps checking engine is deducing as if same exch lock is held twice
on same exch instance. So perhaps false alarm from lock checking engine
or I'm missing something here, but just to avoid this nasty warning,
perhaps exch lock should be dropped before calling fc_exch_rrq() but I
need to think more about that since that might have some other issues.

Here is similar log with exch pool patches which coincidentally happened
today and I was wondering perhaps it was with my patches but your
response confirms that wasn't the case as it happens without them also.

 =======================================================
[239459.637422] [ INFO: possible circular locking dependency detected ]
[239459.637426] 2.6.31-rc2-upstream #3
[239459.637428] -------------------------------------------------------
[239459.637431] events/0/51 is trying to acquire lock:
[239459.637433]  (&pool->lock){+.-...}, at: [<ffffffffa02bf4d2>]
fc_exch_em_alloc+0x6d/0x20c [libfc]
[239459.637448]
[239459.637449] but task is already holding lock:
[239459.637451]  (&ep->ex_lock){+.-...}, at: [<ffffffffa02c0037>]
fc_exch_timeout+0x24/0xea [libfc]
[239459.637463]
[239459.637464] which lock already depends on the new lock.
[239459.637465]
[239459.637467]
[239459.637468] the existing dependency chain (in reverse order) is:
[239459.637470]
[239459.637471] -> #1 (&ep->ex_lock){+.-...}:
[239459.637476]        [<ffffffff81066e19>] check_prev_add+0x146/0x1c7
[239459.637483]        [<ffffffff810672e3>] validate_chain+0x449/0x4e3
[239459.637487]        [<ffffffff8106766f>] __lock_acquire+0x2f2/0x363
[239459.637492]        [<ffffffff810677aa>] lock_acquire+0xca/0xee
[239459.637496]        [<ffffffff8150348c>] _spin_lock_bh+0x31/0x40
[239459.637501]        [<ffffffffa02bf548>] fc_exch_em_alloc+0xe3/0x20c
[libfc]
[239459.637510]        [<ffffffffa02bf6aa>] fc_exch_alloc+0x39/0x57
[libfc]
[239459.637519]        [<ffffffffa02bfab7>] fc_exch_seq_send+0x2a/0x1cb
[libfc]
[239459.637528]        [<ffffffffa02c1002>] fc_elsct_send+0x4f3/0x509 
[239459.637537]        [<ffffffffa02c1ad4>] fc_lport_enter_flogi
+0xb1/0xc8 [libfc]
[239459.637546]        [<ffffffffa02c1dd7>] fc_linkup+0x5b/0x68 [libfc]
[239459.637556]        [<ffffffffa02d6c66>] fcoe_ctlr_link_up+0x8b/0xa4
[libfcoe]
[239459.637562]        [<ffffffffa02ddc91>] fcoe_if_create+0x403/0x444
[fcoe]
[239459.637570]        [<ffffffffa02ddd75>] fcoe_create+0xa3/0xd1 [fcoe]
[239459.637577]        [<ffffffff81054fbd>] param_attr_store+0x22/0x30
[239459.637582]        [<ffffffff81055010>] module_attr_store+0x21/0x25
[239459.637587]        [<ffffffff81127f0d>] sysfs_write_file+0xf6/0x132
[239459.637594]        [<ffffffff810d665b>] vfs_write+0xa9/0x106
[239459.637600]        [<ffffffff810d676e>] sys_write+0x45/0x69
[239459.637604]        [<ffffffff8100baeb>] system_call_fastpath
+0x16/0x1b
[239459.637611]        [<ffffffffffffffff>] 0xffffffffffffffff
[239459.637622]
[239459.637623] -> #0 (&pool->lock){+.-...}:
[239459.637627]        [<ffffffff81066d19>] check_prev_add+0x46/0x1c7
[239459.637632]        [<ffffffff810672e3>] validate_chain+0x449/0x4e3
[239459.637636]        [<ffffffff8106766f>] __lock_acquire+0x2f2/0x363
[239459.637641]        [<ffffffff810677aa>] lock_acquire+0xca/0xee
[239459.637645]        [<ffffffff8150348c>] _spin_lock_bh+0x31/0x40
[239459.637649]        [<ffffffffa02bf4d2>] fc_exch_em_alloc+0x6d/0x20c
[libfc]
[239459.637658]        [<ffffffffa02bf6aa>] fc_exch_alloc+0x39/0x57
[libfc]
[239459.637667]        [<ffffffffa02bfab7>] fc_exch_seq_send+0x2a/0x1cb
[libfc]
[239459.637675]        [<ffffffffa02bff05>] fc_exch_rrq+0xe6/0x10c
[libfc]
[239459.637684]        [<ffffffffa02c006d>] fc_exch_timeout+0x5a/0xea
[libfc]
[239459.637693]        [<ffffffff81052764>] run_workqueue+0x112/0x20c
[239459.637697]        [<ffffffff81052948>] worker_thread+0xea/0xfb
[239459.637701]        [<ffffffff81056bde>] kthread+0x85/0x8d
[239459.637706]        [<ffffffff8100cbfa>] child_rip+0xa/0x20
[239459.637710]        [<ffffffffffffffff>] 0xffffffffffffffff
[239459.637714]
[239459.637714] other info that might help us debug this:
[239459.637716]
[239459.637719] 3 locks held by events/0/51:
[239459.637721]  #0:  (events){+.+.+.}, at: [<ffffffff81052713>]
run_workqueue+0xc1/0x20c
[239459.637728]  #1:  (&(&ep->timeout_work)->work){+.+...}, at:
[<ffffffff81052713>] run_workqueue+0xc1/0x20c
[239459.637735]  #2:  (&ep->ex_lock){+.-...}, at: [<ffffffffa02c0037>]
fc_exch_timeout+0x24/0xea [libfc]
[239459.637747]
[239459.637747] stack backtrace:
[239459.637751] Pid: 51, comm: events/0 Not tainted 2.6.31-rc2-upstream
#3
[239459.637753] Call Trace:
[239459.637758]  [<ffffffff8106692d>] print_circular_bug_tail+0x71/0x7c
[239459.637763]  [<ffffffff81066d19>] check_prev_add+0x46/0x1c7
[239459.637768]  [<ffffffff810672e3>] validate_chain+0x449/0x4e3
[239459.637772]  [<ffffffff8106766f>] __lock_acquire+0x2f2/0x363
[239459.637777]  [<ffffffff810677aa>] lock_acquire+0xca/0xee
[239459.637785]  [<ffffffffa02bf4d2>] ? fc_exch_em_alloc+0x6d/0x20c
[libfc]
[239459.637792]  [<ffffffff810a66d7>] ? mempool_alloc+0x5d/0x114
[239459.637796]  [<ffffffff8150348c>] _spin_lock_bh+0x31/0x40
[239459.637804]  [<ffffffffa02bf4d2>] ? fc_exch_em_alloc+0x6d/0x20c
[libfc]
[239459.637813]  [<ffffffffa02bf4d2>] fc_exch_em_alloc+0x6d/0x20c
[libfc]
[239459.637819]  [<ffffffff813e0bb0>] ? __alloc_skb+0x41/0x160
[239459.637827]  [<ffffffffa02bf6aa>] fc_exch_alloc+0x39/0x57 [libfc]
[239459.637836]  [<ffffffffa02bfab7>] fc_exch_seq_send+0x2a/0x1cb
[libfc]
[239459.637844]  [<ffffffffa02bef86>] ? fc_exch_rrq_resp+0x0/0x104
[libfc]
[239459.637853]  [<ffffffffa02bff05>] fc_exch_rrq+0xe6/0x10c [libfc]
[239459.637861]  [<ffffffffa02c0037>] ? fc_exch_timeout+0x24/0xea
[libfc]
[239459.637870]  [<ffffffffa02c006d>] fc_exch_timeout+0x5a/0xea [libfc]
[239459.637878]  [<ffffffffa02c0013>] ? fc_exch_timeout+0x0/0xea [libfc]
[239459.637883]  [<ffffffff81052764>] run_workqueue+0x112/0x20c
[239459.637887]  [<ffffffff81052713>] ? run_workqueue+0xc1/0x20c
[239459.637891]  [<ffffffff81052948>] worker_thread+0xea/0xfb
[239459.637895]  [<ffffffff81056f1f>] ? autoremove_wake_function
+0x0/0x34
[239459.637900]  [<ffffffff8105285e>] ? worker_thread+0x0/0xfb
[239459.637904]  [<ffffffff81056bde>] kthread+0x85/0x8d
[239459.637908]  [<ffffffff8100cbfa>] child_rip+0xa/0x20
[239459.637913]  [<ffffffff8100c5bc>] ? restore_args+0x0/0x30
[239459.637917]  [<ffffffff81056b38>] ? kthreadd+0x107/0x128
[239459.637922]  [<ffffffff81056b59>] ? kthread+0x0/0x8d
[239459.637926]  [<ffffffff8100cbf0>] ? child_rip+0x0/0x20



> Is this explained by recent changes in this area or am I just seeing this due to
> changes in my configuration (I have tapes again) and/or debug features?
> 
> 
> [  116.871020] =======================================================
> [  116.872004] [ INFO: possible circular locking dependency detected ]
> [  116.872004] 2.6.31-rc2-rp7 #1
> [  116.872004] -------------------------------------------------------
> [  116.872004] events/1/16 is trying to acquire lock:
> [  116.872004]  (&mp->em_lock){+.-...}, at: [<ffffffffa00570bd>] fc_exch_alloc+0xa1/0x26e [libfc]
> [  116.872004]
> [  116.872004] but task is already holding lock:
> [  116.872004]  (&ep->ex_lock){+.-...}, at: [<ffffffffa005814a>] fc_exch_timeout+0x25/0x20f [libfc]
> [  116.872004]
> [  116.872004] which lock already depends on the new lock.
> [  116.872004]
> [  116.872004]
> [  116.872004] the existing dependency chain (in reverse order) is:
> [  116.872004]
> [  116.872004] -> #1 (&ep->ex_lock){+.-...}:
> [  116.872004]        [<ffffffff81066e10>] __lock_acquire+0xa5e/0xbe6
> [  116.872004]        [<ffffffff81067051>] lock_acquire+0xb9/0xdd
> [  116.872004]        [<ffffffff81508971>] _spin_lock_bh+0x31/0x3d
> [  116.872004]        [<ffffffffa0057138>] fc_exch_alloc+0x11c/0x26e [libfc]
> [  116.872004]        [<ffffffffa0057f79>] fc_exch_seq_send+0x2a/0x1d6 [libfc]
> [  116.872004]        [<ffffffffa005888e>] fc_elsct_send+0x4a8/0x4be [libfc]
> [  116.872004]        [<ffffffffa0059be9>] fc_lport_enter_flogi+0xb4/0xcd [libfc]
> [  116.872004]        [<ffffffffa0059eeb>] fc_linkup+0x5b/0x68 [libfc]
> [  116.872004]        [<ffffffffa006da1a>] fcoe_ctlr_link_up+0x8b/0xa4 [libfcoe]
> [  116.872004]        [<ffffffffa0074d1f>] fcoe_create+0x7ce/0x856 [fcoe]
> [  116.872004]        [<ffffffff81054e48>] param_attr_store+0x25/0x35
> [  116.872004]        [<ffffffff81054e9d>] module_attr_store+0x21/0x25
> [  116.872004]        [<ffffffff81124af2>] sysfs_write_file+0xe4/0x119
> [  116.872004]        [<ffffffff810d554c>] vfs_write+0xab/0x105
> [  116.872004]        [<ffffffff810d566a>] sys_write+0x47/0x6e
> [  116.872004]        [<ffffffff8100baeb>] system_call_fastpath+0x16/0x1b
> [  116.872004]        [<ffffffffffffffff>] 0xffffffffffffffff
> [  116.872004]
> [  116.872004] -> #0 (&mp->em_lock){+.-...}:
> [  116.872004]        [<ffffffff81066d04>] __lock_acquire+0x952/0xbe6
> [  116.872004]        [<ffffffff81067051>] lock_acquire+0xb9/0xdd
> [  116.872004]        [<ffffffff81508971>] _spin_lock_bh+0x31/0x3d
> [  116.872004]        [<ffffffffa00570bd>] fc_exch_alloc+0xa1/0x26e [libfc]
> [  116.872004]        [<ffffffffa0057f79>] fc_exch_seq_send+0x2a/0x1d6 [libfc]
> [  116.872004]        [<ffffffffa0058276>] fc_exch_timeout+0x151/0x20f [libfc]
> [  116.872004]        [<ffffffff81052600>] worker_thread+0x1fa/0x30a
> [  116.872004]        [<ffffffff81056a41>] kthread+0x88/0x90
> [  116.872004]        [<ffffffff8100cbfa>] child_rip+0xa/0x20
> [  116.872004]        [<ffffffffffffffff>] 0xffffffffffffffff
> [  116.872004]
> [  116.872004] other info that might help us debug this:
> [  116.872004]
> [  116.872004] 3 locks held by events/1/16:
> [  116.872004]  #0:  (events){+.+.+.}, at: [<ffffffff810525a9>] worker_thread+0x1a3/0x30a
> [  116.872004]  #1:  (&(&ep->timeout_work)->work){+.+...}, at: [<ffffffff810525a9>] worker_thread+0x1a3/0x30a
> [  116.872004]  #2:  (&ep->ex_lock){+.-...}, at: [<ffffffffa005814a>] fc_exch_timeout+0x25/0x20f [libfc]
> [  116.872004]
> [  116.872004] stack backtrace:
> [  116.872004] Pid: 16, comm: events/1 Not tainted 2.6.31-rc2-rp7 #1
> [  116.872004] Call Trace:
> [  116.872004]  [<ffffffff8106603c>] print_circular_bug_tail+0x71/0x7c
> [  116.872004]  [<ffffffff81066d04>] __lock_acquire+0x952/0xbe6
> [  116.872004]  [<ffffffff81067051>] lock_acquire+0xb9/0xdd
> [  116.872004]  [<ffffffffa00570bd>] ? fc_exch_alloc+0xa1/0x26e [libfc]
> [  116.872004]  [<ffffffffa0056c84>] ? fc_exch_rrq_resp+0x0/0x106 [libfc]
> [  116.872004]  [<ffffffff81508971>] _spin_lock_bh+0x31/0x3d
> [  116.872004]  [<ffffffffa00570bd>] ? fc_exch_alloc+0xa1/0x26e [libfc]
> [  116.872004]  [<ffffffffa00570bd>] fc_exch_alloc+0xa1/0x26e [libfc]
> [  116.872004]  [<ffffffffa0056c84>] ? fc_exch_rrq_resp+0x0/0x106 [libfc]
> [  116.872004]  [<ffffffffa0057f79>] fc_exch_seq_send+0x2a/0x1d6 [libfc]
> [  116.872004]  [<ffffffffa0058276>] fc_exch_timeout+0x151/0x20f [libfc]
> [  116.872004]  [<ffffffff81052600>] worker_thread+0x1fa/0x30a
> [  116.872004]  [<ffffffff810525a9>] ? worker_thread+0x1a3/0x30a
> [  116.872004]  [<ffffffff81506d7c>] ? thread_return+0x3e/0xc3
> [  116.872004]  [<ffffffffa0058125>] ? fc_exch_timeout+0x0/0x20f [libfc]
> [  116.872004]  [<ffffffff81056d84>] ? autoremove_wake_function+0x0/0x38
> [  116.872004]  [<ffffffff81052406>] ? worker_thread+0x0/0x30a
> [  116.872004]  [<ffffffff81056a41>] kthread+0x88/0x90
> [  116.872004]  [<ffffffff8100cbfa>] child_rip+0xa/0x20
> [  116.872004]  [<ffffffff8103b1f8>] ? finish_task_switch+0x3b/0xd5
> [  116.872004]  [<ffffffff8100c5bc>] ? restore_args+0x0/0x30
> [  116.872004]  [<ffffffff810569b9>] ? kthread+0x0/0x90
> [  116.872004]  [<ffffffff8100cbf0>] ? child_rip+0x0/0x20
> _______________________________________________
> devel mailing list
> devel at open-fcoe.org
> http://www.open-fcoe.org/mailman/listinfo/devel




More information about the devel mailing list