[Open-FCoE] [RFC PATCH 00/28] rport state machine and locking

Robert Love robert.w.love at intel.com
Thu Sep 11 23:35:05 UTC 2008


The following series implements a change to the rport and lport state machines and
applies a locking strategy to that state machine.

1) A callback function is added to the LP for RP events (READY, FAILED, LOGO)
2) GNN_ID and GPN_ID are removed, rport state machine starts at PLOGI
   - uses a dummy rport until PLOGI response
3) A READY state is added to the RP state machine
4) The RP and LP state machines locking has been given a strategy
5) RP only has one error/retry handling routine
6) LP only has one error/retry handling routine
7) Naming conventions are changed
    rport (remote port) = fc_rport pointer
    rdata (remote port private) = fc_rport_libfc_priv pointer
    lport (local port) = fc_lport_pointer

I think these patches still need a little cleaning up, but they're mostly good.
I'm curious what everyone thinks about the folowing changes to:

a) the locking policy
b) the event callback mechanism
c) the consolidation of fc_rport_error and fc_rport_retry.

...

There is remaining work to be done:
1) Ensure dummy rports are being deleted
2) Timeout handling needs to be reviewed
3) fc_ns.c -> fc_disc.c
4) Check ptp rport creation
5) probably not cancelling work threads appropriately
6) What happens if discovery fails? Currently it's silent
7) circular locking issue

...

I'm not sure about this circular locking warning...

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.27-rc5 #144
-------------------------------------------------------
fcoethread/0/6855 is trying to acquire lock:
 (&lport->lp_mutex){--..}, at: [<ffffffffa001b2a0>] fc_lport_rport_event+0x23/0x66 [libfc]

but task is already holding lock:
 (&rdata->rp_mutex#2){--..}, at: [<ffffffffa001d361>] fc_rport_plogi_resp+0x14d/0x297 [libfc]

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&rdata->rp_mutex#2){--..}:
       [<ffffffff8024f05f>] __lock_acquire+0x117c/0x1494
       [<ffffffffa001d361>] fc_rport_plogi_resp+0x14d/0x297 [libfc]
       [<ffffffff8024f3fc>] lock_acquire+0x85/0xa9
       [<ffffffffa001d361>] fc_rport_plogi_resp+0x14d/0x297 [libfc]
       [<ffffffff80583d6a>] mutex_lock_nested+0xf6/0x25d
       [<ffffffffa001d361>] fc_rport_plogi_resp+0x14d/0x297 [libfc]
       [<ffffffffa001d361>] fc_rport_plogi_resp+0x14d/0x297 [libfc]
       [<ffffffff802351b6>] local_bh_enable_ip+0xc0/0xc3
       [<ffffffffa001d214>] fc_rport_plogi_resp+0x0/0x297 [libfc]
       [<ffffffffa0018deb>] fc_exch_recv+0x800/0xd3a [libfc]
       [<ffffffffa002a237>] fcoe_percpu_receive_thread+0x67/0x514 [fcoe]
       [<ffffffff802351b6>] local_bh_enable_ip+0xc0/0xc3
       [<ffffffffa002a693>] fcoe_percpu_receive_thread+0x4c3/0x514 [fcoe]
       [<ffffffffa002a1d0>] fcoe_percpu_receive_thread+0x0/0x514 [fcoe]
       [<ffffffff80242724>] kthread+0x47/0x73
       [<ffffffff8058506e>] trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff8020c379>] child_rip+0xa/0x11
       [<ffffffff8020b9af>] restore_args+0x0/0x30
       [<ffffffff802426b8>] kthreadd+0x189/0x1ae
       [<ffffffff802426dd>] kthread+0x0/0x73
       [<ffffffff8020c36f>] child_rip+0x0/0x11
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #1 (&rdata->rp_mutex){--..}:
       [<ffffffff8024f05f>] __lock_acquire+0x117c/0x1494
       [<ffffffffa001c084>] fc_rport_login+0x1b/0x4a [libfc]
       [<ffffffff8024f3fc>] lock_acquire+0x85/0xa9
       [<ffffffffa001c084>] fc_rport_login+0x1b/0x4a [libfc]
       [<ffffffff80583d6a>] mutex_lock_nested+0xf6/0x25d
       [<ffffffffa001c084>] fc_rport_login+0x1b/0x4a [libfc]
       [<ffffffff8024b9f9>] static_obj+0x51/0x68
       [<ffffffffa001c084>] fc_rport_login+0x1b/0x4a [libfc]
       [<ffffffffa001b26b>] fc_lport_enter_dns+0xaf/0xc1 [libfc]
       [<ffffffffa0017d8d>] fc_exch_mgr_delete_ep+0x16/0x60 [libfc]
       [<ffffffffa001b4e1>] fc_lport_flogi_resp+0x1fe/0x25a [libfc]
       [<ffffffff8024d64c>] trace_hardirqs_on_caller+0xfa/0x125
       [<ffffffffa001b2e3>] fc_lport_flogi_resp+0x0/0x25a [libfc]
       [<ffffffffa0018deb>] fc_exch_recv+0x800/0xd3a [libfc]
       [<ffffffff804e7804>] netdev_run_todo+0x1f4/0x200
       [<ffffffffa002a693>] fcoe_percpu_receive_thread+0x4c3/0x514 [fcoe]
       [<ffffffffa002a1d0>] fcoe_percpu_receive_thread+0x0/0x514 [fcoe]
       [<ffffffff80242724>] kthread+0x47/0x73
       [<ffffffff8058506e>] trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff8020c379>] child_rip+0xa/0x11
       [<ffffffff8020b9af>] restore_args+0x0/0x30
       [<ffffffff802426b8>] kthreadd+0x189/0x1ae
       [<ffffffff802426dd>] kthread+0x0/0x73
       [<ffffffff8020c36f>] child_rip+0x0/0x11
       [<ffffffffffffffff>] 0xffffffffffffffff

-> #0 (&lport->lp_mutex){--..}:
       [<ffffffff8024d971>] print_circular_bug_header+0xc8/0xcf
       [<ffffffff8024ed85>] __lock_acquire+0xea2/0x1494
       [<ffffffff8024b35c>] save_trace+0x37/0x91
       [<ffffffff8024f35e>] __lock_acquire+0x147b/0x1494
       [<ffffffff8024f3fc>] lock_acquire+0x85/0xa9
       [<ffffffffa001b2a0>] fc_lport_rport_event+0x23/0x66 [libfc]
       [<ffffffff80583d6a>] mutex_lock_nested+0xf6/0x25d
       [<ffffffffa001b2a0>] fc_lport_rport_event+0x23/0x66 [libfc]
       [<ffffffff8024d4a5>] mark_held_locks+0x5c/0x72
       [<ffffffff8043757f>] fc_remote_port_rolechg+0x52/0x173
       [<ffffffffa001b2a0>] fc_lport_rport_event+0x23/0x66 [libfc]
       [<ffffffffa001d437>] fc_rport_plogi_resp+0x223/0x297 [libfc]
       [<ffffffff802351b6>] local_bh_enable_ip+0xc0/0xc3
       [<ffffffffa001d214>] fc_rport_plogi_resp+0x0/0x297 [libfc]
       [<ffffffffa0018deb>] fc_exch_recv+0x800/0xd3a [libfc]
       [<ffffffffa002a237>] fcoe_percpu_receive_thread+0x67/0x514 [fcoe]
       [<ffffffff802351b6>] local_bh_enable_ip+0xc0/0xc3
       [<ffffffffa002a693>] fcoe_percpu_receive_thread+0x4c3/0x514 [fcoe]
       [<ffffffffa002a1d0>] fcoe_percpu_receive_thread+0x0/0x514 [fcoe]
       [<ffffffff80242724>] kthread+0x47/0x73
       [<ffffffff8058506e>] trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff8020c379>] child_rip+0xa/0x11
       [<ffffffff8020b9af>] restore_args+0x0/0x30
       [<ffffffff802426b8>] kthreadd+0x189/0x1ae
       [<ffffffff802426dd>] kthread+0x0/0x73
       [<ffffffff8020c36f>] child_rip+0x0/0x11
       [<ffffffffffffffff>] 0xffffffffffffffff

other info that might help us debug this:

1 lock held by fcoethread/0/6855:
 #0:  (&rdata->rp_mutex#2){--..}, at: [<ffffffffa001d361>] fc_rport_plogi_resp+0x14d/0x297 [libfc]

stack backtrace:
Pid: 6855, comm: fcoethread/0 Not tainted 2.6.27-rc5 #144

Call Trace:
 [<ffffffff8024db42>] print_circular_bug_tail+0xb1/0xba
 [<ffffffff8024d971>] print_circular_bug_header+0xc8/0xcf
 [<ffffffff8024ed85>] __lock_acquire+0xea2/0x1494
 [<ffffffff8024b35c>] save_trace+0x37/0x91
 [<ffffffff8024f35e>] __lock_acquire+0x147b/0x1494
 [<ffffffff8024f3fc>] lock_acquire+0x85/0xa9
 [<ffffffffa001b2a0>] fc_lport_rport_event+0x23/0x66 [libfc]
 [<ffffffff80583d6a>] mutex_lock_nested+0xf6/0x25d
 [<ffffffffa001b2a0>] fc_lport_rport_event+0x23/0x66 [libfc]
 [<ffffffff8024d4a5>] mark_held_locks+0x5c/0x72
 [<ffffffff8043757f>] fc_remote_port_rolechg+0x52/0x173
 [<ffffffffa001b2a0>] fc_lport_rport_event+0x23/0x66 [libfc]
 [<ffffffffa001d437>] fc_rport_plogi_resp+0x223/0x297 [libfc]
 [<ffffffff802351b6>] local_bh_enable_ip+0xc0/0xc3
 [<ffffffffa001d214>] fc_rport_plogi_resp+0x0/0x297 [libfc]
 [<ffffffffa0018deb>] fc_exch_recv+0x800/0xd3a [libfc]
 [<ffffffffa002a237>] fcoe_percpu_receive_thread+0x67/0x514 [fcoe]
 [<ffffffff802351b6>] local_bh_enable_ip+0xc0/0xc3
 [<ffffffffa002a693>] fcoe_percpu_receive_thread+0x4c3/0x514 [fcoe]
 [<ffffffffa002a1d0>] fcoe_percpu_receive_thread+0x0/0x514 [fcoe]
 [<ffffffff80242724>] kthread+0x47/0x73
 [<ffffffff8058506e>] trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff8020c379>] child_rip+0xa/0x11
 [<ffffffff8020b9af>] restore_args+0x0/0x30
 [<ffffffff802426b8>] kthreadd+0x189/0x1ae
 [<ffffffff802426dd>] kthread+0x0/0x73
 [<ffffffff8020c36f>] child_rip+0x0/0x11

The issue is that the rport mutex is locked when the callback to the lport occurs.
The lport event handler will lock the lport mutex. I'm not sure what to do about
this yet. I'll continue to investigate, suggestions are welcome.

All comments are welcome. If there are no major issues I'll do a little cleanup,
drop the RFC, resubmit and then commit.
-- 
//Rob



More information about the devel mailing list