Hi
Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading.
scenario:
# modprobe nvmet_rdma
# modprobe -r bonding
# modprobe bonding -v mode=1 miimon=100 fail_over_mac=0
# ifdown eth4
# ifdown eth5
# ip addr add 15.209.12.173/8 dev bond0
# ip link set bond0 up
# echo +eth5 > /sys/class/net/bond0/bonding/slaves
# echo +eth4 > /sys/class/net/bond0/bonding/slaves
# echo -eth4 > /sys/class/net/bond0/bonding/slaves
# echo -eth5 > /sys/class/net/bond0/bonding/slaves
# echo -bond0 > /sys/class/net/bonding_masters
dmesg:
kernel: [78348.225556] bond0 (unregistering): Released all slaves
kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2
The following upstream commits that fix this issue
commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1
Author: Max Gurtovoy <email address hidden>
Date: Wed Feb 28 13:12:38 2018 +0200
nvmet-rdma: Don't flush system_wq by default during remove_one
The .remove_one function is called for any ib_device removal.
In case the removed device has no reference in our driver, there
is no need to flush the system work queue.
Reviewed-by: Israel Rukshin <email address hidden>
Signed-off-by: Max Gurtovoy <email address hidden>
Reviewed-by: Sagi Grimberg <email address hidden>
Signed-off-by: Keith Busch <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
- /* Device is being removed, delete all queues using this device */
+ /*
+ * IB Device that is used by nvmet controllers is being removed,
+ * delete all queues using this device.
+ */ mutex_lock(&nvmet_rdma_queue_mutex); list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list, queue_list) {
commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a
Author: Max Gurtovoy <email address hidden>
Date: Wed Feb 28 13:12:39 2018 +0200
nvme-rdma: Don't flush delete_wq by default during remove_one
The .remove_one function is called for any ib_device removal.
In case the removed device has no reference in our driver, there
is no need to flush the work queue.
Reviewed-by: Israel Rukshin <email address hidden>
Signed-off-by: Max Gurtovoy <email address hidden>
Reviewed-by: Sagi Grimberg <email address hidden>
Signed-off-by: Keith Busch <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
Hi
Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading.
scenario:
# modprobe nvmet_rdma net/bond0/ bonding/ slaves net/bond0/ bonding/ slaves net/bond0/ bonding/ slaves net/bond0/ bonding/ slaves net/bonding_ masters
# modprobe -r bonding
# modprobe bonding -v mode=1 miimon=100 fail_over_mac=0
# ifdown eth4
# ifdown eth5
# ip addr add 15.209.12.173/8 dev bond0
# ip link set bond0 up
# echo +eth5 > /sys/class/
# echo +eth4 > /sys/class/
# echo -eth4 > /sys/class/
# echo -eth5 > /sys/class/
# echo -bond0 > /sys/class/
dmesg:
kernel: [78348.225556] bond0 (unregistering): Released all slaves netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2 netdevice: waiting for bond0 to become free. Usage count = 2
kernel: [78358.339631] unregister_
kernel: [78368.419621] unregister_
kernel: [78378.499615] unregister_
kernel: [78388.579625] unregister_
kernel: [78398.659613] unregister_
kernel: [78408.739655] unregister_
kernel: [78418.819634] unregister_
kernel: [78428.899642] unregister_
kernel: [78438.979614] unregister_
kernel: [78449.059619] unregister_
kernel: [78459.139626] unregister_
kernel: [78469.219623] unregister_
kernel: [78479.299619] unregister_
kernel: [78489.379620] unregister_
kernel: [78499.459623] unregister_
kernel: [78509.539631] unregister_
kernel: [78519.619629] unregister_
The following upstream commits that fix this issue
commit a3dd7d0022c3472 07ae931c753a6dc 3e6e8fcbc1
Author: Max Gurtovoy <email address hidden>
Date: Wed Feb 28 13:12:38 2018 +0200
nvmet-rdma: Don't flush system_wq by default during remove_one
The .remove_one function is called for any ib_device removal.
In case the removed device has no reference in our driver, there
is no need to flush the system work queue.
Reviewed-by: Israel Rukshin <email address hidden>
Signed-off-by: Max Gurtovoy <email address hidden>
Reviewed-by: Sagi Grimberg <email address hidden>
Signed-off-by: Keith Busch <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
diff --git a/drivers/ nvme/target/ rdma.c b/drivers/ nvme/target/ rdma.c nvme/target/ rdma.c nvme/target/ rdma.c remove_ one(struct ib_device *ib_device, void *client_data) &device_ list_mutex) ; each_entry( ndev, &device_list, entry) { &device_ list_mutex) ;
index aa8068f..a59263d 100644
--- a/drivers/
+++ b/drivers/
@@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {
static void nvmet_rdma_
{
struct nvmet_rdma_queue *queue, *tmp;
+ struct nvmet_rdma_device *ndev;
+ bool found = false;
+
+ mutex_lock(
+ list_for_
+ if (ndev->device == ib_device) {
+ found = true;
+ break;
+ }
+ }
+ mutex_unlock(
+
+ if (!found)
+ return;
- /* Device is being removed, delete all queues using this device */
mutex_ lock(&nvmet_ rdma_queue_ mutex);
list_for_ each_entry_ safe(queue, tmp, &nvmet_ rdma_queue_ list,
queue_ list) {
+ /*
+ * IB Device that is used by nvmet controllers is being removed,
+ * delete all queues using this device.
+ */
commit 9bad0404ecd7594 265cef04e176ade aa4ffbca4a
Author: Max Gurtovoy <email address hidden>
Date: Wed Feb 28 13:12:39 2018 +0200
nvme-rdma: Don't flush delete_wq by default during remove_one
The .remove_one function is called for any ib_device removal.
In case the removed device has no reference in our driver, there
is no need to flush the work queue.
Reviewed-by: Israel Rukshin <email address hidden>
Signed-off-by: Max Gurtovoy <email address hidden>
Reviewed-by: Sagi Grimberg <email address hidden>
Signed-off-by: Keith Busch <email address hidden>
Signed-off-by: Jens Axboe <email address hidden>
diff --git a/drivers/ nvme/host/ rdma.c b/drivers/ nvme/host/ rdma.c nvme/host/ rdma.c nvme/host/ rdma.c remove_ one(struct ib_device *ib_device, void *client_data) &device_ list_mutex) ; each_entry( ndev, &device_list, entry) { &device_ list_mutex) ;
index f5f460b..250b277 100644
--- a/drivers/
+++ b/drivers/
@@ -2024,6 +2024,20 @@ static struct nvmf_transport_ops nvme_rdma_transport = {
static void nvme_rdma_
{
struct nvme_rdma_ctrl *ctrl;
+ struct nvme_rdma_device *ndev;
+ bool found = false;
+
+ mutex_lock(
+ list_for_
+ if (ndev->dev == ib_device) {
+ found = true;
+ break;
+ }
+ }
+ mutex_unlock(
+
+ if (!found)
+ return;
/* Delete all controllers using this device */
mutex_ lock(&nvme_ rdma_ctrl_ mutex);