Activity log for bug #1764982

Date Who What changed Old value New value Message
2018-04-18 09:12:46 Talat Batheesh bug added bug
2018-04-18 09:13:02 Talat Batheesh summary machine stuck and bonding not working well when nvmet_rdma module is loaded [bionic] machine stuck and bonding not working well when nvmet_rdma module is loaded
2018-04-18 09:13:23 Talat Batheesh bug added subscriber Noa Spanier
2018-04-18 09:30:10 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2018-04-18 09:30:12 Ubuntu Kernel Bot tags bionic
2018-04-20 18:14:45 Joseph Salisbury linux (Ubuntu): importance Undecided High
2018-04-20 18:14:52 Joseph Salisbury nominated for series Ubuntu Bionic
2018-04-20 18:14:52 Joseph Salisbury bug task added linux (Ubuntu Bionic)
2018-04-20 18:15:15 Joseph Salisbury linux (Ubuntu Bionic): status Incomplete Triaged
2018-04-23 17:37:54 Joseph Salisbury linux (Ubuntu Bionic): assignee Joseph Salisbury (jsalisbury)
2018-04-23 17:37:58 Joseph Salisbury linux (Ubuntu Bionic): status Triaged In Progress
2018-05-07 21:33:41 Joseph Salisbury description Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario: # modprobe nvmet_rdma # modprobe -r bonding # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0 # ifdown eth4 # ifdown eth5 # ip addr add 15.209.12.173/8 dev bond0 # ip link set bond0 up # echo +eth5 > /sys/class/net/bond0/bonding/slaves # echo +eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth4 > /sys/class/net/bond0/bonding/slaves # echo -eth5 > /sys/class/net/bond0/bonding/slaves # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max Gurtovoy <maxg@mellanox.com> Date: Wed Feb 28 13:12:38 2018 +0200 nvmet-rdma: Don't flush system_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the system work queue. Reviewed-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = { static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(&device_list_mutex); + list_for_each_entry(ndev, &device_list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(&device_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */ mutex_lock(&nvmet_rdma_queue_mutex); list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list, queue_list) { commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a Author: Max Gurtovoy <maxg@mellanox.com> Date: Wed Feb 28 13:12:39 2018 +0200 nvme-rdma: Don't flush delete_wq by default during remove_one The .remove_one function is called for any ib_device removal. In case the removed device has no reference in our driver, there is no need to flush the work queue. Reviewed-by: Israel Rukshin <israelr@mellanox.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index f5f460b..250b277 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2024,6 +2024,20 @@ static struct nvmf_transport_ops nvme_rdma_transport = { static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data) { struct nvme_rdma_ctrl *ctrl; + struct nvme_rdma_device *ndev; + bool found = false; + + mutex_lock(&device_list_mutex); + list_for_each_entry(ndev, &device_list, entry) { + if (ndev->dev == ib_device) { + found = true; + break; + } + } + mutex_unlock(&device_list_mutex); + + if (!found) + return; /* Delete all controllers using this device */ mutex_lock(&nvme_rdma_ctrl_mutex); == SRU Justification == This bug causes the machine to get stuck and bonding to not work when the nvmet_rdma module is loaded. Both of these commits are in mainline as of v4.17-rc1. == Fixes == a3dd7d0022c3 ("nvmet-rdma: Don't flush system_wq by default during remove_one") 9bad0404ecd7 ("nvme-rdma: Don't flush delete_wq by default during remove_one") == Regression Potential == Low. Limited to nvme driver and tested by Mellanox. == Test Case == A test kernel was built with these patches and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Original Bug Description == Hi Machine stuck after unregistering bonding interface when the nvmet_rdma module is loading. scenario:  # modprobe nvmet_rdma  # modprobe -r bonding  # modprobe bonding -v mode=1 miimon=100 fail_over_mac=0  # ifdown eth4  # ifdown eth5  # ip addr add 15.209.12.173/8 dev bond0  # ip link set bond0 up  # echo +eth5 > /sys/class/net/bond0/bonding/slaves  # echo +eth4 > /sys/class/net/bond0/bonding/slaves  # echo -eth4 > /sys/class/net/bond0/bonding/slaves  # echo -eth5 > /sys/class/net/bond0/bonding/slaves  # echo -bond0 > /sys/class/net/bonding_masters dmesg: kernel: [78348.225556] bond0 (unregistering): Released all slaves kernel: [78358.339631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78368.419621] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78378.499615] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78388.579625] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78398.659613] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78408.739655] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78418.819634] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78428.899642] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78438.979614] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78449.059619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78459.139626] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78469.219623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78479.299619] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78489.379620] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78499.459623] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78509.539631] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 kernel: [78519.619629] unregister_netdevice: waiting for bond0 to become free. Usage count = 2 The following upstream commits that fix this issue commit a3dd7d0022c347207ae931c753a6dc3e6e8fcbc1 Author: Max Gurtovoy <maxg@mellanox.com> Date: Wed Feb 28 13:12:38 2018 +0200     nvmet-rdma: Don't flush system_wq by default during remove_one     The .remove_one function is called for any ib_device removal.     In case the removed device has no reference in our driver, there     is no need to flush the system work queue.     Reviewed-by: Israel Rukshin <israelr@mellanox.com>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>     Reviewed-by: Sagi Grimberg <sagi@grimberg.me>     Signed-off-by: Keith Busch <keith.busch@intel.com>     Signed-off-by: Jens Axboe <axboe@kernel.dk> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index aa8068f..a59263d 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1469,8 +1469,25 @@ static struct nvmet_fabrics_ops nvmet_rdma_ops = {  static void nvmet_rdma_remove_one(struct ib_device *ib_device, void *client_data)  {         struct nvmet_rdma_queue *queue, *tmp; + struct nvmet_rdma_device *ndev; + bool found = false; + + mutex_lock(&device_list_mutex); + list_for_each_entry(ndev, &device_list, entry) { + if (ndev->device == ib_device) { + found = true; + break; + } + } + mutex_unlock(&device_list_mutex); + + if (!found) + return; - /* Device is being removed, delete all queues using this device */ + /* + * IB Device that is used by nvmet controllers is being removed, + * delete all queues using this device. + */         mutex_lock(&nvmet_rdma_queue_mutex);         list_for_each_entry_safe(queue, tmp, &nvmet_rdma_queue_list,                                  queue_list) { commit 9bad0404ecd7594265cef04e176adeaa4ffbca4a Author: Max Gurtovoy <maxg@mellanox.com> Date: Wed Feb 28 13:12:39 2018 +0200     nvme-rdma: Don't flush delete_wq by default during remove_one     The .remove_one function is called for any ib_device removal.     In case the removed device has no reference in our driver, there     is no need to flush the work queue.     Reviewed-by: Israel Rukshin <israelr@mellanox.com>     Signed-off-by: Max Gurtovoy <maxg@mellanox.com>     Reviewed-by: Sagi Grimberg <sagi@grimberg.me>     Signed-off-by: Keith Busch <keith.busch@intel.com>     Signed-off-by: Jens Axboe <axboe@kernel.dk> diff --git a/drivers/nvme/host/rdma.c b/drivers/nvme/host/rdma.c index f5f460b..250b277 100644 --- a/drivers/nvme/host/rdma.c +++ b/drivers/nvme/host/rdma.c @@ -2024,6 +2024,20 @@ static struct nvmf_transport_ops nvme_rdma_transport = {  static void nvme_rdma_remove_one(struct ib_device *ib_device, void *client_data)  {         struct nvme_rdma_ctrl *ctrl; + struct nvme_rdma_device *ndev; + bool found = false; + + mutex_lock(&device_list_mutex); + list_for_each_entry(ndev, &device_list, entry) { + if (ndev->dev == ib_device) { + found = true; + break; + } + } + mutex_unlock(&device_list_mutex); + + if (!found) + return;         /* Delete all controllers using this device */         mutex_lock(&nvme_rdma_ctrl_mutex);
2018-05-23 09:08:34 Stefan Bader linux (Ubuntu Bionic): status In Progress Fix Committed
2018-05-24 18:06:13 Brad Figg tags bionic bionic verification-needed-bionic
2018-06-07 12:22:17 Talat Batheesh tags bionic verification-needed-bionic bionic verification-done-bionic
2018-06-11 15:08:06 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-1092
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-3639
2018-06-11 15:08:06 Launchpad Janitor cve linked 2018-8087
2018-06-14 12:16:29 Launchpad Janitor linux (Ubuntu): status In Progress Fix Released