Comment 5 for bug 1556549

Revision history for this message
Rahul Sharma (rahulsharmaait) wrote :

Hi Matt,

As I debugged further, the issue occurred because the openvswitch service was failing on that particular node with the error:
ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)\n

However, since new instance creation requests were coming in again and again, it kept on creating qvb and qvo interfaces successfully but failed after that I guess. For example, in logs it states that it failed for this request:-

2016-03-08 01:34:45.083 4444 INFO nova.virt.libvirt.driver [req-c78b194d-6fa0-4cc2-8751-0c9d3fc43bea c665814ae07a4f71b666d04fcb99c2e9 a0288bedbb884e07bc0c602e7a343de8 - - -] [instance: ce125391-b07f-4100-8046-51b982c17553] Creating image
2016-03-08 01:35:03.595 4444 ERROR nova.network.linux_net [req-c78b194d-6fa0-4cc2-8751-0c9d3fc43bea c665814ae07a4f71b666d04fcb99c2e9 a0288bedbb884e07bc0c602e7a343de8 - - -] Unable to execute ['ovs-vsctl', '--timeout=120', '--', '--if-exists', 'del-port', u'qvo2188d93e-29', '--', 'add-port', 'br-int', u'qvo2188d93e-29', '--', 'set', 'Interface', u'qvo2188d93e-29', u'external-ids:iface-id=2188d93e-2945-4f11-80d8-525e8d81957b', 'external-ids:iface-status=active', u'external-ids:attached-mac=fa:16:3e:2d:51:19', 'external-ids:vm-uuid=ce125391-b07f-4100-8046-51b982c17553']. Exception: Unexpected error while running command.
Command: sudo nova-rootwrap /etc/nova/rootwrap.conf ovs-vsctl --timeout=120 -- --if-exists del-port qvo2188d93e-29 -- add-port br-int qvo2188d93e-29 -- set Interface qvo2188d93e-29 external-ids:iface-id=2188d93e-2945-4f11-80d8-525e8d81957b external-ids:iface-status=active external-ids:attached-mac=fa:16:3e:2d:51:19 external-ids:vm-uuid=ce125391-b07f-4100-8046-51b982c17553
Exit code: 1
Stdout: u''
Stderr: u'ovs-vsctl: unix:/var/run/openvswitch/db.sock: database connection failed (Protocol error)\n'
2016-03-08 01:35:03.596 4444 ERROR nova.compute.manager [req-c78b194d-6fa0-4cc2-8751-0c9d3fc43bea c665814ae07a4f71b666d04fcb99c2e9 a0288bedbb884e07bc0c602e7a343de8 - - -] [instance: ce125391-b07f-4100-8046-51b982c17553] Instance failed to spawn

However, if I check for qvo2188d93e-29, it is still present:-
[root@compute-42 rahul]# ifconfig qvo2188d93e-29
qvo2188d93e-29: flags=4419<UP,BROADCAST,RUNNING,PROMISC,MULTICAST> mtu 9000
        inet6 fe80::dcf4:caff:fef0:8e5 prefixlen 64 scopeid 0x20<link>
        ether de:f4:ca:f0:08:e5 txqueuelen 1000 (Ethernet)
        RX packets 15 bytes 1206 (1.1 KiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 8 bytes 648 (648.0 B)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Due to this, the compute node ended up having more than 350+ qvo/qvb pairs. I added this bug since this behavior seems to mess up the compute node, though the openvswitch is unable to connect to the database. Also, neutron-agent is seen as up in this case.

Please find logs for nova-compute attached.