Instance can not lease ip on Centos Neutron GRE

Bug #1348649 reported by Tatyanka
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
Critical
Ilya Shakhat

Bug Description

[root@nailgun ~]# cat /etc/fuel/version.yaml
VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1"
  api: "1.0"
  build_number: "351"
  build_id: "2014-07-24_02-01-14"
  astute_sha: "fd9b8e3b6f59b2727b1b037054f10e0dd7bd37f1"
  fuellib_sha: "8bffb2a4723109614aeaabaabffa3c94a1b72705"
  ostf_sha: "81b019a502711dfcd935a981b292e88eb956b141"
  nailgun_sha: "744e17cc03207c46ecd79f4ac78fde98f75aec2f"
  fuelmain_sha: "103ce9abd6e2632ec1029d1aa3e918517417cba3

Steps to Reproduce:

1. Deploy simple cluster on centos with Neutron GRE
2. Run ostf tests

Actual result
Test for instance connectivity failed, in instance console.log we can see that instance can not get ip(se paste bellow)

http://paste.openstack.org/show/88165/

Neutron agents-list says that everything is not bad:
[root@node-1 ~]# neutron agent-list
+--------------------------------------+--------------------+--------------------------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+--------------------------+-------+----------------+
| 04731637-5c00-4ad3-aa61-4661ddb0da3a | DHCP agent | node-1.test.domain.local | :-) | True |
| 18e84e2b-aca8-4bf0-b2d5-200a3a762743 | L3 agent | node-1.test.domain.local | :-) | True |
| 6864c707-5fdf-4243-9250-40080c7e3fac | Open vSwitch agent | node-3.test.domain.local | :-) | True |
| bf66fff3-4922-4ae0-87b9-53e74e48d64d | Open vSwitch agent | node-2.test.domain.local | :-) | True |
| cc7d124d-528f-46ad-828e-f437d16399fd | Open vSwitch agent | node-1.test.domain.local | :-) | True |
| eac4f484-fefc-4d6e-af16-19a64a6b6f78 | Metadata agent | node-1.test.domain.local | :-) | True |
+--------------------------------------+--------------------+--------------------------+-------+----------------+

next namespaces or controller takes place:
[root@node-1 log]# ip netns list
qdhcp-17ddd25e-4640-4b7a-a5f9-2aaddc5f9d39
qrouter-5c171c7e-9396-41b0-8d3d-7264868c7fa8
[root@node-1 log]# ip netns exec qdhcp-17ddd25e-4640-4b7a-a5f9-2aaddc5f9d39 ip a
22: tap41bdd3fd-7b: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN
    link/ether fa:16:3e:a6:67:44 brd ff:ff:ff:ff:ff:ff
    inet 192.168.111.2/24 brd 192.168.111.255 scope global tap41bdd3fd-7b
    inet6 fe80::f816:3eff:fea6:6744/64 scope link
       valid_lft forever preferred_lft forever
23: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever

I try to restart all agent - but this no make sence

The same problem on http://jenkins-product.srt.mirantis.net:8080/job/fuel_master.centos.bvt_1/201/console

Tags: neutron
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Sergey Vasilenko (xenolog)
status: New → Confirmed
Revision history for this message
Sergey Vasilenko (xenolog) wrote :

astute.yaml contains
role: controller.

I gues, that in the manifests we have part of code, bordered by
if $primary_controller {
......
}

Changed in fuel:
assignee: Sergey Vasilenko (xenolog) → Sergey Kolekonov (kolekonov-s-deactivatedaccount-deactivatedaccount)
Changed in fuel:
assignee: Sergey Kolekonov (kolekonov-s-deactivatedaccount-deactivatedaccount) → Sergey Kolekonov (skolekonov)
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

I tried to investigate the same issue with neutron+gre

http://jenkins-product.srt.mirantis.net:8080/job/fuel_master.centos.bvt_1/201/testReport/junit/(root)/deploy_neutron_gre/deploy_neutron_gre/

and found that GRE tunnels from compute nodes to controller were down:

node-1 (controller): 10.108.7.2
node-2 (compute): 10.108.7.3
node-3 (compute): 10.108.7.4

http://paste.openstack.org/show/88190/

So I just restarted 'neutron-openvswitch-agent' on computes and tunnels to controller became UP:

http://paste.openstack.org/show/88191/

Then I restarted instance and it became accessible via private and floating IPs, OSTF test 'Check network connectivity from instance via floating IP' also passed.

Changed in fuel:
status: Confirmed → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/110140

Changed in fuel:
assignee: Sergey Kolekonov (skolekonov) → Vladimir Kuklin (vkuklin)
status: Triaged → In Progress
Changed in fuel:
importance: High → Critical
assignee: Vladimir Kuklin (vkuklin) → MOS Neutron (mos-neutron)
Changed in fuel:
assignee: MOS Neutron (mos-neutron) → Ilya Shakhat (shakhat)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

look into logs before for neutron openvswitch agent on node-2 (10.108.5.2/var/log/docker-logs/remote/node-2...).

GRE interface is being created:

<167>Jul 28 16:37:16 node-2 neutron-openvswitch-agent 2014-07-28 16:37:16.374 17387 DEBUG neutron.agent.linux.utils [-]
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', '--', '--may-exist', 'add-port', 'br-tun', 'gre-0a6c0702', '--', 'set', 'Interface', 'gre-0a6c0702', 'type=gre', 'options:remote_ip=10.108.7.2', 'options:local_ip=10.108.7.3', 'options:in_key=flow', 'options:out_key=flow']
Exit code: 0

and then tunneling bridge is destroyed.

<167>Jul 28 16:37:17 node-2 neutron-openvswitch-agent 2014-07-28 16:37:17.634 17387 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', '--', '--if-exists', 'del-br', 'br-tun'] create_process /usr/lib/python2.6/site-packages/neutron/agent/linux/utils.py:48
<167>Jul 28 16:37:17 node-2 neutron-openvswitch-agent 2014-07-28 16:37:17.669 17387 DEBUG neutron.agent.linux.ovsdb_monitor [-] Output received from ovsdb monitor: {"data":[["cf16004c-4ff5-471b-b897-ee8bb90cefe4","delete","br-tun",65534],["1eb65dfc-b0a9-411e-ac9b-0c8d6c9019d2","delete","patch-int",1],["0533c85c-e099-435e-9382-69d255f8b2c2","delete","gre-0a6c0704",3],["b391cf59-a61e-43b2-a268-969f136f4583","delete","gre-0a6c0702",2]],"headings":["row","action","name","ofport"]}
 _read_stdout /usr/lib/python2.6/site-packages/neutron/agent/linux/ovsdb_monitor.py:53
<167>Jul 28 16:37:17 node-2 neutron-openvswitch-agent 2014-07-28 16:37:17.678 17387 DEBUG neutron.agent.linux.utils [-]
Command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ovs-vsctl', '--timeout=10', '--', '--if-exists', 'del-br', 'br-tun']
Exit code: 0

this looks really ridiculous

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package neutron has been built from changeset: http://gerrit.mirantis.com/19602
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-19602/centos

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package neutron has been built from changeset: http://gerrit.mirantis.com/19602
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-19602/ubuntu

Revision history for this message
Ilya Shakhat (shakhat) wrote :

The issue is reproducible on CentOS nodes in compute roles only.

Logs show that ovs agent initializes all ovs objects, but 2 seconds after it re-creates them from scratch. GRE port however is created only initially and not recreated. The process of recreation is triggered by code that checks if there was ovs restart (this is done by monitoring existence of ovs flow inside artificial table - see https://review.openstack.org/95060). Tunnels are synched only during start of agent. The proposed fix is to sync tunnels every time ovs restart is detected.

The reason why ovs restarts is not known, but most likely the issue is in ovs itself. The issue is reproducible on specific configuration only most probably because ovs-agent starts too soon and is hit by ovs restart.

Revision history for this message
Ilya Shakhat (shakhat) wrote :
Changed in mos:
status: New → In Progress
importance: Undecided → Critical
assignee: nobody → Ilya Shakhat (shakhat)
milestone: none → 5.1
Revision history for this message
Ilya Shakhat (shakhat) wrote :
Revision history for this message
Ilya Shakhat (shakhat) wrote :
Revision history for this message
OSCI Robot (oscirobot) wrote :

Package neutron has been built from changeset: http://gerrit.mirantis.com/19605
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable-19605/centos

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package neutron has been built from changeset: http://gerrit.mirantis.com/19605
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable-19605/ubuntu

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package neutron has been built from changeset: http://gerrit.mirantis.com/19605
RPM Repository URL: http://osci-obs.vm.mirantis.net:82/centos-fuel-5.1-stable/centos

Revision history for this message
OSCI Robot (oscirobot) wrote :

Package neutron has been built from changeset: http://gerrit.mirantis.com/19605
DEB Repository URL: http://osci-obs.vm.mirantis.net:82/ubuntu-fuel-5.1-stable/ubuntu

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Vladimir Kuklin (<email address hidden>) on branch: master
Review: https://review.openstack.org/110140

no longer affects: fuel
Ilya Shakhat (shakhat)
Changed in mos:
status: In Progress → Fix Committed
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

verified

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1"
  api: "1.0"
  build_number: "377"
  build_id: "2014-07-31_02-07-54"
  astute_sha: "b16efcec6b4af1fb8669055c053fbabe188afa67"
  fuellib_sha: "9a64237c70fc464ef0b11ac7c0bad34e8c202135"
  ostf_sha: "b4c5efa51909404fd9ec1d0bbc38a31b200e1d6d"
  nailgun_sha: "2df0d1ebe4fc7e8870768176b3dd6fcffcfbe261"
  fuelmain_sha: "63d0775708b0f5fa4d6d1e09a316d9c26f7e5444"

Changed in mos:
status: Fix Committed → Fix Released
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Issue is reproduced again on next cluster configuration:
Simple, Ubuntu, Neutron GRE, 1 controller, 2 compute

Instance can't get ip - http://paste.openstack.org/show/88165/

Logs are attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in mos:
status: Fix Released → Confirmed
Revision history for this message
Ilya Shakhat (shakhat) wrote :

The new observed issue is not related to the original:
 * GRE endpoints are configured successfully
 * there is no evidence of tunnel reconfiguration

Revision history for this message
Alexander Ignatov (aignatov) wrote :

As Ilya said in previous comments the reopening of this issue is not quite correct because originally this bug was fixed and not related to new one. I found a duplicate of new issue: https://bugs.launchpad.net/mos/+bug/1352203 and consider to track all activities there.

Changed in mos:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.