Bug #1863110 “2/3 snat namespace transitions to master” : Bugs : neutron

Revision history for this message

Slawek Kaplonski (slaweq) wrote on 2020-02-15:

#1

Hi,

It's keepalived process which decides which node is master and which is backup. Can You check in keepalived logs - maybe there is some info about what is the reason of such problem.

tags:

added: l3-ha

Revision history for this message

Brian Haley (brian-haley) wrote on 2020-02-17:

#2

I think we've seen this with an old version of keepalived, can you verify you have a new(er) version?

Revision history for this message

Marek Grudzinski (ivve) wrote on 2020-02-19:

#3

Hello Slawek & Brian,

Yes but logs say nothing of value and I don't understand why both are becoming master. Will paste keepalived logs below.

The version of keepalived is bundled with the kolla l3 neutron agent, the version of the neutron is 14.0.2 and keepalived below. I could try to repackage the container with a newer custom version. Do you have any recommendation on versions to use?

ii keepalived 1:1.3.9-1ubuntu0.18.04.2 amd64 Failover and monitoring daemon for LVS clusters

Revision history for this message

Marek Grudzinski (ivve) wrote on 2020-02-19:

#4

Download full text (7.1 KiB)

2020-02-18 08:21:27.455 3129610 INFO neutron.common.config [-] Logging enabled!
2020-02-18 08:21:27.455 3129610 INFO neutron.common.config [-] /var/lib/kolla/venv/bin/neutron-keepalived-state-change version 14.0.2
2020-02-18 08:21:27.456 3129610 DEBUG neutron.common.config [-] command line: /var/lib/kolla/venv/bin/neutron-keepalived-state-change --router_id=abfd6fde-a4b5-436e-9eb4-7da3ee926279 --namespace=snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279
--conf_dir=/var/lib/neutron/ha_confs/abfd6fde-a4b5-436e-9eb4-7da3ee926279 --monitor_interface=ha-5952f8d5-dd --monitor_cidr=169.254.0.92/24 --pid_file=/var/lib/neutron/external/pids/abfd6fde-a4b5-436e-9eb4-7da3ee926279.monitor.pid.neutron-keepalived-state-change-monitor --state_path=/var/lib/neutron --user=42435 --group=42435 setup_logging /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/common/config.py:103
2020-02-18 08:21:27.463 3129726 DEBUG neutron.agent.common.async_process [-] Launching async process [ip netns exec snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279 ip -o monitor address]. start /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/common/async_process.py:112
2020-02-18 08:21:27.464 3129726 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'ip', '-o', 'monitor', 'address'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:87
2020-02-18 08:21:27.472 3129726 DEBUG neutron.agent.linux.utils [-] Found cmdline ['ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'ip', '-o', 'monitor', 'address'] for rocess with PID 3129727. get_cmdline_from_pid /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:339
2020-02-18 08:21:28.473 3129726 DEBUG neutron.agent.linux.utils [-] Found cmdline ['ip', '-o', 'monitor', 'address'] for rocess with PID 3129727. get_cmdline_from_pid /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:339
Process runs with uid/gid: 42435/42435
Running privsep helper: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpVR89Yr/privsep.sock']
Spawned new privsep daemon via rootwrap
Accepted privsep connection to /tmp/tmpVR89Yr/privsep.sock
privsep daemon starting
privsep process running with uid/gid: 0/0
privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none
privsep daemon running as pid 3129957
privsep log: /var/lib/kolla/venv/local/lib/python2.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
privsep log: """)
Initial status of router abfd6fde-a4b5-436e-9eb4-7da3ee926279 is backup
Wrote router abfd6fde-a4b5-436e-9eb4-7da3ee926279 state master
Notified agent router abfd6fde-a4b5...

2020-02-18 08:21:27.455 3129610 INFO neutron.common.config [-] Logging enabled!
2020-02-18 08:21:27.455 3129610 INFO neutron.common.config [-] /var/lib/kolla/venv/bin/neutron-keepalived-state-change version 14.0.2
2020-02-18 08:21:27.456 3129610 DEBUG neutron.common.config [-] command line: /var/lib/kolla/venv/bin/neutron-keepalived-state-change --router_id=abfd6fde-a4b5-436e-9eb4-7da3ee926279 --namespace=snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279
--conf_dir=/var/lib/neutron/ha_confs/abfd6fde-a4b5-436e-9eb4-7da3ee926279 --monitor_interface=ha-5952f8d5-dd --monitor_cidr=169.254.0.92/24 --pid_file=/var/lib/neutron/external/pids/abfd6fde-a4b5-436e-9eb4-7da3ee926279.monitor.pid.neutron-keepalived-state-change-monitor --state_path=/var/lib/neutron --user=42435 --group=42435 setup_logging /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/common/config.py:103
2020-02-18 08:21:27.463 3129726 DEBUG neutron.agent.common.async_process [-] Launching async process [ip netns exec snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279 ip -o monitor address]. start /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/common/async_process.py:112
2020-02-18 08:21:27.464 3129726 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'ip', '-o', 'monitor', 'address'] create_process /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:87
2020-02-18 08:21:27.472 3129726 DEBUG neutron.agent.linux.utils [-] Found cmdline ['ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'ip', '-o', 'monitor', 'address'] for rocess with PID 3129727. get_cmdline_from_pid /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:339
2020-02-18 08:21:28.473 3129726 DEBUG neutron.agent.linux.utils [-] Found cmdline ['ip', '-o', 'monitor', 'address'] for rocess with PID 3129727. get_cmdline_from_pid /var/lib/kolla/venv/local/lib/python2.7/site-packages/neutron/agent/linux/utils.py:339
Process runs with uid/gid: 42435/42435
Running privsep helper: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'privsep-helper', '--privsep_context', 'neutron.privileged.default', '--privsep_sock_path', '/tmp/tmpVR89Yr/privsep.sock']
Spawned new privsep daemon via rootwrap
Accepted privsep connection to /tmp/tmpVR89Yr/privsep.sock
privsep daemon starting
privsep process running with uid/gid: 0/0
privsep process running with capabilities (eff/prm/inh): CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/CAP_DAC_OVERRIDE|CAP_DAC_READ_SEARCH|CAP_NET_ADMIN|CAP_SYS_ADMIN/none
privsep daemon running as pid 3129957
privsep log: /var/lib/kolla/venv/local/lib/python2.7/site-packages/psycopg2/__init__.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: <http://initd.org/psycopg/docs/install.html#binary-install-from-pypi>.
privsep log:   """)
Initial status of router abfd6fde-a4b5-436e-9eb4-7da3ee926279 is backup
Wrote router abfd6fde-a4b5-436e-9eb4-7da3ee926279 state master
Notified agent router abfd6fde-a4b5-436e-9eb4-7da3ee926279, state master
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'sg-0fe6aa27-fe', '-c', '1', '-w', '1.5', '10.13.38.68']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.187']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.214']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.214']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'sg-0fe6aa27-fe', '-c', '1', '-w', '1.5', '10.13.38.68']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.187']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'sg-0fe6aa27-fe', '-c', '1', '-w', '1.5', '10.13.38.68']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.187']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.214']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'sg-0fe6aa27-fe', '-c', '1', '-w', '1.5', '10.13.38.68']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.187']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.214']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'sg-0fe6aa27-fe', '-c', '1', '-w', '1.5', '10.13.38.68']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.187']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'sg-0fe6aa27-fe', '-c', '1', '-w', '1.5', '10.13.38.68']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-U', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.214']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.187']
Running command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'snat-abfd6fde-a4b5-436e-9eb4-7da3ee926279', 'arping', '-A', '-I', 'qg-1a97376b-5d', '-c', '1', '-w', '1.5', '10.25.1.214']

Revision history for this message

LIU Yulong (dragon889) wrote on 2020-02-19:

#5

So I guess maybe the VRRP heartbeats were dropped between the hosts for their LVS clusters.
Could you peast the default security group rules of these cluster hosts?
Or the port security or allowed address pair settings?

Revision history for this message

Marek Grudzinski (ivve) wrote on 2020-02-19:

#6

Hello Liu Yulong,

This issue is regarding physical nodes acting openstack controller nodes. They do not have any security group rules.
Besides this happens ca ~50% of the times when a large stack is created with multiple instances using snat rather than flips. Any form of firewall issue would result in consistent errors.
It seems that keepalived does not always respect the option nopreemt and releases master during setup of the 3 snat namespaces, even if it transitions to master first.

neutron

2/3 snat namespace transitions to master

Bug Description

Other bug subscribers

Remote bug watches