l3 agent HA communication failure

Bug #1648823 reported by Ian Banks
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

An openstack environment was built using Openstack-Ansible (OSA) on Mitaka with the neutron_l3_agent in HA mode. This was functioning correctly using network namespaces for routers. Within the namespace keeplived created an 'ha' virtual interface to track the status of the other instance of the virtual router. This worked correctly, the 'ha' virtual interface within 'master' router namespace could ping the 'ha' virtual interface within the 'backup' router namespace, and when the master went offline keepalived would successfully transition to master and bring up the virtual IP addresses with then network namespace virtual router.

We upgraded the environment to newton via the guide at http://docs.openstack.org/developer/openstack-ansible/newton/upgrade-guide/manual-upgrade.html. After this was done the network namespace virtual routers (specifically the 'ha' track interfaaces) were no longer able to communicate with each other, resulting in them both transitioning to 'master' and bringing up duplicate IP addresses. This caused intermittent connectivity to public floating IPs and also from the routers to instances over VXLAN network.

******** l3_agent.ini configuration ********

# General
[DEFAULT]
verbose = True
debug = False

# While this option is deprecated in Liberty, if we remove it then it takes
# a default value of 'br-ex', which we do not want. We therefore leave it
# in place for now and can remove it in Mitaka.
external_network_bridge =
gateway_external_network_id =

use_namespaces = True
router_delete_namespaces = True

# Drivers
interface_driver = neutron.agent.linux.interface.BridgeInterfaceDriver

# Agent mode (legacy only)
agent_mode = legacy

# Conventional failover
allow_automatic_l3agent_failover = True

# HA failover
ha_confs_path = /var/lib/neutron/ha_confs
ha_vrrp_advert_int = 2
ha_vrrp_auth_password = bee916a2589b14dd7f
ha_vrrp_auth_type = PASS
handle_internal_only_routers = False
send_arp_for_ha = 3

# Metadata
enable_metadata_proxy = True

******** keepalived.conf configuration ********

vrrp_instance VR_1 {
    state BACKUP
    interface ha-42c56d27-10
    virtual_router_id 1
    priority 50
    garp_master_delay 60
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass bee916a2589b14dd7f
    }
    track_interface {
        ha-42c56d27-10
    }
    virtual_ipaddress {
        169.254.0.1/24 dev ha-42c56d27-10
    }
    virtual_ipaddress_excluded {
        10.0.0.1/8 dev qr-8deaf807-bb
        xx.xx.xx.xx/22 dev qg-6e4ebe51-94
        xx.xx.xx.xx/32 dev qg-6e4ebe51-94
        xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qg-6e4ebe51-94 scope link
        xxxx::xxxx:xxxx:xxxx:xxxx/64 dev qr-8deaf807-bb scope link
    }
    virtual_routes {
        0.0.0.0/0 via xx.xx.xx.xx dev qg-6e4ebe51-94
    }
}

Tags: l3-ha
Revision history for this message
James Denton (james-denton) wrote :

Do you know if l2population is in use here after the upgrade? Are only the HA networks affected, or are you also unable to ping from the router or DHCP namespaces to the instances?

tags: added: l3-ha
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.