Incorrect admin iface auto-assignment

Bug #1557580 reported by Aleksandr Didenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Aleksey Kasatkin
7.0.x
Fix Released
High
Alexey Stupnikov
8.0.x
Fix Released
High
Alexey Stupnikov

Bug Description

Nodes from non-default nodegroup that are connected to DHCP/PXE network not via first interface ($interfaces[0], which in Fuel-7.0 and older is "eth0") are misconfigured during provisioning. This is happening because nailgun sends wrong provisioning info to fuel-agent via astute. Example:

Node meta where we can see eth4 is used for PXE
        "bus_info": "0000:02:00.0",
        "current_speed": 1000,
        "driver": "tg3",
        "ip": "192.168.1.53",
        "mac": "99:55:aa:55:dd:cc",
        "max_speed": 1000,
        "name": "eth4",
        "netmask": "255.255.255.0",

Provisioning info:
2016-03-15T08:37:51 debug: [732] 9f1764ce-18d8-4090-bb28-7edd5aa398d2: uploading provision data: {"profile":"ubuntu_1404_x86_64",
...
"eth4":{"static":"0","mac_address":"99:55:aa:55:dd:cc"},
"eth0":{"ip_address":"192.168.1.53","dns_name":"node2.domain.tld,"netmask":"255.255.255.0","static":"0","mac_address":"55:bb:00:99:55:44"}

It looks like this is happening because of double network assignment in DB, fuelweb_admin network from non-default nodegroup gets assigned to eth4 and eth0 in the example:

network_groups
 id | name
 24 | fuelweb_admin

node_nic_interfaces
 id | node_id | name | mac
 384 | 51 | eth0 | 55:bb:00:99:55:44
 380 | 51 | eth4 | 99:55:aa:55:dd:cc

net_nic_assignments
 id | network_id | interface_id
 413 | 24 | 380
 418 | 24 | 384

Steps to reproduce:
1. Create new env with one additional node group
2. Setup virtual machines in that second node group (virtual rack) to use non-eth0 (not first) interface for PXE
3. Setup dhcrelay to bootstrap nodes from non-default node group
4. Add 1 controller and 1 compute from non-default nodegroup
5. Deploy changes

Expected result:
Deployment successfull

Actual result:
Nodes stuck on provisioning and then timeout/errror

Found on Fuel-7.0 GA, but should affect 8.0 and master as well.

description: updated
tags: added: customer-found
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/292994

Changed in fuel:
status: New → In Progress
Revision history for this message
Aleksandr Didenko (adidenko) wrote : Re: Wrong admin interface for nodes from non-default nodegroup and non-eth0 PXE interface

Here's how it looks like in UI on 8.0 GA version

Revision history for this message
Aleksandr Didenko (adidenko) wrote :

I've adapted the fix proposed by Aleksey to 8.0 and installed it on a running Fuel node. After it I was able to successfully provision node with admin net on interface number 3 (the node from the screenshot above). So fix works.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/292994
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=2ad1ef0e8dcfc5f68178ddf99dbed6285cff8850
Submitter: Jenkins
Branch: master

commit 2ad1ef0e8dcfc5f68178ddf99dbed6285cff8850
Author: Aleksey Kasatkin <email address hidden>
Date: Tue Mar 15 17:39:11 2016 +0200

    Do not include Admin network twice

    Admin network was included into networks list twice in case of using
    multiple node groups. Because of that the allocation algorithm had
    an assumption that Admin network was not allocated to NIC and allocated
    it twice, fisrt time - to the right NIC, second - to the first NIC always.
    The issue arrived when node was in non-default node group and was booted
    not from the first NIC.
    It is fixed now.

    Closes-Bug: #1557580
    Change-Id: I64a9ecdad1e68e4f0819940f0266699c394e9caf

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Alexey Stupnikov (astupnikov) wrote : Re: Wrong admin interface for nodes from non-default nodegroup and non-eth0 PXE interface

STEPS-TO-REPRODUCE (you should probably use bare-metal lab):
  - install fuel controller on server with 2 NICs
  - create new environment
  - add another node group, configure added networks
  - connect slave node to PXE network in second node group and boot slave node from its second NIC

Result: according to 'fuel node --node-id 1 --network --download ' command output, fuelweb_admin network is assigned to multiple slave's NICs

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/295832

Revision history for this message
Alexey Stupnikov (astupnikov) wrote : Re: Wrong admin interface for nodes from non-default nodegroup and non-eth0 PXE interface

I have manually confirmed patch for MOS 8.0 environment.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Message #5 describes steps to reproduce for MOS 8.0. 7.0 case is a bit more complicated. It is hard to describe them here, I have all config files, so please contact me if you need them.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/296511

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/7.0)

Reviewed: https://review.openstack.org/296511
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=fa493c391b0e4e7922c5b3af5ec2607b81d4dd1c
Submitter: Jenkins
Branch: stable/7.0

commit fa493c391b0e4e7922c5b3af5ec2607b81d4dd1c
Author: Aleksey Kasatkin <email address hidden>
Date: Tue Mar 15 17:39:11 2016 +0200

    Do not include Admin network twice

    Admin network was included into networks list twice in case of using
    multiple node groups. Because of that the allocation algorithm had
    an assumption that Admin network was not allocated to NIC and allocated
    it twice, first time - to the right NIC, second - to the first NIC always.
    The issue arrived when node was in non-default node group and was PXE
    booted not from the first NIC.
    It is fixed now.

    Conflicts:
     nailgun/nailgun/network/manager.py

    Closes-Bug: #1557580
    Change-Id: I64a9ecdad1e68e4f0819940f0266699c394e9caf

tags: added: on-verification
Revision history for this message
Alexander Gromov (agromov) wrote : Re: Wrong admin interface for nodes from non-default nodegroup and non-eth0 PXE interface

Verified on MOS 7.0 + mu3

Revision history for this message
Alexander Gromov (agromov) wrote :

Steps to reproduce (as described above):
  - install fuel controller on server with 2 NICs
  - create new environment
  - add another node group, configure added networks
  - connect slave node to PXE network in second node group and boot slave node from its second NIC

tags: removed: on-verification
summary: - Wrong admin interface for nodes from non-default nodegroup and non-eth0
- PXE interface
+ Incorrect admin iface auto-assignment
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-docs (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/305755

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-docs (master)

Reviewed: https://review.openstack.org/305755
Committed: https://git.openstack.org/cgit/openstack/fuel-docs/commit/?id=5560bb841039fbf9bbd1136ed2cd7f625b4f2c67
Submitter: Jenkins
Branch: master

commit 5560bb841039fbf9bbd1136ed2cd7f625b4f2c67
Author: Evgeny Konstantinov <email address hidden>
Date: Thu Apr 14 14:00:51 2016 +0300

    Add fixed double admin network allocation to nic to fuel mitaka relnotes

    Change-Id: I4825bb4fe9e6109335f84c6cd90c93841e97ef80
    Related-Bug: #1557580

tags: added: on-verification
tags: removed: on-verification
tags: added: on-verification
Revision history for this message
Aleksandr Didenko (adidenko) wrote :

Verified on RC2 (9.0 #495)

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/8.0)

Reviewed: https://review.openstack.org/295832
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=87b40cbaeacf564473f818374fca7d302f819388
Submitter: Jenkins
Branch: stable/8.0

commit 87b40cbaeacf564473f818374fca7d302f819388
Author: Aleksey Kasatkin <email address hidden>
Date: Tue Mar 15 17:39:11 2016 +0200

    Do not include Admin network twice

    Admin network was included into networks list twice in case of using
    multiple node groups. Because of that the allocation algorithm had
    an assumption that Admin network was not allocated to NIC and allocated
    it twice, fisrt time - to the right NIC, second - to the first NIC always.
    The issue arrived when node was in non-default node group and was booted
    not from the first NIC. This issue was fixed.

    Closes-Bug: #1557580
    Change-Id: I64a9ecdad1e68e4f0819940f0266699c394e9caf

tags: added: on-verification
Revision history for this message
Vladimir Jigulin (vjigulin) wrote :

Reproduced on 8.0
Verified on 8.0+mu3 proposed

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.