Compute node confuses eth interface number during deploy

Bug #1394466 reported by Sergey Galkin
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Sebastian Kalinowski
5.1.x
Fix Committed
High
Sebastian Kalinowski

Bug Description

astute_sha: 65eb911c38afc0e23d187772f9a05f703c685896
auth_required: true
build_id: 2014-11-18_22-00-23
build_number: '114'
feature_groups:
- mirantis
fuellib_sha: 5a5275370b33ab3b9a403728a1c7ad173289e4a0
fuelmain_sha: e556f0e1b00c30ec5c4b374ca2878c047c8686c2
nailgun_sha: b0add09c4361fee8fc70637c9a6ef42fbe738abe
ostf_sha: 82465a94eed4eff1fc8d8e1f2fb7e9993c22f068
production: docker
release: '6.0'

Steps to reproduce
1. Start deployment with 3 controllers in HA, and 97 compute nodes with ceph on CentOS with Neutron GRE

Deployment was failed because one node goes to offline

Log from jenkins job
01:31:27 2014-11-20 01:31:27,681 - python.http - DEBUG - PUT [{'interfaces': [{u'name': u'eth0', u'state': u'up', u'mac': u'00:25:90:e3:43:08', u'max_speed': 1000, u'current_speed': 1000, u'assigned_networks': [{u'id': 2, u'name': u'public'}], u'type': u'ether', u'id': 182}, {u'name': u'eth1', u'state': u'up', u'mac': u'00:25:90:e3:43:09', u'max_speed': 1000, u'current_speed': 1000, u'assigned_networks': [], u'type': u'ether', u'id': 181}, {u'name': u'eth2', u'state': u'up', u'mac': u'0c:c4:7a:1d:f1:e4', u'max_speed': 10000, u'current_speed': 10000, u'assigned_networks': [{u'id': 1, u'name': u'fuelweb_admin'}, {u'id': 3, u'name': u'management'}, {u'id': 4, u'name': u'storage'}], u'type': u'ether', u'id': 180}, {u'name': u'eth3', u'state': u'up', u'mac': u'0c:c4:7a:1d:f1:e5', u'max_speed': 10000, u'current_speed': 10000, u'assigned_networks': [], u'type': u'ether', u'id': 179}], 'id': 45}] to http://172.16.44.10:8000/api/nodes/interfaces

Output from Fuel
[root@fuel init.d]# fuel nodes | grep error
45 | error | compute_77 | 1 | 10.20.1.42 | 0c:c4:7a:1d:f1:e4 | ceph-osd, compute | | False | 1

As we see on attached screenshot from IPMI eth0 и eth1 confuses with eth2 and eth3

Tags: scale
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Sergey Galkin (sgalkin)
summary: - Compute node confuses number of eth interfaces during deploy
+ Compute node confuses eth interface number during deploy
Revision history for this message
Mike Scherbakov (mihgen) wrote :

I've heard about same behavior (that node goes offline, and deployment fails) from a few sources already. Sergey, thanks for catching the interfaces issue! I hope we will be able to find a way to fix it in 6.0.

Changed in fuel:
milestone: none → 6.0
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Passing to fuel-python as Sergey Galkin wrote me saying he couldn't do it himself

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel Python Team (fuel-python)
Revision history for this message
Sergey Galkin (sgalkin) wrote :
Revision history for this message
Dima Shulyak (dshulyak) wrote :

I see the reason why node is offline, netcfg/choose_interface for cobbler is misconfigured:

  'netcfg/choose_interface': '00:25:90:e3:43:08'

it should map to 0c:c4:7a:1d:f1:e4, we can fix it,

but i am struggling to understand right now,
what will happen with network configuration (which is done by l23network module)

Changed in fuel:
status: New → Confirmed
Revision history for this message
Dima Shulyak (dshulyak) wrote :

We need to change code at:

  https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/network/manager.py#L682-L690

to perform matching always in lower case

Changed in fuel:
status: Confirmed → Triaged
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Sebastian Kalinowski (prmtl)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/136331

no longer affects: fuel/5.1.x
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/136331
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=dfe79d35a47a28873f271131cf2fdae7052e0713
Submitter: Jenkins
Branch: master

commit dfe79d35a47a28873f271131cf2fdae7052e0713
Author: Sebastian Kalinowski <email address hidden>
Date: Fri Nov 21 14:06:43 2014 +0100

    Always match lowercase MAC

    In some places we are making comparisons between MACs
    from different sources like:
    * from DB - those are lowercase strings
    * from other inputs - where we do not know if MAC will
      be lowercase, uppercase (or mixed) string
    We need to always lowercase string in DB queries.

    Also new utils function was created: is_same_mac to compare
    if MACs are equal. It uses netaddr.EUI internally.

    Change-Id: Idae5bc6009f857e72712afac04ffe174e73c7a87
    Closes-Bug: #1394466

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/136804

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/5.1)

Reviewed: https://review.openstack.org/136804
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=24b956d739e8e9d8f728701522c6fa8364526c45
Submitter: Jenkins
Branch: stable/5.1

commit 24b956d739e8e9d8f728701522c6fa8364526c45
Author: Sebastian Kalinowski <email address hidden>
Date: Fri Nov 21 14:06:43 2014 +0100

    Always match lowercase MAC

    In some places we are making comparisons between MACs
    from different sources like:
    * from DB - those are lowercase strings
    * from other inputs - where we do not know if MAC will
      be lowercase, uppercase (or mixed) string
    We need to always lowercase string in DB queries.

    Also new utils function was created: is_same_mac to compare
    if MACs are equal. It uses netaddr.EUI internally.

    Change-Id: Idae5bc6009f857e72712afac04ffe174e73c7a87
    Closes-Bug: #1394466
    (cherry picked from commit dfe79d35a47a28873f271131cf2fdae7052e0713)

Revision history for this message
Leontiy Istomin (listomin) wrote :

Please, take a look at the screenshot and snapshot. The bug has been reproduced again.
Ubuntu+HA+Neutron-gre+Ceph-for-all
controllers: 3
computes: 47

[root@fuel ~]# fuel --fuel-version
api: '1.0'
astute_sha: f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0
auth_required: true
build_id: 2015-01-25_20-50-01
build_number: '48'
feature_groups:
- mirantis
fuellib_sha: 9aa913096fb93ea4847ee14bfaf33597326886f3
fuelmain_sha: 0c4bec04c2ff7c75170108dd342cbd804a110678
nailgun_sha: c0a6ed9319048e907d7f696398262108b319bf11
ostf_sha: 3b57985d4d2155510894a1f6d03b478b201f7780
production: docker
release: 6.0.1
release_versions:
  2014.2-6.0.1:
    VERSION:
      api: '1.0'
      astute_sha: f7cda2171b0b677dfaeb59693d980a2d3ee4c3e0
      build_id: 2015-01-25_20-50-01
      build_number: '48'
      feature_groups:
      - mirantis
      fuellib_sha: 9aa913096fb93ea4847ee14bfaf33597326886f3
      fuelmain_sha: 0c4bec04c2ff7c75170108dd342cbd804a110678
      nailgun_sha: c0a6ed9319048e907d7f696398262108b319bf11
      ostf_sha: 3b57985d4d2155510894a1f6d03b478b201f7780
      production: docker
      release: 6.0.1

screenshot is attached
https://drive.google.com/a/mirantis.com/file/d/0Bx4ptZV1Jt7hRkItWmRjcmVGQ28/view?usp=sharing

Revision history for this message
Łukasz Oleś (loles) wrote :

Snapshot from env

Revision history for this message
Alexander Nevenchannyy (anevenchannyy) wrote :

Hi, folks.

This looks like problems with udev.
Please remove /etc/udev/rules.d/70-persistent-net.rules from node where this bug are reproduced and restart host. Repeat this 5 times with screenshots of ifconfig output.
If i'm right we will see a spontaneous change ethX binding to MAC address.

Revision history for this message
arogusskiy (arogusskiy) wrote :

Aleksander ( <email address hidden> ) suspects that was udev

Revision history for this message
Leontiy Istomin (listomin) wrote :

I performed some test:
fresh installation of fuel. 25 baremetal nodes have been discovered. On one of the nodes I could see the following:
http://paste.openstack.org/show/162677/
Then I cleaned /etc/udev/rules.d/70-persistent-net.rules file and rebooted the node. the 2d and 4th interfaces were swapped:
http://paste.openstack.org/show/162678/

fuel version: http://paste.openstack.org/show/162679/

Revision history for this message
Leontiy Istomin (listomin) wrote :

ok, previous test doesn't make sense. Another one:

fresh installation of fuel. 25 baremetal nodes have been discovered and deployed. On one of the compute nodes I could see the following:
http://paste.openstack.org/show/162747/
Then I cleaned /etc/udev/rules.d/70-persistent-net.rules file and rebooted the node. I cleaned and rebooted 5 times:
http://paste.openstack.org/show/162748/
http://paste.openstack.org/show/162749/
http://paste.openstack.org/show/162753/
http://paste.openstack.org/show/162755/
http://paste.openstack.org/show/162760/

No changes

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/6.0.x
Revision history for this message
Alexander Evseev (aevseev) wrote :

Same (?) bug with Fuel 6.1:

# fuel --fuel-version
DEPRECATION WARNING: /etc/fuel/client/config.yaml exists and will be used as the source for settings. This behavior is deprecated. Please specify the path to your custom settings file in the FUELCLIENT_CUSTOM_SETTINGS environment variable.
api: '1.0'
astute_sha: 04ebab96d57b0e8acbf2d7f3ba05e4fbf31b741e
auth_required: true
build_id: 2015-04-29_07-55-19
build_number: '361'
feature_groups:
- mirantis
fuel-library_sha: 0e5b82d24853304befb22145ac4aaf3545d295e1
fuel-ostf_sha: b38602c841deaa03ddffc95c02f319360462cbe3
fuelmain_sha: ee112acfdd0f9017ef40be53e8e51bb5c429e97c
nailgun_sha: e660b1c09d7d4d07bdd48d424ce9aed3b6facd6e
openstack_version: 2014.2.2-6.1
production: docker
python-fuelclient_sha: 8cd6cf575d3c101dee1032abb6877dfa8487e077
release: '6.1'
release_versions:
  2014.2.2-6.1:
    VERSION:
      api: '1.0'
      astute_sha: 04ebab96d57b0e8acbf2d7f3ba05e4fbf31b741e
      build_id: 2015-04-29_07-55-19
      build_number: '361'
      feature_groups:
      - mirantis
      fuel-library_sha: 0e5b82d24853304befb22145ac4aaf3545d295e1
      fuel-ostf_sha: b38602c841deaa03ddffc95c02f319360462cbe3
      fuelmain_sha: ee112acfdd0f9017ef40be53e8e51bb5c429e97c
      nailgun_sha: e660b1c09d7d4d07bdd48d424ce9aed3b6facd6e
      openstack_version: 2014.2.2-6.1
      production: docker
      python-fuelclient_sha: 8cd6cf575d3c101dee1032abb6877dfa8487e077
      release: '6.1'

All nodes virtualized by KVM: Fuel-master, 1 Controller and 1 Compute. Neutron+VLAN, Ceph for all, Fedora LT kernel.

Removing UDEV rules and string with HWADDR in /etc/sysconfig/network-scripts/ifcfg-eth0 not matters.

Revision history for this message
Alexander Evseev (aevseev) wrote :

Deployment proceeded after running on both nodes "dhclient eth0 eth1"

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.