tripleo

Ocata -> Pike upgrade of an environment with ceph nodes failure caused by failing 'Check legacy Ceph hieradata' task on OSD nodes

Bug #1756363 reported by Marius Cornea on 2018-03-16

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Won't Fix	High	Unassigned	tripleo stein-rc1

Bug Description

Description of problem:
Ocata -> Pike upgrade of an environment with ceph nodes failure caused by failing 'Check legacy Ceph hieradata' task on OSD nodes

Version-Release number of selected component (if applicable):

How reproducible:
100%

Steps to Reproduce:
1. Deploy Ocata with 3 controller + 2 computes + 3 ceph nodes

timeout 100m openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \
-e /home/stack/virt/network/network-environment-v6.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
--log-file overcloud_deployment_81.log

Content of: /home/stack/virt/internal.yaml

parameter_defaults:
    CinderEnableIscsiBackend: false
    CinderEnableRbdBackend: true
    CinderEnableNfsBackend: false
    NovaEnableRbdBackend: true
    GlanceBackend: rbd
    CinderRbdPoolName: "volumes"
    NovaRbdPoolName: "vms"
    GlanceRbdPoolName: "images"
    ExtraConfig:
      ceph::profile::params::osd_pool_default_pg_num: 32
      ceph::profile::params::osd_pool_default_pgp_num: 32
      ceph::profile::params::osds:
       '/dev/vdb': {}

2. Run major upgrade composable steps:

openstack overcloud deploy \
--templates /usr/share/openstack-tripleo-heat-templates \
--stack overcloud \
--libvirt-type kvm \
--ntp-server clock.redhat.com \
-e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml \
-e /home/stack/virt/internal.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation-v6.yaml \
-e /home/stack/virt/network/network-environment-v6.yaml \
-e /home/stack/virt/enable-tls.yaml \
-e /home/stack/virt/inject-trust-anchor.yaml \
-e /home/stack/virt/public_vip.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/tls-endpoints-public-ip.yaml \
-e /home/stack/virt/hostnames.yml \
-e /home/stack/virt/debug.yaml \
-e /home/stack/virt/nodes_data.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-composable-steps-docker.yaml \
-e /home/stack/ceph-ansible-env.yaml \
-e /home/stack/extraconfig_override.yaml \
-e /home/stack/docker-osp12.yaml \

Note the /home/stack/extraconfig_override.yaml content that overrides ExtraConfig:

parameter_defaults:
    ExtraConfig:
        ceph::profile::params::osd_pool_default_pgp_num: 32
        ceph::profile::params::osd_pool_default_pg_num: 32

Actual results:
Upgrade fails caused by failing 'Check legacy Ceph hieradata' task on OSD nodes

overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step0.1:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: a9b9c88b-b88c-4a5a-8bb2-30911c9540e1
  status: CREATE_FAILED
  status_reason: |
    Error: resources[1]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2
  deploy_stdout: |
    ...
    TASK [Gathering Facts] *********************************************************
    ok: [localhost]

    TASK [Check legacy Ceph hieradata] *********************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.094349", "end": "2018-03-02 21:01:47.052440", "msg": "non-zero return code", "rc": 1, "start": "2018-03-02 21:01:46.958091", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
     to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/39bb3320-0bd6-4365-8f0a-a0ca7674142c_playbook.retry

PLAY RECAP *********************************************************************
localhost : ok=1 changed=0 unreachable=0 failed=1

(truncated, view all with --long)
deploy_stderr: |

overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step0.0:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: e83240df-2a7c-4ea9-b8b0-3edab4a0c4c9
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
  deploy_stdout: |
None
  deploy_stderr: |
None
overcloud.AllNodesDeploySteps.CephStorageUpgrade_Step0.2:
  resource_type: OS::Heat::SoftwareDeployment
  physical_resource_id: 15e04b1d-bdd3-405d-9440-61face8ef484
  status: CREATE_FAILED
  status_reason: |
    CREATE aborted
  deploy_stdout: |
    ...
    TASK [Gathering Facts] *********************************************************
    ok: [localhost]

    TASK [Check legacy Ceph hieradata] *********************************************
    fatal: [localhost]: FAILED! => {"changed": true, "cmd": "test \"nil\" == \"$(hiera -c /etc/puppet/hiera.yaml ceph::profile::params::osds)\"", "delta": "0:00:00.097766", "end": "2018-03-02 21:01:47.675402", "msg": "non-zero return code", "rc": 1, "start": "2018-03-02 21:01:47.577636", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
     to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/cead8171-a86d-4c4c-9597-a7d960de0170_playbook.retry

PLAY RECAP *********************************************************************
localhost : ok=1 changed=0 unreachable=0 faileHeat Stack update failed.
Heat Stack update failed.
d=1

(truncated, view all with --long)
deploy_stderr: |

Expected results:
Upgrade succeeds.

Additional info:

It looks that the ExtraConfig override didn't get applied on the nodes:

cat controller-0/etc/puppet/hieradata/extraconfig.json
{
    "ceph::profile::params::osd_pool_default_pg_num": 32,
    "ceph::profile::params::osd_pool_default_pgp_num": 32,
    "ceph::profile::params::osds": {
        "/dev/vdb": {}
    }
}

cat ceph-0/etc/puppet/hieradata/extraconfig.json
{
    "ceph::profile::params::osd_pool_default_pg_num": 32,
    "ceph::profile::params::osd_pool_default_pgp_num": 32,
    "ceph::profile::params::osds": {
        "/dev/vdb": {}
    }

Tags:

Revision history for this message

Marius Cornea (mcornea) wrote on 2018-03-16:

I tried instead of overriding ExtraConfig in an additional environment file to remove the old hieradata from the existing environment file and this allowed me to move forward.

Original Environment file containing hiera:

Adjusted file used during upgrade which allows the upgrade to pass:

parameter_defaults:
    CinderEnableIscsiBackend: false
    CinderEnableRbdBackend: true
    CinderEnableNfsBackend: false
    NovaEnableRbdBackend: true
    GlanceBackend: rbd
    CinderRbdPoolName: "volumes"
    NovaRbdPoolName: "vms"
    GlanceRbdPoolName: "images"
    ExtraConfig: {}
    CephPoolDefaultPgNum: 32
    CephAnsibleDisksConfig:
        devices:
            - '/dev/vdb'

Emilien Macchi (emilienm) on 2018-03-18

Changed in tripleo:
milestone:	none → rocky-1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-03-21: Related fix merged to tripleo-upgrade (stable/pike)

Reviewed: https://review.openstack.org/553572
Committed: https://git.openstack.org/cgit/openstack/tripleo-upgrade/commit/?id=07e6d2673660ecada04233136c2ee611449c0b0d
Submitter: Zuul
Branch: stable/pike

commit 07e6d2673660ecada04233136c2ee611449c0b0d
Author: Marius Cornea <email address hidden>
Date: Thu Mar 15 14:04:10 2018 -0400

Remove ceph osd hieradata during upgrade

    Overriding ExtraConfig in an extra environment file is not working
    as expected and stale hieradata is still remaining on the overcloud
    nodes. This removes the hieradata from existing environment file
    instead of overriding ExtraConfig.

Related-bug: 1756363

Change-Id: I4915f48b6711add3d6f013c375e08ca1b0d0b22e

tags:

added: in-stable-pike

Alex Schultz (alex-schultz) on 2018-04-20

Changed in tripleo:
milestone:	rocky-1 → rocky-2

Emilien Macchi (emilienm) on 2018-06-05

Changed in tripleo:
milestone:	rocky-2 → rocky-3

Emilien Macchi (emilienm) on 2018-07-26

Changed in tripleo:
milestone:	rocky-3 → rocky-rc1

Alex Schultz (alex-schultz) on 2018-08-14

Changed in tripleo:
milestone:	rocky-rc1 → stein-1

Juan Antonio Osorio Robles (juan-osorio-robles) on 2018-10-30

Changed in tripleo:
milestone:	stein-1 → stein-2

Emilien Macchi (emilienm) on 2019-01-13

Changed in tripleo:
milestone:	stein-2 → stein-3

Alex Schultz (alex-schultz) on 2019-03-14

Changed in tripleo:
milestone:	stein-3 → stein-rc1

Revision history for this message

Lukas Bezdicka (social-b) wrote on 2019-04-05:

Fix in tripleo-upgrade is proper process. User has to unset ExtraConfig by specifying it to {}. Overriding in Upgrade could pose a risk.

Changed in tripleo:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.