N ->O upgrade: after running major-upgrade-composable-steps.yaml nova-api cannot connect to Galera on 2/3 controllers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Sofer Athlan-Guyot |
Bug Description
Reported originally there: https:/
newton -> ocata upgrade:
after running major-upgrade-
connect to MySQL on 2/3 controllers. This results in timeouts and 500
ERROR (ClientException): Unknown Error (HTTP 504) when calling the
nova api, making it very difficult to manage the nova instances prior
to upgrading the compute nodes.
Even after running the upgrade converge step 2/3 controllers cannot
reach MySQL leaving the upgraded environment in a semi working state.
Steps to Reproduce:
1. Deploy newton with 3 ctrl, 2 computes, 3 ceph nodes
2. Upgrde to ocata
Actual results:
controller-1 and controller-2 report in /var/log/
2017-03-22 17:25:43.189 377885 WARNING oslo_db.
runnnig:
mysql -u nova_api -p -h <galera_vip> -e 'show grants;'
works fine from every nodes in the cluster, so no network connectivity issue.
Changed in tripleo: | |
milestone: | ongoing → pike-1 |
So after a discussion with Damien and Michele, the problem was found.
It appears that when the nova cell is create it hardcode into the database the connection parameter present in /etc/nova/nova.conf of the node where it's run. Running it on controller0 for instance will give you this in the database:
mysql+ pymysql: //nova: c2cdagE8PyAbnpe rs3AD88Hge@ 10.0.0. 19nova? bind_address= 10.0.0. 20'
This is later used to create a connection to the database for nova cell information. This obviously fails on 2 other node as they don't have the 10.0.0.20 address.
To prevent this issue, this workaround have been done: https:/ /review. openstack. org/#/c/ 436192/ removing the bind_address parameter from the configuration line.
The sequence of event on the seems correct:
1. update hiera data;
2. create nova cell with database option;
From the journalctl logs:
Mar 22 19:23:44 overcloud- controller- 0.localdomain os-collect- config[ 4197]: [2017-03-22 19:23:44,806] (heat-config) [DEBUG] Running /usr/libexec/ heat-config/ hooks/hiera < /var/lib/ heat-config/ deployed/ e4e9bd8e- 4b7b-41da- b040-d4f563f2fd 48.json controller- 0.localdomain os-collect- config[ 4197]: [2017-03-22 19:23:44,852] (heat-config) [DEBUG] Running heat-config-notify /var/lib/ heat-config/ deployed/ e4e9bd8e- 4b7b-41da- b040-d4f563f2fd 48.json < /var/lib/ heat-config/ deployed/ e4e9bd8e- 4b7b-41da- b040-d4f563f2fd 48.notify. json
Mar 22 19:23:44 overcloud-
$ grep nova::database_ connection /var/lib/ heat-config/ deployed/ e4e9bd8e- 4b7b-41da- b040-d4f563f2fd 48.json
"nova: :database_ connection" : "mysql+ pymysql: //nova: c2cdagE8PyAbnpe rs3AD88Hge@ 10.0.0. 19/nova? read_default_ file=/etc/ my.cnf. d/tripleo. cnf&read_ default_ group=tripleo" ,
[root@ overcloud- controller- 0 e]# journalctl | grep 'nova-manage cell_v2' controller- 0.localdomain ansible- command[ 440226] : Invoked with warn=True executable=None _uses_shell=False _raw_params= nova-manage cell_v2 map_cell0 removes=None creates=None chdir=None controller- 0.localdomain ansible- command[ 440632] : Invoked with warn=True executable=None _uses_shell=True _raw_params= nova-manage cell_v2 create_cell --name='default' --database_ connection= $(hiera nova::database_ connection) removes=None creates=None chdir=None controller- 0.localdomain ansible- command[ 443480] : Invoked with warn=True executable=None _uses_shell=False _raw_params= nova-manage cell_v2 map_cell_and_hosts removes=None creates=None chdir=None controller- 0.localdomain ansible- command[ 443950] : Invoked with warn=True executable=None _uses_shell=True _raw_params= nova-manage cell_v2 list_cells | sed -e '1,3d' -e '$d' | awk -F ' *| *' '$2 == "default" {print $4}' removes=None creates=None chdir=None controller- 0.localdomain ansible- command[ 444382] : Invoked with warn=True executable=None _uses_shell=False _raw_params= nova-manage cell_v2 map_instances --cell_uuid 7f04f00d- 4b9d-478d- 941f-d93a7be145 e7 removes=None creates=None chdir=None
Mar 22 19:39:52 overcloud-
Mar 22 19:39:55 overcloud-
Mar 22 19:40:09 overcloud-
Mar 22 19:40:12 overcloud-
Mar 22 19:40:15 overcloud-
INSERT INTO `cell_mappings` VALUES ,NULL,2, '00000000- 0000-0000- 0000-0000000000 00','cell0' ,'none: /...
('2017-03-22 19:39:54'