HA cluster deployment failed with "Lost connection to MySQL server at 'reading initial communication packet', system error: 0"

Bug #1349450 reported by Andrey Sledzinskiy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Confirmed
High
Vladimir Kuklin

Bug Description

{
    "build_id": "2014-07-28_02-01-14",
    "ostf_sha": "8c328521b1444f22c50463b9432193e20ed33813",
    "build_number": "366",
    "auth_required": true,
    "api": "1.0",
    "nailgun_sha": "83cc9ed44ebc8dd97248483b6d414ebbc4cff3c0",
    "production": "docker",
    "fuelmain_sha": "9adfbf5a52cedbdd16ec1a74f6c44c5b3419b87c",
    "astute_sha": "aa5aed61035a8dc4035ab1619a8bb540a7430a95",
    "feature_groups": [
        "mirantis"
    ],
    "release": "5.1",
    "fuellib_sha": "d1c7f67b3cf51978d3178c8666ea398f2477dcb5"
}

Steps:
1. Create next cluster - Ubuntu, HA, Neutron GRE, 3 controller, 2 compute
2. Deploy cluster

Expected - deployment passed
Actual - deployment failed with errors in neutron-server log on controller-1 (node-1):
neutron.openstack.common.rpc.amqp [req-f9b91851-e51a-41b5-bc00-ad223654bdbd None] Exception during message handling
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp Traceback (most recent call last):
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/openstack/common/rpc/amqp.py", line 462, in _process_data
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp **args)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/common/rpc.py", line 45, in dispatch
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp neutron_ctxt, version, method, namespace, **kwargs)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 219, in report_state
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp self.plugin.create_or_update_agent(context, agent_state)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 180, in create_or_update_agent
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp return self._create_or_update_agent(context, agent)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 159, in _create_or_update_agent
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp context, agent['agent_type'], agent['host'])
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/neutron/db/agents_db.py", line 136, in _get_agent_by_type_and_host
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp Agent.host == host).one()
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2306, in one
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp ret = list(self)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2349, in __iter__
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp return self._execute_and_instances(context)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2362, in _execute_and_instances
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp close_with_result=True)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/query.py", line 2353, in _connection_from_session
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp **kw)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 799, in connection
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp close_with_result=close_with_result)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 803, in _connection_for_bind
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp return self.transaction._connection_for_bind(engine)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/orm/session.py", line 299, in _connection_for_bind
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp conn = bind.contextual_connect()
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/base.py", line 1661, in contextual_connect
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp self.pool.connect(),
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 272, in connect
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp return _ConnectionFairy(self).checkout()
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 425, in __init__
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp rec = self._connection_record = pool._do_get()
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 778, in _do_get
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp con = self._create_connection()
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 225, in _create_connection
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp return _ConnectionRecord(self)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 318, in __init__
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp self.connection = self.__connect()
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/pool.py", line 368, in __connect
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp connection = self.__pool._creator()
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/strategies.py", line 80, in connect
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp return dialect.connect(*cargs, **cparams)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/sqlalchemy/engine/default.py", line 283, in connect
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp return self.dbapi.connect(*cargs, **cparams)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/MySQLdb/__init__.py", line 81, in Connect
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp return Connection(*args, **kwargs)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 193, in __init__
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp super(Connection, self).__init__(*args, **kwargs2)
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp OperationalError: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None
2014-07-28 11:55:31.404 23309 TRACE neutron.openstack.common.rpc.amqp

Looks like issue with connection (logs are attached)

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

This problem belongs to mysqldump which is used for SST for Galera. Here is the log

node-3:/var/log/puppet.log
Mon Jul 28 11:49:54 +0000 2014 /Stage[main]/Galera/Service[mysql-service] (info): Starting to evaluate the resource

node-1: neutron-server.log
2014-07-28T12:44:15.293475+01:00 warning: 2014-07-28 11:44:15.292 23135 WARNING neutron.openstack.common.db.sqlalchemy.session [req-21cbcff2-c7e8-4352-8784-3de095ee0dcc ] Database server has gone away: (2006, 'MySQL server has gone away')

Since node-1 was donor and non blocking SST (xtrabackup) was used this means that during SST MySQL was not available and was turned off by HAProxy clustercheck script. However, this didn't affect on neutron as it restored its operations after SST.

Changed in fuel:
importance: Medium → High
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

We need a reproducer here. Looks like a connectivity issue.

Changed in fuel:
status: New → Incomplete
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

looks pretty much like sqlalchemy issue

Changed in fuel:
status: Incomplete → Confirmed
Changed in mos:
assignee: nobody → Dmitry Mescheryakov (dmitrymex)
milestone: none → 5.1
status: New → Confirmed
importance: Undecided → High
no longer affects: fuel
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

Actually, it does not. Looks like Galera was down at the time Neutron tried to reach the database.

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Seems like the root cause is that mcollective lost connection on two nodes. Excerpts from 10.108.5.2/var/log/docker-logs/astute/astute.log:

2014-07-28T12:51:22 debug: [373] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"3", "progress"=>4, "status"=>"deploying", "role"=>"controller"}]}
2014-07-28T12:52:06 debug: [373] Retry #1 to run mcollective agent on nodes: '3'
2014-07-28T12:52:49 debug: [373] Retry #2 to run mcollective agent on nodes: '3'
2014-07-28T12:53:31 debug: [373] Retry #3 to run mcollective agent on nodes: '3'
2014-07-28T12:54:14 debug: [373] Retry #4 to run mcollective agent on nodes: '3'
2014-07-28T12:54:56 debug: [373] Retry #5 to run mcollective agent on nodes: '3'
2014-07-28T12:55:38 debug: [373] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"3", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}
2014-07-28T12:55:38 debug: [373] Data send by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"3", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}

2014-07-28T12:51:22 debug: [373] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"3", "progress"=>4, "status"=>"deploying", "role"=>"controller"}]}
2014-07-28T12:52:06 debug: [373] Retry #1 to run mcollective agent on nodes: '3'
2014-07-28T12:52:49 debug: [373] Retry #2 to run mcollective agent on nodes: '3'
2014-07-28T12:53:31 debug: [373] Retry #3 to run mcollective agent on nodes: '3'
2014-07-28T12:54:14 debug: [373] Retry #4 to run mcollective agent on nodes: '3'
2014-07-28T12:54:56 debug: [373] Retry #5 to run mcollective agent on nodes: '3'
2014-07-28T12:55:38 debug: [373] Data received by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"3", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}
2014-07-28T12:55:38 debug: [373] Data send by DeploymentProxyReporter to report it up: {"nodes"=>[{"uid"=>"3", "status"=>"error", "error_type"=>"deploy", "role"=>"controller"}]}

Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Vladimir, please confirm

Changed in mos:
assignee: Dmitry Mescheryakov (dmitrymex) → Vladimir Kuklin (vkuklin)
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Symptoms look the same as in https://bugs.launchpad.net/fuel/+bug/1356954 . Vladimir, if you agree, please make this bug a duplicate of 1356954.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.