Activity log for bug #1455489

Date Who What changed Old value New value Message
2015-05-15 13:12:02 Vladimir Kozhukalov bug added bug
2015-05-15 15:12:17 Dmitry Pyzhov fuel: milestone 7.0
2015-05-18 23:32:05 Mike Scherbakov fuel: milestone 7.0 6.1
2015-05-19 08:40:58 Kamil Sambor fuel: assignee Fuel Python Team (fuel-python) Kamil Sambor (ksambor)
2015-05-19 10:31:07 Kamil Sambor fuel: status Triaged In Progress
2015-05-19 10:32:43 Kamil Sambor fuel: assignee Kamil Sambor (ksambor) Vladimir Kozhukalov (kozhukalov)
2015-05-19 16:24:01 OpenStack Infra fuel: status In Progress Fix Committed
2016-04-13 14:07:01 Vladimir Kuklin nominated for series fuel/mitaka
2016-04-13 14:07:01 Vladimir Kuklin bug task added fuel/mitaka
2016-04-13 14:07:01 Vladimir Kuklin nominated for series fuel/newton
2016-04-13 14:07:01 Vladimir Kuklin bug task added fuel/newton
2016-04-13 14:07:11 Vladimir Kuklin nominated for series fuel/6.1.x
2016-04-13 14:07:11 Vladimir Kuklin bug task added fuel/6.1.x
2016-04-13 14:07:29 Vladimir Kuklin fuel/6.1.x: milestone 6.1-updates
2016-04-13 14:07:38 Vladimir Kuklin fuel/newton: milestone 6.1 10.0
2016-04-13 14:07:40 Vladimir Kuklin fuel/mitaka: milestone 9.0
2016-04-13 14:07:45 Vladimir Kuklin fuel/mitaka: importance Undecided Critical
2016-04-13 14:07:47 Vladimir Kuklin fuel/6.1.x: importance Undecided Critical
2016-04-13 14:07:54 Vladimir Kuklin fuel/6.1.x: assignee Fuel Python Team (fuel-python)
2016-04-13 14:07:59 Vladimir Kuklin fuel/mitaka: assignee Fuel Python Team (fuel-python)
2016-04-13 14:08:01 Vladimir Kuklin fuel/6.1.x: importance Critical High
2016-04-13 14:08:03 Vladimir Kuklin fuel/mitaka: importance Critical High
2016-04-13 14:08:05 Vladimir Kuklin tags customer-found
2016-04-13 15:22:41 Dmitry Pyzhov fuel/newton: status Fix Committed Confirmed
2016-04-13 15:23:05 Dmitry Pyzhov bug task deleted fuel/newton
2016-04-13 15:23:57 Dmitry Pyzhov tags customer-found customer-found fuel-py
2016-04-13 15:24:01 Dmitry Pyzhov tags customer-found fuel-py customer-found fuel-python
2016-04-13 15:24:06 Dmitry Pyzhov tags customer-found fuel-python area-python customer-found
2016-04-14 07:21:15 OpenStack Infra fuel: status Confirmed In Progress
2016-04-14 07:21:15 OpenStack Infra fuel: assignee Vladimir Kozhukalov (kozhukalov) Georgy Kibardin (gkibardin)
2016-04-14 07:21:51 Georgy Kibardin fuel/mitaka: assignee Fuel Python Team (fuel-python) Georgy Kibardin (gkibardin)
2016-04-14 07:22:06 Georgy Kibardin fuel/mitaka: status New In Progress
2016-04-14 09:02:39 Bartosz Kupidura fuel/6.1.x: status New Confirmed
2016-05-10 18:12:34 OpenStack Infra fuel: status In Progress Fix Committed
2016-05-11 15:11:43 OpenStack Infra fuel/mitaka: status In Progress Fix Committed
2016-05-25 12:43:38 Alexei Sheplyakov nominated for series fuel/8.0.x
2016-05-25 12:43:38 Alexei Sheplyakov bug task added fuel/8.0.x
2016-05-25 12:43:45 Alexei Sheplyakov fuel/8.0.x: status New Confirmed
2016-05-25 12:43:49 Alexei Sheplyakov fuel/8.0.x: importance Undecided High
2016-05-25 12:44:07 Alexei Sheplyakov fuel/mitaka: status Fix Committed Confirmed
2016-05-25 12:46:43 Alexei Sheplyakov summary Cloud-init must restart mcollective after configuring Nodes stuck in "provisioning" state despite successful reboot into newly installed OS
2016-05-25 13:00:48 Alexei Sheplyakov description Currently, cloud-init configures mcollective and then runs the following command service mcollective start But if mcollective was already installed into an image, System V subsystem starts it before mcollective and starting it again does give nothing. So, in order to make sure that mcollective is able to see config file changes made by cloud-init it must be restarted. Fortunately, nailgun-agent slightly changes the config and then restarts mcollective. This bug is motivated by the results of investigation done for this one https://bugs.launchpad.net/fuel/+bug/1454741 According to the output of `fuel nodes' almost half of nodes stuck in the `provisioning' state. However in fact those node have successfully rebooted into the newly installed OS a long time ago (~ 20 -- 30 minutes). The problem is that the mcollective service uses a wrong config file which specifies activemq as a messaging backend, as a result the node can't notify fuel that provisioning has successfuly completed. The /var/log/mcollective.log on such a node looks like # Logfile created on 2016-05-24 11:51:19 +0000 by logger.rb/31641 I, [2016-05-24T11:51:19.708422 #7999] INFO -- : config.rb:146:in `loadconfig' PLMC1: The Marionette Collective version 2.3.3 started by /usr/sbin/mcollectived using config file /etc/mcollective/server.cfg I, [2016-05-24T11:51:19.708508 #7999] INFO -- : mcollectived:35:in `<main>' The Marionette Collective 2.3.3 started logging at info level W, [2016-05-24T11:51:19.709309 #7999] WARN -- : activemq.rb:274:in `connection_headers' Connecting without STOMP 1.1 heartbeats, if you are using ActiveMQ 5.8 or newer consider setting plugin.activemq.heartbeat_interval I, [2016-05-24T11:51:19.709467 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 0 to stomp://mcollective@stomp1:6163 I, [2016-05-24T11:51:19.720273 #7999] INFO -- : activemq.rb:128:in `on_connectfail' TCP Connection to stomp://mcollective@stomp1:6163 failed on attempt 0 I, [2016-05-24T11:51:19.730528 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 1 to stomp://mcollective@stomp1:6163 I, [2016-05-24T11:51:19.733415 #7999] INFO -- : activemq.rb:128:in `on_connectfail' TCP Connection to stomp://mcollective@stomp1:6163 failed on attempt 1 I, [2016-05-24T11:51:19.753646 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 2 to stomp://mcollective@stomp1:6163 [repeated many many times] There's also an error message regarding a failed mcollective configuration attempt in /var/log/cloud-init-output.log on the affected nodes: Reading package lists... Building dependency tree... Reading state information... mcollective is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. start: Job is already running: mcollective 2016-05-24 11:51:19,768 - util.py[WARNING]: Running mcollective (<module 'cloudinit.config.cc_mcollective' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_mcollective.pyc'>) failed Currently, cloud-init configures mcollective and then runs the following command service mcollective start However nailgun-agent also adjusts mcollective config (/etc/mcollective/server.cfg) and restarts the service. If nailgun-agent (or init) wins the race and starts mcollective before cloud-init have configured it the above start command does nothing. As a result the node can't report that the provisioning has been completed successfully, so deployment never starts. In order to make sure that mcollective is able to see config file changes made by cloud-init it must be restarted.
2016-05-25 13:01:35 Alexei Sheplyakov description According to the output of `fuel nodes' almost half of nodes stuck in the `provisioning' state. However in fact those node have successfully rebooted into the newly installed OS a long time ago (~ 20 -- 30 minutes). The problem is that the mcollective service uses a wrong config file which specifies activemq as a messaging backend, as a result the node can't notify fuel that provisioning has successfuly completed. The /var/log/mcollective.log on such a node looks like # Logfile created on 2016-05-24 11:51:19 +0000 by logger.rb/31641 I, [2016-05-24T11:51:19.708422 #7999] INFO -- : config.rb:146:in `loadconfig' PLMC1: The Marionette Collective version 2.3.3 started by /usr/sbin/mcollectived using config file /etc/mcollective/server.cfg I, [2016-05-24T11:51:19.708508 #7999] INFO -- : mcollectived:35:in `<main>' The Marionette Collective 2.3.3 started logging at info level W, [2016-05-24T11:51:19.709309 #7999] WARN -- : activemq.rb:274:in `connection_headers' Connecting without STOMP 1.1 heartbeats, if you are using ActiveMQ 5.8 or newer consider setting plugin.activemq.heartbeat_interval I, [2016-05-24T11:51:19.709467 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 0 to stomp://mcollective@stomp1:6163 I, [2016-05-24T11:51:19.720273 #7999] INFO -- : activemq.rb:128:in `on_connectfail' TCP Connection to stomp://mcollective@stomp1:6163 failed on attempt 0 I, [2016-05-24T11:51:19.730528 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 1 to stomp://mcollective@stomp1:6163 I, [2016-05-24T11:51:19.733415 #7999] INFO -- : activemq.rb:128:in `on_connectfail' TCP Connection to stomp://mcollective@stomp1:6163 failed on attempt 1 I, [2016-05-24T11:51:19.753646 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 2 to stomp://mcollective@stomp1:6163 [repeated many many times] There's also an error message regarding a failed mcollective configuration attempt in /var/log/cloud-init-output.log on the affected nodes: Reading package lists... Building dependency tree... Reading state information... mcollective is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. start: Job is already running: mcollective 2016-05-24 11:51:19,768 - util.py[WARNING]: Running mcollective (<module 'cloudinit.config.cc_mcollective' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_mcollective.pyc'>) failed Currently, cloud-init configures mcollective and then runs the following command service mcollective start However nailgun-agent also adjusts mcollective config (/etc/mcollective/server.cfg) and restarts the service. If nailgun-agent (or init) wins the race and starts mcollective before cloud-init have configured it the above start command does nothing. As a result the node can't report that the provisioning has been completed successfully, so deployment never starts. In order to make sure that mcollective is able to see config file changes made by cloud-init it must be restarted. According to the output of `fuel nodes' almost half of nodes stuck in the `provisioning' state. However in fact those node have successfully rebooted into the newly installed OS a long time ago (~ 20 -- 30 minutes). The problem is that the mcollective service uses a wrong config file which specifies activemq as a messaging backend, as a result the node can't notify fuel that provisioning has successfuly completed. The /var/log/mcollective.log on such a node looks like # Logfile created on 2016-05-24 11:51:19 +0000 by logger.rb/31641 I, [2016-05-24T11:51:19.708422 #7999] INFO -- : config.rb:146:in `loadconfig' PLMC1: The Marionette Collective version 2.3.3 started by /usr/sbin/mcollectived using config file /etc/mcollective/server.cfg I, [2016-05-24T11:51:19.708508 #7999] INFO -- : mcollectived:35:in `<main>' The Marionette Collective 2.3.3 started logging at info level W, [2016-05-24T11:51:19.709309 #7999] WARN -- : activemq.rb:274:in `connection_headers' Connecting without STOMP 1.1 heartbeats, if you are using ActiveMQ 5.8 or newer consider setting plugin.activemq.heartbeat_interval I, [2016-05-24T11:51:19.709467 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 0 to stomp://mcollective@stomp1:6163 I, [2016-05-24T11:51:19.720273 #7999] INFO -- : activemq.rb:128:in `on_connectfail' TCP Connection to stomp://mcollective@stomp1:6163 failed on attempt 0 I, [2016-05-24T11:51:19.730528 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 1 to stomp://mcollective@stomp1:6163 I, [2016-05-24T11:51:19.733415 #7999] INFO -- : activemq.rb:128:in `on_connectfail' TCP Connection to stomp://mcollective@stomp1:6163 failed on attempt 1 I, [2016-05-24T11:51:19.753646 #7999] INFO -- : activemq.rb:113:in `on_connecting' TCP Connection attempt 2 to stomp://mcollective@stomp1:6163 [repeated many many times] There's also an error message regarding a failed mcollective configuration attempt in /var/log/cloud-init-output.log on the affected nodes: Reading package lists... Building dependency tree... Reading state information... mcollective is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. start: Job is already running: mcollective 2016-05-24 11:51:19,768 - util.py[WARNING]: Running mcollective (<module 'cloudinit.config.cc_mcollective' from '/usr/lib/python2.7/dist-packages/cloudinit/config/cc_mcollective.pyc'>) failed Currently, cloud-init configures mcollective and then runs the following command service mcollective start However nailgun-agent also adjusts mcollective config (/etc/mcollective/server.cfg) and restarts the service. If nailgun-agent (or init) wins the race and starts mcollective before cloud-init have configured it the above start command does nothing. As a result the node can't report that the provisioning has been completed successfully, so deployment never starts. In order to make sure that mcollective is able to see config file changes made by cloud-init it must be restarted. * Steps to reproduce: - Deploy a minimal cluster: 3 controllers, 2 computes, 3 ceph-osd nodes * Expected results: - Provisioning and deployment completes normally within the sane time (~ 30 -- 60 minutes) * Actual results: - Quite a number of nodes (almost a half) stuck in the `provisioning' state * Workaround: - manually restart the mcollective service on the affected nodes
2016-05-25 13:01:43 Alexei Sheplyakov fuel/8.0.x: milestone 8.0-updates
2016-05-25 14:16:04 Dmitry Pyzhov fuel/8.0.x: assignee MOS Maintenance (mos-maintenance)
2016-05-25 14:35:04 Georgy Kibardin fuel/mitaka: status Confirmed Fix Committed
2016-05-27 16:09:44 Sergii Rizvan fuel/8.0.x: assignee MOS Maintenance (mos-maintenance) Sergii Rizvan (srizvan)
2016-05-27 16:09:53 Sergii Rizvan fuel/8.0.x: milestone 8.0-updates 8.0-mu-2
2016-06-01 11:20:52 Sergii Rizvan fuel/8.0.x: milestone 8.0-mu-2 8.0-updates
2016-06-07 12:30:55 Andrey Lavrentyev tags area-python customer-found area-python customer-found on-verification
2016-06-08 10:45:34 Andrey Lavrentyev tags area-python customer-found on-verification area-python customer-found
2016-06-08 10:45:44 Andrey Lavrentyev fuel/mitaka: status Fix Committed Fix Released
2016-06-09 10:15:21 Sergii Rizvan fuel/8.0.x: status Confirmed In Progress
2016-06-09 10:15:30 Sergii Rizvan fuel/8.0.x: milestone 8.0-updates 8.0-mu-2
2016-06-24 07:39:49 Sergii Rizvan fuel/8.0.x: status In Progress Fix Committed
2016-06-24 09:53:21 Ekaterina Shutova tags area-python customer-found area-python customer-found on-verification
2016-06-29 10:34:57 Vitaly Sedelnik nominated for series fuel/7.0.x
2016-06-29 10:34:57 Vitaly Sedelnik bug task added fuel/7.0.x
2016-06-29 10:35:04 Vitaly Sedelnik fuel/7.0.x: status New Confirmed
2016-06-29 10:35:10 Vitaly Sedelnik fuel/7.0.x: importance Undecided High
2016-06-29 10:35:17 Vitaly Sedelnik fuel/7.0.x: assignee Sergii Rizvan (srizvan)
2016-06-29 10:35:24 Vitaly Sedelnik fuel/7.0.x: milestone 7.0-mu-5
2016-06-29 10:35:27 Vitaly Sedelnik fuel/6.1.x: milestone 6.1-updates 6.1-mu-7
2016-06-29 10:35:37 Vitaly Sedelnik fuel/6.1.x: assignee Fuel Python (Deprecated) (fuel-python) Sergii Rizvan (srizvan)
2016-06-30 09:17:45 Ekaterina Shutova fuel/8.0.x: status Fix Committed Fix Released
2016-06-30 09:17:59 Ekaterina Shutova tags area-python customer-found on-verification area-python customer-found
2016-07-05 10:53:37 Sergii Rizvan fuel/7.0.x: status Confirmed In Progress
2016-07-05 10:53:40 Sergii Rizvan fuel/6.1.x: status Confirmed In Progress
2016-07-08 04:57:02 Sergii Rizvan fuel/6.1.x: status In Progress Won't Fix
2016-07-08 04:57:05 Sergii Rizvan fuel/7.0.x: status In Progress Won't Fix
2016-07-08 04:57:12 Sergii Rizvan fuel/6.1.x: milestone 6.1-mu-7 6.1-updates
2016-07-08 04:57:18 Sergii Rizvan fuel/7.0.x: milestone 7.0-mu-5 7.0-updates
2017-04-10 10:14:43 Dmitry Belyaninov tags area-python customer-found area-python customer-found on-verification
2017-04-13 07:53:12 Dmitry Belyaninov fuel: status Fix Committed Fix Released
2017-04-13 07:53:21 Dmitry Belyaninov tags area-python customer-found on-verification area-python customer-found