Heat stack with "wait_condition" fails with stack creation error

Bug #1612204 reported by Anastasia Kuznetsova
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
MOS Heat
Mitaka
Fix Released
High
MOS Heat

Bug Description

Detailed bug description:
One of Heat OSTF tests fails on the SWARM:

- Check creation of stack with Wait Condition/Handle resources (fuel_health.tests.tests_platform.test_heat.HeatSmokeTests.test_wait_condition). Test status: failure, message: Time limit exceeded while waiting for stack status becoming "CREATE_COMPLETE" to finish. Please refer to OpenStack logs for more details.

Steps to reproduce:
1. Create cluster
2. Add 1 node with controller role and mongo
3. Add 1 nodes with compute role
4. Set install Ceilometer option
5. Deploy the cluster
6. Verify Heat, Ceilometer services
7. Run OSTF platform tests

Expected result:
All OSTF tests are passed

Observed result:
"test_wait_condition" failed

There are no any tracebacks or something criminal in heat logs, just:
<131>Aug 11 04:25:36 node-1 heat-engine: 2016-08-11 04:25:36.011 30973 ERROR heat.engine.stack [-] Unexpected exception in create:
<135>Aug 11 04:25:36 node-1 heat-engine: 2016-08-11 04:25:36.030 30973 DEBUG oslo_messaging._drivers.amqpdriver [-] CAST unique_id: d12ad526b2e341d78403723e004b2ea2 size: 11323 NOTIFY exchange: heat topic: notifications.error _send /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:480
<134>Aug 11 04:25:36 node-1 heat-engine: 2016-08-11 04:25:36.060 30973 INFO heat.engine.stack [-] Stack CREATE FAILED (ost1_test-heat-stack-1601873178):
<135>Aug 11 04:25:36 node-1 heat-engine: 2016-08-11 04:25:36.068 30973 DEBUG oslo_messaging._drivers.amqpdriver [req-d297987d-df72-4493-912a-ba3d087e1c56 - heatSimple - default default] REPLY msg_id: dd668e608a3a4c408c345859fee3e593 size: 146 reply queue: reply_88a6f868b8dd49e8abb4609711309939 time elapsed: 0.0601863840002s _send_reply /usr/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py:96

Description of the environment:
9.1 snapshot 119
Link to failed job: https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.services_ha_one_controller/24/testReport/(root)/deploy_heat_ha_one_controller_neutron/

Changed in fuel:
assignee: nobody → MOS Heat (mos-heat)
milestone: none → 9.1
tags: added: system-tests
Revision history for this message
Oleksii Chuprykov (ochuprykov) wrote :

Could you please provide an output of heat stack-show command?

Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :
Revision history for this message
Vitalii Gridnev (vgridnev) wrote :

From diagnostic snapshot logs of sahara it's clear that stack was always in CREATE_IN_PROGRESS state. Additionally, in 9.x sahara always creates wait conditions for all nodes in cluster, so if wait condition tests are not really working, that absolutely will affects sahara since we are consuming that feature.

[0] https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.services_ha_one_controller/24/testReport/(root)/deploy_sahara_ha_one_controller_tun/

Changed in fuel:
importance: Undecided → High
tags: added: area-heat
Revision history for this message
Oleksii Chuprykov (ochuprykov) wrote :

I think it is problems with ssl. You should use correct templates.
https://review.openstack.org/#/c/330603/

Revision history for this message
Alexander Nagovitsyn (gluk12189) wrote :
Changed in fuel:
status: New → Confirmed
Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :

Added tag "swarm-blocker" because it affects 3 swarm system tests (one heat, two sahara tests)

tags: added: swarm-blocker
summary: - Heat OSTF test "test_wait_condition" fails with stack creation error
+ Heat stack with "wait_condition" fails with stack creation error
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Alexander Nagovitsyn (gluk12189) wrote :

we have some problem with create networks command in template.
Heat team are working on it

Revision history for this message
Peter Razumovsky (prazumovsky) wrote :

seems like we don't need to add custom -k argument to test, but add insecure=True to [clients_heat] section of heat.conf. So seems heat is misconfigured and should be fixed with puppets by adding insecure option during configurating, i.e.

[clients_heat]
insecure=true
url=...

Revision history for this message
Peter Razumovsky (prazumovsky) wrote :

I will write puppet team about that

Revision history for this message
Denis Egorenko (degorenko) wrote :

I'm sure that it's not a good idea to use insecure for heat client for SSL case.

Revision history for this message
Sergey Kraynev (skraynev) wrote :

Denis, let me explain it a bit.

This option in heat.conf was added just for supporting easy work with WaitConditions.
It does not affect other security in Heat itself.

It was added in upstream for handling follow situation. User has template, which works on openstack without SSL.
Then he wants to use SSL or (copy example of template from somewhere, where was not used SSL).
He try to create stack with this template and got some unexpected error, due to internal heat misunderstanding.

How user should deal with it before?
there are two options:
1. add manually --insecure option in template (p.s. forgot to say, that config options insecure just trigger adding this option to WaitConditionHandle resource)

2. use full approach with certificates like generate them - upload one in Heat template and another to controller with heat services. (Honestly I have not heard that somebody followed this way. Most part prefer option #1)

So now we have option 3:
enable insecure WHEN we have ENABLE SSL (please make it dependence from SSL option, don't set it true for all - it's not bad, but looks weird :) )
So this bug is about option 3.

Revision history for this message
Alex Schultz (alex-schultz) wrote :

My personal preference is -k to the template to minimize risk but I'm not sure why the host it's running on does not have the proper CA in the system for validation. Where is the curl command ultimately run and are we not publishing the self signed CA everywhere? Or is this a custom SSL where the user does not provide a CA bundle? Adding insecure=True disables ssl certificate verification so while the traffic is still encrypted it does not prevent MITM attacks

Revision history for this message
Sergey Kraynev (skraynev) wrote :

Alex,
"Everything that is done is done for the best".

I re-reviewed all Heat code again + patch and unfortunately for me need to say,
that we should not change it.
This options also changes parameter for internal clients, that is not security safe way.
So generally you are right.

However all words about wait condition curl are true.
>> Where is the curl command ultimately run and are we not publishing the self signed CA everywhere?

This curl request will be executed in VM, which will be launched in Heat stack, i.e. usual VM booted by Nova. So issue here, that user should put certificate for Openstack inside his custom/own VM. Obviously He has not this certificate - because it's not security safe too.

I thought, that it's good solution for us, but unfortunately it just re-use existing "insecure" option, which is not what we want to have.

We will fix tests.

Revision history for this message
Evgeny Sikachev (esikachev) wrote :

Hi! This commit has broken sahara https://github.com/openstack/heat/commit/59fc53a66c4dec45e4d150bce0a1d674477f710c and WC(if we correctly understand it). In logs of instance we can see:

Traceback (most recent call last):
  File "/var/lib/cloud/instance/scripts/loguserdata.py", line 32, in <module>
    import pkg_resources
ImportError: No module named 'pkg_resources'

When we revert this commit everything works fine.

For reproduce this issue you can create stack with one instance(with WC) and open logs of instance.

Revision history for this message
Peter Razumovsky (prazumovsky) wrote :

https://review.openstack.org/#/c/330603/ in master has been merged.
cherry-pick for mitaka waiting in the wings https://review.openstack.org/#/c/352416/

Revision history for this message
Anastasia Kuznetsova (akuznetsova) wrote :

This bug is not a blocker anymore, cause two affected Sahara tests were fixed by other patch within another related bug.

tags: removed: swarm-blocker
tags: added: swarm-blocker
tags: removed: swarm-blocker
no longer affects: fuel/newton
Revision history for this message
Alexander Nagovitsyn (gluk12189) wrote :

Fix merged in stable/mitaca
https://review.openstack.org/#/c/352416/4

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Alexander Nagovitsyn (gluk12189) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.