messages from _heat_stacks_get workflow are probably too large

Bug #1774958 reported by Jiri Tomasek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

When running deploy_plan workflow, tripleo.v1.stack_heat_stacks_get workflow is periodically run to get status from heat. It publishes stack as output and sends stack via Zaqar message. This is good because client does not have to reach Heat API again to get stack information.

Unfortunately this causes error in send_message task (tripleo.v1.stack_heat_stacks_get -> tripleo.v1.messaging.send -> send_message task) ZaqarAction.queue_post failed: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 1048576."

IMHO we should try to strip the stack from outputs as that is what takes up most of the space and outputs are not important for stack status tracking. Output is not included in stacks list heat api call, so maybe this can be tweaked on mistral heat.stacks_get action side (which maybe does multiple heat api calls to fetch all data including outputs)

Also for some reason, tripleo.v1.messaging.send keeps RUNNING forever instead of failing due to the above error, which makes deploy_plan also never finish. Here is the tasks list of the send_message workflow:
(undercloud) [stack@undercloud tripleo-common]$ mistral task-list 4469cd99-6560-46a8-8652-a735a87fe581
+--------------------------------------+------------------+---------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+
| ID | Name | Workflow name | Workflow namespace | Execution ID | State | State info | Created at | Updated at |
+--------------------------------------+------------------+---------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+
| 3994cb6d-4b9c-4c3c-9659-2d834e5b3600 | merge_payload | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | SUCCESS | None | 2018-06-04 07:49:09 | 2018-06-04 07:49:10 |
| a9e8a051-42d1-44ff-b384-0f5c2b40b253 | prepare_messages | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | SUCCESS | None | 2018-06-04 07:49:10 | 2018-06-04 07:49:12 |
| b8193b55-8bac-4474-b66a-23d9a7e17466 | branch_workflow | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | SUCCESS | None | 2018-06-04 07:49:12 | 2018-06-04 07:49:13 |
| 5a851b61-4fe6-447d-a4b4-9bd964868e67 | complete_swift | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | SUCCESS | None | 2018-06-04 07:49:13 | 2018-06-04 07:49:14 |
| 611c79a5-272f-4000-855b-8df16e60468a | send_message | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | ERROR | Failed to run action [act... | 2018-06-04 07:49:13 | 2018-06-04 07:49:30 |
| 9092d105-e79b-4d75-8b55-534749c02294 | check_status | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | ERROR | Failed by tasks: [u'send_... | 2018-06-04 07:49:14 | 2018-06-04 07:49:31 |
+--------------------------------------+------------------+---------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+

Tags: workflows
Revision history for this message
Jiri Tomasek (jtomasek) wrote :

Apparently heat.stacks_get action has resolve_outputs=true input, so using false should resolve one of the problems

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-common (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/572396

Changed in tripleo:
milestone: rocky-2 → rocky-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/572807

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/572396
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=ce0db0722ac4a5ffbeea6b641f0e0fc35c9b746b
Submitter: Zuul
Branch: master

commit ce0db0722ac4a5ffbeea6b641f0e0fc35c9b746b
Author: Dougal Matthews <email address hidden>
Date: Tue Jun 5 14:37:58 2018 +0100

    Remove the output from the heat.stacks_get action result

    The output can be huge and far too big to store in Mistral or send via
    Zaqar. This is a workaroud while we investigate why resolve_outputs
    doesn't seem to be working in the Mistral action.

    Related-Bug: 1774958
    Change-Id: I8491df32194546098eb1cfad2df90f0829684a76

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/572807
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=21a66f2bddb47212ba87155f60a1a6b843e81c33
Submitter: Zuul
Branch: master

commit 21a66f2bddb47212ba87155f60a1a6b843e81c33
Author: Thomas Herve <email address hidden>
Date: Wed Jun 6 17:24:09 2018 +0200

    Don't resolve outputs when getting stacks

    A few workflows are retrieving the overcloud stack with its outputs,
    while not using them. This can make for a big result, so pass the
    resolve_outputs flag to disable it.

    Change-Id: I5deac641dbbc5552bb52f3e27a13b9dee5c1be4a
    Related-Bug: #1774958
    Depends-On: https://review.openstack.org/573306

Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Dougal Matthews (d0ugal)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.