When running deploy_plan workflow, tripleo.v1.stack_heat_stacks_get workflow is periodically run to get status from heat. It publishes stack as output and sends stack via Zaqar message. This is good because client does not have to reach Heat API again to get stack information.
Unfortunately this causes error in send_message task (tripleo.v1.stack_heat_stacks_get -> tripleo.v1.messaging.send -> send_message task) ZaqarAction.queue_post failed: Error response from Zaqar. Code: 400. Title: Invalid API request. Description: Message collection size is too large. Max size 1048576."
IMHO we should try to strip the stack from outputs as that is what takes up most of the space and outputs are not important for stack status tracking. Output is not included in stacks list heat api call, so maybe this can be tweaked on mistral heat.stacks_get action side (which maybe does multiple heat api calls to fetch all data including outputs)
Also for some reason, tripleo.v1.messaging.send keeps RUNNING forever instead of failing due to the above error, which makes deploy_plan also never finish. Here is the tasks list of the send_message workflow:
(undercloud) [stack@undercloud tripleo-common]$ mistral task-list 4469cd99-6560-46a8-8652-a735a87fe581
+--------------------------------------+------------------+---------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+
| ID | Name | Workflow name | Workflow namespace | Execution ID | State | State info | Created at | Updated at |
+--------------------------------------+------------------+---------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+
| 3994cb6d-4b9c-4c3c-9659-2d834e5b3600 | merge_payload | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | SUCCESS | None | 2018-06-04 07:49:09 | 2018-06-04 07:49:10 |
| a9e8a051-42d1-44ff-b384-0f5c2b40b253 | prepare_messages | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | SUCCESS | None | 2018-06-04 07:49:10 | 2018-06-04 07:49:12 |
| b8193b55-8bac-4474-b66a-23d9a7e17466 | branch_workflow | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | SUCCESS | None | 2018-06-04 07:49:12 | 2018-06-04 07:49:13 |
| 5a851b61-4fe6-447d-a4b4-9bd964868e67 | complete_swift | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | SUCCESS | None | 2018-06-04 07:49:13 | 2018-06-04 07:49:14 |
| 611c79a5-272f-4000-855b-8df16e60468a | send_message | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | ERROR | Failed to run action [act... | 2018-06-04 07:49:13 | 2018-06-04 07:49:30 |
| 9092d105-e79b-4d75-8b55-534749c02294 | check_status | tripleo.messaging.v1.send | | 4469cd99-6560-46a8-8652-a735a87fe581 | ERROR | Failed by tasks: [u'send_... | 2018-06-04 07:49:14 | 2018-06-04 07:49:31 |
+--------------------------------------+------------------+---------------------------+--------------------+--------------------------------------+---------+------------------------------+---------------------+---------------------+
Apparently heat.stacks_get action has resolve_ outputs= true input, so using false should resolve one of the problems