No valid host nova error results in uninformative stack status
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Heat |
New
|
Undecided
|
Unassigned |
Bug Description
I hit this in TripleO CI - the root cause appears to be Nova returning 500 with a no-valid-host-found error, but Heat then obfuscates this pretty well:
What you see first as a user is:
2016-05-20 08:36:33.000 | | Controller | 26791d29-
So, a ResourceGroup, containing nested stacks (that contain servers) failed to create, and the reason is "Unknown, Code: Unknown", great! :(
There's no indication what the reason is, until you look in the logs:
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
2016-05-20 08:32:45.692 20437 ERROR heat.engine.
Nova has given us 500, but it did give us a reasonable message.
This causes the stack resource to fail, at which point all useful information about why the error happened is lost:
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
2016-05-20 08:32:47.844 20439 ERROR heat.engine.
So, I guess the question is how do we surface deeply nested errors in a way that's more useful than "unknown"?
Changed in heat: | |
milestone: | none → no-priority-tag-bugs |
I looked into it, and it appears there is one server which actually ends up with that error message. See http:// logs.openstack. org/81/ 266881/ 10/check- tripleo/ gate-tripleo- ci-f22- nonha/9eb113c/ logs/undercloud /var/log/ heat/heat- engine. txt.gz# _2016-05- 20_08_30_ 40_620
ResourceGroup only surface the first error, so that's why you get that message at the top.
I don't know if we can aggregate errors, but I believe we're doing our best here.