Autoscaling doesn't detect failed instance creation
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Heat |
Fix Released
|
High
|
Thomas Herve |
Bug Description
When nova fails to create an instance, autoscaling doesn't propagate the error, it adds the instance to the instance list in the DB anyway, so you have no chance to retry creating the instance (e.g by triggering another scaling event/alarm).
So our instance list in the DB is false, and inconsistent with what actually exists in nova.
This also creates bad entries in the haproxy.cfg on the loadbalancer:
backend servers
balance roundrobin
option http-server-close
option forwardfor
option httpchk
timeout check 5s
server server1 0.0.0.0:80 check inter 30s fall 5 rise 3
server server2 0.0.0.0:80 check inter 30s fall 5 rise 3
server server3 0.0.0.0:80 check inter 30s fall 5 rise 3
which obviously isn't going to work (we actually create a bad IP for the first server too, but I'll raise a separate bug for that)
Here's an example in engine.log of things going wrong - note the nova error, then we go ahead and update the LoadBalancer anyway...
2013-06-18 11:50:48.158 2992 ERROR heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.158 2992 TRACE heat.engine.
2013-06-18 11:50:48.206 2992 DEBUG heat.engine.
2013-06-18 11:50:48.217 2992 DEBUG heat.engine.
2013-06-18 11:50:48.233 2992 INFO heat.engine.
Changed in heat: | |
importance: | Undecided → Critical |
status: | New → Triaged |
Changed in heat: | |
milestone: | none → havana-2 |
Changed in heat: | |
assignee: | nobody → Thomas Herve (therve) |
Changed in heat: | |
importance: | Critical → High |
Changed in heat: | |
status: | Fix Committed → Fix Released |
Changed in heat: | |
milestone: | havana-2 → 2013.2 |
Assuming this is occurring on a scaling event, rather than stack create/update, we have always suppressed these errors.