backtrace on console 3-5 minutes after HA test completes

Bug #1072940 reported by Angus Salkeld
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat
Fix Released
Medium
Steven Hardy
Grizzly
Fix Released
Medium
Steven Hardy

Bug Description

[root@bigiron heat]# ./run_tests.sh -a tag='HA'
Running tests
HaFunctionalTest
test_instance

OK 222.77
Failure
runTest SKIP: unable to import cfn helper, skipping

Ran 2 tests in 223.181s

OK (SKIP=1)
Slowest 1 tests took 222.77 secs:
222.77 test_instance (heat.tests.functional.test_WordPress_Single_Instance_With_HA.HaFunctionalTest)

This seems to work correctly

Then after about 3-5 minutes, engine prints the following backtrace to stdout:

Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/eventlet/greenpool.py", line 80, in _spawn_n_impl
func(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/heat-6-py2.7.egg/heat/engine/parser.py", line 440, in create
self.state_set(stack_status, reason)
File "/usr/lib/python2.7/site-packages/heat-6-py2.7.egg/heat/engine/parser.py", line 402, in state_set
stack.update_and_save({'status': new_status,
AttributeError: 'NoneType' object has no attribute 'update_and_save'

The engine.log then contains:

2012-09-06 13:50:29 DEBUG [heat.manager] Running periodic task EngineManager._periodic_watcher_task
2012-09-06 13:50:55 ERROR [heat.engine.resources] create WaitCondition "WaitCondition"
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/heat-6-py2.7.egg/heat/engine/resources.py", line 249, in create
self.handle_create()
File "/usr/lib/python2.7/site-packages/heat-6-py2.7.egg/heat/engine/wait_condition.py", line 96, in handle_create
meta = handle.metadata
File "/usr/lib/python2.7/site-packages/heat-6-py2.7.egg/heat/engine/resources.py", line 46, in get
rs = db_api.resource_get(resource.stack.context, resource.id)
File "/usr/lib/python2.7/site-packages/heat-6-py2.7.egg/heat/db/api.py", line 65, in resource_get
return IMPL.resource_get(context, resource_id)
File "/usr/lib/python2.7/site-packages/heat-6-py2.7.egg/heat/db/sqlalchemy/api.py", line 65, in resource_get
raise NotFound("resource with id %s not found" % resource_id)
NotFound: resource with id 26 not found
2012-09-06 13:50:55 ERROR [heat.engine.resources] DB error (IntegrityError) (1452, 'Cannot add or update a child row: a foreign key constraint fails (heat.resource, CONSTRAINT resource_ibfk_2 FOREIGN KEY (stack_id) REFERENCES stack (id))') 'INSERT INTO resource (created_at, updated_at, state, name, nova_instance, state_description, rsrc_metadata, stack_id) VALUES (%s, %s, %s, %s, %s, %s, %s, %s)' (datetime.datetime(2012, 9, 6, 20, 50, 55, 653377), None, 'CREATE_FAILED', 'WaitCondition', None, None, '{}', 8L)

shardy:
======
I think this is the same problem described in #261, and is related to #264

The reason we're seeing this in the test is that utils.Instance.wait_for_provisioning doesn't actually check that the stack is CREATE_COMPLETE, only that the instance is provisioned (which due to #264 can mean the test proceeds while the stack is still CREATE_IN_PROGRESS state)

I guess we need to look at the best fix for #261, and I have a fix (or at least improvement) for #264, but to fix this specific issue with the tests, we just need to add another test to ensure the stack is CREATE_COMPLETE before we do the tests, hence avoiding any risk of deleting while still in CREATE_IN_PROGRESS state

Actually, correction to above, utils.Stack.create does check for CREATE_COMPLETE, so I'm guessing this problem only occurs with the IHA test (not the HA one), steve can you confirm that is the case?

I think the problem is as described above, only the waitcondition ends up still in-progress after the IHA instance replacement has happened

Angus Salkeld (asalkeld)
Changed in heat:
status: New → Triaged
importance: Undecided → Medium
Steven Dake (sdake)
Changed in heat:
milestone: none → grizzly-3
Revision history for this message
Steven Hardy (shardy) wrote :

So I'm pretty sure the root cause of this is the waitcondition greenthread didn't get cancelled on stack delete, and there have been several fixes in that area recently (in particular bug 1096150), so I think this is fixed, but doing some testing to confirm, and also to check the behavior if you delete a stack while an IHA replacement is in progress.

Changed in heat:
status: Triaged → In Progress
Revision history for this message
Steven Hardy (shardy) wrote :

So after some testing, it seems that the WaitCondition from the initial stack create will get cancelled now with the fix for bug 1096150, but we also need the fix for bug 1097847 or the WaitCondition for the replacement instance after the IHA HeartbeatFailureAlarm is not cancelled, leading to a similar problem.

Revision history for this message
Steven Hardy (shardy) wrote :

Ok, I've re-tested now both the fixes mentioned in #2 are merged, and I can no longer reproduce this problem (tested with test_WordPress_Single_Instance_With_IHA.py which previously would reliably reproduce the issue).

So closing this bug as fix committed.

Changed in heat:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in heat:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in heat:
milestone: grizzly-3 → 2013.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.