Kolla uses dumb-init[1] as PID1 for service containers. When container is stopped/restarted, SIGTERM is sent to dumb-init which forwards it to the all children in the root session. So when a services has child processes (workers) it seems they also receive SIGTERM and killed abruptly[2], rather than the parent waiting for them to finish.
[1] https://github.com/openstack/kolla/blob/master/docker/base/Dockerfile.j2#L403
[2] https://github.com/openstack/oslo.service/blob/master/oslo_service/service.py#L623
Here is what I've noticed with heat-engine container
1. Stopping/restarting the container
.7/site-packages/heat/engine/service.py:2343
2018-10-23 11:36:19.944 27 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:424
2018-10-23 11:36:19.945 28 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:424
2018-10-23 11:36:19.946 26 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:424
2018-10-23 11:36:19.946 29 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:424
2018-10-23 11:36:19.950 6 INFO oslo_service.service [-] Caught SIGTERM, stopping children
2018-10-23 11:36:19.951 6 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2018-10-23 11:36:19.951 6 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
2018-10-23 11:36:19.951 6 DEBUG oslo_service.service [-] Stop services. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:699
2018-10-23 11:36:19.951 6 INFO heat.engine.service [-] All threads were gone, terminating engine
2018-10-23 11:36:19.952 6 DEBUG oslo_service.service [-] Killing children. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:704
2018-10-23 11:36:19.952 6 INFO oslo_service.service [-] Waiting on 4 children to exit
2018-10-23 11:36:19.981 6 INFO oslo_service.service [-] Child 27 killed by signal 15
2018-10-23 11:36:19.995 6 INFO oslo_service.service [-] Child 29 killed by signal 15
2018-10-23 11:36:20.006 6 INFO oslo_service.service [-] Child 28 killed by signal 15
2018-10-23 11:36:20.009 6 INFO oslo_service.service [-] Child 26 killed by signal 15
2. When SIGTERM is (kill) is sent only to main heat-engine process
.7/site-packages/heat/engine/service.py:2343
2018-10-23 12:00:56.221 6 INFO oslo_service.service [-] Caught SIGTERM, stopping children
2018-10-23 12:00:56.221 6 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2018-10-23 12:00:56.221 6 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
2018-10-23 12:00:56.222 6 DEBUG oslo_service.service [-] Stop services. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:699
2018-10-23 12:00:56.222 6 INFO heat.engine.service [-] All threads were gone, terminating engine
2018-10-23 12:00:56.222 6 DEBUG oslo_service.service [-] Killing children. stop /usr/lib/python2.7/site-packages/oslo_service/service.py:704
2018-10-23 12:00:56.222 6 INFO oslo_service.service [-] Waiting on 4 children to exit
2018-10-23 12:00:56.223 28 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:424
2018-10-23 12:00:56.224 27 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:424
2018-10-23 12:00:56.224 25 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:424
2018-10-23 12:00:56.224 26 DEBUG heat.engine.service [-] Attempting to stop engine service... _stop_rpc_server /usr/lib/python2.7/site-packages/heat/engine/service.py:424
2018-10-23 12:01:03.638 25 INFO heat.engine.service [-] Engine service is stopped successfully
2018-10-23 12:01:03.638 25 DEBUG heat.engine.service [-] Attempting to stop engine listener... stop /usr/lib/python2.7/site-packages/heat/engine/service.py:306
2018-10-23 12:01:08.626 25 INFO heat.engine.service [-] Engine listener is stopped successfully
2018-10-23 12:01:08.627 25 INFO heat.engine.worker [-] Stopping engine_worker in engine e0a6bf2e-99d3-461d-abc6-ca44b9e1a2ef.
2018-10-23 12:01:08.678 25 INFO heat.engine.service [-] Waiting stack None processing to be finished
2018-10-23 12:01:08.722 25 INFO heat.engine.service [-] Stack None processing was finished
2018-10-23 12:01:08.732 27 INFO heat.engine.service [-] Engine service is stopped successfully
2018-10-23 12:01:08.732 25 INFO heat.engine.service [req-ff93becb-f183-44b7-b3f5-b8dfa47526d0 - - - - -] Service cafb13f5-241e-49dc-9830-3bb175286474 is deleted
2018-10-23 12:01:08.732 27 DEBUG heat.engine.service [-] Attempting to stop engine listener... stop /usr/lib/python2.7/site-packages/heat/engine/service.py:306
2018-10-23 12:01:08.732 25 INFO heat.engine.service [req-ff93becb-f183-44b7-b3f5-b8dfa47526d0 - - - - -] All threads were gone, terminating engine
2018-10-23 12:01:08.733 25 DEBUG oslo_concurrency.lockutils [-] Acquired semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:212
2018-10-23 12:01:08.733 25 DEBUG oslo_concurrency.lockutils [-] Releasing semaphore "singleton_lock" lock /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:228
I'm not sure if dumb-init with --single-child would help or we've to do some signal re-writing or something to be fixed in oslo.service?
Some experiment in kolla with https:/ /review. openstack. org/#/c/ 612887/, not sure if it would actually fix the issue.