I have a theory that perhaps the threading employed in swift-object-replicator may not be properly waiting for proper termination of the rsync processes if they are in the non-blocking i/o run state, as is often the case in the environment we are having issues with. I believe it it due to many writes happening from other nodes via the inbound rsync connections are causing a denial of service to the outbound rsync traffic at the disk controller. I think the "killing coros" section of code may need some improvement to ensure the rsyncs are culled before they return. We are ending up with a lot of zombie rsync processes under the swift-object-replicator parent.
I have a theory that perhaps the threading employed in swift-object- replicator may not be properly waiting for proper termination of the rsync processes if they are in the non-blocking i/o run state, as is often the case in the environment we are having issues with. I believe it it due to many writes happening from other nodes via the inbound rsync connections are causing a denial of service to the outbound rsync traffic at the disk controller. I think the "killing coros" section of code may need some improvement to ensure the rsyncs are culled before they return. We are ending up with a lot of zombie rsync processes under the swift-object- replicator parent.