Comment 0 for bug 1844697

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote : periodic: container build job can fail during push

```
ERROR:kolla.common.utils.glance-api:Unknown error when pushing
2019-09-19 01:13:00 | Traceback (most recent call last):
2019-09-19 01:13:00 | File "/home/zuul/workspace/venv_build/lib/python2.7/site-packages/kolla/image/build.py", line 309, in run
2019-09-19 01:13:00 | self.push_image(image)
2019-09-19 01:13:00 | File "/home/zuul/workspace/venv_build/lib/python2.7/site-packages/kolla/image/build.py", line 335, in push_image
2019-09-19 01:13:00 | for response in self.dc.push(image.canonical_name, **kwargs):
2019-09-19 01:13:00 | File "/usr/lib/python2.7/site-packages/docker/api/client.py", line 334, in _stream_helper
2019-09-19 01:13:00 | for chunk in json_stream(self._stream_helper(response, False)):
2019-09-19 01:13:00 | File "/usr/lib/python2.7/site-packages/docker/utils/json_stream.py", line 66, in split_buffer
2019-09-19 01:13:00 | for data in stream_as_text(stream):
2019-09-19 01:13:00 | File "/usr/lib/python2.7/site-packages/docker/utils/json_stream.py", line 22, in stream_as_text
2019-09-19 01:13:00 | for data in stream:
2019-09-19 01:13:00 | File "/usr/lib/python2.7/site-packages/docker/api/client.py", line 340, in _stream_helper
2019-09-19 01:13:00 | data = reader.read(1)
2019-09-19 01:13:00 | File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 459, in read
2019-09-19 01:13:00 | raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
2019-09-19 01:13:00 | File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
2019-09-19 01:13:00 | self.gen.throw(type, value, traceback)
2019-09-19 01:13:00 | File "/usr/lib/python2.7/site-packages/urllib3/response.py", line 365, in _error_catcher
2019-09-19 01:13:00 | raise ReadTimeoutError(self._pool, None, 'Read timed out.')
2019-09-19 01:13:00 | ReadTimeoutError: UnixHTTPConnectionPool(host='localhost', port=None): Read timed out.
```

http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-centos-7-master-containers-build-push/51d361e/logs/build.log.txt.gz

Apparently the code does not have any retry mechanism on it, I think it should retry at least 3 times withing 10 minutes before failing, so we can avoid failing the entire job just because an external service is restarted or the network connectivity is bit flaky.

The retry should be implemented around https://github.com/openstack/kolla/blob/master/kolla/image/build.py#L305-L324