Apparently the code does not have any retry mechanism on it, I think it should retry at least 3 times withing 10 minutes before failing, so we can avoid failing the entire job just because an external service is restarted or the network connectivity is bit flaky.
``` common. utils.glance- api:Unknown error when pushing zuul/workspace/ venv_build/ lib/python2. 7/site- packages/ kolla/image/ build.py" , line 309, in run image(image) zuul/workspace/ venv_build/ lib/python2. 7/site- packages/ kolla/image/ build.py" , line 335, in push_image push(image. canonical_ name, **kwargs): python2. 7/site- packages/ docker/ api/client. py", line 334, in _stream_helper self._stream_ helper( response, False)): python2. 7/site- packages/ docker/ utils/json_ stream. py", line 66, in split_buffer as_text( stream) : python2. 7/site- packages/ docker/ utils/json_ stream. py", line 22, in stream_as_text python2. 7/site- packages/ docker/ api/client. py", line 340, in _stream_helper python2. 7/site- packages/ urllib3/ response. py", line 459, in read self._fp_ bytes_read, self.length_ remaining) python2. 7/contextlib. py", line 35, in __exit__ throw(type, value, traceback) python2. 7/site- packages/ urllib3/ response. py", line 365, in _error_catcher r(self. _pool, None, 'Read timed out.') ionPool( host='localhost ', port=None): Read timed out.
ERROR:kolla.
2019-09-19 01:13:00 | Traceback (most recent call last):
2019-09-19 01:13:00 | File "/home/
2019-09-19 01:13:00 | self.push_
2019-09-19 01:13:00 | File "/home/
2019-09-19 01:13:00 | for response in self.dc.
2019-09-19 01:13:00 | File "/usr/lib/
2019-09-19 01:13:00 | for chunk in json_stream(
2019-09-19 01:13:00 | File "/usr/lib/
2019-09-19 01:13:00 | for data in stream_
2019-09-19 01:13:00 | File "/usr/lib/
2019-09-19 01:13:00 | for data in stream:
2019-09-19 01:13:00 | File "/usr/lib/
2019-09-19 01:13:00 | data = reader.read(1)
2019-09-19 01:13:00 | File "/usr/lib/
2019-09-19 01:13:00 | raise IncompleteRead(
2019-09-19 01:13:00 | File "/usr/lib64/
2019-09-19 01:13:00 | self.gen.
2019-09-19 01:13:00 | File "/usr/lib/
2019-09-19 01:13:00 | raise ReadTimeoutErro
2019-09-19 01:13:00 | ReadTimeoutError: UnixHTTPConnect
```
http:// logs.rdoproject .org/openstack- periodic- master/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- centos- 7-master- containers- build-push/ 51d361e/ logs/build. log.txt. gz
Apparently the code does not have any retry mechanism on it, I think it should retry at least 3 times withing 10 minutes before failing, so we can avoid failing the entire job just because an external service is restarted or the network connectivity is bit flaky.
The retry should be implemented around https:/ /github. com/openstack/ kolla/blob/ master/ kolla/image/ build.py# L305-L324