Subcloud add failure due to bootstrap replay failure

Bug #2039863 reported by Rupanjan Chakraborty
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
In Progress
Undecided
Rupanjan Chakraborty

Bug Description

Brief Description

Many subclouds in scale lab could not be re-added due to bootstrap replay failure.

Severity

Major

Steps to Reproduce

To check if this is reproducible in DC1000-2, we had to apply a change and restart dcmanager service. As a result, all subclouds that were in the middle of bootstrap failed

Readd the 250 subclouds

Expected Behavior

Subclouds could be re-added without issues

Actual Behavior

Some subcloud failed to bootstrap with the following error

      Failed to provision the initial system config.
      Traceback (most recent call last):
        File "/tmp/.ansible-sysadmin/tmp/ansible-tmp-1694037682.4409835-4193870-129384925261196/populate_initial_config.py", line 1326, in <module>
          populate_service_parameter_config(client)
        File "/tmp/.ansible-sysadmin/tmp/ansible-tmp-1694037682.4409835-4193870-129384925261196/populate_initial_config.py", line 1046, in populate_service_parameter_config
          populate_docker_kube_config(client)
        File "/tmp/.ansible-sysadmin/tmp/ansible-tmp-1694037682.4409835-4193870-129384925261196/populate_initial_config.py", line 838, in populate_docker_kube_config
          client.sysinv.service_parameter.delete(parameter.uuid)
        File "/usr/lib/python3/dist-packages/cgtsclient/v1/service_parameter.py", line 45, in delete
          return self._delete(self._path(parameter_id))
        File "/usr/lib/python3/dist-packages/cgtsclient/common/base.py", line 95, in _delete
          self.api.raw_request('DELETE', url)
        File "/usr/lib/python3/dist-packages/cgtsclient/common/http.py", line 224, in raw_request
          return self._http_request(url, method, **kwargs)
        File "/usr/lib/python3/dist-packages/cgtsclient/common/http.py", line 186, in _http_request
          raise exceptions.from_response(
      cgtsclient.exc.HTTPBadRequest: Failure deleting configmap: Kubernetes is not configured. API operations will not be available.
    stdout_lines:
An example of these failed subclouds is subcloud104. Ansible bootstrap was terminated while it was in the middle of the following task which should not affect the bootstrap replay.

TASK [bootstrap/bringup-essential-services : Check controller-0 is in online state] ***
Wednesday 06 September 2023 21:31:46 +0000 (0:00:01.269) 0:36:06.602 ***
FAILED - RETRYING: Check controller-0 is in online state (15 retries left).
Reproducibility

100% reproducible

System Configuration

Distrubuted Cloud

Alarms

N/A

Test Activity

Developer Testing

Workaround

AWS subcloud: Delete the VM, redeploy the VM, delete the subcloud and re-add

Hardware subcloud: delete the subcloud and re-add with reinstall option

Changed in starlingx:
assignee: nobody → Rupanjan Chakraborty (rchakrab)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.