Bug #1917308 “Stx-openstack apply-fail after swact standby contr...” : Bugs : StarlingX

Revision history for this message

Alexandru Dimofte (adimofte) wrote on 2021-03-01:

#1

Attached the collected logs Edit (168.4 MiB, application/x-tar)

Revision history for this message

Dan Voiculeasa (dvoicule) wrote on 2021-03-01:

#2

Download full text (3.6 KiB)

Did a short investigation since https://review.opendev.org/c/starlingx/config/+/773451 landed.

There is a small error observerd in the logs introduced by that that commit, but it is not the cause for the issue observed here. This will be the fix for that error:
diff --git a/sysinv/sysinv/sysinv/sysinv/conductor/manager.py b/sysinv/sysinv/sysinv/sysinv/conductor/manager.py
index b5189f65..6fb2616e 100644
--- a/sysinv/sysinv/sysinv/sysinv/conductor/manager.py
+++ b/sysinv/sysinv/sysinv/sysinv/conductor/manager.py
@@ -11908,8 +11908,8 @@ class ConductorManager(service.PeriodicService):
                 LOG.exception("Failed to regenerate the overrides for app %s. %s" %
                               (app.name, e))
         else:
- LOG.info("{} app active:{} status:{} does not warrant re-apply",
- app.name, app.active, app.status)
+ LOG.info("{} app active:{} status:{} does not warrant re-apply"
+ "".format(app.name, app.active, app.status))

def app_lifecycle_actions(self, context, rpc_app, hook_info):
"""Perform any lifecycle actions for the operation and timing supplied.
--
2.30.0

Back to the issue:

Seems armada/kubernetes related.

sysinv 2021-03-01 11:36:32.372 2356122 INFO sysinv.conductor.kube_app [-] lifecycle hook for application stx-openstack (1.0-78-centos-stable-versioned) started {'lifecycle_type': u'manifest', 'relative_timing': u'pre', 'mode': u'auto', 'operation': u'apply', 'extra': {'was_applied': True}}.
sysinv 2021-03-01 11:36:32.372 2356122 INFO k8sapp_openstack.lifecycle.lifecycle_openstack [-] Wait if there are openstack charts in pending install...
sysinv 2021-03-01 11:36:32.781 2356122 ERROR sysinv.conductor.kube_app [-] Helm operation failure: Failed to obtain pending charts list: Helm operation failure: Error: write tcp 172.16.192.176:45960->10.10.59.10:5432: write: broken pipe
command terminated with exit code 1
: HelmTillerFailure: Helm operation failure: Failed to obtain pending charts list: Helm operation failure: Error: write tcp 172.16.192.176:45960->10.10.59.10:5432: write: broken pipe
command terminated with exit code 1
2021-03-01 11:36:32.781 2356122 ERROR sysinv.conductor.kube_app Traceback (most recent call last):

var/log/containers$ grep -R "10.10.59.10" | grep armada-api
armada-api-b86d46465-xdbjt_armada_tiller-a00cf66fa21b19f28771a99a2aa85643c1fbfd2ed9d19d0f10c2a8ac7925cc1b.log:2021-03-01T10:44:38.71962272Z stderr F [storage/driver] 2021/03/01 10:44:38 list: failed to list: write tcp 172.16.192.176:60758->10.10.59.10:5432: write: broken pipe
armada-api-b86d46465-xdbjt_armada_tiller-a00cf66fa21b19f28771a99a2aa85643c1fbfd2ed9d19d0f10c2a8ac7925cc1b.log:2021-03-01T11:36:32.776510152Z stderr F [storage/driver] 2021/03/01 11:36:32 list: failed to list: write tcp 172.16.192.176:45960->10.10.59.10:5432: write: broken pipe
armada-api-b86d46465-xdbjt_armada_tiller-a00cf66fa21b19f28771a99a2aa85643c1fbfd2ed9d19d0f10c2a8ac7925cc1b.log:2021-03-01T11:38:56.600564874Z stderr F [storage/driver] 2021/03/01 11:38:56 list: failed to list: write tcp 172.16.192.176:35854->10.10.59.10:5432: write: broken pipe
armada-api-b86d46465-xdbjt_armada_til...

I checked again today(20210408T015657Z) and I still see the issue:
sysinv 2021-04-08 17:40:06.274 920503 INFO sysinv.helm.utils [-] Caught HelmTillerFailure exception. Retrying... Exception: Helm operation failure: Failed to obtain pending charts list: Helm operation failure: Error: write tcp 172.16.166.148:52966->10.10.59.10:5432: write: broken pipe
command terminated with exit code 1
sysinv 2021-04-08 17:40:06.691 920503 INFO sysinv.helm.utils [-] Caught HelmTillerFailure exception. Retrying... Exception: Helm operation failure: Failed to obtain pending charts list: Helm operation failure: Error: write tcp 172.16.166.148:55046->10.10.59.10:5432: write: broken pipe
command terminated with exit code 1
sysinv 2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app [-] Helm operation failure: Failed to obtain pending charts list: Helm operation failure: Error: write tcp 172.16.166.148:55046->10.10.59.10:5432: write: broken pipe
command terminated with exit code 1
: HelmTillerFailure: Helm operation failure: Failed to obtain pending charts list: Helm operation failure: Error: write tcp 172.16.166.148:55046->10.10.59.10:5432: write: broken pipe
command terminated with exit code 1
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app Traceback (most recent call last):
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 2294, in perform_app_apply
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     self.app_lifecycle_actions(None, None, rpc_app, lifecycle_hook_info_app_apply)
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/usr/lib64/python2.7/site-packages/sysinv/conductor/kube_app.py", line 1891, in app_lifecycle_actions
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     lifecycle_op.app_lifecycle_actions(context, conductor_obj, self, app, hook_info)
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/opt/platform/helm/21.05/stx-openstack/1.0-78-centos-stable-versioned/plugins/k8sapp_openstack/lifecycle/lifecycle_openstack.py", line 56, in app_lifecycle_actions
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     return self.pre_manifest_apply(app, app_op, hook_info)
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/opt/platform/helm/21.05/stx-openstack/1.0-78-centos-stable-versioned/plugins/k8sapp_openstack/lifecycle/lifecycle_openstack.py", line 144, in pre_manifest_apply
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     result = helm_utils.get_openstack_pending_install_charts()
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     return Retrying(*dargs, **dkw).call(f, *args, **kw)
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/usr/lib/python2.7/site-packages/retrying.py", line 229, in call
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     raise attempt.get()
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     six.reraise(self.value[0], self.value[1], self.value[2])
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app   File "/usr/lib64/python2.7/site-packages/sysinv/helm/utils.py", line 214, in get_openstack_pending_install_charts
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app     reason="Failed to obtain pending charts list: %s" % e)
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app HelmTillerFailure: Helm operation failure: Failed to obtain pending charts list: Helm operation failure: Error: write tcp 172.16.166.148:55046->10.10.59.10:5432: write: broken pipe
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app command terminated with exit code 1
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app
2021-04-08 17:40:06.692 920503 ERROR sysinv.conductor.kube_app
sysinv 2021-04-08 17:40:06.702 920503 INFO sysinv.conductor.kube_app [-] lifecycle hook for application stx-openstack (1.0-78-centos-stable-versioned) started {'lifecycle_type': u'manifest', 'relative_timing': u'post', 'mode': u'auto', 'operation': u'apply', 'extra': {'manifest_applied': False, 'was_applied': True}}.
sysinv 2021-04-08 17:40:06.702 920503 INFO sysinv.conductor.kube_app [-] lifecycle hook for application stx-openstack (1.0-78-centos-stable-versioned) started {'lifecycle_type': u'rbd', 'relative_timing': u'post', 'mode': u'auto', 'operation': u'apply', 'extra': {'manifest_applied': False, 'was_applied': True}}.
sysinv 2021-04-08 17:40:06.702 920503 INFO sysinv.conductor.kube_app [-] lifecycle hook for application stx-openstack (1.0-78-centos-stable-versioned) started {'lifecycle_type': u'resource', 'relative_timing': u'post', 'mode': u'auto', 'operation': u'apply', 'extra': {'manifest_applied': False, 'was_applied': True}}.
sysinv 2021-04-08 17:40:06.952 920503 ERROR sysinv.conductor.kube_app [-] Application apply aborted!.: HelmTillerFailure: Helm operation failure: Failed to obtain pending charts list: Helm operation failure: Error: write tcp 172.16.166.148:55046->10.10.59.10:5432: write: broken pipe
sysinv 2021-04-08 17:40:06.952 920503 INFO sysinv.conductor.kube_app [-] Deregister the abort status of app stx-openstack
sysinv 2021-04-08 17:40:06.953 920503 INFO sysinv.conductor.kube_app [-] lifecycle hook for application stx-openstack (1.0-78-centos-stable-versioned) started {'lifecycle_type': u'operation', 'relative_timing': u'post', 'mode': u'auto', 'operation': u'apply', 'extra': {'app_applied': False, 'manifest_applied': False, 'was_applied': True}}.
sysinv 2021-04-08 17:40:12.956 920503 INFO sysinv.openstack.common.rpc.common [-] Connected to AMQP server on 10.10.59.10:5672
[sysadmin@controller-1 ~(keystone_admin)]$

Revision history for this message

Gustavo Santos (gooshtavow) wrote on 2021-04-09:

#14

Alexandru, can you provide a little more information about the system you've tested this on and if you got the error more than once? I wasn't able to reproduce the issue in several attempts on two different systems and I'm wondering why you're still getting the error.

Revision history for this message

Alexandru Dimofte (adimofte) wrote on 2021-04-09:

#15

Today I checked again if this issue is still there and I tested using a baremetal Standard configuration.
The steps were:
system host-swact controller-0
ssh controller-1
system host-lock controller-0
system host-unlock controller-0
watch system application-list (in 5-6 minutes stx-openstack will try a reapply but will fail)

Revision history for this message

Alexandru Dimofte (adimofte) wrote on 2021-04-09:

#16

Added collected logs from today, baremetal standard configuration Edit (119.9 MiB, application/x-tar)

Revision history for this message

Alexandru Dimofte (adimofte) wrote on 2021-04-13:

#17

I manually checked again today this bug on baremetal: Duplex, Standard and Standard External. I reproduced it on Standard External only.

Revision history for this message

Alexandru Dimofte (adimofte) wrote on 2021-04-13:

#18

Collected logs from standard external baremetal Edit (140.8 MiB, application/x-tar)

Revision history for this message

Gustavo Santos (gooshtavow) wrote on 2021-04-14:

#19

Alexandru, I have opened a code review (https://review.opendev.org/c/starlingx/config/+/786092) for a new fix to this problem. Since I wasn't able to reproduce the issue even after the first fix, can you give this one a try before it gets merged? I also couldn't get the broken pipe error while testing this one several times.

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2021-04-21:

#20

Re-opening as there seems to be more code reviews required to address this issue.
Once a fix is merged in stx master, it will also need to be cherrypicked to the r/stx.5.0 release.

Changed in starlingx:
status:	Fix Released → In Progress

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2021-04-21:

#21

Note:
This seems to be a generic issue with the containerized application framework after a swact.
https://bugs.launchpad.net/starlingx/+bug/1920650 reports the same issue with the oidc application.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-23: Fix merged to config (master)

#22

Reviewed: https://review.opendev.org/c/starlingx/config/+/786092
Committed: https://opendev.org/starlingx/config/commit/ad8567f06485a10edf3857fbc87ae7d3058a1dfc
Submitter: "Zuul (22348)"
Branch: master

commit ad8567f06485a10edf3857fbc87ae7d3058a1dfc
Author: Gustavo Santos <email address hidden>
Date: Tue Apr 13 16:09:21 2021 -0300

Restart tiller on openstack pending install check

    This is another attempt at fixing the same bug as the merged review
    https://review.opendev.org/c/starlingx/config/+/783472 had tried, since
    there were reports indicating that the bug would still occur on certain
    setups.

    This patch explicitly forces a tiller restart when catching the first
    HelmTillerFailure exception caused by the broken pipe error, instead of
    only trying to rerun the 'helm list' command, which was believed to be
    a reliable workaround to the problem, but didn't solve it in every
    possible scenario.

    Closes-Bug: #1917308
    Signed-off-by: Gustavo Santos <email address hidden>
    Change-Id: I38667609173ca5c6fed028f75742ae99efedf149

Changed in starlingx:
status:	In Progress → Fix Released

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2021-04-23:

#23

@Gustavo, please cherrypick your changes to the r/stx.5.0 release asap.

Bill Zvonar (billzvonar) on 2021-04-27

tags:

added: stx.cherrypickneeded

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-27: Fix proposed to config (r/stx.5.0)

#24

Fix proposed to branch: r/stx.5.0
Review: https://review.opendev.org/c/starlingx/config/+/788294

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-04-27: Fix merged to config (r/stx.5.0)

#25

Reviewed: https://review.opendev.org/c/starlingx/config/+/788294
Committed: https://opendev.org/starlingx/config/commit/70df83f1f949f7652300b7b26ed0b28d9b095cff
Submitter: "Zuul (22348)"
Branch: r/stx.5.0

commit 70df83f1f949f7652300b7b26ed0b28d9b095cff
Author: Gustavo Santos <email address hidden>
Date: Tue Apr 13 16:09:21 2021 -0300

Restart tiller on openstack pending install check

    This is another attempt at fixing the same bug as the merged review
    https://review.opendev.org/c/starlingx/config/+/783472 had tried, since
    there were reports indicating that the bug would still occur on certain
    setups.

    This patch explicitly forces a tiller restart when catching the first
    HelmTillerFailure exception caused by the broken pipe error, instead of
    only trying to rerun the 'helm list' command, which was believed to be
    a reliable workaround to the problem, but didn't solve it in every
    possible scenario.

    Closes-Bug: #1917308
    Signed-off-by: Gustavo Santos <email address hidden>
    Change-Id: I38667609173ca5c6fed028f75742ae99efedf149
    (cherry picked from commit ad8567f06485a10edf3857fbc87ae7d3058a1dfc)

Ghada Khalil (gkhalil) on 2021-04-27

tags:

added: in-r-stx50
removed: stx.cherrypickneeded

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-13: Related fix proposed to integ (master)

#26

Related fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/791092

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-13: Related fix proposed to ansible-playbooks (master)

#27

Related fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/791093

Revision history for this message

Angie Wang (angiewang) wrote on 2021-05-14:

#28

Just a note, helm is using package sqlx to establish connection with postgres backend and sqlx is using Golong postgres driver. The "broken pipe" issue is an issue in Golang Postgres driver - https://github.com/lib/pq/issues/870 which was just fixed at the end of last year https://github.com/lib/pq/pull/1013. Has not been merged to sqlx https://github.com/jmoiron/sqlx/pull/715

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-14: Related fix merged to integ (master)

#29

Reviewed: https://review.opendev.org/c/starlingx/integ/+/791092
Committed: https://opendev.org/starlingx/integ/commit/b3540ccfdfa6956fb20c62e5e5bb76af56d2ab63
Submitter: "Zuul (22348)"
Branch: master

commit b3540ccfdfa6956fb20c62e5e5bb76af56d2ab63
Author: Robert Church <email address hidden>
Date: Wed May 12 22:36:23 2021 -0400

Update the liveness probe to verify postgres connectivity

    Change the tillerLivenessProbeTemplate to test the connectivity to the
    postgres backend. We will override the periodSeconds and
    failureThreshold when installing the helm chart to trigger a restart of
    the tiller pod over a swact when the postgres DB/server moves from one
    controller to the other.

This will help guarantee that the tiller connection is always
re-established if the connectivity to the postgres backend fails.

    Change-Id: I7fbed33a8c821f6c9254f58d5953e2115cf4141a
    Related-Bug: #1917308
    Signed-off-by: Robert Church <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-14: Related fix merged to ansible-playbooks (master)

#30

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/791093
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/d5460198dc0310a80580537fd8df76ae00e17f02
Submitter: "Zuul (22348)"
Branch: master

commit d5460198dc0310a80580537fd8df76ae00e17f02
Author: Robert Church <email address hidden>
Date: Wed May 12 22:45:38 2021 -0400

Adjust armada's tiller container liveness probe

    With the liveness probe update in the armada helm chart to test the
    connectivity to the postgres backend, adjust the periodSeconds and
    failureThreshold to align with the minimum swact time to be expected for
    postgres switching from one controller to another.

Reviewing logs from various H/W labs it appears that average postgres
swact time ranges from 9s-20s, with the mean ~15s.

    Times can be observed with:
    2021-05-09T13:32:24.475 controller-1 OCF_pgsql(postgres)[396293]: info
                                         INFO: server shutting down
    2021-05-09T13:32:33.423 controller-0 OCF_pgsql(postgres)[147541]: info
                                         INFO: server starting

    Set the periodSeconds to 4 and the failureThreshold to 2 so that if the
    postgres server is not accessible, the tiller container will be
    restarted within the 9s minimum swact time. This will ensure that the
    next time tiller is required by Armada or used by the helmv2-cli that
    the connection to postgres backend has been re-established.

    Change-Id: I7454a737771d9a608d2fe69c5136d37da022007e
    Depends-On: https://review.opendev.org/c/starlingx/integ/+/791092
    Related-Bug: #1917308
    Signed-off-by: Robert Church <email address hidden>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-15: Related fix proposed to integ (master)

#31

Related fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/791599

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-15: Related fix merged to integ (master)

#32

Reviewed: https://review.opendev.org/c/starlingx/integ/+/791599
Committed: https://opendev.org/starlingx/integ/commit/4e1aa82e96d9b4caeff7e7b31632733c395c6ad0
Submitter: "Zuul (22348)"
Branch: master

commit 4e1aa82e96d9b4caeff7e7b31632733c395c6ad0
Author: Robert Church <email address hidden>
Date: Sat May 15 16:24:29 2021 -0400

Update postgres liveness check to support IPv6 addresses

    Templating will add square brackets for IPv6 addresses which are
    interpreted as an array vs. a string. Quote this so that it interpreted
    correctly.

    Change-Id: I2b705015a74ea2e4e914b7a83cdceed37d49b766
    Related-Bug: #1917308
    Signed-off-by: Robert Church <email address hidden>

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2021-05-17:

#33

The additional commits above will need to be merged in the r/stx.5.0 branch

tags:

added: stx.cherrypickneeded
removed: in-r-stx50

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-17: Related fix proposed to ansible-playbooks (r/stx.5.0)

#34

Related fix proposed to branch: r/stx.5.0
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/791777

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-17: Related fix proposed to integ (r/stx.5.0)

#35

Related fix proposed to branch: r/stx.5.0
Review: https://review.opendev.org/c/starlingx/integ/+/791785

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Related fix merged to integ (r/stx.5.0)

#36

Reviewed: https://review.opendev.org/c/starlingx/integ/+/791785
Committed: https://opendev.org/starlingx/integ/commit/106331ecec1a77f3a04a3a15efcd4886d9104ea9
Submitter: "Zuul (22348)"
Branch: r/stx.5.0

commit 106331ecec1a77f3a04a3a15efcd4886d9104ea9
Author: Robert Church <email address hidden>
Date: Wed May 12 22:36:23 2021 -0400

Update the liveness probe to verify postgres connectivity

    Change the tillerLivenessProbeTemplate to test the connectivity to the
    postgres backend. We will override the periodSeconds and
    failureThreshold when installing the helm chart to trigger a restart of
    the tiller pod over a swact when the postgres DB/server moves from one
    controller to the other.

This will help guarantee that the tiller connection is always
re-established if the connectivity to the postgres backend fails.

    Change-Id: I7fbed33a8c821f6c9254f58d5953e2115cf4141a
    Related-Bug: #1917308
    Signed-off-by: Robert Church <email address hidden>
    (cherry picked from commit b3540ccfdfa6956fb20c62e5e5bb76af56d2ab63)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Related fix proposed to integ (r/stx.5.0)

#37

Related fix proposed to branch: r/stx.5.0
Review: https://review.opendev.org/c/starlingx/integ/+/791943

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Related fix merged to ansible-playbooks (r/stx.5.0)

#38

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/791777
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/4555715323b25613768214d891e414959ac7b5d6
Submitter: "Zuul (22348)"
Branch: r/stx.5.0

commit 4555715323b25613768214d891e414959ac7b5d6
Author: Robert Church <email address hidden>
Date: Wed May 12 22:45:38 2021 -0400

Adjust armada's tiller container liveness probe

    With the liveness probe update in the armada helm chart to test the
    connectivity to the postgres backend, adjust the periodSeconds and
    failureThreshold to align with the minimum swact time to be expected for
    postgres switching from one controller to another.

Reviewing logs from various H/W labs it appears that average postgres
swact time ranges from 9s-20s, with the mean ~15s.

    Times can be observed with:
    2021-05-09T13:32:24.475 controller-1 OCF_pgsql(postgres)[396293]: info
                                         INFO: server shutting down
    2021-05-09T13:32:33.423 controller-0 OCF_pgsql(postgres)[147541]: info
                                         INFO: server starting

    Set the periodSeconds to 4 and the failureThreshold to 2 so that if the
    postgres server is not accessible, the tiller container will be
    restarted within the 9s minimum swact time. This will ensure that the
    next time tiller is required by Armada or used by the helmv2-cli that
    the connection to postgres backend has been re-established.

    Change-Id: I7454a737771d9a608d2fe69c5136d37da022007e
    Depends-On: https://review.opendev.org/c/starlingx/integ/+/791092
    Related-Bug: #1917308
    Signed-off-by: Robert Church <email address hidden>
    (cherry picked from commit d5460198dc0310a80580537fd8df76ae00e17f02)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-18: Related fix merged to integ (r/stx.5.0)

#39

Reviewed: https://review.opendev.org/c/starlingx/integ/+/791943
Committed: https://opendev.org/starlingx/integ/commit/821de96615cb6f93fbc39f4baaa769029328d34d
Submitter: "Zuul (22348)"
Branch: r/stx.5.0

commit 821de96615cb6f93fbc39f4baaa769029328d34d
Author: Robert Church <email address hidden>
Date: Sat May 15 16:24:29 2021 -0400

Update postgres liveness check to support IPv6 addresses

    Templating will add square brackets for IPv6 addresses which are
    interpreted as an array vs. a string. Quote this so that it interpreted
    correctly.

    Change-Id: I2b705015a74ea2e4e914b7a83cdceed37d49b766
    Related-Bug: #1917308
    Signed-off-by: Robert Church <email address hidden>
    (cherry picked from commit 4e1aa82e96d9b4caeff7e7b31632733c395c6ad0)

Revision history for this message

Ghada Khalil (gkhalil) wrote on 2021-05-19:

#40

Adding in-r-stx50 as the latest commits have been merged in the r/stx.5.0 release branch

tags:

added: in-r-stx50
removed: stx.cherrypickneeded

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-19: Related fix proposed to ansible-playbooks (f/centos8)

#41

Related fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/792195

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-27: Fix proposed to config (f/centos8)

#42

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-30:

#43

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-05-31: Related fix proposed to integ (f/centos8)

#44

Related fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/integ/+/793754

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-02: Related fix proposed to ansible-playbooks (f/centos8)

#45

Related fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/794324

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-02: Change abandoned on ansible-playbooks (f/centos8)

#46

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/ansible-playbooks/+/792195

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-03: Related fix merged to ansible-playbooks (f/centos8)

#47

Download full text (52.5 KiB)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/794324
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/163ec9989cc7360dba4c572b2c43effd10306048
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 4e96b762f549aadb0291cc9bcf3352ae923e94eb
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 15:48:19 2021 +0000

Revert "Restore host filesystems with collected sizes"

This reverts commit 255488739efa4ac072424b19f2dbb7a3adb0254e.

Reason for revert: Did a rework to fix https://bugs.launchpad.net/starlingx/+bug/1926591. The original problem was in puppet, and this fix in ansible was not good enough, it generated some other problems.

Change-Id: Iea79701a874effecb7fe995ac468d50081d1a84f
Depends-On: I55ae6954d24ba32e40c2e5e276ec17015d9bba44

commit c064aacc377c8bd5336ceab825d4bcbf5af0b5e8
Author: Angie Wang <email address hidden>
Date: Fri May 21 21:28:02 2021 -0400

Ensure apiserver keys are present before extract from tarball

    This is to fix the upgrade playbook issue that happens during
    AIO-SX upgrade from stx4.0 to stx5.0 which introduced by
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/792093.
    The apiserver keys are not available in stx4.0 side so we need
    to ensure the keys under /etc/kubernetes/pki are present in the
    backed-up tarball before extracting, otherwise playbook fails
    because the keys are not found in the archive.

    Change-Id: I8602f07d1b1041a7fd3fff21e6f9a422b9784ab5
    Closes-Bug: 928925
    Signed-off-by: Angie Wang <email address hidden>

commit 0261f22ff7c23d2a8608fe3b51725c9f29931281
Author: Don Penney <email address hidden>
Date: Thu May 20 23:09:07 2021 -0400

Update SX to DX migration to wait for coredns config

    This commit updates the SX to DX migration playbook to wait after
    modifying the system mode to duplex until the runtime manifest that
    updates coredns config has completed. The playbook will wait for up to
    20 minutes to allow for the possibilty that sysinv has multiple
    runtime manifests queued up, each of which could take several minutes.

    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/792494
    Depends-On: https://review.opendev.org/c/starlingx/config/+/792496
    Change-Id: I3bf94d3493ae20eeb16b3fdcb27576ee18c0dc4d
    Closes-Bug: 1929148
    Signed-off-by: Don Penney <email address hidden>

commit 7c4f17bd0d92fc1122823211e1c9787829d206a9
Author: Daniel Safta <email address hidden>
Date: Wed May 19 09:08:16 2021 +0000

Fixed missing apiserver-etcd-client certs

    When controller-1 is the active controller
    the backup archive does not contain
    /etc/etcd/apiserver-etcd-client.{crt, key}

This change adds a new task which brings
the certs from /etc/kubernetes/pki

    Closes-bug: 1928925
    Signed-off-by: Daniel Safta <email address hidden>
    Change-Id: I3c68377603e1af9a71d104e5b1108e9582497a09

commit e221ef8fbe51aa6ca229b584fb5632fe512ad5cb
Author: David Sullivan <email address hidden>
Date: Wed May 19 16:01:27 2021 -0500

Support boo...

Reviewed:  https://review.opendev.org/c/starlingx/ansible-playbooks/+/794324
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/163ec9989cc7360dba4c572b2c43effd10306048
Submitter: "Zuul (22348)"
Branch:    f/centos8

commit 4e96b762f549aadb0291cc9bcf3352ae923e94eb
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Sat May 22 15:48:19 2021 +0000

Revert "Restore host filesystems with collected sizes"
    
    This reverts commit 255488739efa4ac072424b19f2dbb7a3adb0254e.
    
    Reason for revert: Did a rework to fix https://bugs.launchpad.net/starlingx/+bug/1926591. The original problem was in puppet, and this fix in ansible was not good enough, it generated some other problems.
    
    Change-Id: Iea79701a874effecb7fe995ac468d50081d1a84f
    Depends-On: I55ae6954d24ba32e40c2e5e276ec17015d9bba44

commit c064aacc377c8bd5336ceab825d4bcbf5af0b5e8
Author: Angie Wang <angie.wang@windriver.com>
Date:   Fri May 21 21:28:02 2021 -0400

Ensure apiserver keys are present before extract from tarball
    
    This is to fix the upgrade playbook issue that happens during
    AIO-SX upgrade from stx4.0 to stx5.0 which introduced by
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/792093.
    The apiserver keys are not available in stx4.0 side so we need
    to ensure the keys under /etc/kubernetes/pki are present in the
    backed-up tarball before extracting, otherwise playbook fails
    because the keys are not found in the archive.
    
    Change-Id: I8602f07d1b1041a7fd3fff21e6f9a422b9784ab5
    Closes-Bug: 928925
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 0261f22ff7c23d2a8608fe3b51725c9f29931281
Author: Don Penney <don.penney@windriver.com>
Date:   Thu May 20 23:09:07 2021 -0400

Update SX to DX migration to wait for coredns config
    
    This commit updates the SX to DX migration playbook to wait after
    modifying the system mode to duplex until the runtime manifest that
    updates coredns config has completed. The playbook will wait for up to
    20 minutes to allow for the possibilty that sysinv has multiple
    runtime manifests queued up, each of which could take several minutes.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/792494
    Depends-On: https://review.opendev.org/c/starlingx/config/+/792496
    Change-Id: I3bf94d3493ae20eeb16b3fdcb27576ee18c0dc4d
    Closes-Bug: 1929148
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 7c4f17bd0d92fc1122823211e1c9787829d206a9
Author: Daniel Safta <daniel.safta@windriver.com>
Date:   Wed May 19 09:08:16 2021 +0000

Fixed missing apiserver-etcd-client certs
    
    When controller-1 is the active controller
    the backup archive does not contain
    /etc/etcd/apiserver-etcd-client.{crt, key}
    
    This change adds a new task which brings
    the certs from /etc/kubernetes/pki
    
    Closes-bug: 1928925
    Signed-off-by: Daniel Safta <daniel.safta@windriver.com>
    Change-Id: I3c68377603e1af9a71d104e5b1108e9582497a09

commit e221ef8fbe51aa6ca229b584fb5632fe512ad5cb
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Wed May 19 16:01:27 2021 -0500

Support bootstrap replay with networking changes
    
    Currently bootstrap playbook replay will fail if the management or
    cluster host networks are changed. To resolve this a couple of changes
    are needed:
    
    * Restart the sysinv agent and wait until it is ready. The sysinv agent
      uses the current management ip for the rabbitMQ connection and
      applying runtime manifests. The process needs to be restarted to
      resync that data.
    
    * Copy the etcd certs to the /opt/platform on replay. The etcd-server
      certs are regenerated on replay. When the cluster host network changed
      the SAN in the certs under /opt/platform were out of date resulting in
      kube-apiserver failures on controller-0 unlock.
    
    Closes-Bug: 1925668
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I228321a2540a0024cd217ed844feb54be9ae3b29

commit 41ada83e4f4486d0795eed3e7a8bbe4227ee88d8
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Wed May 19 12:53:26 2021 -0500

Bug fix: update barbican external id if the project id changes
    
    The previous commit
    https://review.opendev.org/c/starlingx/ansible-playbooks/+/790824
    unexpectedly moves the update_barbican_project_external_id out of the
    scope of the project id changes. Further, the old_id['id'] is
    duplicated with the current_project_id, it will be undeclared in the
    case of the project id not changed (by deleting the subcloud in
    powered off status and re-add it with --migrate option). This commit
    deletes the duplicated variable and moves the
    update_barbican_project_external_id to the correct scope.
    
    Tests:
    1 Delete a subcloud from a central cloud and add it back with the
    migrate option.
    2 Migrate a subcloud successfully to a different central cloud.
    3 Migrate a subcloud successfully with an extra nonlocal user to a
    different central cloud.
    
    Partial-Bug: 1928139
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>
    Change-Id: I8acf7bcea5d8e77b92877427ccd40b95e8b3515e

commit ac0c5d51a8708c8c056a736ff11ce1a0b1550c4f
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu May 13 19:53:41 2021 +0300

Create the pod_max_pids service parameter
    
    This adds a default entry to service parameters.
    
    Create the default entry taking into consideration the most hungry of
    the optional StarlingX apps. The user is free to modify the value as
    desired, using 'system service-parameter-modify'.
    
    Same can be created by the user using 'system service-parameter-add',
    but this helps the user by being transparent in service-parameter-list.
    
    If this service parameter was missing an entry, then no hieradata
    variable would have been generated, so puppet would have used
    a predefined value.
    
    Partial-Bug: 1928353
    Depends-On: I74fcf2bd405c2a3811a4f27a55b28c0d001430e1
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I707ddc4ca67595fbf809c6ffc15ecd4fb21f4661

commit 64bf73c85c5de0737c8a1cf967b6b251288ee424
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue May 11 16:22:12 2021 +0300

Enable kubelet support for pod pid limit
    
    This protects the system before the unlock. This has the most meaning
    during the restore procedure, when the system is unprotected until
    unlock (until puppet generates the config file containing protection).
    
    Partial-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I09c4d4f494bc113ae8b439256655476e03b54b0e

commit cfc719b82a6f1651a2b3950b316244f907d58491
Author: Angie Wang <angie.wang@windriver.com>
Date:   Mon May 17 17:11:12 2021 -0400

Configure kubeadm to not apply the default taint
    
    The taint "node-role.kubernetes.io/master:NoSchedule" needs
    to be removed from master node so that pods can be scheduled
    on it. This is handled by a bootstrap task. However, issue
    was seen that the default taint was not removed during bootstrap
    that causes armada pod fails to be scheduled on controller-0.
    This happens on one of the subcloud when bootstrapping a batch
    of 50 subclouds.
    
    Add configuration in kubeadm to not apply the default taint
    at the beginning so it doesn't need to be removed afterwards.
    
    Tested AIO-SX, DX upgrade and a batch of 50 subclouds deployment
    
    Change-Id: I543280ddd55ec94ccf0586dc07877349baa06bdd
    Closes-Bug: 1928722
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 45c74db36670e9cba3475e598a4655490b744cee
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Thu May 13 10:46:02 2021 -0500

Cleanup of subcloud rehoming playbook
    
    As the ansible log is accessible for all the users, we don't want to
    expose the keystone IDs and passwords in a subcloud's ansible log.
    This commit hides the keystone IDs and passwords in the ansible log.
    
    This commit also removes un-used facts and the rehome_in_progress
    flag.
    
    Tested with successfully re-homed a SX and a DX subcloud in virtual
    box env.
    
    Story: 2008774
    Task: 42462
    
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>
    Change-Id: I6c2e83ba5b4923e9d7c82ccf94165608739e59e1

commit 0ca19e0870e06b601553ea2a9d9e1cfc0367d75f
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Tue May 11 16:02:49 2021 -0500

Clean non-local keystone user during migrating keystone IDs
    
    During the progress of re-homing a sub cloud to a new central cloud,
    insert a new user ID will be failed if there's a duplicated non-local
    user in keystone database. This commit deletes the non-local user with
    the same user id before inserting a new keystone local user into the
    database.
    
    Test:
    1. Add a non-local user in keystone database, and this user should
    have a same user ID as the new central cloud.
    2. Successfully re-home this subcloud to the new central cloud.
    3. Successfully re-home a subcloud without the duplicated user in
    keystone database.
    
    Partial-Bug: 1928139
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>
    Change-Id: Iecaac3400cb9362acc63686ae9470cf01f7eb2e1

commit 275e046280a71b0c5c8c637fdc7065d4e2e19aee
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Thu May 13 13:07:56 2021 -0300

Fix deletion of temp dir when backup fails
    
    When the /opt/backups disk is full and the backup playbook
    is executed, the Ansible task fails to create a temp directory.
    After the failure the cleanup code is executed but if the temp
    dir was not created, it fails because tempdir.path will be undefined.
    This second failure aborts the cleanup code and the
    "System Backup in progress." will stay on indefinitely, requiring
    manual intervetion.
    
    Tested by executing the backup procedure when /opt/backups
    is full.
    
    Closes-Bug: 1928365
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: I76c852b65bde4b40a22bcb9be3a81776ade86d15

commit 02b36df15ba42c4346b34b4ddce29d95b4c7fc69
Author: Tee Ngo <tee.ngo@windriver.com>
Date:   Thu May 13 12:17:06 2021 -0400

Enable ssh connection retries in Ansible playbooks
    
    In this commit:
     - Allow ssh retries in the playbooks in case of unreachable
       failures that are observed in batch subcloud deployment or
       upgrade where the system controller has low network bandwidth.
     - Increase ssh timeout to accomodate slow sudo response when
       ldap service is not yet available for the subcloud.
     - Allow docker login and images pull retries in case docker
       or docker proxy throws an exception due to slow response
       (e.g. oam network is overwhelmed).
    
    Closes-Bug: 1928357
    Change-Id: Ibc6155671b20a01340b66270c3c402174d34ab9e
    Signed-off-by: Tee Ngo <tee.ngo@windriver.com>

commit 20e44c6dd71758f89bbfe88c6204e777c17deb5d
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Tue Apr 27 01:40:16 2021 +0800

Fix bootstrap replay failure when changing mgmt subnet
    
    After mgmt subnet is changed, we use previous controller_0
    address for etcd puppet apply to avoid resycing an nonexistent
    hieradata file.
    
    Change-Id: Ie31c48153af30df240237013dd51bfffea5213cd
    Closes-Bug: 1925668
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit 7228f60e146286d0948ac67bee159a7b8c54f704
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Tue May 11 18:56:03 2021 +0300

Fix restore user images playbook
    
    When running restore_user_images playbook,
    after the images will be uploaded to the local registry,
    the docker images stored in the local docker filesystem
    will be deleted by a python script called by the playbook.
    Sometimes this script will try to delete images
    that are not present into the local docker filesystem and
    will throw some errors.
    We fix this by adding a check before trying to delete an image.
    
    Also, updated the
    /playbooks/roles/common/push-docker-images/files/download_images.py
    file to do the same checks since it does similar steps
    and can also fail when deleting
    the images from the local docker cache.
    
    Closes-Bug: 1928092
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
    Change-Id: I1be45b2669ad1ab209aa49f050106dd1a7759cee

commit d5460198dc0310a80580537fd8df76ae00e17f02
Author: Robert Church <robert.church@windriver.com>
Date:   Wed May 12 22:45:38 2021 -0400

Adjust armada's tiller container liveness probe
    
    With the liveness probe update in the armada helm chart to test the
    connectivity to the postgres backend, adjust the periodSeconds and
    failureThreshold to align with the minimum swact time to be expected for
    postgres switching from one controller to another.
    
    Reviewing logs from various H/W labs it appears that average postgres
    swact time ranges from 9s-20s, with the mean ~15s.
    
    Times can be observed with:
    2021-05-09T13:32:24.475 controller-1 OCF_pgsql(postgres)[396293]: info
                                         INFO: server shutting down
    2021-05-09T13:32:33.423 controller-0 OCF_pgsql(postgres)[147541]: info
                                         INFO: server starting
    
    Set the periodSeconds to 4 and the failureThreshold to 2 so that if the
    postgres server is not accessible, the tiller container will be
    restarted within the 9s minimum swact time. This will ensure that the
    next time tiller is required by Armada or used by the helmv2-cli that
    the connection to postgres backend has been re-established.
    
    Change-Id: I7454a737771d9a608d2fe69c5136d37da022007e
    Depends-On: https://review.opendev.org/c/starlingx/integ/+/791092
    Related-Bug: #1917308
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 9d4203b7e0d4b321fcccf14e69367573ef5d5093
Author: Tao Liu <tao.liu@windriver.com>
Date:   Wed May 12 20:30:37 2021 -0400

Use async_timeout for kubectl wait timeout
    
    Data migration failed for 5 subclouds during 30 subcloud upgrade.
    This failure was caused by timeout waiting for Kubernetes component,
    Networking or Armada pods to be ready.
    
    The async_timeout is set to 120s in restore mode, but the pod wait time
    is still set to 30s. This update changes the pod wait time to be the
    same as async_timeout.
    
    Tested on DC-2 with 50 subcloud parallel upgrade
    
    Closes-Bug: 1928252
    
    Signed-off-by: Tao Liu <tao.liu@windriver.com>
    Change-Id: If2b21d0cd2e0de9e84869323a43d4a249d031132

commit 36451c99ce76e76084ad5c68e4954bf347e8c0b7
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue May 11 16:24:26 2021 +0300

Add helm sql database ip to armada overrides
    
    This will be used by tiller container to check that the container
    networking is properly set up.
    
    Partial-Bug: 1928141
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I177bb628497611eb64472291a04d635856c26590

commit 86fbafec14744d6630aff6e6ac2bb165c5be2be8
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Tue May 11 14:58:29 2021 -0400

Fix: Upgrade does not replay correctly after activate failure
    
    When the etcd runtime manifest fails during upgrade activate,
    the activation fails as expected. However when the upgrade-activate
    is attempted again, the etcd upgrade playbook is not re-run.
    Depending on when the manifest fails, this could result in secure
    etcd remaining disabled on the system.
    
    This commit makes it possible to replay the etcd upgrade playbook,
    thus solving the issue.
    
    Change-Id: I7f453d9040916381519ac96ed4567ec5fb6e7a8d
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Closes-Bug: 1928130

commit 2dfadea581d1ac9955a7921bd2730e36a152255c
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Thu Mar 11 11:12:05 2021 -0500

Add a rehome subcloud playbook
    
    This commit creates a rehome_subcloud playbook to migrate an existing
    subcloud from the original central cloud to a new central cloud. This
    playbook is expected to be remotely play from the new central cloud
    system controller based on the subcloud's overrides. This playbook
    updates the system controller network info in the subcloud, migrates
    the keystone data and updates the subcloud's ca and certs. After
    running this playbook and unlock the subcloud controllers, the
    subcloud can be discovered online in the new central cloud, and
    can be managed and brought in-sync from the new central cloud.
    
    Test:
    Successfully migrate a AIOSX and a AIODX subcloud to the new central
    cloud with this playbook.
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/784767
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/785977
    Depends-On: https://review.opendev.org/c/starlingx/config/+/786638
    Depends-On: https://review.opendev.org/c/starlingx/config/+/787213
    
    Story: 2008774
    Task: 42152
    Change-Id: Iaa6699951e855c76602bd43f71d42e64e298c786
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit d23b932208b778ec73cc51bccf460310f16ceaad
Author: Bin Qian <bin.qian@windriver.com>
Date:   Mon May 3 15:32:06 2021 -0400

Ensure n3000-opae image cache is not deleted
    
    Skip removing n3000-opae image cache. This image is expected to be
    available to reset n3000 fpgas before docker local registry is ready.
    
    Closes-Bug: 1927000
    Signed-off-by: Bin Qian <bin.qian@windriver.com>
    Change-Id: I3372d89c48b394c8cac6b9da06d2528bb6afa803

commit 255488739efa4ac072424b19f2dbb7a3adb0254e
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Thu Apr 29 16:37:21 2021 +0300

Restore host filesystems with collected sizes
    
    Since https://review.opendev.org/c/starlingx/ansible-playbooks/+/784860,
    the host filesystems(backup, docker, kubelet, scratch) are
    no longer resized in ansible at restore and they are not using the
    collected sizes from the backup archive. Puppet will try to
    resize them when unlocking but this will generate some errors.
    
    The solution is to create the host filesystems with the
    correct sizes at restore. The sizes are taken from the
    backup archive.
    
    Closes-Bug: 1926591
    Change-Id: Id670408a518e4a1e3fc75a668eea42d26a972d66
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit bc0fba6bbbd0182c4886e5a3ccbfc2d0973cfd70
Author: Bin Qian <bin.qian@windriver.com>
Date:   Wed Apr 28 11:52:36 2021 -0400

Remove restore subcloud admin endpoint certificate from config
    
    This change is to exclude the admin endpoint certificate from restore
    to config directory. The admin endpoint certificate is stored in k8s
    (backup) and restore as part of k8s restore. Sysinv will generate it
    into hieradata from k8s secret and puppet will genereate the pem for
    haproxy.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/786666
    Partial-Bug: 1923510
    Signed-off-by: Bin Qian <bin.qian@windriver.com>
    Change-Id: Iae8fb9c53e0aa6797a25b872adb0c99636c4243a

commit e88c9290a3ca2d947828a1f4e988f9fe61a1a623
Author: Melissa Wang <melissa.wang@windriver.com>
Date:   Mon Apr 26 17:45:40 2021 -0400

SX to DX migration: Check network interface config
    
    This change adds a semantic check to ensure that the cluster-host
    and management networks are not configured on loopback before
    allowing sx-to-dx migration.
    
    Task: 42375
    Story: 2008587
    
    Change-Id: I87326db222ffe9eb8bf23b69d17f676abc7c242d
    Signed-off-by: Melissa Wang <melissa.wang@windriver.com>

commit e25439d49d127779f9ab32650a4a51027242884b
Author: Bin Qian <bin.qian@windriver.com>
Date:   Wed Apr 14 14:53:56 2021 -0400

Remove creating admin endpoint cert in subcloud bootstrap
    
    This change removes creating admin endpoint cert in subcloud
    bootstrap.
    The admin endpoint cert is generated in manifest at the time when
    the controller node is unlock the first time. The cert data is
    retrieved directly from k8s secret data (where cert-manager is
    responsible to maintain it and keep it up to date).
    
    Partial-Bug: 1923510
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/786666
    Signed-off-by: Bin Qian <bin.qian@windriver.com>
    Change-Id: Ie6a5c8fe159efcdebdb4c81666e981772408b82c

commit 5b286734637ca6f503a62131b309829d2f308fed
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Mon Apr 5 17:01:48 2021 -0500

Remove drbd resize actions from bootstrap playbook
    
    drbd resize actions can cause failures during unlock. Avoid all drbd
    resize actions during the bootstrap playbook. Pass the correct
    filesystem sizes to the bootstrap manifest. Use the collected drbd
    filesystem sizes during the restore/upgrade bootstrap manifest.
    
    Closes-Bug: 1920245
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I235575ad40ba84298d3db5b39ce7861a143c78a8

commit 05d3d8c21f5017b26466bfd1cdd1f1e7accf266f
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Mon Apr 12 10:35:57 2021 -0400

Fix kube-apiserver pod removal
    
    This commit adds rolebinding configuration to bind
    the "privileged-psp-user" role to the kubelet user.
    It fixes the issue where the kube-apiserver pod does
    not get recreated after enabling PodSecurityPolicy
    plugin. With this fix we make sure to allow the apiserver
    pod creation by granting permission to the kubelet user
    to create that pod.
    
    Closes-Bug: 1881605
    
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
    Change-Id: Ibdf6d4cacf2ce83dfa744455dac460027b2a6e47

commit 802afc1c23d26f44bd7954a5ec038c5badddcaa3
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Mon Feb 8 22:28:49 2021 +0000

Inform conductor of ansible backup actions.
    
    Let apps run custom code around backup actions inside the playbook.
    
    Story: 2007960
    Task: 40769
    Depends-On: I0ebab45f4846cbcd25fecac6bf99195d9047eb8a
    
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: I61156db05970aa03c96ddc8533fdd4f4a680b334

commit f6cd32b82bc01c9f1b9f53d1754aac6a45e51643
Author: Don Penney <don.penney@windriver.com>
Date:   Fri Mar 26 11:23:51 2021 -0400

Copy default-registry-key to deployment namespace
    
    After creating the deployment namespace, copy the default-registry-key
    from the kube-system namespace into it for use by pods running in that
    namespace.
    
    Story: 2007361
    Task: 42186
    Signed-off-by: Don Penney <don.penney@windriver.com>
    Change-Id: I3b431c74295cc0099f00814ee28709b1b4c56c8c

commit d61c82f555034f104e1bee8a83bb19ad448012b2
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Wed Mar 31 23:12:43 2021 +0800

Change CN of etcd CA since etcd will reuse kubernetes CA
    
    Basic test pass on simplex.
    
    Closes-Bug: 1921511
    
    Change-Id: If3a7cca4a03b05ac5eb61f7f579449a7393c1644
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit 8a7c871a8d7c8843b404e6779eccd2e483293c2a
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Mon Mar 22 15:32:48 2021 -0400

Add default psp configuration to psp-policies.yaml
    
    Added ClusterRoleBindings configuration in psp-policies.yaml to
    enhance the current capabilities.
    The new configuration gives users the ability that by default,
    they can create pods and deployments with “restricted” capabilities
    in their namespaces.
    When psp policies are applied at bootstrap time, all tenants/users
    have at least restricted capabilities and access rights will be
    added as needed, reducing this way the manual configuration
    required.
    
    Closes-Bug: 1885716
    
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
    Change-Id: I6118efd6832836829d31aa84b2b4305d5a1f24c4

commit f7d5491e404ba2854105278f6b9e2883b52a5206
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Tue Aug 25 12:16:22 2020 -0400

Remove CPU resource requests from platform pods
    
    In order to allow applications to make use of all "application" cores,
    we need to remove all CPU resource requests from "platform" pods.  We tried
    to make this work by modifying Kubernetes itself to track platform pod resources
    separately, but it would have required changes to kube-scheduler, which would
    involve building a custom kube-scheduler container image.
    
    Some platform pods are created by "kubeadm" during early init and are handled
    via a kubernetes code change.  For calico, multus, and sriov we modify the
    templates used to create the pods and remove any reference to CPU resources.
    
    Normally this would mean that they could get throttled by application pods, but
    since they're running on the platform cores this shouldn't be a problem.
    
    Story: 2008760
    Task: 42170
    Change-Id: Ibf73bd0d105e4040f02a4114afbb31d131fc9585
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 41de2e52db9985d84397d7ba56a59bbeaa9cf88f
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Fri Mar 19 01:00:10 2021 +0800

Fix bootstrap replay fails due to not changing etcd config.
    
    When replay bootstrap, we still need to apply etcd puppet manifest.
    Test PASS for below cases
    1）System bring up with bootstrap replay
       - non cluster host network reconfiguration
    2）System bring up with bootstrap replay
       - cluster host network change
    3）AIO simplex backup and restore
    
    Closes-Bug: 1918943
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>
    Change-Id: Ib892776f4b2949f9255fc4725add1f0e362956a9

commit fbd6a848d53ae6de9703466215fb85df2485ab0b
Author: Scott Little <scott.little@windriver.com>
Date:   Thu Mar 25 11:35:34 2021 -0400

Set SW_VERSION 21.05
    
    Prep for the StarlingX 5.0 release.
    SW_VERSION uses YY.MM format.
    
    Story: 2008055
    Task: 42113
    Depends-On: https://review.opendev.org/c/starlingx/utilities/+/783042
    Signed-off-by: Scott Little <scott.little@windriver.com>
    Change-Id: Ic718ed214dfdb6eb8fcfffcadbbeadfbb8f6d052

commit 2ffd9c4bff6bd736f35cc1fb2fd3b0e25c9ef8f2
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Fri Mar 19 10:36:55 2021 -0500

drbd filesystems not resized during bootstrap
    
    Remove drbd resize_result check on resize2fs operation. Both operations
    should run when requested. These commands will return 0 when the disk is
    resized and will return 0 if the disk is already correctly sized. Any
    non-zero return code should fail the playbook.
    
    Closes-Bug: 1920245
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I946f63b2886b5377494e658a59586005c27ec2d2

commit 8c2580cc85b8e7a2783ece895d7c2c476db81990
Author: Don Penney <don.penney@windriver.com>
Date:   Wed Mar 10 12:13:56 2021 -0500

Introduce SX to DX migration playbook
    
    This commit introduces a migrate-subcloud.yml playbook that the user
    can run, with an overrides file to provide config values, to perform
    the migration steps for a subcloud. Once the migration playbook has
    been applied and the subcloud has recovered from the unlock (performed
    by the playbook), the second controller can be installed and
    configured.
    
    Story: 2008587
    Task: 41743
    Depends-On: https://review.opendev.org/c/starlingx/config/+/776536
    Signed-off-by: Don Penney <don.penney@windriver.com>
    Change-Id: I1d8c1219694147baaabb183ef3debe1715aaf153

commit 64feca44b59284b07d42e7100fb90b22297c0ce0
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Fri Mar 5 15:18:46 2021 -0500

Use jinja2 template for containerd config.toml
    
    File "/etc/containerd/config.toml" is set up with ansible bootstrap
    using an erb template managed by puppet.
    We replace the erb template with a jinja2 template to improve ease
    of use and reduce the need for complex regular expressions.
    
    Closes-Bug: 1892768
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
    Change-Id: I93601321d4f554d27bb9457bcff99428351cbefd

commit df1f8e381d951a2ab6c52a32eb2467817795e083
Author: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
Date:   Thu Mar 11 10:38:57 2021 -0500

Adding force flag to purge task
    
    This flag will allow ansible replay
    
    Story: 2007960
    Task: 42039
    Signed-off-by: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
    Depends-On: If68d66d799addcd996da4b146d092c855b455aa3
    Change-Id: I93821965184d95a00fddd3398a1c214e3d730efa

commit ed7314d4c4c1f9d6f18dc6c144dd4b6cdf67b66e
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Mon Mar 8 14:36:54 2021 +0200

Fix bootstrap playbook when initializing kubernetes
    
    The etcd endpoint in the kubedeadm file is different than
    the endpoint defined in the etcd config file.
    This is because in the kubeadm file, the etcd endpoint is equal to
    the 'cluster_floating_address' variable. And in the etcd config
    file the 'default_cluster_host_start_address' variable is used.
    These 2 variables can be different when the
    'cluster_host_start_address' variable defined in the localhost.yml
    differs from the first address of the 'cluster_host_subnet'.
    
    The solution is to use 'cluster_floating_address' in both cases
    because this variable is defined in the following way:
    
    cluster_floating_address: "{{ address_pairs['cluster_host']['start'] }}"
    will be different than the 'default_cluster_host_start_address': "{{
    (cluster_host_subnet | ipaddr(1)).split('/')[0] }
    
    So it will use 'default_cluster_host_start_address' when the
    'cluster_host_start_address' is not defined.
    
    Closes-Bug: 1918130
    Change-Id: I8fecc1e5e54b5a9a9a72a54c069f79f5f2d434ba
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit ff5c9327c1990cb2bf6454a1a156669187dc056b
Author: Babak Sarashki <babak.sarashki@windriver.com>
Date:   Wed Feb 24 05:45:30 2021 +0000

Remove container runtime interface (CRI) placeholder
    
    Change-Id Ib1dd5bd (stx-puppet) and Icc5fd16 (config) add
    support to set CRI entries for kubernetes runTimeClass.
    
    This is not needed during the initial bring-up of services
    at bootstrap time and produces incorrect config.toml causing
    bootstrap failure. Therefore, it is removed from the initial
    config.toml generated during bring up of essential services.
    
    Story: 2008434
    Task: 41928
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/776220
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/776223
    
    Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
    Change-Id: Id124653329da0ba9f990e5bba6b53faa3c88fa86

commit ae898b781539f71ea009cefec7602ed62335741b
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Thu Feb 11 13:22:39 2021 -0500

Create device_images bind directory
    
    The device images are stored in the drbd filesystem
    (/opt/platform/device_images) in the active controller.
    In order to allow the other worker hosts to retrieve the device images
    from the active controller over lighttpd, the directory
    /www/pages/device_images is created as a bind mount of the drbd
    directory.
    
    Tests performed on the following systems:
    AIO-DX, AIO-DX plus compute, Standard 2+1
    DC with AIO-DX plus subcloud
    DC with Standard subcloud
    
    Story: 2007875
    Task: 41878
    
    Change-Id: I00c75767543d3840c466df887e9f16ba75a5386d
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 9814ec0490506f222326396368f34e278ae52a0d
Author: Tee Ngo <tee.ngo@windriver.com>
Date:   Mon Feb 22 23:07:34 2021 -0500

Update Ansible bootstrap address validation
    
    The method to validate bootstrap address in previous
    commit (9c62c83536b737e731b140c95be37d74769989ff) is not
    reliable for IPv6. This commit fixes it.
    
    Task: 41800
    Story: 2008573
    Change-Id: Ibadf36e7f6c1ec31ca47514802991c92959fd138
    Signed-off-by: Tee Ngo <tee.ngo@windriver.com>

commit 9cee892563c68154714bcb7a7e173dcc13b6b237
Author: Angie Wang <angie.wang@windriver.com>
Date:   Wed Feb 24 14:27:14 2021 -0500

Update the minimum root disk size to be aligned with sysinv
    
    Change-Id: I03c8b31ab76ce8a2b8534677910763150ec1d9c0
    Closes-bug: 1916797
    Depends-On: https://review.opendev.org/c/starlingx/config/+/777465
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit f2d20c15bbda992bec707b9dd3529f5f1fd53b83
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Tue Feb 2 18:09:33 2021 +0200

Fix restoring dc-vault on a central controller
    
    At this moment, when we do a restore procedure on a
    DC system controller, the /opt/dc-vault directory will be
    created under "/" filesystem. It should be created on
    a separated filesystem, but that filesystem is available
    only after an unlock of the controller.
    
    The proposed solution is to create an additional restore
    playbook for the dc-vault that will be manually run after
    unlocking controller-0. The backup playbook will create
    an additional archive with the contents of dc-vault, and
    the dc-vault directory will be removed from the platform
    backup.
    
    The new playbook will be used like this:
    
    ansible-playbook
    /usr/share/ansible/stx-ansible/playbooks/restore_dc_vault.yml -e
    "ansible_become_pass=Li69nux*" -e "admin_password=Li69nux*" -e
    "initial_backup_dir=/home/sysadmin" -e
    "backup_filename=localhost_dc_vault_backup_2021_02_02_11_46_09.tgz"
    
    Closes-Bug: 1914258
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
    Change-Id: I8fdd5b678e2296cd0ce98ea4dd91e2988beb200f

commit 3babc1eed3ba861c652f082655fa284992be0859
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Feb 19 16:45:22 2021 +0200

B&R: Fix registry backup generated when it should not
    
    Incorrect ansible variable evaluation results in the a backup file
    generated when it is required not do so.
    
    Fix the evaluation.
    
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Depends-On: I4644784ea4164134f163d218e69dc4ceb148985a
    Closes-Bug: 1916246
    Change-Id: I2a31dcda55137a668b2e82b9a938535bdf623656

commit 151b885dc733d5094fce474ae042ee2f2ceae49e
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Fri Feb 19 16:21:23 2021 +0200

B&R: Fix backup hangout on IPv6 systems
    
    IPv6 addreses must be enclosed in square brackets when they are
    followed by a port number.
    
    At the "[backup/backup-system : Create etcd snapshot]"
    step, the etcd IPv6 endpoint is not wrapped in square brackets,
    so the command hangs indefinitely.
    
    We fix this by using the 'ipwrap' ansible filter which will
    wrap the address in [] brackets if it's an IPv6 one.
    
    Closes-bug: 1916053
    Change-Id: If40ed59f4e44c9f877aaefe87f6211a3e83ddfee
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit 71db4f1aa82a36c360ada97a6d8d989a736e1133
Author: Steven Webster <steven.webster@windriver.com>
Date:   Wed Feb 17 13:54:14 2021 -0500

Uprev SR-IOV CNI image
    
    This commit uprevs the SR-IOV CNI image to pick up a few bug
    fixes.  Specifically, this commit will allow rate-limiting
    configuration on a VF to be retained after the VF has been
    used by a pod (and pod subsequently deleted).
    
    Testing:
    
    NICs:
    Ethernet Controller X710 for 10GbE SFP+
    Mellanox MT27700 Family [ConnectX-4]
    
    Functional:
    Connectivity testing (kernel + DPDK)
    Devices allocated appropriately to pod
    Rate-limiting information retained after pod deletion
    
    Closes-Bug: #1915951
    
    Signed-off-by: Steven Webster <steven.webster@windriver.com>
    Change-Id: Ie893fee61fa15e28e1994b5e766ed3bca2ff4050

commit 9c62c83536b737e731b140c95be37d74769989ff
Author: Tee Ngo <tee.ngo@windriver.com>
Date:   Tue Feb 9 23:08:32 2021 -0500

Ansible update to support remote subcloud restore
    
    In this commit:
      - Common code to validate target is updated to include additional
        host check options (system readiness, load version, bootstrap ip,
        and patches).
      - A new playbook is added to perform host check for various
        use cases (remote install, pre subcloud restore, pre subcloud
        upgrade, etc...).
      - Install playbook is updated to make use of the new playbook.
      - A new role which performs generic user input validation for all
        restore playbooks is added.
      - A new B&R parameter is added to indicate where the backup data
        can be found, on the host itself (on-box) or on another
        machine (off-box).
      - Platform, user images and openstack restore playbooks are
        updated to a) make use of the on_box_data parameter,
        b) use the same target_backup_dir for both local and remote
        playbook execution for consistent behavior.
      - Host override file is extracted only on the target.
      - Patches restore is skipped if requested by the caller. Default
        behavior is to restore patches.
      - Various subtle bugs are fixed.
      - Ansible version is specified in Zuul test requirements.
    
    Tests:
      - Deployment of a Redfish capable subcloud
      - Remote restore a simplex with various options: a) without patches
        b) skip patches restore, c) with patches restore, d) on-box backup
        tarball, e) off-box backup tarball
      - Local restore of a simplex with 2 options: a) without patches and
        b) skip patches restore
      - Simplex fresh install
      - Restore user images
      - Restore OpenStack
      - Error cases
    
    Task: 41725
    Story: 2008573
    Change-Id: Ica2b9010a73854a01216e2e16b581484d182264e
    Signed-off-by: Tee Ngo <tee.ngo@windriver.com>

commit 64af997f72d2e78615bd38d573aa13d0a59932da
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Thu Feb 4 17:55:24 2021 -0500

Fetch additional image config file for remote play
    
    When invoking the playbook remotely, the additional system images
    config file locates in the remote host may not also exist in the
    control host. This commit fetches the additional image config file to
    the control host to prevent include_vars failure.
    
    Tests:
    1 Remote backup and restore an AIOSX node with the additional system
    images config file from a control host without the config file.
    2 Build an image ISO and installed/ bootstrapped an AIOSX node.
    
    Closes-Bug: 1914611
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>
    Change-Id: I79362b00a25ca031a1fbbaa92476955e35477b73

commit 6226fa7a28ee271f92419bbeb62f2ee512e4b192
Author: Mingyuan Qi <mingyuan.qi@intel.com>
Date:   Tue Dec 8 05:52:41 2020 +0000

Add provision edgeworker playbook
    
    This playbook provisions edgeworker nodes to setup OS
    configurations as well as join the node to the Kubernetes
    platform.
    
    This playbook should be triggered manually after an edgeworker node
    added to sysinv. Make sure the edgeworker node has got the correct
    mgmt ip address from controller's dhcp server.
    
    1. Create an inventory file for edgeworker nodes on the active
    controller:
    
    tee ./edgeworker_inventory.yml << EOF
    all:
      hosts:
        localhost:
          ansible_connection: local
      children:
        edgeworker:
          hosts:
            <edgeworker hostname>:
              ansible_ssh_user: <admin username>
              ansible_ssh_pass: <ssh password>
              ansible_become_pass: <admin password>
              ansible_python_interpreter: <python path(e.g.
    /usr/bin/python3)>
            #<more edgeworker nodes>:
      vars:
        ansible_ssh_user: sysadmin
        ansible_ssh_pass: St8rlingX*
        ansible_become_pass: St8rlingX*
    EOF
    
    2. Trigger the playbook provision_edgeworker.yml:
    
    ansible-playbook \
    -i ./edgeworker_inventory.yml \
    /usr/share/ansible/stx-ansible/playbooks/provision_edgeworker.yml
    
    After the provisioning, the edgeworker node will be shown Ready in
    Kubernetes.
    
    Test:
    Edgeworker node OS: Ubuntu 18.04/20.04
    AIO-SX + edgeworker nodes: PASS
    AIO-DX + edgeworker nodes: PASS
    Standard + edgeworker nodes: PASS
    
    Story: 2008129
    Task: 40879
    
    Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
    Change-Id: If189f915461da3ce79dea3a688a3cf5e59d8f12c
    Depends-On: https://review.opendev.org/c/starlingx/config/+/763918

commit 2a5e37cecfde7572331b0c7f13e8ea55400c5260
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Mon Feb 1 09:26:48 2021 -0500

Remove RPF config of tunl0 interfaces from kernel
    
    Calico project removed "strict" mode for RPF from kernel's IPv4
    configuration at version 3.12, moving back to former approach of
    setting iptables rules for security purposes (see
    https://github.com/projectcalico/felix/pull/2189). Thus, forcing
    disable of RPF at tunl0 interface as introduced by the fix of
    bug 1838801 isn't necessary anymore.
    This commit removes that forced writing of kernel configuration,
    so that calico-node pod is healthy after started, and asymmetrical
    routing still works as required.
    
    Implements: removal of postStart hook for Calico pods
    Closes-Bug: 1912807
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
    Change-Id: Iea8f4aa988269f99f5632fec2cf344297709eaef

commit adf542c16b71e681e47599b1e60a17311efeb6a6
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu Jan 21 09:05:22 2021 +0200

B&R: Fix restore of ceph using mds
    
    Calling /etc/init.d/ceph without specifying a component applies an
    action to all components.
    After cephfs_platform_integ topic merged, ceph mds component is also
    used.
    Mds requires a monitor to be up before starting.
    Due to the fact that the ceph script was called without specifying the
    components mds the start action was applied to mds also and it was
    applied before starting the monitor, resulting in failure.
    
    The fix is to specify the monitor and osd components.
    Mds will be start when ceph is restarted a few tasks later.
    
    Plus minor ansible changes:
    - Use failed_when instead of ignore_errors
    - Renamed task
    
    Closes-Bug: 1912488
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I5388ddd32923bbd9acb205fa2d2adbc083346631

commit c62c3aa8890ef6c0d80ba8397ef64f5b0fb8c291
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Mon Jan 18 15:49:21 2021 -0500

Bug fix: static image type error
    
    A bug was introduced by commit
    587c1ff2075c518f32bb93f14f66bcd23e542259 that the bootstrap of a
    distributed cloud system controller will fail due to the RVMC image
    is a string rather than a list. This commit converts it to a single
    member list before appending it to static images list.
    
    Tested by bootstraping a distributed cloud system controller.
    
    Change-Id: Ib2e0c6cc1913e0ee44f8b288004459c497ebedb8
    Closes-Bug: 1908100
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit 6cf33738447ba9f65d0f5aec1507991676522a58
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Tue Dec 22 12:22:04 2020 -0500

Create deployment namespace during bootstrap
    
    A generic namespace is being created during bootstrap
    called 'deployment'. This namespace will be shared by
    various deployed platform resources including platform
    certificates.
    
    Change-Id: I48bca00cc882fae691786fd1e200e28bf126496b
    Story: 2007361
    Task: 41495
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>

commit 587c1ff2075c518f32bb93f14f66bcd23e542259
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Mon Dec 14 11:55:32 2020 -0500

Upgrade: append additional images to the static images list
    
    Images specified in additional_local_registry_images at install time
    will not be upgraded after an upgrade is completed. This commit allows
    common/load-images-information load additional images list from
    /usr/share/additional-system-images.yml by default.
    
    In a distributed cloud system, the Redfish Virtual Media
    Controller(RVMC) image is can support remote install on Redfish
    configured hosts. This commit includes the RMVC image in the static
    images list if the host is a DC controller, enables
    download/push/update this image with other static images.
    
    Tested by installing and upgrading an AIODX central cloud with an
    AIOSX subcloud DC system.
    
    Partial-Bug: 1908100
    Change-Id: I1f927f876f4883a587098c61fbcaf408d65fdde4
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit a83a13a0e8c6923ea4e9f74797269aa9ee4bf02d
Author: Tee Ngo <tee.ngo@windriver.com>
Date:   Mon Jan 11 00:46:54 2021 -0500

Ensure docker proxies are defined
    
    System triggered playbooks such as upgrade-static-images,
    will fail to pull newer images from AWS ECR if either
    docker http or https proxy is not configured. This commit
    resolves the issue.
    
    Tests:
      - Fresh install with public registries
      - Fresh install with a private registry
      - Fresh install with AWS registry, http proxy configured
      - Fresh install with AWS registry, proxies not configured
          a. Perform controller swact
          b. Perform upgrade
    
    Closes-Bug: 1910951
    Change-Id: I93c2856210dde90a66bae823178e14f140f711f4
    Signed-off by: Tee Ngo <tee.ngo@windriver.com>

commit bbcec5f2d678a0f18c02eb514e274a6187c21e32
Author: Angie Wang <angie.wang@windriver.com>
Date:   Wed Oct 14 15:56:39 2020 -0400

Configure SQL as helm storage backend
    
    Configmap is the default helmv2 storage backend to store
    release information but its 1MB resource limit prevents
    scaling up stx openstack worker nodes, so we want to use
    SQL as helm storage backend.
    
    Update armada overrides to start up tiller with SQL storage
    backend.
    
    Tested:
    - AIO-SX, AIO-DX, STD installation (IPv4 and IPv6)
    - apply stx-openstack
    - host-swact/lock/unlock
    - AIO-SX, AIO-DX upgrade with stx-openstack, stx-monitor
    - backup&restore with stx-openstack
    
    Closes-Bug: 1887677
    Depends-on: https://review.opendev.org/#/c/761647/
    Change-Id: I8ad7194973e8fc60ee2539dea5da67b15be1df4a
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 56e338e5c729242dd14a7ecd4d294d65ca069863
Author: Tee Ngo <tee.ngo@windriver.com>
Date:   Wed Dec 16 23:52:26 2020 -0500

Improve bootstrap config validation
    
    A misconfigured docker_registries when merged with the
    default docker_registries will lead to bootstrap failure
    with a misleading error that is difficult for the user to
    identify the root cause.
    
    Add tasks to ensure user provided registry keys are valid.
    
    Closes-Bug: 1908479
    Change-Id: I829496481708221a772a9242dc660a2462880116
    Signed-off-by: Tee Ngo <tee.ngo@windriver.com>

commit 820f347324d4fb7d17396d5d21f98eb2b674d23a
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Sat Oct 31 00:32:03 2020 +0800

Enable etcd with security setting.
    
    Add etcd client/server certificate generation process in
    ansible playbook and restart etcd after etcd security config
    enabled.
    
    Test status.
    1) Deployment test on both simplex and duplex  - PASS
       check the communication status between apiserver
       and etcd by kubectl command
       check the etcd status and configuration on controllers
    2) Switch active controller - PASS
       After switching, check the communication status between
       apiserver and etcd by kubectl command
       check the etcd status and configuration on controllers
    3) Lock/unlock of a simplex controller - PASS
    4) Backup/Restore test on simplex - PASS
    5) Spontaneous reboot of a simplex controller - PASS
    6) Enable secured etcd on simplex by script after deploy
       unsecured etcd. - PASS
    7) Backup test on Duplex - PASS
    8) Restore test on Duplex - PASS
    9) Re-installing a controller host on a duplex setup and
       then swacting to it - PASS
    
    Partial-Bug: 1894870
    
    Depends-on: https://review.opendev.org/#/c/760508/
    Change-Id: I88691b84c9acc2e27f0b783d7454a873d3490072
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit 0df607a49b514f143a9198c6fbba1e9e846e6c4f
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Tue Jul 14 14:21:14 2020 -0400

Support upgrade to Helm v3 with containerized armada
    
    This provides an upgrades playbook for containerized armada
    to launch armada using Helm v3.
    
    This refactors common code from bringup_helm.yml so that it may be
    called from either bootstrap or upgrade.
    
    Story: 2007927
    Task: 40354
    Closes-Bug: 1906554
    
    Change-Id: I061006683252a8592f07c90d6d82bdc109418451
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 936881f954f8acb2ed0b69de47f576d49ed82d2d
Author: Cole Walker <cole.walker@windriver.com>
Date:   Mon Nov 30 17:27:56 2020 -0500

Check local pods only to prevent B&R timeout
    
    This fix reworks a kubectl wait command to only wait for ready pods on
    the local controller. This fixes an issue where restoring a cluster with
    many nodes can timeout during "Start wait for armada,
    calico-kube-controllers & coredns deployments to reach Available state".
    This timeout was happening because the kubectl command was waiting for
    pods on downed nodes to become ready for 30 seconds per pod. In cases
    where there were 5 or more unreachable pods, the async timeout value of
    120 seconds would be reached and the kubectl wait commands would be
    terminated before completion. This prevented subsequent Ansible tasks
    from completing and resulted in a parsing error during "Fail if any of
    the Kubernetes component, Networking or Armada pods are not ready by
    this time"
    
    The fix here adds --field-selector spec.nodeName=$(hostname) to the
    kubectl wait command and causes only pods on the running
    controller to be checked. This also ensures that the asyncronous tasks
    will always only take 30 seconds, regardless of the number of nodes in
    the cluser.
    
    Removed the behaviour where the wait time would scale up based on the
    number of nodes, as the return time of the async tasks is now always a
    fixed 30 seconds.
    
    Closes-Bug: 1905788
    
    Change-Id: Idb7b30891d4fd00901aaa69412ef4c59913e21f3
    Signed-off-by: Cole Walker <cole.walker@windriver.com>

commit 843e77819228d1298db192d68746f2591ac3e078
Author: Andy Ning <andy.ning@windriver.com>
Date:   Fri Dec 4 10:14:25 2020 -0500

Restrict permissions on docker registry certificate file
    
    It is noticed that docker registry certificate file
    (/etc/docker/certs.d/registry.local:9001/registry-cert.crt) has
    permission set to 644. This update changes its permissions to
    400 as required, by preserving the original permissions when it
    is copied over in ansible bootstrap.
    
    Change-Id: Ic85fa5fa2595f81d5cde6b0294eb7fbbd9c7dc63
    Closes-Bug: 1906844
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit d3e48692a5b28a2466ea9dbf8d0737bebd9df68e
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Dec 1 16:50:41 2020 -0500

Lower negative cache TTL for coredns
    
    This change updates the coredns config map to lower the TTL for caching
    negative responses from 30 seconds down to 5 seconds. This will improve
    the response time for cases where a hostname lookup might occur before a
    given pod is created, resulting in a negative entry being cached and
    preventing pods from resolving the name for 30 seconds afterwards.
    
    The default cache size of 9984 items is unchanged, but must be
    explicitly defined in this configuration.
    
    The change also adds coredns to the upgrade-k8s-networking playbook to
    ensure that changes to coredns can be automatically deployed by
    sysinv-conductor.
    
    Closes-Bug: 1906870
    
    Change-Id: I7b44358508c32c8ca4e58b1e69d6232f1a61bfcf
    Signed-off-by: Cole Walker <cole.walker@windriver.com>

commit 1d7305eaf980ea258bcf9e2bb31406bf96ceb766
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Wed Nov 18 01:11:33 2020 -0500

Remove the addition of identity to shared services
    
    Identity services are no longer a shared service in DC.
    This commit removes the addition of identity to the shared service
    list for subclouds.
    
    Change-Id: I3d9d0e4df1a41142cce1ce13d4bbf7d43a626909
    Partial-Bug: 1904675
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>

tags:

added: in-f-centos8

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-03: Fix proposed to config (f/centos8)

#48

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-05:

#49

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794906

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-05: Change abandoned on config (f/centos8)

#50

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/794611

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-07: Related fix merged to integ (f/centos8)

#51

Download full text (37.0 KiB)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/793754
Committed: https://opendev.org/starlingx/integ/commit/a13966754d4e19423874ca31bf1533f057380c52
Submitter: "Zuul (22348)"
Branch: f/centos8

commit b310077093fd567944c6a46b7d0adcabe1f2b4b9
Author: Mihnea Saracin <email address hidden>
Date: Sat May 22 18:19:54 2021 +0300

Fix resize of filesystems in puppet logical_volume

    After system reinstalls there is stale data on the disk
    and puppet fails when resizing, reporting some wrong filesystem
    types. In our case docker-lv was reported as drbd when
    it should have been xfs.

    This problem was solved in some cases e.g:
    when doing a live fs resize we wipe the last 10MB
    at the end of partition:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L146

    Our issue happened here:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L65
    Resize can happen at unlock when a bigger size is detected for the
    filesystem and the 'logical_volume' will resize it.
    To fix this we have to wipe the last 10MB of the partition after the
    'lvextend' cmd in the 'logical_volume' module.

Tested the following scenarios:

B&R on SX with default sizes of filesystems and cgts-vg.

B&R on SX with with docker-lv of size 50G, backup-lv also 50G and
cgts-vg with additional physical volumes:

    - name: cgts-vg
        physicalVolumes:
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

B&R on DX system with backup of size 70G and cgts-vg
with additional physical volumes:

    physicalVolumes:
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk

    Closes-Bug: 1926591
    Change-Id: I55ae6954d24ba32e40c2e5e276ec17015d9bba44
    Signed-off-by: Mihnea Saracin <email address hidden>

commit 3225570530458956fd642fa06b83360a7e4e2e61
Author: Mihnea Saracin <email address hidden>
Date: Thu May 20 14:33:58 2021 +0300

Execute once the ceph services script on AIO

    The MTC client manages ceph services via ceph.sh which
    is installed on all node types in
    /etc/service.d/{controller,worker,storage}/ceph.sh

    Since the AIO controllers have both controller and worker
    personalities, the MTC client will execute the ceph script
    twice (/etc/service.d/worker/ceph.sh,
    /etc/service.d/controller/ceph.sh).
    This behavior will generate some issues.

We fix this by exiting the ceph script if it is the one from
/etc/services.d/worker on AIO systems.

Closes-Bug: 1928934
Change-Id: I3e4dc313cc3764f870b8f6c640a60338...

Reviewed:  https://review.opendev.org/c/starlingx/integ/+/793754
Committed: https://opendev.org/starlingx/integ/commit/a13966754d4e19423874ca31bf1533f057380c52
Submitter: "Zuul (22348)"
Branch:    f/centos8

commit b310077093fd567944c6a46b7d0adcabe1f2b4b9
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Sat May 22 18:19:54 2021 +0300

Fix resize of filesystems in puppet logical_volume
    
    After system reinstalls there is stale data on the disk
    and puppet fails when resizing, reporting some wrong filesystem
    types. In our case docker-lv was reported as drbd when
    it should have been xfs.
    
    This problem was solved in some cases e.g:
    when doing a live fs resize we wipe the last 10MB
    at the end of partition:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L146
    
    Our issue happened here:
    https://opendev.org/starlingx/stx-puppet/src/branch/master/puppet-manifests/src/modules/platform/manifests/filesystem.pp#L65
    Resize can happen at unlock when a bigger size is detected for the
    filesystem and the 'logical_volume' will resize it.
    To fix this we have to wipe the last 10MB of the partition after the
    'lvextend' cmd in the 'logical_volume' module.
    
    Tested the following scenarios:
    
    B&R on SX with default sizes of filesystems and cgts-vg.
    
    B&R on SX with with docker-lv of size 50G, backup-lv also 50G and
    cgts-vg with additional physical volumes:
    
    - name: cgts-vg
        physicalVolumes:
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
        - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk
    
    B&R on DX system with backup of size 70G and cgts-vg
    with additional physical volumes:
    
    physicalVolumes:
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 50
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-1.0
        size: 30
        type: partition
    - path: /dev/disk/by-path/pci-0000:00:0d.0-ata-3.0
        type: disk
    
    Closes-Bug: 1926591
    Change-Id: I55ae6954d24ba32e40c2e5e276ec17015d9bba44
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit 3225570530458956fd642fa06b83360a7e4e2e61
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Thu May 20 14:33:58 2021 +0300

Execute once the ceph services script on AIO
    
    The MTC client manages ceph services via ceph.sh which
    is installed on all node types in
    /etc/service.d/{controller,worker,storage}/ceph.sh
    
    Since the AIO controllers have both controller and worker
    personalities, the MTC client will execute the ceph script
    twice (/etc/service.d/worker/ceph.sh,
    /etc/service.d/controller/ceph.sh).
    This behavior will generate some issues.
    
    We fix this by exiting the ceph script if it is the one from
    /etc/services.d/worker on AIO systems.
    
    Closes-Bug: 1928934
    Change-Id: I3e4dc313cc3764f870b8f6c640a6033822639926
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit b428a5de0070c6df82536b8b5b782810ebd9efda
Author: Cole Walker <cole.walker@windriver.com>
Date:   Wed May 19 16:57:54 2021 +0000

Revert "Remove recover operations to "restart-on-reboot" pods"
    
    This reverts commit 8abcbf6fb1951b25e9964933558b75b9aff88135.
    
    Reason for revert:
    
    After performing a backup and restore on an AIO-SX system, SRIOV pods do
    not return to a running state and are instead stuck in "container
    creating". The workaround for this is to restart SRIOV pods when the
    system unlocks.
    
    Reverting this commit to allow users to label SRIOV pods and have them
    restarted by k8s-pod-recovery. Labelled pods will be restarted by
    k8s-pod-recovery and will be running after backup and restore is
    completed.
    
    This change has been tested by performing backup and restore on an
    AIO-SX system. SRIOV pods now come up correctly when labelled with
    restart-on-reboot=true
    
    Closes-Bug: 1928965
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: I9c520c0a47aabca7b96e50adf0f71742f4199c2f

commit 4e1aa82e96d9b4caeff7e7b31632733c395c6ad0
Author: Robert Church <robert.church@windriver.com>
Date:   Sat May 15 16:24:29 2021 -0400

Update postgres liveness check to support IPv6 addresses
    
    Templating will add square brackets for IPv6 addresses which are
    interpreted as an array vs. a string. Quote this so that it interpreted
    correctly.
    
    Change-Id: I2b705015a74ea2e4e914b7a83cdceed37d49b766
    Related-Bug: #1917308
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit b3540ccfdfa6956fb20c62e5e5bb76af56d2ab63
Author: Robert Church <robert.church@windriver.com>
Date:   Wed May 12 22:36:23 2021 -0400

Update the liveness probe to verify postgres connectivity
    
    Change the tillerLivenessProbeTemplate to test the connectivity to the
    postgres backend. We will override the periodSeconds and
    failureThreshold when installing the helm chart to trigger a restart of
    the tiller pod over a swact when the postgres DB/server moves from one
    controller to the other.
    
    This will help guarantee that the tiller connection is always
    re-established if the connectivity to the postgres backend fails.
    
    Change-Id: I7fbed33a8c821f6c9254f58d5953e2115cf4141a
    Related-Bug: #1917308
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 03665ae745babb4524e2b9b9cc0f768eaf1e8781
Author: Angie Wang <angie.wang@windriver.com>
Date:   Mon May 10 18:54:07 2021 -0400

Add armada namespace in k8s pod recovery
    
    Update k8s pod recovery service to include armada namespace
    so armada pod that stuck in an unknown state after host
    lock/unlock or reboot could be recovered by the service.
    
    Change-Id: Iacd92637a9b4fcaf4c0076e922e1bd739f69a584
    Closes-Bug: 1928018
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 764cac1642a8820d169576da3d8d886449d3cf73
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue May 11 17:04:01 2021 +0000

Armada: Fix tiller stuck connecting to postgres database
    
    Tiller may start executing before IPv6 network is fully initialized.
    This will result in tiller not being fully functional.
    The liveness probe will detect that tiller didn't start properly and
    restart it. But this might happen an unlimited number of times in a row.
    
    Wait until ping is succesful to the ip of the postgres database.
    This ensures that networking finished setting up.
    Credits to Cole Walker <cole.walker@windriver.com> for proposing the
    idea.
    
    Depends-On: I177bb628497611eb64472291a04d635856c26590
    Closes-Bug: 1928141
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I9c5be3f30fad2650e6aa53fb80ef44f7798813ed

commit 1974b3f570c0a21ec5e4cfe7d806c58a01a7dd0c
Author: Don Penney <don.penney@windriver.com>
Date:   Fri May 7 09:01:47 2021 -0400

Copy shim.efi to /pxeboot for UEFI pxeboot support
    
    Package a copy of the shim.efi file to /pxeboot to support UEFI secure
    boot. The recent grub2 update for CVE-2020-15705 requires the use of
    shim.efi in order to support kernel signature validation.
    
    Change-Id: If87925e1697b34d7ff1a7a770d9f13619dd9dd52
    Partial-Bug: 1927730
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit b1ac60470315153dc9bc03f7f0bb1bfb221f6c5d
Author: Davlet Panech <davlet.panech@windriver.com>
Date:   Wed May 5 10:42:56 2021 -0400

Pin clearlinux/golang to v1.15.10 in Dockerfiles
    
    Upstream Dockerfiles use clearlinux/golang:latest as the base, which is
    broken as of now. Solution: change it to last known working tag before
    building.
    
    Closes-Bug: 1927153
    Signed-off-by: Davlet Panech <davlet.panech@windriver.com>
    Change-Id: Ic13973c0518eeab74ec86884036d08c2b8a4961f

commit 4850ab86da1cecca239d2ffa6dded4c0946e8a43
Author: Li Zhou <li.zhou@windriver.com>
Date:   Tue Apr 13 04:34:32 2021 -0400

systemd: Upgrade to version 219-78.el7_9.3
    
    This fixes the issue of systemd sending tons of useless
    PropertiesChanged messages when a mount happens as described in:
    https://bugzilla.redhat.com/show_bug.cgi?id=1793527
    
    Depends-On: https://review.opendev.org/c/starlingx/tools/+/786601
    Partial-Bug: #1924691
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: I3596303d77211a135e8559a05806395328725cde

commit 18010eb1d637c8ea3c9fdd9be5684f1b5ee8b23c
Author: Thiago Brito <thiago.brito@windriver.com>
Date:   Thu Mar 25 14:55:11 2021 -0400

Upversioning armada tarball to 7ef4b86
    
    A fix landed upstream to deal with armada waiting indefinitely for
    evicted pods, which intermittently fails stx-openstack
    application. This commit upversions the tarball version to the
    one containing that change.
    
    Removing patches 0002 and 0003 since the commits are already on
    the armada code at this version.
    
    Story: 2008645
    Task: 41906
    
    Signed-off-by: Thiago Brito <thiago.brito@windriver.com>
    Change-Id: I62caf0a403a054c30b5bbfc1a3c5bc4cf73b60a6

commit ccfeeef59d39e42b2775bb5a216732c4999f6e42
Author: Li Zhou <li.zhou@windriver.com>
Date:   Mon Apr 12 02:15:25 2021 -0400

systemd: Prevent excessive /proc/1/mountinfo reparsing
    
    Backport the patches for this issue:
    https://bugzilla.redhat.com/show_bug.cgi?id=1819868
    
    We met such an issue:
    When testing a large number of pods (> 230), occasionally observed a
    number of issues related to systemd process:
        systemd ran continually 90-100% cpu usage
        systemd memory usage started increasing rapidly (20GB/hour)
        systemctl commands would always timeout (Failed to get properties:
            Connection timed out)
        sm services failed and can't recover: open-ldap,
            registry-token-server, docker-distribution, etcd
        new pods can't start, and got stuck in state ContainerCreating
    
    Those patches work to prevent excessive /proc/1/mountinfo reparsing.
    It has been verified that those patches can improve this performance
    greatly.
    
    16 commits are listed in sequence (from [1] to [16]) at below link
    for the issue:
    https://github.com/systemd-rhel/rhel-8/pull/154/commits
    
    [16](10)core: prevent excessive /proc/self/mountinfo parsing
    [15][Dropped-6]test: add ratelimiting test
    [14](9)sd-event: add ability to ratelimit event sources
    [13](8)sd-event: increase n_enabled_child_sources just once
    [12](7)sd-event: update state at the end in event_source_enable
    [11](6)sd-event: remove earliest_index/latest_index into common part of
    event source objects
    [10][Dropped-5]sd-event: follow coding style with naming return
    parameter
    [9] [Dropped-4]sd-event: ref event loop while in sd_event_prepare() ot
    sd_event_run()
    [8] (5)sd-event: refuse running default event loops in any other thread
    than the one they are default for
    [7] [Dropped-3]sd-event: let's suffix last_run/last_log with "_usec"
    [6] [Dropped-2]sd-event: fix delays assert brain-o (#17790)
    [5] (4)sd-event: split out code to add/remove timer event sources to
    earliest/latest prioq
    [4] (3)sd-event: split clock data allocation out of sd_event_add_time()
    [3] [Dropped-1]sd-event: mention that two debug logged events are
    ignored
    [2] (2)sd-event: split out enable and disable codepaths from
    sd_event_source_set_enabled()
    [1] (1)sd-event: split out helper functions for reshuffling prioqs
    
    I ported 10 of them back (from (1) to (10)) to fix this issue
    and dropped the other 6 (from [Dropped-1] to [Dropped-6]) for those
    reasons:
    [Dropped-1]Only changes error log.
    [Dropped-2]Fixes a bug introduced in a commit which doesn't exist in
    this version.
    [Dropped-3]Only changes vars' names and there is no functional change.
    [Dropped-4]More commits are needed for merging it, while I don't see
    any help on adding the rate-limiting ability.
    [Dropped-5]Change coding style for a function which isn't really used
    by anyone.
    [Dropped-6]Add test cases.
    
    Closes-Bug: #1924686
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: Ia4c8f162cb1a47b40d1b26cf4d604976b97e92d6

commit e62b1a53b9148738a7c36355b19607d6e6f3d0d7
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Tue Apr 20 17:32:45 2021 -0500

Unmount all targets during drbd stop
    
    When stopping drbd, we need to unmount targets from each device.
    Devices with multiple mountpoints can fail to unmount, leading to
    metadata corruption. Add --all-targets to the umount command.
    
    Closes-Bug: 1920245
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: Ic1b4583c72a0dd256724b8672dbb59126273330b

commit de263f633ea359507357d3d4c53e98a71bff5afc
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Apr 13 16:47:24 2021 -0400

Add alternative command to disable lldp agent for i40e devices
    
    LLDP information is not available for certain i40e network devices when
    running system host-lldp-neighbor-show.
    
    This is caused by the firmware lldp agent on the devices not getting
    disabled by the i40e-lldp-configure.sh script which is invoked by lldpd.
    
    The command used to disable the firmware lldp agent in the script works
    for some firmware versions found on devices, but not others. This change
    adds an ethtool command to disable the lldp agent which works for these
    other firmware versions.
    
    From testing, the ethtool method is used for firmware versions 5.05 and
    8.10. The sysfs method is used for firmware version 7.10. In all cases,
    the driver version is 2.14.13
    
    Closes-Bug: 1923665
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: Ifac34091599bd4020bf55cc1b8ba3119edccb297

commit 3924cfe7ae390678ae4df9b544acf8b373440183
Author: Marcus Secato <marcus.viniciuscarvalhosecato@windriver.com>
Date:   Thu Apr 15 17:52:58 2021 -0400

Set proper user ID for armada-api container
    
    Since armada application moved to Kubernetes cluster, processes and
    commands are not executed with the 'armada' user in armada-api
    container. Previously when armada was a separated container user was
    enforced through 'docker exec'.
    
    Closes-Bug: 1924579
    
    Signed-off-by: Marcus Secato <marcus.viniciuscarvalhosecato@windriver.com>
    Change-Id: I5600974c0b9c3ade73a58dae300e8f3b18c6aefd

commit 8abcbf6fb1951b25e9964933558b75b9aff88135
Author: Bin Qian <bin.qian@windriver.com>
Date:   Thu Apr 8 12:58:44 2021 -0400

Remove recover operations to "restart-on-reboot" pods
    
    The pods being labeled as "restart-on-reboot" is to workaround
    kubernetes restart on worker manifest. As the AIO running a
    single manifest to start kubernetes only once, the operation
    is no longer needed.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/785736
    Change-Id: I0d6c549199559b2bc19d8edff52f64ea0b08b50d
    Closes-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 859e8eb7309f6c26f1ebbc0898e87e82d56af97b
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Tue Apr 14 20:16:48 2020 -0400

add isolcpus device plugin for kubernetes
    
    In order to minimize latency as much as possible, we want to allow
    kubernetes containers to make use of CPUs which have been specified
    as "isolated" via the kernel boot args.
    
    This commit creates an isolcpus device plugin, which detects the isolated
    CPUs and exports them to kubelet via the device plugin API.
    
    See kubernetes/plugins/isolcpus-device-plugin/files/README.md for
    more information on the behaviour and design choices for this commit.
    
    When we move to a newer version of the Intel device plugin manager we
    may be able to simplify some of this.  See the above README.md file
    for details.
    
    Change-Id: I3bfe04ab6e7fbafefa63f6dc43cb2ed79a52579f
    Story: 2008760
    Task: 42165
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 777b7d88630bae55bf130e240212a2abf288bbd3
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Mon Oct 26 17:30:00 2020 -0400

enable support for kubernetes to ignore isolcpus
    
    The normal mechanisms for allocating isolated CPUs do not allow
    a mix of isolated and exclusive CPUs in the same container.  In
    order to allow this in *very* limited cases where the pod spec
    is known in advance we will add the ability to disable the normal
    isolcpus behaviour.
    
    If the file "/etc/kubernetes/ignore_isolcpus" exists, then kubelet
    will basically forget everything it knows about isolcpus and just
    treat them like regular CPUs.
    
    The admin user can then rely on the fact that CPU allocation is
    deterministic to ensure that the isolcpus they configure end up being
    allocated to the correct pods.
    
    Story: 2008760
    Task: 42164
    Change-Id: Ie38c81209ee407ac98b4882f2581fc14622b3af1
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 4150b7a6b61365525c6201ad04eb678c96a578d5
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Mon Aug 31 11:06:59 2020 -0400

kubeadm: create platform pods with zero CPU resources
    
    We want to specify zero CPU resources when creating the manifests
    for the static platform pods, as a workaround for the lack of
    separate resource tracking for platform resources.
    
    We also specify zero CPU resources for the coredns deployment.
    manifests.go appears to be the main file for this, not sure if the
    others are used by I changed them just in case.
    
    Story: 2008760
    Task: 42163
    Change-Id: I6410b8af556d5167d1813e7545fad8baa27b1100
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 71876817aa2a0a0109d7dfe6bf34c1344b3d5f06
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Apr 24 02:25:22 2020 -0400

fix exclusive CPU alloc being deleted at container restart
    
    The expectation is that exclusive CPU allocations happen at pod
    creation time. When a container restarts, it should not have its
    exclusive CPU allocations removed, and it should not need to
    re-allocate CPUs.
    
    There are a few places in the current code that look for containers
    that have exited and call CpuManager.RemoveContainer() to clean up
    the container.  This will end up deleting any exclusive CPU
    allocations for that container, and if the container restarts within
    the same pod it will end up using the default cpuset rather than
    what should be exclusive CPUs.
    
    Removing those calls and adding resource cleanup at allocation
    time should get rid of the problem.
    
    This should eventually go into upstream 1.18.1, at which point
    we can just revert this commit.
    
    Story: 2008760
    Task: 42160
    Change-Id: I61d3670805ef805e21b9c54daf0677d4c7e1bc74
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit b88df951face924d0c29fa76ee03424c4546afd2
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Apr 24 02:18:00 2020 -0400

Add kubernetes support for isolated cpus
    
    This introduces the concept of "isolated CPUs", which are CPUs that
    have been isolated at the kernel level via the "isolcpus" kernel boot
    parameter.
    
    When starting the kubelet process, the set of reserved CPUs (including
    both platform and isolated CPUs) will be specified via
    '--reserved-cpus'.  The isolated CPUs will be identified by looking at
    "/sys/devices/system/cpu/isolated" and treated separately from the
    platform CPUs (which are used to run infrastructure pods).
    
    A plugin (outside the scope of this commit) exposes the isolated
    CPUs to kubelet via the device plugin API.
    
    If a pod specifies some number of "isolcpus" resources, the device manager
    will allocate them.  In this code we check whether such resources have
    been allocated, and if so we set the container cpuset to the isolated
    CPUs.  This does mean that it really only makes sense to specify "isolcpus"
    resources for best-effort or burstable pods, not for guaranteed ones since
    that would throw off the accounting code.  In order to ensure the accounting
    still works as designed, if "isolcpus" are specified for guaranteed pods,
    the affinity will be set to the non-isolated CPUs.
    
    Story: 2008760
    Task: 42161
    Change-Id: I7bd2eabb4c82faea63e3ad129ef735b9d1223e11
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit a436a17d711db209c2bc8802360b9f31df16c237
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Apr 24 02:12:35 2020 -0400

kubelet cpumanager patches for low-latency
    
    In order to minimize latency as much as possible, kubernetes containers
    require isolation of platform CPUs, isolcpus CPUs, and shared CPUs. For
    Guaranteed pods, we also need to disable CFS quota throttling.
    
    Infrastructure pods are allowed to run on platform CPUs since they're
    basically doing platform work.  This frees up some resources on
    application CPUs for "normal" kubernetes containers.
    
    Story: 2008760
    Task: 42159
    Change-Id: I28c99565ed8081496f6d8be4aa68144a1d3578ed
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 9d60767e32ab937be2e7d1beaf86c8651dd2ac5a
Author: Li Zhou <li.zhou@windriver.com>
Date:   Wed Mar 31 23:32:19 2021 -0400

ntp: fix CVE-2020-13817
    
    Update ntp source package to:
    ntp-4.2.6p5-29.el7.centos.2.src.rpm
    In fact it is version ntp-4.2.6p5-29.el7_8.2.
    (Refer to https://git.centos.org/rpms/ntp/c/
    e9ba41e9edf8efad8f090aad24845b8f4db0668d?branch=c7)
    
    Story: 2008532
    Task: 41691
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: If5db6b15b9c01a20a614bb160bba575c6b578d3e

commit 7badc1dad154bd28a8d299d748854dad53606c82
Author: Babak Sarashki <babak.sarashki@windriver.com>
Date:   Wed Mar 3 12:15:52 2021 +0000

integ: add nvidia gpu-operator helm charts
    
    This commit adds nvidia gpu-operator helm charts use case for
    custom container runtime feature. To load nvidia-gpu-operator
    on starlingx:
    
    system service-parameter-add platform container_runtime \
    custom_container_runtime=\
    nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime
    
    And define  runtimeClass for nvidia gpu  pods:
    
    kind: RuntimeClass
    apiVersion: node.k8s.io/v1beta1
    metadata:
      name: nvidia
    handler: nvidia
    
    The above will direct all containerd creations of pods with nvidia
    runtimeClass to nvidia-container-runtime -- where the nvidia-conta
    iner-runtime is installed by the operator onto a hostMount.
    
    Story: 2008434
    Task: 41978
    
    Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
    Change-Id: Ifea8cdf6eb89a159f446c53566279e72fcf0e45e

commit 9c8d4bbcfb0d2b84bfc17276f4afa906b8e97686
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Wed Jul 15 19:45:24 2020 -0400

fix net/http caching of broken persistent connections
    
    The net/http transport code is currently broken, it keeps broken
    persistent connections in the cache if a write error happens during
    h2 handshake.
    
    This is documented in the upstream bug at:
    https://github.com/golang/go/issues/40213
    
    The problem occurs because in the "go" compiler the http2 code is
    imported into http as a bundle, with an additional "http2" prefix
    applied.  This messes up the erringRoundTripper handling because
    the name doesn't match.
    
    The solution is to have the "go" compiler look for an interface
    instead, so we add a new dummy function that doesn't actually do
    anything and then the "go" compiler can check whether the specified
    RoundTripper implements the dummy function.
    
    Specifically for Kubernetes we need to update the http2 code in the
    "vendor" subdirectory.  A separate change is being made in the "go"
    compiler.
    
    Partial-Bug: 1887438
    Depends-On: https://review.opendev.org/c/starlingx/compile/+/780669
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
    Change-Id: I95dcbda879973524cd23b2a374537a675ce9435f

commit f161f7f18e6a12592f8e807b15576d6609d5946e
Author: Jim Gauld <James.Gauld@windriver.com>
Date:   Mon Mar 29 12:31:25 2021 +0000

Revert "integ: gpu-operator helm charts"
    
    This reverts commit 41bdf53f65684b54abaa3098a5fe3acf568cdf2a.
    
    Reason for revert: gpu operator patch is breaking stx-master build.
    
    e.g.,
    08:06:44 Failed to build packages:  gpu-operator-1.6.0-0.tis.1.src.rpm; problem with:
    Patch #2 (enablement-support-on-starlingx-cloud-platform.patch):
    . .
    Skipping patch.
    1 out of 1 hunk ignored -- saving rejects to file deployments/gpu-operator/templates/operator.yaml.rej
    patching file deployments/gpu-operator/values.yaml
    error: Bad exit status from /var/tmp/rpm-tmp.VQuqLh (%prep)
    
    Change-Id: Id7a05987586582c940d605874d1e0f813333f2c3

commit 41bdf53f65684b54abaa3098a5fe3acf568cdf2a
Author: Babak Sarashki <babak.sarashki@windriver.com>
Date:   Wed Mar 3 12:15:52 2021 +0000

integ: gpu-operator helm charts
    
    This commit adds nvidia gpu-operator helm charts use case for
    custom container runtime feature. To load nvidia-gpu-operator
    on starlingx:
    
    system service-parameter-add platform container_runtime \
    custom_container_runtime=\
    nvidia:/usr/local/nvidia/toolkit/nvidia-container-runtime
    
    And define  runtimeClass for nvidia gpu  pods:
    
    kind: RuntimeClass
    apiVersion: node.k8s.io/v1beta1
    metadata:
      name: nvidia
    handler: nvidia
    
    The above will direct all containerd creations of pods with nvidia
    runtimeClass to nvidia-container-runtime -- where the nvidia-conta
    iner-runtime is installed by the operator onto a hostMount.
    
    Story: 2008434
    Task: 41978
    
    Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
    Change-Id: I999804d4697349bc0966d0a6e653d7bce15e18fc

commit 3832fabeff1493d424593ec502d261508d9e6e75
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Mon Mar 22 15:06:41 2021 -0400

Upgrade pf-bb-config to version 21.3
    
    Upgrade of pf-bb-config package to v21.3 is required by Intel in order
    to have better support to ACC100 (Mount Bryce) device.
    
    Story: 2008440
    Task: 42090
    Change-Id: I2af1ca9fc43ae78f41f30f4bde255afaacb56c46
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>

commit 852ec5ed538f5091ee7e6aa604be68295c09d21b
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Thu Mar 4 17:36:54 2021 +0200

Add custom apps in the k8s-pod-recovery service
    
    At startup, there might be pods that are left in unknown states.
    The k8s-pod-recovery service takes care of
    recovering these unknown pods in specific namespaces.
    To fix this for custom apps that are not part of starlingx,
    we modify the service to look into the /etc/k8s-post-recovery.d
    directory for conf files. Any app that needs to be recovered by this
    service will have to create a conf file e.g the app-1 will create
    /etc/k8s-post-recovery.d/APP_1.conf which will contain the following:
    namespace=app-1-namespace
    
    Closes-Bug: 1917781
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
    Change-Id: I8febdb685d506cff3c34946163612cafdab3e3a8

commit 6169cc5d81f809a1237ba341f7cb87d09fdd811e
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Thu Mar 11 09:12:49 2021 -0500

Handle labeled pods after stabilized
    
    Pods that are in a k8s deployment, daemonset, etc can be labeled as
    restart-on-reboot="true", which will automatically cause them to be
    restarted after the worker manifest has completed in an AIO system.
    It may happen, however, that k8s-pod-recovery service is started
    before the pods are scheduled and created at the node the script is
    running on, causing them to be not restarted. The proposed solution is
    to wait for stabilization of labeled pods before restarting them.
    
    Closes-Bug: 1900920
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
    Change-Id: I5c73bd838ab2be070bd40bea9e315dcf3852e47f

commit cb85cff32ba0afc48fbe16ab94dd36edc979fbb4
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Wed Jan 20 21:41:20 2021 -0500

dhcp: fix CVE-2019-6470
    
    Upgrade dhcp pkg to dhcp-4.2.5-82.el7.centos.src.rpm
    
    Adjust the context of the patch to match to apply the new version.
    At the same time as the new version depends on the bind-export
    pacakges, so we also add the dependence package in tools repo.
     bind-export-libs-9.11.4-26.P2.el7.x86_64.rpm
     bind-export-devel-9.11.4-26.P2.el7.x86_64.rpm
    
    In addition, since the patch dhcp-dhclient_ipv6_prefix.patch set the
    default prefixlen to 128, which is usually the specifications call
    for host address and it doesn't include any on-link information.
    By contrast, 64 indicates that's subnet area, and this vaule is used
    frequently as usual. So we still use the previous value 64.
    As a result we don't need to modify the relevant place where every
    application code needed for the compatibility any more.
    
    Depends-On: https://review.opendev.org/c/starlingx/tools/+/772241
    
    Story: 2008532
    Task: 41638
    Change-Id: I0305711790d8e3fb1adfa69e1077468456b65d84
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>

commit 7d7fe3dc61864e38c6f183aa6ae7583844b44183
Author: Joe Slater <joe.slater@windriver.com>
Date:   Wed Feb 24 17:39:05 2021 -0500

sudo: fix CVE-2021-3156
    
    Advance to sudo-1.8.23-10.el7_9.1.src.rpm.
    
    Closes-Bug: 1916946
    Change-Id: Ibb90439c77d6f5b1badcadb37080ff9e330787d5
    Signed-off-by: Joe Slater <joe.slater@windriver.com>

commit eccff3b0e661592084d9114a9a41816761e1f9b5
Author: Steven Webster <steven.webster@windriver.com>
Date:   Wed Feb 17 12:39:52 2021 -0500

Uprev SR-IOV CNI image
    
    This commit uprevs the SR-IOV CNI image to pick up a few bug
    fixes.  Specifically, this commit will allow rate-limiting
    configuration on a VF to be retained after the VF has been
    used by a pod (and pod subsequently deleted).
    
    Testing:
    
    NICs:
    Ethernet Controller X710 for 10GbE SFP+
    Mellanox MT27700 Family [ConnectX-4]
    
    Functional:
    Connectivity testing (kernel + DPDK)
    Devices allocated appropriately to pod
    Rate-limiting information retained after pod deletion
    
    Partial-Bug: #1915951
    
    Signed-off-by: Steven Webster <steven.webster@windriver.com>
    Change-Id: I32395c4805164401519cde8bc503f040c4187250

commit 8a33372bee65b517850245b55f771e3cd6bba0ff
Author: Babak Sarashki <zbsarashki@gmail.com>
Date:   Thu Feb 4 23:21:48 2021 +0000

Add: PF Baseband Device config application for ACC100
    
    This introduces PF BBDEV (baseband device) Configuration Application
    "pf_bb_config" and inih. PF BBDEV program accesses the configuration
    space and sets the various parameters through memory-mapped IO
    read/writes. This is needed for Intel ACC100 (Mt Bryce) configuration
    and QMGR related settings.
    
    PF BBDEV requires inih for parsing .INI configuration file. This
    commit adds the inih for static linkage with PF BBDEV.
    
    Story: 2008440
    Task: 41472
    Signed-off-by: Babak Sarashki <zbsarashki@gmail.com>
    Change-Id: Idaebcac5d0021d5c11c7ab27e13176139ba66c3b

commit 7b5b3aeabfdb47b51fce5f1591d82fd3ca5d9672
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Wed Feb 10 21:00:04 2021 -0500

Revert "dhcp: fix CVE-2019-6470"
    
    This reverts commit 613fbf258f72042f912a1fde5608168b1068db36.
    
    Since this upversioned package updates the prefixlen to 128, and it
    will occur all hosts offline after booting off the controller-0.
    At the same time this issue will block the use of recent loads for
    both development and test activities. So we revert the patch firstly,
    and investigate deeply then send the new review and request of the
    upgraded patch with the appropriate offline fix.
    
    Closes-Bug: #1915050
    
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>
    Change-Id: I02ecaa1bda463efb38d9c32a47f2221d0de7f99d

commit 29dd2fd42acf3e37e53715ca4755b8b089c743cb
Author: Li Zhou <li.zhou@windriver.com>
Date:   Tue Jan 26 07:50:09 2021 +0000

openssh: fix CVE-2018-15473 from source build
    
    Upgrade to openssh-7.4p1-21 for fixing CVE.
    
    Story: 2008532
    Task: 41668
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: Ic3e10b3455587bba16585fe8e235c4c0655f1e3e

commit d053c675546e944d0a08fb6f8d2b831647f70663
Author: Li Zhou <li.zhou@windriver.com>
Date:   Tue Jan 26 07:21:41 2021 +0000

sudo: fix CVE-2019-18634
    
    Upgrade to sudo-1.8.23-10 for fixing CVE.
    
    Story: 2008532
    Task: 41689
    Signed-off-by: Li Zhou <li.zhou@windriver.com>
    Change-Id: I863e66ee887de40d75db7951f4ba408ad022c131

commit a0b2acecaac080345c1cd42c3ad7fc05d75ac96a
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Mon Jan 25 03:49:38 2021 -0500

grub2: fix CVE-2020-15707
    
    Avoid to the heap-based buffer overflow.
    
    Upgrade to the below package to fix the CVE issue:
     grub2-2.02-0.86.el7.centos.src.rpm
    
    At the same time adjust the context and drop
    0004-grub2-remove-32b-requirements.patch since it already had been
    included in the new version.
    
    Story: 2008532
    Task: 41664
    Change-Id: I7943127323ee28457ffe0a4ece54764633f86d9f
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>

commit 613fbf258f72042f912a1fde5608168b1068db36
Author: Zhixiong Chi <zhixiong.chi@windriver.com>
Date:   Wed Jan 20 21:41:20 2021 -0500

dhcp: fix CVE-2019-6470
    
    Upgrade dhcp pkg to dhcp-4.2.5-82.el7.centos.src.rpm
    
    At the same time since the new version depends on the bind-export
    pacakge, so we also add the dependence package in tools repo.
    
    Depends-On: https://review.opendev.org/c/starlingx/tools/+/771744
    
    Story: 2008532
    Task: 41638
    Change-Id: Ic25b4404475a6f914e5a524db7d60d7e9dcffc85
    Signed-off-by: Zhixiong Chi <zhixiong.chi@windriver.com>

commit 8ec4e97b34e49d2ad212bc16f7863fd83eff6f8e
Author: Don Penney <don.penney@windriver.com>
Date:   Thu Dec 17 13:26:44 2020 -0500

Add auto-version for remaining stx/integ packages
    
    Update remaining StarlingX packages with hardcoded TIS_PATCH_VER to
    use PKG_GITREVCOUNT where possible, with offsets as needed to ensure
    the version is incremented above the hardcoded version.
    
    Change-Id: I9b40cd7e41c0cd713b73741ac3c8cab41d358642
    Story: 2008455
    Task: 41461
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 46d8d8fdf1ec75d74df6e2f20dff5f31732b9dc7
Author: Robert Church <robert.church@windriver.com>
Date:   Sat Dec 12 01:08:54 2020 -0500

Add conditions to when RBD devices are unmounted
    
    ceph-preshutdown.sh is called as a post operation when docker is
    stopped/restarted. Based on current service dependencies, when docker is
    restarted this will also trigger a restart of containerd.
    
    Puppet manifests will restart containerd and docker for various
    operations both on system boot and during runtime operations when their
    configuration has changed.
    
    This update adds conditions to ensure that the RBD devices are only
    unmounted when the system is shutting down. This avoids the RBD backed
    persistent volumes from being forcibly removed from running pods and
    being remounted read-only during these restart scenarios.
    
    Change-Id: I7adfddf135debcc8bcaa1f93866e1a276b554c88
    Closes-Bug: #1901449
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit d815cfe2f2003f4a64a5e0b5348b0e9daabb58df
Author: Nicolas Alvarez <nicolas.alvarez@windriver.com>
Date:   Thu Dec 3 17:42:31 2020 -0300

Uninstall SNMP RPM Host-Based from STX.
    
    Uninstall SNMP RPM Host-Based from starlingx/integ repo because it
    will be containerized.
    Also disable snmp from networking/lldpd/centos/lldpd.spec file.
    
    Story: 2008132
    Task: 41322
    Depends-On: https://review.opendev.org/761792
    Signed-off-by: Nicolas Alvarez <nicolas.alvarez@windriver.com>
    
    Change-Id: Ifda06a5eb3bd0ec9683823b643e6d9cc0e7c97e2

commit 39f6f92cc888b3893b4d4717fba1599056382997
Author: Angie Wang <angie.wang@windriver.com>
Date:   Mon Sep 28 11:24:12 2020 -0400

Armada: add configurations for helm sql storage backend
    
    Configmap is the default helmv2 storage backend to store
    release information but its 1MB resource limit prevents
    scaling up stx openstack workers, so we want to use sql
    as helm storage backend.
    
    Update armada chart to support sql storage backend
    configuration for helm/tiller.
    
    Upstream review: https://review.opendev.org/#/c/759899/
    
    Partial-Bug: 1887677
    Change-Id: Ifcb7f28e99413be5a0dbfddf684ca064866860f5
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-07: Fix merged to config (f/centos8)

#52

Download full text (147.3 KiB)

Reviewed: https://review.opendev.org/c/starlingx/config/+/794906
Committed: https://opendev.org/starlingx/config/commit/75758b37a5a23c8811355b67e2a430a1713cd85b
Submitter: "Zuul (22348)"
Branch: f/centos8

commit 9e420d9513e5fafb1df4d29567bc299a9e04d58d
Author: Bin Qian <email address hidden>
Date: Mon May 31 14:45:52 2021 -0400

Add more logging to run docker login

Add error log for running docker login. The new log could
help identify docker login failure.

    Closes-Bug: 1930310
    Change-Id: I8a709fb6665de8301fbe3022563499a92b2a0211
    Signed-off-by: Bin Qian <email address hidden>

commit 31c77439d2cea590dfcca13cfa646522665f8686
Author: albailey <email address hidden>
Date: Fri May 28 13:42:42 2021 -0500

Fix controller-0 downgrade failing to kill ceph

kill_ceph_storage_monitor tried to manipulate a pmon
file that does not exist in an AIO-DX environment.

We no longer invoke kill_ceph_storage_monitor in an
AIO SX or DX env.

    This allows: "system host-downgrade controller-0"
    to proceed in an AIO-DX environment where that second
    controller (controller-0) was upgraded.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I633853f75317736084feae96b5b849c601204c13

commit 0dc99eee608336fe01b58821ea404286371f1408
Author: albailey <email address hidden>
Date: Fri May 28 11:05:43 2021 -0500

Fix file permissions failure during duplex upgrade abort

    When issuing a downgrade for controller-0 in a duplex upgrade
    abort and rollback scenario, the downgrade command was failing
    because the sysinv API does not have root permissions to set
    a file flag.
    The fix is to use RPC so the conductor can create the flag
    and allow the downgrade for controller-0 to get further.

    Partial-Bug: 1929884
    Signed-off-by: albailey <email address hidden>
    Change-Id: I913bcad73309fe887a12cbb016a518da93327947

commit 7ef3724dad173754e40b45538b1cc726a458cc1c
Author: Chen, Haochuan Z <email address hidden>
Date: Tue May 25 16:16:29 2021 +0800

Fix bug rook-ceph provision with multi osd on one host

    Test case:
    1, deploy simplex system
    2, apply rook-ceph with below override value
    value.yaml
    cluster:
      storage:
        nodes:
        - name: controller-0
          devices:
          - name: sdb
          - name: sdc
    3, reboot

Without this fix, only osd pod could launch successfully after boot
as vg start with ceph could not correctly add in sysinv-database

Closes-bug: 1929511

Change-Id: Ia5be599cd168d13d2aab7b5e5890376c3c8a0019
Signed-off-by: Chen, Haochuan Z <email address hidden>

commit 23505ba77d76114cf8a0bf833f9a5bcd05bc1dd1
Author: Angie Wang <email address hidden>
Date: Tue May 25 18:49:21 2021 -0400

Fix issue in partition data migration script

    The created partition dictonary partition_map is not
    an ordered dict so we need to sort it by its key -
    device node when iterating it to adjust the device
    nodes/paths for user created extra partitions to ensure
    the number of device node...

Reviewed:  https://review.opendev.org/c/starlingx/config/+/794906
Committed: https://opendev.org/starlingx/config/commit/75758b37a5a23c8811355b67e2a430a1713cd85b
Submitter: "Zuul (22348)"
Branch:    f/centos8

commit 9e420d9513e5fafb1df4d29567bc299a9e04d58d
Author: Bin Qian <bin.qian@windriver.com>
Date:   Mon May 31 14:45:52 2021 -0400

Add more logging to run docker login
    
    Add error log for running docker login. The new log could
    help identify docker login failure.
    
    Closes-Bug: 1930310
    Change-Id: I8a709fb6665de8301fbe3022563499a92b2a0211
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 31c77439d2cea590dfcca13cfa646522665f8686
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri May 28 13:42:42 2021 -0500

Fix controller-0 downgrade failing to kill ceph
    
    kill_ceph_storage_monitor tried to manipulate a pmon
    file that does not exist in an AIO-DX environment.
    
    We no longer invoke kill_ceph_storage_monitor in an
    AIO SX or DX env.
    
    This allows: "system host-downgrade controller-0"
    to proceed in an AIO-DX environment where that second
    controller (controller-0) was upgraded.
    
    Partial-Bug: 1929884
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I633853f75317736084feae96b5b849c601204c13

commit 0dc99eee608336fe01b58821ea404286371f1408
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri May 28 11:05:43 2021 -0500

Fix file permissions failure during duplex upgrade abort
    
    When issuing a downgrade for controller-0 in a duplex upgrade
    abort and rollback scenario, the downgrade command was failing
    because the sysinv API does not have root permissions to set
    a file flag.
    The fix is to use RPC so the conductor can create the flag
    and allow the downgrade for controller-0 to get further.
    
    Partial-Bug: 1929884
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I913bcad73309fe887a12cbb016a518da93327947

commit 7ef3724dad173754e40b45538b1cc726a458cc1c
Author: Chen, Haochuan Z <haochuan.z.chen@intel.com>
Date:   Tue May 25 16:16:29 2021 +0800

Fix bug rook-ceph provision with multi osd on one host
    
    Test case:
    1, deploy simplex system
    2, apply rook-ceph with below override value
    value.yaml
    cluster:
      storage:
        nodes:
        - name: controller-0
          devices:
          - name: sdb
          - name: sdc
    3, reboot
    
    Without this fix, only osd pod could launch successfully after boot
    as vg start with ceph could not correctly add in sysinv-database
    
    Closes-bug: 1929511
    
    Change-Id: Ia5be599cd168d13d2aab7b5e5890376c3c8a0019
    Signed-off-by: Chen, Haochuan Z <haochuan.z.chen@intel.com>

commit 23505ba77d76114cf8a0bf833f9a5bcd05bc1dd1
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue May 25 18:49:21 2021 -0400

Fix issue in partition data migration script
    
    The created partition dictonary partition_map is not
    an ordered dict so we need to sort it by its key -
    device node when iterating it to adjust the device
    nodes/paths for user created extra partitions to ensure
    the number of device node/path for each extra partition
    is calculated correctly, otherwise the adjustments
    could be messy that causes the partition DB update to
    fail.
    
    Tested AIO-SX upgrade with three additional partitions.
    
    Change-Id: I1cb3bbfaf144a59d29633c1784b0fde80529cd71
    Closes-Bug: 1892554
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 204d5c36438979767a5c30a31da332cfffde4e66
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon May 24 10:45:43 2021 +0300

Expose ceph backend field over proxy endpoint
    
    The main storage backend endpoint acts like a proxy. It allows ceph
    backend to be configured, but doesn't return all the desired fields
    when doing a query.
    
    Fill the information about ceph backend network parameter.
    
    Story: 2008843
    Task: 42350
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I6521f3727ed96be33ade2419946c875d7ffb6e13

commit c6f0967086e61c7e55d8e0501f94168b82cb7040
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Sat May 22 19:04:01 2021 -0500

Apply sriov/FEC configuration during SX upgrade
    
    During the SX we also need to apply the sriov/FEC manifest in order to
    configure the FEC devices. This needs to be done before the sysinv agent
    starts to maintain the host configuration.
    
    Closes-Bug: 1929301
    Change-Id: I99bd891bf43fd5912d0297861fce819fe6ce678f
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 9b8a1d074f85dc3e246d164e5d0356e44906287b
Author: Angie Wang <angie.wang@windriver.com>
Date:   Thu May 20 22:26:46 2021 -0400

Ensure the old app plugins enabled in app recovery lifecycle
    
    When an application update fails, the app should be recovered
    to the pervious version. In the case that the new app version
    is decoupled(which has the plugins infrastructure) but the old
    app version is not, the old app version is not considered to be
    a system knowledgeable app which causes the plugins activation
    to be skipped during the subsequent app update so that an error
    occurs because the lifecycle operator is still looking for the
    new app version's plugins.
    
    Update to ensure the old app plugins are enabled before armada
    process during recovery and after recovery is completed. For the
    apps decoupled in stx5.0 but not decoupled in stx4.0 like
    nginx-ingress-controller and portieris, only reload the operators.
    The particular handling for non-decoupled apps can be removed in
    the release stx6.0 as all apps in stx5.0 are decoupled.
    
    Change-Id: Ief79baac428af7f926f8721f15ded340e3cf1e44
    Closes-Bug: 1929149
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 8423e70fd04f07bbf6a22eb83d45c719663b0c51
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri May 21 12:56:47 2021 +0300

Fix pod max pids service parameter default value
    
    Openstack installation fails for rabbit-mq pods.
    
    Change the approach of how the default value is selected.
    Document recommended minimum values for apps instead of using them.
    Select the default value as high as possible, protecting against a
    rogue pod, protecting against platform slowdowns created by high number
    of processes in the system, but low enough such that platform is still
    responsive even on older hardware.
    User is free to decrease the limit to increase the degree of protection
    against slowdowns.
    
    Initially it was observed that openstack pods reach ~450 processes
    in steady state.
    New tests show even with the 2/3 extra room, 750 pid limit is not
    sufficient when deploying rabbit-mq pods. But 2000 is.
    Recommended minimum value for openstack pods pid limit becomes 2000.
    
    Partial-Bug: 1928949
    Related-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I0d66173e2247fae15eda1ad0e83c7bcf858f0369

commit 7d740e60f41bf4ee57210caa7acddaa1e9c543fb
Author: Don Penney <don.penney@windriver.com>
Date:   Thu May 20 23:04:47 2021 -0400

Add coredns SX to DX migration manifest
    
    As part of the SX to DX migration, a runtime manifest class is added
    to update coredns configuration. This commit updates the system modify
    handler to apply this runtime manifest when modifying system mode from
    simplex to duplex.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/792494
    Change-Id: I03b609de16d0273bf116f318affbe1f22867cfa9
    Partial-Bug: 1929148
    Signed-off-by: Don Penney <don.penney@windriver.com>

commit 9d8cdc5bb33f9e8ae4dfd658a0c9b216e7557431
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Fri May 14 14:45:29 2021 -0400

Replace applying flag by dict for FEC device check
    
    This change replaces the latest solution to check FEC device before an
    unlock action, that relied on an '/APPLYING' flag. In certain asynchro-
    nous scenarios, that flag could be cleared before than expected if an
    inventory report not related to the FEC device configuration came late
    (that might happen when configuring a long queue of SRIOV port changes)
    or by periodic sysinv report.
    The solution still uses the 'extra_info' field of PCI devices, this
    time "stringifying" a dictionary entry for 'expected_numvfs' that will
    keep (without clearing) at that field the programmed number of VFs at
    FEC device. It is then compared with the actual sriov_numvfs of device
    from the inventory report, in a similar way of what is currently done
    for comparing SRIOV interfaces (from database) to ports (from device).
    
    Closes-bug: 1927089
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
    Change-Id: I380bd66a8229a72ef1981cbefa3a0543c28d7f30

commit 6d262e1b4c80ef6567edf7ebcaa494e143b30a77
Author: albailey <Al.Bailey@windriver.com>
Date:   Thu May 20 08:43:15 2021 -0500

Fix zuul for bandit target
    
    Some zuul nodes running bionic do not consider the
    older version of pyflakes to be installable. This seems to
    be a cache issue.
    
    This fix updates the version of hacking defined in the top
    level test-requirements.txt file to use a more modern version.
    
    It also only imports yamllint if it is python 3, since the yamllint
    tox target is python 3 only.
    
    Partial-Bug: 1928978
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ia7aa6a296810adc0d9ba9eca701ec70f2c4be8cd

commit d55ca91b907c45dc73d4d44e39560e37a7b668ef
Author: albailey <Al.Bailey@windriver.com>
Date:   Wed May 19 13:51:15 2021 -0500

Specify the nodeset for zuul jobs
    
    The py2.7 jobs need to specify xenial
    The py3.6 jobs need to specify bionic
    The focal zuul nodes only have python 3.8 installed in them
    
    The copyright date was updated for some files in order to trigger
    the zuul jobs, as a no-delta type of change.
    
    Partial-Bug: 1928978
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ifc7904d4908a5dbe2ffbd9214e5e4c425932afad

commit f3edd0cd6a278d1cc537529ac366c23cc6d4919d
Author: Lucas Cavalcante <lucasmedeiros.cavalcante@windriver.com>
Date:   Thu Apr 22 07:20:05 2021 -0300

Add pull_image option to perform_app_upload
    
    Downloading images at application upload enables subclouds to use the
    central cloud registry even when the central does not have the same
    application applied. This way Central would only need to upload an app
    such as Openstack in order to have images available for the subclouds.
    
    This is done by adding an extra option `-i` or `--images` that
    defaults to False.
    
    Testing:
    * VirtualBox: AIO-DX (Central), AIO-SX (Subcloud)
    * Upload an app to the Central Cloud using `--image` or '-i'
    * option (without applying it) Upload and apply app at
    * Subcloud using the DC registry
    
    Signed-off-by: Lucas Cavalcante <lucasmedeiros.cavalcante@windriver.com>
    Change-Id: I706c2bdf233617aadae8506724dde1afbbc1b35b
    Closes-Bug: 1925844

commit 30e0d90d24df92e79a414e33c1779aed5e43eb2c
Author: Angie Wang <angie.wang@windriver.com>
Date:   Wed May 19 10:48:44 2021 -0400

Update k8s application upgrade script for corner case
    
    The upgrade script may report the application update incorrectly
    because there is a window when querying the application version
    and status during update.
    
    In the case that the application update to the new version is
    failed, the query for the app version is made before the
    application recovery is triggered, and the query for the app
    status is made after the application is recovered to the previous
    version, the script gets incorrect information and reports the
    application is updated successfully.
    
    Update the application query in one request to eliminate this
    possiblity.
    
    Change-Id: Icbf173214de591861c21841e22359ad981453e84
    Story: 2008055
    Task: 42246
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit e4bc9dc6026f76964e91809d3971e73160d653db
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon May 17 13:23:54 2021 +0300

Fix skipping recovery of the old app for app updates
    
    When app update fails and it is requested that rollback is skipped it
    still reverts to the old version of the app.
    
    Confusion here was created by the names of the building blocks for the
    update logic (perform_app_rollback, perform_app_recover). In fact it is
    desired to skip operations that recover the old app.
    
    Added the missing logic for path of failed update operation.
    Now both upgrades and downgrades of the app behave the same.
    
    Tested by changing the pvc claim to trigger the armada failure.
    
    Closes-Bug: 1928671
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I6792744257a9cb249e0b1bf99f9b78f3b27859d9

commit eec51d6a4a47737b1eba8585c412449292938ad9
Author: Vinicius Lopes da Silva <vinicius.lopesdasilva@windriver.com>
Date:   Thu Apr 29 10:55:40 2021 -0400

Delete ceph stor when removing controller
    
    When removing controller from cluster, its OSD is not being removed
    from ceph. This happens when removal is made by issuing "host-delete"
    command.
    
    This commit adds a call to remove the OSD from Ceph when controller is
    removed.
    
    Closes-Bug: #1926626
    
    Signed-off-by: Vinicius Lopes da Silva <vinicius.lopesdasilva@windriver.com>
    Change-Id: If0fd9260ab7b9c717ef0a4ae621da0ffc9d0e6ab

commit 32b4df542c4566ad08f057950f78dae7633a8233
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Tue May 18 16:02:29 2021 -0400

Notify dcmanager when k8s upgrade completes
    
    When a k8s upgrade has been completed, we notify dcmanager so that
    it can do a kubernetes audit of the subclouds immediately rather
    than waiting up to an hour for the normal audit to run.
    
    Change-Id: Ife2abdbc65ad4ee91441db8fa39cb80291cbe201
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Partial-Bug: 1928864

commit f9a862b5d80943b76cd71dab84c765f9b17af89e
Author: albailey <Al.Bailey@windriver.com>
Date:   Tue May 18 11:06:17 2021 -0500

Fixing pylint failures in zuul.
    
    pylint running in python3.8 will complain about some
    code such as 'import contextlib'
    
    Rather than destabilize the code, specify python3.6 for
    the pylint job.
    
    When python2 is fully dropped, the changes can be addressed.
    
    The zuul nodeset for bionic sets up a venv for python3.6
    
    Closes-Bug: #1928841
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ib2e7f467ce20e5b31aef7242405ef034583d4e1a

commit bf547186d19a218c03e73f31c5d7cafd1a5d4bd3
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Wed May 12 15:16:32 2021 +0300

Add service parameter to control pod pids limit
    
    Create a config section for kubernetes service.
    Create a parameter named pod_max_pids to have similar name as
    the kubernetes parameter pod-max-pids.
    Store the value in the config section.
    
    This will create a system-wide entry in hieradata when unlocking:
    plattform::kubernetes::params::k8s_pod_max_pids
    
    This affects hosts with kubelet running, meaning controller and
    worker personalities. A config out of date will be raised for all hosts
    of both personalities, even for parameters that target only a specific
    personality.
    
    After modifying the parameter a host-lock then host-unlock is required.
    
    Platform pods use under 20 processes in steady state.
    Some openstack pods reach ~450 processes in steady state.
    Since StarlingX provides some optional apps we provide a default value
    that takes into account the most hungry app, that being openstack.
    The database entry will be populated considering openstack will be
    applied.(I707ddc4ca67595fbf809c6ffc15ecd4fb21f4661), but we shouldn't
    restrict the minimum based on optional apps, as this allows the user
    to set a lower minimum if there is no plan to use openstack.
    
    Tested on Standard+dedicated storage:
    - out of sync raised for controllers and workers when using
    service-parameter modify
    - alarm cleared after host-lock, host-unlock
    - new value correctly generated and used
    - add with system service-parameter-add
    - modify with system service-parameter-modify
    
    Tested on top of: I10c1684fe3145e0a46b011f8e87f7a23557ddd4a
    Partial-Bug: 1928353
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I74fcf2bd405c2a3811a4f27a55b28c0d001430e1

commit 82f01a8912567e48c0778689c80919e3fc6d9ce9
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon May 17 18:44:29 2021 +0300

Re-add wrongly removed function in helm plugins
    
    Some app plugins that are based on base.BaseHelm use a function wrongly
    removed in I681ccb3302b8f233424bc291e08675a4dc2b10f7.
    Re-add the function.
    
    Closes-Bug: 1928696
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I43fe2065714cbae2eebf43469424e812b8783218

commit cfe94d9dae6f55d5d44b23a57d742827f43b16ca
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon May 17 18:13:20 2021 +0300

Fix semantic check for N+1 app version
    
    Add the missing property 'relative_timing' for a LifecycleHookInfo
    object.
    
    Closes-Bug: 1928692
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I5c87aac9497f19f3a5adb8d383465c78cdad0a7b

commit 703b9dc6f97628271950f9b7352fdb8d4df8f74d
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Wed May 12 16:51:19 2021 -0300

Refactor and expose logic to acquire a flock with retries
    
    When locking the file descriptor skip_udev_partition_probe was not
    handling errors thrown by fcntl.flock which was leading controller-0
    to degraded state after unlock. This change aims to strengthen that
    logic by handling the error properly, retrying the lock operation and
    improving logs.
    
    Re-implementation of commit cbb9121a289603ec003dec098b8fa5918ca98300.
    The original commit inadvertently replaced a shared lock with a
    exclusive lock on the decorator skip_udev_partition_probe which caused
    fd locking issues.
    
    This commit exposes utility functions to acquire shared or exclusive
    non-blocking locks of file descriptors.
    
    Tested on Standard (2 + 4) and AIO-Simplex configurations. Ran sanity
    load on both.
    
    Closes-Bug: 1922256
    Change-Id: Ifcddab027df955152f420fd7451f42167694a31a
    Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>

commit d4f82539e0d421ab8a7f1cd466bdbc269727bddc
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Thu May 13 22:23:59 2021 -0400

Leave parameter_reselect as null if not specified
    
    The sysinv API for interface returns the optional parameter
    'primary_reselect' with its default value when the attribute
    is not specified.
    This update is to leave the parameter as null if it is not
    specified.
    
    Closes-Bug: 1928461
    
    Change-Id: I67629aec1e58c26b1ed76c0cd1e37cd53e74b0b2
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit ca1fbf08cbb2e0bdf3f98f7fa091af178768992d
Author: John Kung <john.kung@windriver.com>
Date:   Thu May 13 14:00:12 2021 -0500

Update config file update operations to be more atomic
    
    sysinv agent iconfig_update_file() was unconditionally deleting
    the file or symlink to be updated prior to updating it. This could
    render the file missing if sysinv-agent is restarted.
    
    This update copies the updated file contents to the file and
    only removes symlink, if any, when needed.
    
    A few minor logging improvements related to logs observed in LP.
    
    Change-Id: I3c97e778b17dd0e9693a156ffe6b1c0269413f20
    Closes-Bug: 1928368
    Signed-off-by: John Kung <john.kung@windriver.com>

commit b62a585ab191ac74e71ff6b4b57d716a3aa3ea2f
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue May 11 17:30:18 2021 -0400

Revert "Revert "Check for connectivity to the tiller postgres backend.""
    
    This reverts commit 46421279912daa62162e493c3455ae5b9b75cf69.
    
    One additional change made based on the orignal commit is calling
    retrieve_helm_v2_releases() instead of retrieve_helm_releases()
    to make helmv2 query only as only helmv2 is using postgres backend.
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/789828
    Change-Id: Ia3c52192cea7c3addec446b22436db7a028ec5bc
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 12fff41d7803c7cea2b34e356ac65d361ca57789
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Wed May 5 13:03:50 2021 +0000

Handle empty 'helm list' result when there is nothing deployed
    
    The existing code assumes that there are always applications deployed
    and the result is never an empty list.
    The previous implementation ignored the return code when the subprocess
    was killed by the timeout handler.
    Split the method in two submethods for helm v2 and v3 implementations.
    
    Closes-Bug: 1923587
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Signed-off-by: Angie Wang <angie.wang@windriver.com>
    Change-Id: Ib547bdb20c39e35c1538e3abb90108f7e3cad228

commit 858aee342dcb94654c9fcd8f0731a2838d845eb1
Author: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
Date:   Thu May 6 07:35:56 2021 -0400

Check for FEC device configuration pending
    
    SRIOV configuration at Intel ACC100 device configuration was not being
    correctly applied when it followed a bunch of other configurations, like
    data networks, SRIOV interfaces, etc., and soon before the unlock of the
    host, because runtime manifest was not finishing before system shutdown.
    
    "This issue came about because of the way we handle 'pci_devices' in
    contrast to 'interfaces/ports'. For the interfaces/ports, when
    configuring SR-IOV, the interface sriov_numvfs will represent the user
    requested value, while the underlying port sriov_numvfs will represent
    the actual system value. Then, before a system can be unlocked, the
    values are compared for equality. This ensures that the runtime manifest
    that is setting the value of sriov_numvfs has a chance to run. For
    'pci_devices' like the FPGA cards we support, there is no concept of an
    'upper interface'. Therefore, the value of sriov_numvfs for a pci_device
    represents the system value rather than the user requested value. This
    can cause issues when performing and unlock right after configuring a
    pci_device with SR-IOV. There's a chance that the system populates its
    hieradata and unlocks before the runtime manifest has had a chance to
    configure the SR-IOV value for the pci_device." (Steven Webster)
    
    This change appends a "/APPLYING" string to 'extra_info' field of FEC
    device when it gets configured via API, and it will be removed only when
    the corresponding inventory is reported back by SYSINV agent. Until
    there, attempts of unlocking the host will find that sub-string at the
    field and this is subject of additional semantic check.
    
    Closes-Bug: 1927089
    Signed-off-by: Douglas Henrique Koerich <douglashenrique.koerich@windriver.com>
    Change-Id: I175bc01a2a51808c4dc7b821905c7417660bf286

commit a6481bc4d169ab3c81c73b6807299d3febf7d591
Author: Andy Ning <andy.ning@windriver.com>
Date:   Wed May 5 09:34:00 2021 -0400

Fix invalid admin endpoint cert during subcloud upgrade
    
    cert-mon queues failed cert update tasks and retry them later on. But
    the retry periodic function is not started in subcloud so it doesn't
    really work. This commit fix it by starting these periodic functions if
    the system's DC role is subcloud.
    
    This commit also added unauthorization exception handling for platform
    cert update, so that the retry task will reattempt updating the cert
    with a new token. The other certs update already have such exception
    handling.
    
    Note, commit 862c1746abb8d8901d2acb4bcb43569210e55f3e is needed to fully
    fix Bug 1926788.
    
    Closes-Bug: 1926788
    Signed-off-by: Andy Ning <andy.ning@windriver.com>
    Change-Id: If7f631ee3e5f97db7a06b184f9e68cf901cc8344

commit 4e75297b58c4a488451d575819608ec203c89972
Author: Andy Ning <andy.ning@windriver.com>
Date:   Mon May 10 12:24:52 2021 -0400

Specify default connect_timeout for sysinv agent
    
    During bootstrap, for some reason rabbitMQ takes longer time to
    response to sysinv agent's AMQP request after TCP connection is
    established. This usually happens after multiple connection failures,
    and can be as long as 8s. But it's still within th 10s
    handshake_timeout configured on rabbitMQ. However the sysinv agent
    uses the default 5s timeout from kombu lib so it doesn't wait long
    enough before hanging up the connection. This causes the agent can't
    reconnect to the rabbitMQ. In turn, the defered manifests don't have
    chance to pass readiness checking and get applied.
    
    This commit specifies a 10s default connect_timeout for the agent
    t0 connect to rabbitMQ, aligning with the server side timeout.
    
    Closes-Bug: 1928008
    Signed-off-by: Andy Ning <andy.ning@windriver.com>
    Change-Id: I8cc476910f47fca687ddcfd5f3d20f451f70ffce

commit 55e70b52d73c0c7948fa186a741600c190f0cd2a
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Mon May 10 12:27:04 2021 -0300

Fix bootstrap error on AIO-SX due to post SX to DX migration actions
    
    During start-up of sysinv-conductor we test if the AIO-SX to AIO-DX
    migration is occuring and it relies on an active ihost being available
    which is not the case for when the host is bootstrapping. This commit
    adds a check to whether host_uuid and ihost are available before usage.
    
    Closes-Bug: 1927984
    Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
    Change-Id: I3287852242579164604a7815a599c4f7e9f704f8

commit 6df2034a4e9e2a25cd0ba39af7074fec5a26466d
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Wed May 5 11:34:47 2021 -0300

Adding AIO-SX to AIO-DX migration steps patching existing PVs
    
    Kubelet and kube-api are no longer available during puppet
    manifest run during unlock. Therefore, we moved the patching
    of Persistent Volumes from puppet tosysinv-conductor
    as a post-migration step during its start-up.
    
    Closes-Bug: 1927224
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/789844
    Depends-On: https://review.opendev.org/c/starlingx/fault/+/790183
    Change-Id: I9745b7f8547c82485353130156011650f2655317
    Signed-off-by: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>

commit 407a3e374815c7d325b650e62205e4898ab89b13
Author: Rafael Jordão Jardim <RafaelJordao.Jardim@windriver.com>
Date:   Fri Apr 23 13:26:49 2021 -0400

Send the binary data instead of path
    
    The issue is when it executes a system
    application-upload in the remote cli it returns
    an error "Application-upload rejected: application
    tar file /wd/custom_apps/hello-kitty.tgz does not exist"
    the reason is because the cgts-client sends the
    path of the tarbal, the proposal solution sends
    the binary data and save it in a path on controller
    
    Closes-bug: 1926308
    Signed-off-by: Rafael Jordão Jardim <RafaelJordao.Jardim@windriver.com>
    Change-Id: I6dadef9e86612328ae68fc90564e929646e93dba

commit 9e9979d96e99c7cf8484f28a93d0635f3012fc5d
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Thu May 6 15:30:43 2021 -0400

Specify timeout for _get_token REST API request in cert-mon
    
    In a DC env, when management network connectivity is disrupted
    for some time, cert-mon's audit greenthreads hang. This
    happens because of missing timeouts in REST API requests which
    cause the urlopen to remain hanging forever. As a result, the
    dc-cert sync status for the subcloud  remains unknown
    indefinitely and the subclouds remain out-of-sync.
    
    This commit adds a timeout to the _get_token method to avoid
    this issue.
    
    Change-Id: Idd41cfca6b28287de8328b1ace856cc391778cac
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Closes-Bug: 1927735

commit 62c9e55efd1bc1f713924d3d08d7ba42e3a94af0
Author: Charles Short <charles.short@windriver.com>
Date:   Tue Mar 23 11:31:21 2021 -0400

Switch type(certificate) and type(obj)
    
    Switch type(certificate) and type(obj) to instance since its accords
    with python style.
    
    Story: 2006796
    Task: 42442
    
    Test:
    - Ran unit tests.
    - Built iso and ran some smoke tests.
    - Tested on general install on controller+storage+workers system,
      bootstrapped ok.
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I03d444f3ae67ab802881b8ede90f1e1c7e9694cc

commit ddc6b69dfc82487fc22f3c2eb1795d9bfdb2e155
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue Apr 20 12:16:17 2021 +0300

Allow configurable ceph storage backend network
    
    Ceph components are configured by puppet. Hieradata for it is generated
    by sysinv. Currently the configuration is generated using IPs from the
    management network.
    
    Allow ceph components to be configurable on cluster-host network.
    Create a 'network' parameter for strorage-backend-add command. The
    parameter is optional, the rest controller will default the value to
    management network.
    
    Example usage:
    system storage-backend-add --network cluster-host ceph --confirmed
    system storage-backend-add --network mgmt ceph --confirmed
    system storage-backend-add ceph --confirmed
    
    Updated unit tests.
    Added unit test for component that generates the ceph monitor ips for
    hieradata.
    
    Tests:
    1) AIO-SX: add ceph using --network cluster-host, unlock, ip is from
    cluster-host network.
    2) STANDARD 2+2: full deploy, no --network parameter, everything as
    before.
    3) storage-backend-modify and storage-backend-add allows only 'mgmt' and
    'cluster-host' network parameter.
    4) storage-backend-modify does modify the network before the first
    unlock; is rejected for network parameter after first unlock.
    
    Story: 2008843
    Task: 42350
    Task: 42351
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I681ccb3302b8f233424bc291e08675a4dc2b10f7

commit 6f15428e7565736ab600f4f0f01e1445987cc3a6
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu May 6 10:12:44 2021 -0500

Add logging to secured-etcd upgrade script
    
    Add logs to 70-active-secured-etcd-after-upgrade.sh. Redirect ansible logs to a named file under
    /root.
    
    Closes-Bug: 1927511
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I86fd964cad02ebb8eeb8645fc9dea71a3251aef7

commit 37d348ae6db2bf8522737287d805a100ed8245e5
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu May 6 10:02:50 2021 -0500

Limit etcd migration to swact actions
    
    Currently the etcd migration will take place whenever
    upgrade_swact_migration.py is called. Typically this is called during a
    swact as part of etcd start. However the script can also be called
    during etcd restart, which occurs as part of activating secured etcd.
    
    The solution is to limit the etcd migration to controller-0. This will
    ensure the action will only be taken during the swact to controller-0.
    
    Closes-Bug: 1927508
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I893eeaa7ddb0b600baa051498a30ed737c688151

commit 3e9982bab088df428e786d5fdf2515dc21f189fa
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed May 5 17:21:35 2021 -0400

AIO-SX reboots after change OAM ip address
    
    On HW tests, it was detected that openstack-endpoints restart was
    happening at the same as the service-manager restart, this creating
    a conflict that preventing SM services to reach enabled-active.
    This was provoking the reboot.
    
    The correction add a class to execute openstack-endpoint runtime
    restart on the post stage on puppet, avoiding to be run as SM is
    restarted.
    
    Tested on AIO-SX, by monitoring manifest apply and validating that
    no reboot happens
    
    Closes-Bug: 1927275
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/789946
    
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
    Change-Id: I9f547fbcc73ba5fea077c764a4a9282a02ac71c6

commit c43772ca5c8519254d2be4b16c3326a90091f806
Author: Adriano Oliveira <adriano.oliveira@windriver.com>
Date:   Wed May 5 00:36:59 2021 -0400

Memory adjustment to consider memory available
    
    In order to handle patch applications that might change the reserved
    constants defined for the platform memory, there is an auto adjustment
    routine that compensates the potential extra memory by decreasing the
    number of hugepages configured.
    However, the routine should take into account the memory available
    and only decrease in case a certain threshold is reached.
    For the sake of safety on memory allocation, it was considered 50%
    of total memory to be enough safe guard to avoid reaching the total
    memory limit as well avoiding hugepages to be removed (mainly 1G ones).
    Also, vswitch hugepages allocation have been added to the calculation
    of memory allocation.
    
    Testing:
    1. Patch that increases memory allocation constant
    2. AIO-DX upgrade from stx 4.0 to 5.0 (in which the platform memory
    reserved increased from 7000 MB to 8000MB)
    
    Closes-Bug: 1927172
    
    Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
    Change-Id: I0bb29de83268709cdae07834f537a28010d884bc

commit 6ac2c0e3d0e74dd8d139dba06afe96b6f438955e
Author: Daniel Safta <daniel.safta@windriver.com>
Date:   Tue Apr 27 10:36:44 2021 +0000

check app progress before swact
    
    This update adds a new check which will
    reject a swact while an application
    apply is in progress.
    
    Closes-Bug: 1926405
    Change-Id: I3c683776f3ecaf9c78d111b5b1108e9582497aaa
    Signed-off-by: Daniel Safta <daniel.safta@windriver.com>

commit c9df464849cfe9f6ee769d1682f0968a7aa7af6f
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Tue May 4 09:15:28 2021 -0400

Fix missing sriov attributes in N3000 pci info
    
    The fpga-agent did not include all of the sriov attributes in the
    reporting of N3000 pci device info to the conductor resulting in
    fields being overwritten.
    This update ensures that the attributes sriov_vf_driver and
    sriov_vf_pdevice_id are copied.
    
    Closes-Bug: 1925513
    
    Change-Id: I891166b1a0966b3bd9bfe253f954ff22f7ea8677
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 862c1746abb8d8901d2acb4bcb43569210e55f3e
Author: Bin Qian <bin.qian@windriver.com>
Date:   Fri Apr 30 12:14:31 2021 -0400

Remove subcloud admin endpoint data migration
    
    Admin endpoint cert upgrade will be handeled by manifest, so data
    migration is no longer needed in subcloud.
    On N+1 side, admin endpoint cert secret (key/cert) will be pulled
    directly from k8s resource for manifest to generate endpoint cert
    on first host unlock.
    
    Only need to update SAN of admin endpoint cert.
    
    Closes-Bug: 1923510
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/786666
    Change-Id: I4312abd6c767d6ba54c13ce1e90f2e25df9ed216
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 67c214a4e4ac234ceab33d1514f550eea41ae928
Author: John Kung <john.kung@windriver.com>
Date:   Thu Apr 29 07:55:55 2021 -0500

Clear host config_target on upgrade migration
    
    The host config_target is taken as a snapshot on the upgrade-start.
    This can lead to config out of date condition if the N controller
    issues subsequent config requests.  This is more likely in duplex
    controllers, however, for consistency the config_target is reset
    to track when the N+1 controller is active.
    
    Clear the config target of all hosts on upgrade migration.  The N+1
    controller will resume tracking the config by generating config_target
    when configuration is issued.
    
    Condition for upgrade activation-complete is updated to account
    for potential None config_target.
    
    Verified upgrade complete on AIO-SX and Duplex systems.
    
    Change-Id: I4dd44e6548a45d32ab6a0b6735a04d624da7caad
    Closes-Bug: 1926512
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 46421279912daa62162e493c3455ae5b9b75cf69
Author: Angie Wang <angie.wang@windriver.com>
Date:   Sun May 2 21:41:04 2021 +0000

Revert "Check for connectivity to the tiller postgres backend."
    
    This reverts commit 5edd3bdbe588e2c2e7a58cb839f030305613c30f.
    
    Reason for revert: It causes ansible bootstrap failed.
    
    Change-Id: I5f3640db8576eab1500b54f111a06a981c98b599

commit 5edd3bdbe588e2c2e7a58cb839f030305613c30f
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Tue Apr 13 08:52:40 2021 +0000

Check for connectivity to the tiller postgres backend.
    
    The existing code checks that the pod(s) are 'Running' but that
    might not be enough as the service inside the pod (postgres)
    might not be able to accept connections.
    
    Closes-Bug: 1923587
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: Ide49e4a38b805d5fc41d9f06d94393c69c6ed9d2

commit d225a0217bf3dae22c12e1266c4beab78f1032dd
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Sun Mar 28 14:02:32 2021 -0500

Run SRIOV manifest during controller-1 migration
    
    To address another issue we blocked manifests from applying on a
    controller with a different version from the active controller. This
    blocks the sriov manifest from running on controller-1 after it has been
    upgraded to N+1. To address this the data migration will check if
    controller-1 has sriov interfaces configured and apply the manifest if
    necessary.
    
    The previous change:
    https://review.opendev.org/c/starlingx/config/+/766637
    
    Closes-Bug: 1921788
    Change-Id: I1ec790fced00a4e973e546a260d28f52ef06fb3a
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit e53cf0cc2cb4490a93405789faf8fe497d8894ba
Author: Melissa Wang <melissa.wang@windriver.com>
Date:   Fri Apr 23 16:47:43 2021 -0400

SX-to-DX: Check host administrative state
    
    This update adds semantic checks to ensure that the controller is
    locked before starting simplex-to-duplex migration.
    
    Story: 2008587
    Task: 42369
    
    Change-Id: I1ebf3bb531073344b4d17a00cd1ee480051f3897
    Signed-off-by: Melissa Wang <melissa.wang@windriver.com>

commit b9130523851ef8f8f2f300d4b583c76c012c51d4
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Tue Apr 20 10:37:47 2021 -0400

Add RPCAPI calls to apply LDAP client and DNS runtime manifest
    
    This commit adds RPCAPI calls to invoke runtime manifest of LDAP
    client and DNS after adding system controller network and system
    controller OAM network in a subcloud after the initial network
    configuration completed.
    
    Tested:
    Delete the system controller network and system controller OAM network
    in a subcloud and add them with different values. The related
    configuration and Hieradata are updated.
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/787750
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/785977
    Story: 2008774
    Task: 42307
    
    Change-Id: I4ddf88efa16299c9415f4bf156f2be57e8cc826e
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit a9743b3e05c61a0b738af4770eeff5727bc51622
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Tue Apr 27 07:47:21 2021 -0400

Change in oam ip fails to update the system endpoints, in AIO-SX
    
    With story 2008531 the OAM-IP update is not changing the IP addresses
    for the openstack endpoints, although the REST APIs are already
    responding with the new IP
    
    The correction adds the keystone's endpoints class to the puppet
    manifest class list in order to correct the situation
    
    Tested in AIO-SX setup, as described on the launchpad
    
    Closes-Bug: 1926288
    
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
    Change-Id: I57c4a3392879be3c94de89ee67af3c5a50095b54

commit a9fc0be4e500a8739db04b7555c1be50a877b21f
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Fri Apr 16 08:19:25 2021 -0400

Support configuration of system-controller address pools post-install
    
    During migrating a subcloud to a new central cloud, the system
    controller's network configuration needs to be re-configured in the
    subcloud. This commit removes the check of is_initial_config_complete
    flag during modification/deletion of the system-controller-subnet and
    system-controller-oam-subnet address pools. After this commit, the two
    types of address pools can be re-configured after the initial
    bootstrap.
    
    Test:
    Delete the system-controller-subnet and
    system-controller-oam-subnet in a subcloud after bootstrap.
    
    Change-Id: Ied68bbfd83a0cc1c3bb0fb31ee55f924353fb4b5
    Story: 2008774
    Task: 42291
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit 754c6861ca33a581f3a32a7a353ec9672dd0c8b9
Author: Charles Short <charles.short@windriver.com>
Date:   Mon Apr 26 11:06:45 2021 -0400

Fix zuul errors due to changes in dependencies
    
    Pin hacking to < 4.0.1 to fix zuul gate issues.
    
    Test:
    Ran tox -e pep8 command to validate the flake8 job and result.
    
    Related-Bug: 1926172
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I74fec1c352b8947b58498e32b7554e54c77aaeaa

commit 38ae08893bcb76142da1c1b1133a29f8a767bf16
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Thu Apr 8 14:13:28 2021 -0400

Verify cert chain after adminep update
    
    This commit checks the validity of the certificate chain
    after admin-ep certificate is renewed on subclouds. If
    the check fails, the update_admin_ep_cert deletes the secret
    and subsequent audits will recreate. This check fixes the
    scenario where old ICA was used when updating &
    renewing rootCA.
    
    Closes-bug: #1923071
    
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>
    Change-Id: I112dd52e220dcc8bdb72c2c772ede72fbb786c7b

commit 3c90e3b949c5533feee25efedda114376f43a0ff
Author: jgauld <james.gauld@windriver.com>
Date:   Mon Apr 19 17:21:09 2021 +0000

Add lifecycle hook to allow to_app application-update semantic checking
    
    This adds new lifecycle update operation so that a semantic check lifecycle
    step is run against the "to" application during application-update.
    
    The application specific lifecycle code requires a custom semantic check
    using the new hook (e.g., similar to the following):
    
    if hook_info.lifecycle_type == constants.APP_LIFECYCLE_TYPE_SEMANTIC_CHECK:
        if hook_info.operation == constants.APP_UPDATE_OP:
            if hook_info[LifecycleConstants.EXTRA].get(LifecycleConstants.TO_APP, False):
                return self.update_check(app_op._dbapi, app)
    
    Testing:
    * VirtualBox: AIO-SX
    * Created example application changes for platform-integ-apps using
      semantic check shown above, and prototype update_check() routine.
    * Tested the following for both pass and fail semantic check cases:
      system application-update platform-integ-apps-1.0-29.tgz
    * Tested that apps on the application-update are okay on an upgrade,
      as invoked in upgrade-scripts/65-k8s-app-upgrade.sh.
      i.e., system upgrade-activate
    
    Semantic check passes, update proceeds to 1.0-29 "to" version,
    the 1.0-27 "from" version is cleaned up.
    Semantic check fails, application recovery proceeds to "from" version,
    the 1.0-29 "to" version is cleaned up.
    
    Example when semantic check fails,
    system application-list
    | platform-integ-apps      | 1.0-27  | platform-integration-manifest     | manifest.yaml \
    | applied  | Application update from version 1.0-27 to version 1.0-29 aborted. \
                 Application recover to version 1.0-27 completed. Please check logs for details
    
    sysinv.log
    sysinv 2021-04-19 09:32:53.422 3810926 INFO sysinv.conductor.kube_app [-] \
    Starting recover Application platform-integ-apps from version: 1.0-29 to version: 1.0-27
    
    Story: 2008829
    Task: 42311
    
    Signed-off-by: jgauld <james.gauld@windriver.com>
    Change-Id: If0787e3e3806bdf5dc175fde64ac63e1f38fd852

commit 251afbcdfb5a3f050921275bca97cb9de9553572
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Mon Apr 12 14:51:50 2021 -0400

Handle "updating" and "recovering" app update states during
    upgrade activation
    
    During simplex subcloud upgrade, if the initial upgrade
    activation request fails, dcmanager orchestrator re-tries the
    activation by sending another request (current logic is to
    retry up to 10 times). On the second upgrade activation
    request, sysinv skips the oidc-auth-apps reapply as the
    app is in 'updating' state and proceeds to completing the
    remaining steps of the activation sequence. As a result,
    upgrade activation completes while platform apps were either
    in the incorrect version or incorrect state.
    
    This commit resolves the issue by skipping only for "uploaded"
    and "applied" states during upgrade activation.
    
    Change-Id: I4b0aa4897e83a47ccdcf58c37232301f3668de32
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Story: 2008055
    Task: 42246

commit b1c8d95f2cf8965eccf3306d372a8c820edd6634
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Tue Mar 30 14:12:23 2021 +0300

Fix resize of drbd filesystems
    
    This commit adds the following modifications:
    
    - The drbd filesystems sizes are now calculated using 'dumpe2fs'
    utilitary because it gives better results for larger filesystems.
    
    - Added extra checks before and after executing 'resize2fs'.
    Before running 'resize2fs' check if the drbd device is resized using
    "/sys/block/{drbd_device}/size" and sector size.
    After running 'resize2fs', check if the filesystem is resized
    using 'dumpe2fs'.
    
    - The drbd filesystems were resized only if they were
    in SyncSource or PausedSyncS states. There are cases
    when drbd-overview showed “Connected” instead of this sync states
    and the filesystems would never be resized. Now
    the drbd filesystems will also be resized when they
    are in "Connected" state.
    
    Closes-Bug: 1921896
    Change-Id: I548300deb8916ce863bcd4bb70969cb9d51c9c2a
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit dd71459015c838dbd68ed90c5eb3ee16b5827004
Author: Pedro Henrique Linhares <PedroHenriqueLinhares.Silva@windriver.com>
Date:   Thu Apr 8 13:02:33 2021 -0300

Adding support for Ceph storage during Simplex to Duplex migration
    
    This change will allow users to migrate a simplex system with Ceph
    enabled to duplex via system modify command. During unlock it will
    get the cephfs filesystem pool names and generate the necessary hieradata.
    This file along with other puppet changes on stx-puppet will be used to
    perform necessary changes on the Ceph cluster to support a duplex
    configuration.
    
    Story: 2008587
    Task: 42079
    
    Signed-off-by: Pedro Linhares <PedroHenriqueLinhares.Silva@windriver.com>
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/783727
    Change-Id: Idaa7ebbf3a9c55658187e1d5ca6c357349659d43

commit ca6a15adffc06cd4979015dbe596458fdf309f9b
Author: Marcus Secato <Marcus.ViniciusCarvalhoSecato@windriver.com>
Date:   Mon Apr 19 20:32:54 2021 +0000

Revert "Adjust lock acquiring logic"
    
    This reverts commit cbb9121a289603ec003dec098b8fa5918ca98300.
    
    Reason for revert: reverting as this caused issues in DC upgrade
    
    Change-Id: Ie665b9c4e4d1280d7c8a0821cb7995c9374ce02f

commit a24cd707a5cafaa2d49b25d1d5285c5c8582eb5d
Author: Angie Wang <angie.wang@windriver.com>
Date:   Fri Apr 16 12:46:52 2021 -0400

AIO-DX: Controller-1 fails to be unlocked after downgrade
    
    During stx4.0 to stx5.0 upgrade, controller-1 fails to be unlocked
    after downgrade due to the incorrect disk partition and physical
    volume information stored in stx4.0 DB that causes the puppet
    manifest apply failed during unlock.
    
    This is because cgts-vg size is decreased in stx5.0 and after
    controller-1 is upgraded to stx5.0, additional partition and pv
    are created at stx5.0 side to match the size in stx4.0. However,
    controller-0 is still running stx4.0 DB and it gets updated with
    the new created partition and pv info sent from controller-1 sysinv
    agent audit.
    
    This commit updates to ignore the disk partition and physical volume
    information sent back from a different version during upgrade.
    
    Tested:
    - AIO-DX upgrade from stx4.0 to stx5.0, verified upgrade is completed
    - controller-1 downgrade after it is upgraded and unlocked, verified
      upgrade abort is completed
    
    Change-Id: I5d7858e4b29d096437a5ddf94cd78c74fadfacad
    Closes-Bug: 1924786
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 0271808d75fc12909904e27103a1fd3455df3f18
Author: Charles Short <charles.short@windriver.com>
Date:   Wed Mar 17 11:10:52 2021 -0400

cgts: Add missing dependencies
    
    Prettytable and six are required dependencies that were not listed
    in the requirements.txt.
    
    Story: 2006796
    Task:  42071
    
    tests:
    - Ran build test.
    - Ran basic smoke test with resulting RPMS and iso.
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: Ib047e61487ed26792930c80db167920afc21c66c

commit bc34618d83a5e38d765cbde0a91ebf384b8857b8
Author: Rafael Jordão Jardim <RafaelJordao.Jardim@windriver.com>
Date:   Tue Apr 6 07:44:06 2021 -0400

Python 2 to Python 3 compatibility
    
    The changes of https://review.opendev.org/c/starlingx/config/+/782575
    was used to test the cgts-client, so it is need to be merged first.
    Removing python-neutronclient because this dependency is unnecessary,
    it was removed by copying a few very small utility functions from
    python-neutronclient into the cgtsclient.
    
    Development: When I was trying to find things to modify I followed the
    approach of build the client, get the tar file, I set up 2 environments
    one based on python2 and another python3, I installed the tar client
    in both environments and i exported the env vars that the client expect
    to get to request the controller, and doing that I could switch between
    the two python and indentifying what I should modify.
    
    Test: After all the modification I built an ISO and I installed that
    to run some commands and check if my changes got any side effects. After
    that followed the procedure to update the remote CLI docker image and
    insert the updated client there and I test this new image in the
    remote CLI.
    
    Story: 2007106
    Task: 42268
    
    Depends-On: I5086832605752bdb00a40a24596494c8fd987692
    Signed-off-by: Rafael Jardim <rafaeljordao.jardim@windriver.com>
    Change-Id: Ibf919260693f1cbe99993d1de01ecf785d604839

commit 8cc522bed57aac4699fed1b2cdf7cf4b213a4dc7
Author: Adriano Oliveira <adriano.oliveira@windriver.com>
Date:   Tue Apr 13 19:21:54 2021 -0400

sysinv-api script to return ERROR
    
    Remove NOT_RUNNING return in case sysinv-api pid is still active but
    fails to ping. This scenario can happen in case sysinv-api is manually
    killed and SM triggers a start.
    Also, on the new routine to check if sysinv-api is properly replying,
    return ERROR instead of NOT_RUNNING for SM consistency.
    
    Testing:
    Tested double sysinv-api kill withing 90 seconds as per SM
    configuration. Upon first sysinv-api kill it should be restarted,
    upon second sysinv-api kill, if done within 90 seconds, a swact is
    triggered. Also verified that sysinv-api request routine was engaged.
    Also tested AIO-DX boostrap, manual swact via host-swact and patch
    application. After boostrap and patch application tested double
    sysinv-api kill and manual swact.
    
    Other fix related to this issue was addressed by this change:
    https://review.opendev.org/c/starlingx/stx-puppet/+/783980
    
    Closes-Bug: 1893669
    Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
    Change-Id: I1b1ab0560237f602dadf074331f6a165d12330c7

commit 7ce3d16eeaa4c186e70aad56a1c92a8279dd0aae
Author: Bin Qian <bin.qian@windriver.com>
Date:   Thu Apr 8 11:08:27 2021 -0400

Add sysinv-reset-n3000-fpgas cmd
    
    When AIO runs single manifest, reset N3000 FPGA needs to complete
    without docker local registry and other SM managed services.
    
    This adds sysinv-reset-n3000-fpgas cmd for puppet to reset
    N3000 FPGAS at host start-up.
    The sysinv-reset-n3000-fpgas cmd separates the function of
    reseting n3000 fpgas from sysinv-fpgas-agent as
    sysinv-fpgas-agent has dependency to rabbit, which is not
    available until manifest completes.
    
    Change-Id: Ic3c4b2a00515d194793257729362f71e2951286c
    Partial-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit 6acd2e3564d3d708e496c1a7e78b064419f1fdbf
Author: Bin Qian <bin.qian@windriver.com>
Date:   Tue Feb 23 12:59:28 2021 -0500

Single puppet manifest for AIO controllers
    
    Create a single puppet manifest for AIO controllers.
    This change includes:
    1. remove workerconfig from an AIO controller deployment
    2. running puppet based on subfunctions of the nodes
    
    Depends-on: https://review.opendev.org/c/starlingx/stx-puppet/+/780600
    Partial-Bug: 1918139
    Signed-off-by: Bin Qian <bin.qian@windriver.com>
    Change-Id: Ie3693219e3c19460ac5b617cc216cbc809ec2403

commit ad8567f06485a10edf3857fbc87ae7d3058a1dfc
Author: Gustavo Santos <gustavofaganello.santos@windriver.com>
Date:   Tue Apr 13 16:09:21 2021 -0300

Restart tiller on openstack pending install check
    
    This is another attempt at fixing the same bug as the merged review
    https://review.opendev.org/c/starlingx/config/+/783472 had tried, since
    there were reports indicating that the bug would still occur on certain
    setups.
    
    This patch explicitly forces a tiller restart when catching the first
    HelmTillerFailure exception caused by the broken pipe error, instead of
    only trying to rerun the 'helm list' command, which was believed to be
    a reliable workaround to the problem, but didn't solve it in every
    possible scenario.
    
    Closes-Bug: #1917308
    Signed-off-by: Gustavo Santos <gustavofaganello.santos@windriver.com>
    Change-Id: I38667609173ca5c6fed028f75742ae99efedf149

commit 5a9582ac5ca84db7dfaaa2474f05d373224f3d63
Author: jgauld <james.gauld@windriver.com>
Date:   Tue Mar 16 06:40:49 2021 +0000

Platform support for application upgrades
    
    This enhances the application management component of sysinv to
    support the upgrade of both platform and non-platform applications.
    
    This supports new application metadata:
      upgrades:
        update_failure_no_rollback: <true/false/yes/no>
        from_versions:
          - <version.1>
          - <version.2>
      supported_k8s_version:
        minimum: <version>
        maximum: <version>
      supported_releases:
        <release>:
          - <patch.1>
          - <patch.2> ...
    
    The processing of patch_dependencies is deprecated in favour of
    supported_releases.
    
    This enables applications to specify:
    * required patches such as a platform patch_level which
      the application can run on for a specfic release
    * supported K8s Versions which the application can run on
    * app releases that this version can upgrade from
    * application update failures do not rollback
    
    Application upload and updates check against this metadata.
    
    System health pre-upgrade checks make sure that:
    * all applications are in a valid state for the upgrade
      (i.e. uploaded or applied)
    * active controller is controller-0
    
    Testing:
    - Configs: AIO-SX, AIO-DX, Standard
    - new metadata format validation with valid/invalid input
    - upversion platform-integ-apps tarball with new metadata,
      and do an application-update
      e.g.,
      system application-update platform-integ-apps-1.0-22.tgz
      * applies if the new metadata not provided
      * blocks update if from_versions, supported_k8s_releases,
        or supported_releases criteria not met
    - normal operation of system application-remove/application-delete
    - system application-upload should apply/block based on metadata
    - trigger 'no_rollback' handling by forcing an application
      rollback using a small manifest.yaml timeout and too many
      replicas that cannot be deployed
    - AIO-DX after load-import, swact to controller-1, verify
      that upgrade is blocked
    
    - perform upgrade of current N to N+1 release
      * upversion platform-integ-apps in N+1 release with new metadata
      * import new load containing new metadata
      * system health-query-upgrade; should indicate fail and failure
        reasons based on new criteria
      * follow 'system upgrade' procedure; should block upgrade based
        on new criteria
    
    Story: 2008055
    Task: 42179
    
    Signed-off-by: jgauld <james.gauld@windriver.com>
    Change-Id: I93a6f2ada7ce52414190948cbc458f6295c50603

commit d1a13c9b3f3458e9744cd5eed675157befa45c77
Author: Gerry Kopec <gerry.kopec@windriver.com>
Date:   Thu Apr 8 19:48:13 2021 -0400

Increase platform memory reserve on AIO hosts
    
    Increase memory reserve for AIO by 1000 MiB.  Total memory reserve for
    AIO in MiB will be 5000 + 2000 + 1000 * number of numa nodes.  This will
    give more headroom for AIO-SX subclouds with a single numa node.
    
    Closes-Bug: 1923399
    Change-Id: I433548792504f783a44a80d5099d93c5bee15ed7
    Signed-off-by: Gerry Kopec <gerry.kopec@windriver.com>

commit 5b73ac5813e889ddcd0feb65f96eff0ac789e69a
Author: albailey <Al.Bailey@windriver.com>
Date:   Mon Apr 12 08:49:08 2021 -0500

Eliminate sdist step from sysinv zuul
    
    Zuul fails on setting up pbr randomly in the sdist step.
    It is unclear if the reason is that something is source
    packaged during tox which conflicts with another zuul job
    of a different interpreter, so simply disabling source
    dist generation.
    
    Most openstack projects are configured like this.
    
    Partial-Bug: 1922590
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ieb4cf4113c07a0e166c001f124d920ba7118afb3

commit cbb9121a289603ec003dec098b8fa5918ca98300
Author: Marcus Secato <marcus.viniciuscarvalhosecato@windriver.com>
Date:   Thu Apr 1 16:30:13 2021 -0400

Adjust lock acquiring logic
    
    When locking the file descriptor skip_udev_partition_probe was not
    handling errors thrown by fcntl.flock which was leading controller-0
    to degraded state after unlock. This change aims to strengthen that
    logic by handling the error properly, retrying the lock operation and
    improving logs.
    
    Closes-Bug: 1922256
    
    Signed-off-by: Marcus Secato <marcus.viniciuscarvalhosecato@windriver.com>
    Change-Id: I000367668744a4e92e20ff9d3f1f8cd717883a46

commit 0e1bf356139abce842e61ede936d3f9a8ddc122e
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Tue Feb 2 20:55:27 2021 +0200

Send application lifecycle notifications for backup and restore.
    
    Implements backup, etcd-backup and restore hooks.
    Operations can fail so there is a second parameter,
    'success' that is used to notify applications if an operation
    failed.
    Restore hooks are in place but not used by ansible playbooks.
    Separate semantic check action.
    Revert backup operations by keeping a list of all pre- operations
    with their associated 'revert' action and also, keeping in mind
    the logical order of the pre-backup and pre-etcd backup operations.
    
    Story: 2007960
    Task: 40769
    
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: I0ebab45f4846cbcd25fecac6bf99195d9047eb8a

commit d7909c05221ecf658fd92d63f0dfcfbe2a170e3b
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Mar 30 14:09:25 2021 -0400

Function to create gnp after oam interface-network add
    
    Adds the initialize_oam_config function to manager.py which allows
    the correct flag to be set when creating a new oam interface.
    This flag is used to trigger the platform::firewall::runtime
    puppet manifest that sets up the hostendpoints and globalnetworkpolicy
    in kubernetes.
    
    Tested on:
    AIO-SX
    AIO-DX
    Standard
    
    Closes-Bug: 1911213
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: I34331ae6ad54ee4ce564467616c84931d2ae245a

commit 1e97fb2398f1ecda43bfd4efb182647b4d719e90
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Fri Apr 9 18:27:40 2021 +0300

Add code to handle Kubernetes validating webhooks
    
    This commit will add new functions to list and delete
    'ValidatingWebhookConfiguration' objects using the Kubernetes api.
    
    Partial-Bug: 1923185
    Change-Id: I648e940f8104307e111213afd511f8fca19e39ab
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit f84ae1ecefb1e9e8f3f05216e80b6e0cdc36ef0c
Author: Adriano Oliveira <adriano.oliveira@windriver.com>
Date:   Fri Mar 26 18:38:30 2021 -0400

sysinv-api OCF script API request
    
    In order to avoid other services requesting to sysinv-api before it is
    ready to handle request'; and to avoid individual services to
    implement their own retry logic, sysinv-api OCF script was changed
    to verify if sysinv-api is providing service before returning it is
    ready.
    
    Testing:
    1. Bootstrap of AIO-DX
    2. host-swact on AIO-DX
    3. Upgrade path on AIO-DX, 20.06 to 21.05 (including host-swact)
    4. subcloud bootstrap
    5. Double sysinv-api kill causing swact
    
    In all tests, the logs are verified to confirm the retry logic is
    engaged and sysinv-api is properly started, also cert-mon.
    
    Closes-Bug: 1913455
    Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
    Change-Id: Ia17e9f7a15602c0cc52cb01896fac42ce4fcdcb9

commit 4314755481607a4aa7ea4ca25ccc5c22c2730956
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Tue Mar 30 13:40:43 2021 -0400

In AIO-SX, allow SRIOV VF parameters modify without host lock
    
    During runtime, user will be able to: modify an interface to be an
    SRIOV PF interface while specifying VF parameter --vf-driver and add
    new SRIOV VF interfaces while specifying VF parameters --vf-driver
    and --max-tx-rate
    
    Will not be permitted on an unlocked AIO-SX host to modify the VF
    driver and/or VF rate limit of an existing SR-IOV interface
    
    Story: 2008531
    Task: 42203
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/784761
    Change-Id: Icdb75df45ab2d6b31ecd60ab6c18e8042a7ee99b
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit 38c0ae47c3302ce956c65478281fccd1d8a50ebb
Author: Robert Church <robert.church@windriver.com>
Date:   Thu Apr 8 00:59:50 2021 -0400

Deactivate app plugins when update fails prior to recovery
    
    When an application update fails and the previous version is recovered,
    the plugin infrastructure for the new app version is not fully cleaned
    up during recovery. If another application update is attempted, plugin
    activation/deactivation becomes out of sync and results in a KeyError
    when attempting to enable the new app plugins for update.
    
    This update will:
     - ensure that the failed app plugins are disabled prior to cleanup
       during application recovery
     - catch the KeyError if a similar situation occurs, log a message, and
       continue to cleanup the working set for the plugins
    
    Change-Id: If58942cd9342802bfd2055152c2f2d6289054084
    Closes-Bug: #1923004
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 1c69d99d4ffb7d26296f0f3ef66830b4e64ee430
Author: Gustavo Santos <gustavofaganello.santos@windriver.com>
Date:   Fri Mar 26 18:11:47 2021 -0300

Restart tiller on openstack pending install check
    
    When the armada-api pod, which runs helmv2-cli, goes up, it connects
    tiller to a postgres instance running on the active controller. It does
    that using the active controller's floating IP address. On setups with
    more than one controller, this creates an issue where that connection
    is no longer valid after performing a host swact, since it still points
    to the old controller.
    
    After the swact, the first helmv2-cli command will fail with a broken
    pipe error. One of the steps in the Openstack installation involves
    checking the system for pending helm installations, which will run a
    helmv2-cli command, causing the application operation to fail if ran
    right after a host swact.
    
    This patch is a workaround to that problem. It forces a tiller restart
    by catching the first HelmTillerFailure exception caused by the broken
    pipe error and retrying the operation, which then will reestablish the
    connection between tiller and the correct instance of postgres.
    
    Closes-Bug: #1917308
    Change-Id: Ia09f9d2844611471314d5d3af70e9bbb0938437c
    Signed-off-by: Gustavo Santos <gustavofaganello.santos@windriver.com>

commit 3ad0dc17e7e5cd2de62f736db100a3fe6a38d928
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Fri Apr 2 14:05:57 2021 -0400

Accept additional attributes in addrpool-add CLI
    
    This commit adds additional attributes to enhance the current
    addrpool-add CLI. With this change, the addrpool-add CLI can create
    the addrpool with additional values including: 'floating_address',
    'controller0_address', 'controller1_address' and 'gateway_address'.
    
    Test:
    Fresh install on an AIOSX, add addrpool with or without these
    attributes successfully by system addrpool command.
    
    Story: 2008774
    Task: 42206
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>
    Change-Id: I80955180e90accdcb6ce0c65dded2986f4a9c8ec

commit 1128b30ce6abd878c1a3524fc584ddc42ba4cec4
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Thu Apr 1 12:20:35 2021 -0300

New method to calculate replicas for armada apps
    
    Introduces _num_replicas_for_platform_app as a replacement
    for _num_provisioned_controllers which is used to set the
    number of replicas used by an platform armada application.
    
    The new method will use the same unrderlying logic but will
    never return a value less than 1. This will prevent having
    the replicas set to 0 when there are no provisioned
    controllers.
    
    Tested with unit tests and by checking the replica count
    after a host lock/unlock cycle.
    
    Partial-Bug: 1922278
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: If322ff5d02996c9b853bc350244899c5e22431a2

commit ead91ced1d55c986ec0f828124bfe7b7beb5ad59
Author: Melissa Wang <melissa.wang@windriver.com>
Date:   Wed Mar 31 12:13:43 2021 -0400

AIO-SX to DX: Check cluster-host interface config
    
    This update add a semantic check to ensure that the cluster-host
    interface exists and is not configured on loopback before allowing
    the AIO-SX to DX migration to begin.
    
    Story: 2008587
    Task: 42188
    
    Change-Id: Ic86ef966536ec41e495fb47e8e83682c1d03402b
    Signed-off-by: Melissa Wang <melissa.wang@windriver.com>

commit bb6e8db56979f4ff741d6d7fe3304d1f9c45a99c
Author: Robert Church <robert.church@windriver.com>
Date:   Tue Oct 27 17:12:56 2020 -0400

Add sysinv support for kubernetes to ignore isolcpus
    
    Based on the kube-ignore-isol-cpus host label which will generate
    /etc/kubernetes/ignore_isolcpus, also adjust the k8s_all_reserved_cpuset
    passed to puppet if the label exists.
    
    Basically if the label is set, we want to pass the isolated CPUs to
    kubernetes as though they were regular application CPUs.
    
    Story: 2008760
    Task: 42167
    Change-Id: I065ca40fbd3395bf86a02a7822c1f9d46ee3fe06
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit d7c5a54ab94bce6635b83d91a807d28f97836a81
Author: Charles Short <charles.short@windriver.com>
Date:   Wed Mar 17 13:26:38 2021 -0400

sysinv: Cleanup requirements
    
    This commit does several things at once:
    
    - Drop argparse, its no longer required.
    - Add missing oslo.log, its been pulled in by other requirements
      but explicitly include it.
    - Use rfc3986 instead of Django, the requirement seem to be
      a bit heavy to me, to check for a valid URL. Added unittest.
    - Move run time requirements out of test-requirements.txt
      that are actually run time requirements for sysinv.
    - Also remove python2-mox from the centos7 spec file since
      its no longer required.
    - Updated debian/control and opensuse rpm as well.
    
    tests:
    - Did basic build test.
    - Ran smoke test with resulting RPMS and iso.
    
    Story: 2006796
    Task: 42072
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I591ed52937092233c3ac7c2c2bf847b2b0c3f690

commit 9ed6740aebad0a6344e2490e89ade7afc3b26d9a
Author: Charles Short <charles.short@windriver.com>
Date:   Tue Mar 23 11:49:05 2021 -0400

py3: Fix for python2/python3 compatibility
    
    - Replace 'range' with six.moves.range.
    - Replace 'zip' with six.moves.zip.
    - Reaplace 'map' with six.moves.map.
    - Replace a/b with a//b to use interger division on python3.
    - Replace dict.keys() with list(dict.keys()) to get a list on Python
      3. On Python 3, dict.keys() now returns a view.
    
    test:
    - Changes were tested by running tox py27 and py36.
    - Build iso with added changes and ran some basic functionality
    
    Story: 2006796
    Task: 41887
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I5086832605752bdb00a40a24596494c8fd987692

commit 4cc0fe7332af99a68d4a94115e3cb9ef8126a518
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Sat Oct 24 17:23:00 2020 -0600

add sysinv support for specifying cpu function by range
    
    In order to add flexibility we want to allow specifying CPU
    function by range, rather than just by count.
    
    This will allow us to run something like this on the CLI:
    
    system host-cpu-modify -f application-isolated -c 3-5,25 controller-0
    
    There are a couple complications to be aware of.  First, sysinv will
    NOT automatically add any missing SMT hyperthreads if the host has
    hyperthreading enabled.  Second, when specifying CPU function for
    a different function via the CLI the range specification is lost and
    gets converted to a simple count.  This implies that (for the CLI)
    only one function can support a range-based specification, and it
    must be specified last.
    
    Story: 2008760
    Task: 42180
    Change-Id: Id21d9968b6b0b59e163f42098be7a6f0e6ef739d
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 012f5b82eee70fc838eba43729951491f8603f4e
Author: MCamp859 <maryx.camp@intel.com>
Date:   Fri Jan 22 17:05:07 2021 -0500

Remove SNMP APIs from user doc
    
    Deleted APIs for SNMP Communities and Trap Destinations.
    Added pointers to Fault Management guide.
    
    Story: 2008132
    Task: 41396
    
    Change-Id: Ic2c52bf1b11d1793d57c78264e757795af1deff3
    Signed-off-by: MCamp859 <maryx.camp@intel.com>

commit 5a9ea65db6e5749de61100713994b937e2010438
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu Mar 11 15:13:56 2021 +0200

Detect active/standby switch
    
    A swact can occur outside of a controlled maintenance action.
    An unexpected swact needs to be detected to let apps reapply evaluation
    happen.
    
    A swact is defined as follows:
    - sysinv-conductor runs only on active controller
    - hostname is used to determine where sysinv runs
    - sysinv-conductor switched controllers
    
    The detection can be run only once when conductor is restarted.
    The detection picks up both expected(system host-swact) and
    unexpected(reboot) swacts.
    A new trigger(APP_EVALUATE_REAPPLY_TYPE_DETECTED_SWACT) for evaluating
    apps reapply is used.
    
    Story: 2007960
    Task: 42053
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I48126887b0003a6a1a05ddd23f0429ef79a39832

commit 7343c40f99f861c30797b53b57c5afad58ac057d
Author: Melissa Wang <melissa.wang@windriver.com>
Date:   Tue Feb 9 08:56:12 2021 -0500

Add support for AIO-SX to DX migration on subcloud
    
    This update allows the user to change the system_mode on a subcloud
    from simplex to duplex using the system modify command. The sysinv DB
    and the platform.conf will be updated with the new system mode. The
    semantic checks were modified to ensure that changing from duplex to
    simplex is prohibited. The changes also include support for updating
    the OAM networking config using the oam-modify command.
    
    Story: 2008587
    Task: 41885
    
    Signed-off-by: Melissa Wang <melissa.wang@windriver.com>
    Change-Id: If7c14222ca66323225400ed88f214655f33fe615

commit 3d42b28c77e47c4ce217aa746c786503df9d0818
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Tue Mar 2 01:09:51 2021 +0200

Make the placeholder db entry unique for remote URLs.
    
    The original solution removed the  'app-name-placeholder' dummy
    entry if download failed but it was not approved because
    sysadmins expect this and manually removing the dummy placeholder
    app is advised.
    
    This patch simply sets an unique name for the placeholder using
    the first 16 characters for the URL md5sum as a postfix in case
    there are multiple application-upload performed in sequence (
    by a script, for example)
    
    Closes-Bug: 1917374
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: Ib5db12bb23a0e7cce52596532e661d12092ea1d1

commit f8f826f960ca0359d0dfb62d7237beaf3ea07bcb
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Thu Mar 11 05:52:11 2021 -0500

Allow modification of OAM IP addr, in AIO-SX, without locking the host
    
    Users will be able to change the OAM IP address without a lock/unlock
    cycle. To achieve this some services will be restarted (sm, sm-api,
    haproxy and vim-webserver) to reopen the L4 ports using the old IP
    address as part of the socket.
    
    Some config files in /etc are being updated also with the new address.
    
    Story: 2008531
    Task: 42060
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/780955
    Change-Id: I9e77fc60882f20d4f31c3e38b5305b1f207f40d9
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit c11bd71fb1c7aadf9542cdd2a79d3c045c5c2b44
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Thu Mar 25 23:37:17 2021 -0400

Update psp migration script used in upgrade
    
    This commit updates the migration script used to apply the
    pod security policy (psp) configuration during an upgrade.
    The change is necessary to accommodate newly added psp
    ClusterRoleBinding configurations.
    
    Closes-Bug: 1885716
    Depends-On: https://review.opendev.org/c/starlingx/ansible-playbooks/+/782325
    
    Change-Id: Ibfdfd51e588eb2ad47c9f1c116875d01a2f06502
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>

commit 3f6c732939b8fc857c932ee05b3418545ccae8f1
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Fri Feb 12 16:20:37 2021 +0200

Regenerate the correct plugins path.
    
    For some reason, if the application to be uploaded is a (remote) URL,
    parsing the manifest if deferred (presumably not to block for large
    files and/or slow networks) and a dummy 'app-name-placeholder' is used
    until later, when the file is unpacked and its manifest read.
    
    Closes-Bug: 1915518
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: Ic3929965ea931b117c3e5aab6f8e3f128bbbeb56

commit e2e7d45d0e5a84f7ef345a5a6561c5370a85a5fd
Author: Tao Liu <tao.liu@windriver.com>
Date:   Wed Mar 24 20:35:22 2021 -0400

Fix dcorch subcloud audit issue after the upgrade
    
    Dcorch-engine stops auditing the subclouds after the upgrade.
    This is because audit_status of subcloud_sync data was not
    set during data migration.
    
    This update sets audit_status to initial state of “none”.
    
    Test: Upgrade controller-1, and then swact to controller-1.
          Verify the dcorch-engine audits subclouds.
    
    Closes-Bug: 1920962
    
    Signed-off-by: Tao Liu <tao.liu@windriver.com>
    Change-Id: If8fa6c5e1c1d1a81104976cb3e527c4095dd97f7

commit ed127df6ad9b8863080ef6cf5f44be2e5966518f
Author: Charles Short <charles.short@windriver.com>
Date:   Mon Mar 8 09:01:03 2021 -0500

Remove oslo-incubated version
    
    A long time ago oslo-incubated code was used to build the
    individual projects the same way. Now the openstack projects
    use pbr to build the python projects.
    
    Remove oslo-incubated version code, it is not being used anywhere
    so just remove it. Unit tests run fine when this module has been
    removed.
    
    Story: 2006796
    Task: 42010
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: Ib11d69210878f38febf2d031b083a1ad85fec30c

commit 08c14894f3bf7fb6645707f689394dded27800ea
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Wed Mar 10 09:45:24 2021 -0500

Add bond option primary_reselect
    
    This update is to allow the option primary_reselect configurable for
    aggregated ethernet interface. The option is to prevent reverting
    between the primary slave and other slaves.
    
    Story: 2008706
    Task: 42057
    
    Change-Id: Icacc0bd2d5e42bf2e5db1505fd676c628dbe3ed1
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 7dcdfaae898960c8f6ea7dbee892c118a1b80318
Author: Andrei Grosu <andrei.grosu@windriver.com>
Date:   Mon Mar 1 20:45:06 2021 +0000

Progress adjust metadata refactoring
    
    Changed the name of the constant and the yaml key to better reflect
    the purpose. Now the value is an integer which represents the
    adjustment value used to compute the percentage completion when
    applying charts. Cleaned up the code around the usage of the value
    and computing the percentage.
    
    Story: 2007960
    Task: 41959
    
    Signed-off-by: Andrei Grosu <andrei.grosu@windriver.com>
    Change-Id: Ia3b07b83762cdf20f6809222dc687f67c15deee5

commit 368f5ce3217314acea4253cf2ade25d9d6684580
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Thu Feb 11 12:51:31 2021 +0200

Enhance maintenance semantic checks with app hooks
    
    Let apps run semantic checks for lock and unlock actions.
    Let forced actions not run the app semantic check.
    Create unit tests for allowing and rejecting the action by an app.
    
    Story: 2007960
    Task: 41842
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: Ibe35c917cd5702031a56baf3059b70e0e2e59480

commit b3b28f56e8b52727575667b62a9d10dd409d7d18
Author: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
Date:   Fri Mar 5 10:40:14 2021 -0500

Force remove or delete application
    
    Adding the functionality of using the flag -f or --force with
    system application-remove or system application-delete
    
    Story: 2007960
    Task: 42016
    Signed-off-by: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
    Depends-On: Ia1017b7eff0d9bd73b6448f2c4790f7e2b89c828
    Change-Id: If68d66d799addcd996da4b146d092c855b455aa3

commit 8f0312b5184f9292dbea878d8b4da2e3ef2e786c
Author: Daniel Pinto Barros <DanielPinto.Barros@windriver.com>
Date:   Fri Feb 19 10:31:43 2021 -0500

Introducing GEO location new fields for System
    
    New fields was created for the system object.
    Changes was made to include GEO location attributes (latitude,
    longitude) to the system object and adding a way to retrieve and
    modify those attributes using the API and CLI.
    Updates on: DB system model; DB migration; System object fields;
    API fields; CLI fields; API documentation.
    
    Story: 2008570
    Task: 41721
    
    Signed-off-by: Daniel Pinto Barros <DanielPinto.Barros@windriver.com>
    Change-Id: I86f124c44d80896427e3ac1bc799fe34588ae942

commit fa1622ef5f80950928f8f2447f9c0e178e797704
Author: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
Date:   Fri Mar 5 13:23:14 2021 -0500

Prevents critical apps from being removed
    
    If an app has a metadata stating remove is prevented/forbidden then
    "system application-remove" for that app  will be rejected
    
    Story: 2007960
    Task: 42005
    Signed-off-by: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
    Change-Id: Ia1017b7eff0d9bd73b6448f2c4790f7e2b89c828

commit da2bc95fbcc359f9ad4e719573f19fc2bcd0afd1
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Thu Mar 11 13:28:55 2021 +0200

Change enable_secured_etcd.yml variable in upgrade
    
    The enable_secured_etcd.yml playbook will use the
    cluster_floating_address variable instead of
    default_cluster_host_start_address. So we change
    the upgrade script accordingly to use the new
    variable.
    
    Closes-Bug: 1918130
    Depends-On: I8fecc1e5e54b5a9a9a72a54c069f79f5f2d434ba
    Change-Id: I8c9fd36e1104d4713bb748a57193530a0c4b458a
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit cc27551a8e797398527a18815838ff467c7c0a39
Author: Charles Short <charles.short@windriver.com>
Date:   Wed Feb 17 13:05:30 2021 -0500

Remove unsafe umask usage
    
    The sysinv code runs under eventlet that causes the
    running greenthread to swap out the original umask. This
    results in the sysinv code running with the incorrect umask.
    
    This can be demonstrated by the "system dns-modify" command,
    the agent process starts with a umask of 022, switched to 0,
    and is never restored.
    
    This simple fix is to audit where os.umask is being used and
    replace os.umask with os.chmod.
    
    Testing has been done locally by:
    
    1. Running the "system dns-modify nameservers=8.8.8.8,8.8.4.4" command
       and checking the results/permissions of /etc/resolv.conf. Also,
       cheked to see if the umask flag have been changed in /proc/XXX/status
       before and after running the command.
    2. Running an "system applicaton-upload" command on an installed helm
       armada package, these are located in /usr/local/share/application/helm.
       After the application upload, the application-apply should be
       "applied" without error/failure as shown in "system
       application-list".
    3. Running a distributed-cloud and checking for any errors. The command
       "dcmanager subcloud show <subcloudname>" should show the identity
       service in sync after the dcmanager subcloud manage <>".
    
    Closes-Bug: 1915955
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I16ce695cfc4f6fb496ac0b3287906cc968ec5e98

commit c72417aedeb9d300ca256289a596a5797c863b51
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon Feb 8 13:27:35 2021 +0200

Implement algorithm for reapply evaluation priority
    
    Implement algorithm to determine app priorities for reapply evaluation.
    Use information provided in metadata to create a directional graph.
    Detect cycles and abort.
    Unit tests added.
    
    Tests: AIO-SX, AIO-DX
    Apps are correctly ordered for reapply evaluation.
    Applications reapply order: [u'cert-manager', 'rook-ceph-apps',
    'platform-integ-apps', 'oidc-auth-apps', u'stx-openstack']
    
    Story: 2007960
    Task: 41781
    
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I375a90b746a0ff4c970305a26c2e3e061b14454e

commit 4face8a656ac63d110a2c1d8ce4efd98af771c47
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Tue Mar 2 22:03:53 2021 +0200

Rework platform managed apps
    
    At the moment the managed apps are hardcoded.
    This behavior needs to be changed.
    
    Let apps specify in the metadata if they want to be managed or not.
    Let apps specify in the metadata the state they want to achieve.
    Create column in kube_app table to store metadata. This will be read
    when conductor is restarted.
    
    Tests:
    Install AIO-SX and AIO-DX, apps achieve the state described in their
    metadata file.
    Restart conductor, metadata gets picked up from the database.
    Do system application-remove, app gets auto-applied.
    Do system application-delete, app gets auto-uploaded.
    
    Story: 2007960
    Task: 41780
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I881716049471183cfd1179ab0558a557c8d104d8

commit a87694bf5e0362a5466b84c4c6638ffd453e9520
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Thu Mar 4 11:10:31 2021 -0500

Use http_port from conf file for fpga-agent
    
    The sysinv-fpga-agent is modified to use the http_port parameter
    from the platform.conf file.
    If a device image update operation is in progress, the http_port
    service parameter modification cannot be applied.
    
    Story: 2007875
    Task: 41969
    
    Change-Id: I41e795606535d91131b96a014b07bf18f0032d57
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit ff3ce494eea4fb37a97897a7cceac70e60b2f727
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Fri Feb 26 17:01:26 2021 -0600

Notify dcmanager when upgrade completed
    
    When an upgrade has been completed we want to notify dcmanager
    so that it can do a load audit of the subclouds rather than
    waiting up to an hour for the normal load audit to run.
    
    Story: 2007267
    Task: 41967
    Depends-On: https://review.opendev.org/c/starlingx/distcloud/+/778338
    Change-Id: I0c03bbfa16745fa297e159256a284e8862ff926a
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit b9cd8ec6de2b86a966c062c0db45a8b90c041526
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu Mar 4 09:11:30 2021 -0600

Upgrade activation interrupted by host-swact
    
    During an upgrade-activate the upgrade scripts can be interrupted by a
    swact. We need to block the swact during the activation. If a swact does
    occur we need to reset the upgrade state so the activate can be
    attempted again.
    
    Closes-Bug: 1917779
    Change-Id: I9274319375296b2334533e386629d185e2b472ac
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 69b62362c75cba883446499c5a97a984293d2eca
Author: Litao Gao <litao.gao@windriver.com>
Date:   Thu Mar 4 03:19:06 2021 -0500

Update api-ref with modified interface configuration
    
    1. allow creation of ethernet interface using sriov interface
    2. max_tx_rate options for sriov vf interface configuration
    
    Story: 2008470
    Task: 41987
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: Id64248060b57d1778a455637ba9cf70d680456e5

commit 505e93b4c0d34ddf9e3d24769300d9adeace8da7
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Wed Feb 17 20:38:44 2021 -0300

Fix handling of expired watch event in cert-mon.
    
    The old code did not account for a type='ERROR'
    being received by the watch stream. The new code
    checks if the received event is an error and returns
    from the infinite loop to start the watch from scratch.
    
    Closes-Bug: 1914408
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: I7feabe5b550979d3761427ae501f1a94903a8983

commit 38cc2e8f4f266125fb08a71e3531a774b433f485
Author: Tao Liu <tao.liu@windriver.com>
Date:   Tue Mar 2 16:17:18 2021 -0500

Allow routes to be configured on oam interfaces
    
    This update adds NETWORK_TYPE_OAM to ALLOWED_NETWORK_TYPES
    for configuring routes.
    
    Test: Configured routes(add/delete) through the system CLI
          against the OAM interface.
    
    Story: 2007267
    Task: 41971
    
    Signed-off-by: Tao Liu <tao.liu@windriver.com>
    Change-Id: I053ce52a760fb2b20d81d0b250bde6d9902ddaa2

commit 240be6fee8cedec4a0f36d1d624716b0766bcea2
Author: Chris Friesen <chris.friesen@windriver.com>
Date:   Mon Mar 1 17:50:38 2021 -0600

move rest_api to common code
    
    The rest_api.py function is potentially useful for all sysinv code,
    so move it to the "common" subdirectory to make this more clear.
    
    Story: 2007267
    Task: 41967
    Change-Id: I3ef2b4144f85ad6e3533e0236f4afb83bdae707e
    Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

commit 890b1208ca2ff67ad32228be4be7671f673f4a90
Author: Babak Sarashki <babak.sarashki@windriver.com>
Date:   Wed Feb 17 15:28:19 2021 +0000

config: Add global service parameter to set cri handler
    
    This commit adds global service parameter "container_runtime" to
    allow setting container runtime interface (CRI) entries in the
    containerd configuration file for custom runTimeClass.
    
    An example usage to set the cri:
    
    system service-parameter-add \
      platform container_runtime \
      custom_container_runtime=my_crihandler:/absolute/path/to/my_criBinary
    
    Story: 2008434
    Task: 41390
    
    Signed-off-by: Babak Sarashki <babak.sarashki@windriver.com>
    Change-Id: Icc5fd16682f4cf47abff16e20a5332fc195c4afc

commit 68bb5ecc2b1de4ad1ece4280f538d9ad01442bb8
Author: Litao Gao <litao.gao@windriver.com>
Date:   Fri Feb 12 11:45:15 2021 -0500

Fix miscalculation of the available link speed bandwidth
    
    Previous implementation has two issues:
    1. forgot to multiply the max_tx_rate by vf number
    2. missed the case of vf subinterface modification
    
    Story: 2008470
    Task: 41508
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I4d62ed177124fb2c4ddb547a37d9ea988a332dbd

commit 790b758fe4a87227cd6b76a97b301df744f6b53f
Author: Litao Gao <litao.gao@windriver.com>
Date:   Sat Feb 13 02:40:15 2021 -0500

Need to check MTU setting for the ethernet type subinterface
    
    Previous implementation only need to check the MTU for VLAN
    and VF type subinterface, since we've introduced ethernet type
    subinterface which can be created on pci-sriov interface, also
    need to add the MTU check for ethernet type subinterface.
    
    Story: 2008470
    Task: 41505
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I1e38cb63496013539b91c749076e4cbb5a951bfc

commit e86cc068932f953b34ba212e21dc7849f0f75325
Author: Litao Gao <litao.gao@windriver.com>
Date:   Tue Feb 9 10:59:45 2021 -0500

Remove the checks which restricts vf subinterface creation/modification
    
    Allow pci-sriov class and vf type subinterface creation/modification using
    the pci-sriov interface when it has already been used by other VLAN interface
    
    Story: 2008470
    Task: 41505
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I51d936530a93e3a9e6ac81586c16df4b7342a180

commit b27f224803741b7e9b236671643acbde238578ac
Author: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
Date:   Thu Feb 11 13:40:47 2021 -0500

Prevents critical apps from being deleted.
    
    If an app has a metadata stating deletion is prevented then system
    application-delete will be rejected
    
    Story: 2007960
    Task: 41882
    Signed-off-by: Suvro Ghosh <suvrojeet.ghosh@windriver.com>
    Change-Id: I4401d3af6e7af354783edc945c1a5cdb72c1d0a1

commit 0304082c8918470d536064157f9cad5902823a1b
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Mon Mar 1 19:12:17 2021 +0200

Fix logging during reapply evaluation
    
    If an app is in applied-failed or applying states the evaluation of
    reapplies the flow of execution is stopped. Not all applications might
    get the evaluation done.
    
    Introduced by I023eb3bce9061e0ccfcf10ebeeaef91bcb39cff1.
    
    Story: 2007960
    Task: 41760
    Closes-Bug: 1884770
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I604beb78cc7112c9d05bec358787b74aa28ebe14

commit ea1cec6cd5812a383153b07ba6b31a219118fa8b
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Fri Feb 26 14:58:42 2021 -0500

Upgrade script to create device_images directory
    
    Added an upgrade migration script to handle the case where the
    N-side does not have the /opt/platform/device_images directory.
    
    Upgrade testing was performed.
    
    Story: 2007875
    Task: 41942
    
    Change-Id: I42bd944b831243ddfc35a76309be095008ec749d
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit a86f69a5b401f81b91fb4f9feb4ce95ee03510ff
Author: Matt Peters <matt.peters@windriver.com>
Date:   Mon Mar 1 09:12:37 2021 -0600

Config API documentation for Kubernetes cluster
    
    API Reference documentation for the Kubernetes Cluster
    API for cluster access information.
    
    Story: 2008630
    Task: 41914
    
    Signed-off-by: Matt Peters <matt.peters@windriver.com>
    Change-Id: Id0942dafeb2273e271d145f20517d16b1f409560

commit 6add4f2dfbb162dc21815489bf5f24b0b9eb6e7f
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Fri Feb 19 09:21:32 2021 -0600

Support background runtime manifests during upgrade-activate
    
    In distributed cloud environments runtime manifests can be applied in
    the background. This can cause hosts to become config out-of-date after
    the upgrade-activate completes. This is due to the large window between
    setting the host's config_target and updating the config_applied. If a
    manifest is run in this window the host will remain config out-of-date
    until a lock/unlock is performed.
    
    To address this the config_target changes will be limited to hosts that
    apply a runtime manifest as part of the upgrade-activate process.
    Further the config_target will be updated immediately before the
    _config_apply_runtime_manifest is called.
    
    Story: 2008055
    Task: 41917
    Change-Id: I2e60c7557e8d398eeef2a407a0552f5e8f4a1f18
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit b08ad875ebc9f1c6383b1ad48e9bee19b7425f94
Author: Matt Peters <matt.peters@windriver.com>
Date:   Tue Feb 23 08:33:18 2021 -0600

Config API for Kubernetes cluster access information
    
    Introduces a new sysinv config API for retrieving the Kubernetes
    cluster access information and security credentials (if configured).
    
    The information may be used to configure a remote Kubernetes client
    with the required configuration to use the Kubernetes API with
    administrative privileges with either client certificate authentication
    or token authentication for the kubernetes-admin service account.
    
    The following information is available for each cluster:
      Kubernetes Cluster Name (kubernetes)
      Kubernetes Release Version
      Cluster API Endpoint URL
      Cluster Root CA Certificate
      Admin Client Certificate
      Admin Client Key
      Admin User Name (kubernetes-admin)
      Admin service account token
    
    Story: 2008630
    Task: 41836
    Signed-off-by: Matt Peters <matt.peters@windriver.com>
    Change-Id: Ib81c7fcc3a577c1209ab3a0dd882552ba3d2b9db

commit c4affcaf43ff936f974b3650097bbd6beac93a0c
Author: Litao Gao <litao.gao@windriver.com>
Date:   Tue Feb 9 04:36:12 2021 -0500

Remove the checks to enable flexible pci-sriov interface config ordering
    
    1. Allow pci-sriov class interface creation without mgmt. interface configured
    2. Allow interface transition from platform to pci-sriov class
    
    Story: 2008470
    Task: 41505
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: Icd22ed5ae8314e8849dc07e3fc94baac26f49079

commit df6a3386544a2c1defcf4a9cc35f3eda0daa1e4f
Author: John Kung <john.kung@windriver.com>
Date:   Mon Feb 15 15:05:24 2021 -0600

Ensure agent is ready before issuing runtime config
    
    Update the conductor to ensure at least its own config agent
    on the active controller is ready to handle the config.  Otherwise,
    append runtime config to deferred config list until signalled ready.
    
    Prior to this commit, runtime config can be missed on startup
    on the active controller due to the agent not being ready
    to handle the config request.
    
    It does not defer runtime config application until all other hosts
    are ready as the config target is still persisted to track required
    config for host target in the event the rpc request is missed due
    to unexpected event or agent not ready on other hosts.
    
    Test Performed:
      Installation and deployment of AIO-SX
      Installation and deployment of multinode system
      Verify that deferred runtime configs are applied in order
      when agent becomes available.
    
    Change-Id: I7388844d048453d302409eea36a939d81c9447ec
    Closes-Bug: 1915343
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 96a51f34fafcc71fba2cbec25067a822120c4a6c
Author: Teresa Ho <teresa.ho@windriver.com>
Date:   Wed Feb 17 16:32:40 2021 -0500

Retrieve device image over lighttpd
    
    Modify the fpga-agent to fetch the device image from the controller
    via http instead of the drbd directory.
    
    Tests performed on the following systems:
    AIO-DX, AIO-DX plus compute, Standard 2+1
    DC with AIO-DX plus subcloud
    DC with Standard subcloud
    Upgrades on duplex system
    
    Story: 2007875
    Task: 41879
    Depends-on: https://review.opendev.org/c/starlingx/stx-puppet/+/776490
    
    Change-Id: I9a53eb2131c5ce2c2b87c1740e234af65ffabf78
    Signed-off-by: Teresa Ho <teresa.ho@windriver.com>

commit 61dfb4c0e61fc7aa3a8c33e6a5984525b4c9d608
Author: Andy Ning <andy.ning@windriver.com>
Date:   Fri Feb 19 13:40:12 2021 -0500

Enhance upgrade script 85 to handle empty data
    
    Upgrade script 85-update-sc-admin-endpoint-cert.py extracts admin
    endpoint certificate and private key from cert-manager. But
    intermittently the certificate data returned is empty, causing an
    empty /etc/ssl/private/admin-ep-cert.pem generated and outage
    of many services. Eventually the upgrade is failed.
    
    This update enhanced the script to detect invalid data returned
    and retry to extract the data from cert-manager.
    
    Change-Id: Ib2c7da9147f28bf10dcb8a053412a8f94af42353
    Closes-Bug: 1916279
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 5cc9dc9c86b9214803ebba1231e227af862db77a
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Jan 29 11:10:43 2021 +0200

Rework evaluating apps reapply
    
    At the moment reapply evaluation is performed only for hardcoded apps.
    This behavior needs to be changed.
    
    Allow apps to specify what triggers they subscribe to in their metadata
    file.
    In sysinv a trigger is a dictionary composed of at least the 'type'
    field and other optional fields.
    
    Easy use of filters on the triggers by specifying key:value pairs to be
    searched in the trigger information.
    
    Keep a map between sysinv triggers type and the trigger types specified
    in the metadata.
    
    Existing triggers are: host-delete, runtime-apply-puppet,
    platform-update, unlock, force-unlock.
    Add new triggers: host-add, host-delete, host-reinstall, lock, swact,
    force-lock, force-swact, system-modify.
    
    Introduced a lifeycle semantic check hook for reapply evaluation where
    apps can run complex logic to reject the evaluation.
    
    Modified unit tests. These are basic unit tests querying the number of
    calls.
    Unsupported unit test for host-add.
    Missing unit test for system-modify.
    
    Tests: AIO-SX, AIO-DX
    Triggering the events, events are seen in the logs as expected, apps
    respond to events as specified in their metadata.
    
    To keep backwards compatibility, this work depends on updating the
    metadata file for each of the 5 apps that use the existing triggers.
    Depends-On: Ia7bbca906e343ffffa019885a790befdf5ccb565
    Depends-On: I588648090e82ac573db5112a2704d22fa45a049f
    Depends-On: Ie02743cdf056dda3feb66911c74f9dabe69d98dd
    Depends-On: I4778c6c8232fffb5fafca95b450e590fbb1b0f64
    Depends-On: I0a76ef10fe3958634d714f1484d79763f98a0d4e
    Story: 2007960
    Task: 41760
    Closes-Bug: 1884770
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>
    Change-Id: I023eb3bce9061e0ccfcf10ebeeaef91bcb39cff1

commit fb5956ed3aa068a2b6cd7851dcefd2bdce0244ff
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue Feb 23 18:18:53 2021 -0500

Update the minimum small root disk size
    
    Dc-vault filesystem should be counted into the minimum
    root disk calculation.
    
    Partial-bug: 1916797
    Depends-on: https://review.opendev.org/c/starlingx/metal/+/777464
    Change-Id: I65ac2cc5bda3a94728a7f593b6aadbafca7a3af6
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 2b7a745b7f033de13ca6fb42afcb6a8929afdc45
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Oct 9 15:07:16 2020 +0300

Migrate to database backend for backup and restore
    
    Drop the use of a flag file for restore.
    Create the database table for backup and restore.
    Use the database to store information.
    
    CREATE TABLE backup_restore (
            created_at TIMESTAMP,
            updated_at TIMESTAMP,
            deleted_at TIMESTAMP,
            id serial PRIMARY KEY,
            uuid VARCHAR ( 36 ) UNIQUE NOT NULL,
            state VARCHAR ( 128 ) NOT NULL,
            capabilities TEXT
    );
    
    This is an improvement to allow backup and restore functionality to be
    extended in an easier manner. Not required to fix the the bug.
    
    Added unit tests.
    
    Depends-On: I7b7fab99d457056032dbbd612363cd5036736cda
    Depends-On: I44fc4aaa528e372a84115714f271b4f5e063f86e
    Partial-Bug: 1887648
    Change-Id: Ibb96696a35fe7b560aa002c442e2af735d08ec24
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit fca3000f725bb1ec447a7546c7d3805723544352
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Wed Feb 17 22:06:21 2021 -0300

Fixes the time calculation in the is_expired method of the Token class
    
    The previous implementation calculated an absolute delta between the
    expiration time and the current time and compared it to a time window.
    The calculation should be done without getting the absolute value
    to account for an expiration time older than the time window.
    
    Added unit tests for the affected code.
    
    Closes-Bug: 1915952
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: I46bb36993d6f02978a593a1c1d49692e627d2a9b

commit 00d26663ad0141277620d438fb5904d3de715077
Author: Charles Short <charles.short@windriver.com>
Date:   Tue Feb 16 11:17:32 2021 -0500

Deprecate sysinv.openstack.common.rootwrap
    
    Deprecate sysinv.openstack.common.rootwrap in favor or oslo_rootwrap.
    This was done so we maintain less code and we worry about Python3
    a little less.
    
    Story: 2006796
    Task: 41868
    
    Signed-off-by: Charles Short <charles.short@windriver.com>
    Change-Id: I979ff1c8045030cfcaf8b88678121c4d0d684743

commit 063d9f84487407242ac1eb6cded2062abfdc01f0
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Thu Feb 11 10:01:52 2021 -0600

Limit rootfs size constraints to controller nodes
    
    Previously rootfs size restrictions were limited to controller nodes.
    Return to this model to better support lab and virtual installs.
    
    Closes-Bug: 1915215
    Change-Id: I6dd53f74de54c971c59c5d821ec33a33c7e152f5
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>

commit 6ab90d747a953f5a9f75bc05d0b55d3e2ac74fd2
Author: Babak Sarashki <Babak.SarAshki@windriver.com>
Date:   Tue Jan 5 16:08:43 2021 -0500

sysinv: Intel ACC100 (Mt Bryce) enablement
    
    This commit adds SR-IOV device plugin support for forward error
    correction (FEC) devices that are enabled on an Intel ACC100 (Mt.
    Bryce). The Intel ACC100 is mounted on Lisbon ACC100 Card.
    
    The FEC device is intended for use by a DPDK application. It is
    presented to the system under resource name: "intel_acc100_fec."
    
    An example usage to modify the device:
    
    system host-device-modify <host> <device_name> \
      -e true \
      --driver igb_uio \
      --vf-driver <driver> \
      -N <num_vfs>
    
    And example assignment to a pod:
    
     resources:
          requests:
            memory: 4Gi
            intel.com/intel_acc100_fec: '16'
            cpu: 6
          limits:
            hugepages-1Gi: 2Gi
            memory: 4Gi
            intel.com/intel_acc100_fec: '16'
            cpu: 6
    
    Story: 2008440
    Task: 41403
    
    Depends-On: https://review.opendev.org/c/starlingx/stx-puppet/+/775253
    Depends-On: https://review.opendev.org/c/starlingx/integ/+/775252
    
    Signed-off-by: Babak Sarashki <Babak.SarAshki@windriver.com>
    Change-Id: I831fd16a0410ee988365c067789f760139274ec8

commit a45569eda4daf7f323294a116a76d1bb75e1e5ed
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Feb 16 20:54:11 2021 +0000

Revert "Add api call to trigger creation of gnp"
    
    This reverts commit cd344669d381103b8e065617811b928eb895eacb.
    
    Reason for revert: Config out of date alarm is present after install. Code rework is required.
    
    Change-Id: Ic71eeb8b9d95436b1d1999bf3a1d28022a972077

commit 0786d37bebf244c918dd227fe3112af80870cc79
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed Feb 10 08:43:48 2021 -0500

Allow SR-IOV interface modify on AIO-SX without locking the host
    
    On an unlocked AIO-SX host, it will be possible to modify a non-SR-IOV
    interface to SR-IOV PF, and will also be possible to add and delete
    SR-IOV VF interfaces
    
    But the user will not be able to modify parameters on an existing SR-IOV
    interface as well modify an SR-IOV PF to a different class
    
    Story: 2008531
    Task: 41802
    Change-Id: I1b8b214816404fb0893e77acd55798017fa836bc
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit 625017798a93d2fe57740c981fe3ba7c6acaebae
Author: Bart Wensley <barton.wensley@windriver.com>
Date:   Wed Feb 10 15:06:03 2021 -0600

Add unit tests for _create_host_filesystems
    
    Adding unit tests for _create_host_filesystems in the sysinv
    AgentManager class. This points out the issue described in
    LP1915215 and will ensure any future changes don't break
    existing behaviour.
    
    Also clamping the version of astroid to allow pylint to pass
    properly in local test environments.
    
    Change-Id: Ifd4da0e044c3262281e52826854c20e0d161a3bc
    Partial-bug: 1915215
    Signed-off-by: Bart Wensley <barton.wensley@windriver.com>

commit 500d4e250c49c92b1b3440d068ab654483a437fc
Author: Carmen Rata <carmen.rata@windriver.com>
Date:   Mon Feb 8 10:35:26 2021 -0500

Fix out-of-date alarm wrongful removal
    
    When updating docker service parameters by either adding or
    deleting them, a "Configuration is out-of-date" alarm will
    automatically appear for both active and standby nodes.
    The alarms should only be cleared by a manual lock/unlock
    of each node.
    The fix addresses the scenario where a configuration update
    that triggers a runtime manifest clears the existing out-of-date
    alarm (e.g.: system certificate-install -m ssl_ca <cert>).
    The issue is reproduced no matter what configuration update with
    runtime manifest is used or if the existing alarm was due to a
    “service-parameter-add” or a “service-parameter-delete” command.
    
    Closes-Bug: 1878018
    
    Signed-off-by: Carmen Rata <carmen.rata@windriver.com>
    Change-Id: I1e545327ae394385c995aee885590779432c60b0

commit f2908be215b6c86fb273587c34ec23360791d276
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Fri Feb 5 22:25:45 2021 -0300

Backward compatibility changes for registry cert managed by cert-manager
    
    If a user tries to install a docker_registry certificate and that
    certificate is being managed by cert-manager (i.e. secret is present),
    we will refuse to install it unless a force=true parameter is passed
    in.
    
    Similar to the logic already present for mode=ssl.
    
    Story: 2007361
    Task: 41782
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>
    Change-Id: Ief4ba51054e6fec951f4c44dbf40c2b3e2e8b292

commit 983add3417b2b5dbeadc22be1f1997ca9e0ab16e
Author: Isac Souza <IsacSacchi.Souza@windriver.com>
Date:   Thu Jan 28 12:44:09 2021 -0300

Docker registry certificate management by cert-manager
    
    Cert-mon changes to monitor 'system-docker-local-certificate' k8s secret
    and install StarlingX docker_registry certificate for
    registry.local:9001.
    
    Changes include:
    - New thread to watch registry certificate changes
    - Refactored the code to reduce code duplication
    - Call sysinv api for 'certificate_install'
    
    Design testing completed:
    - When k8s secret is added/modified (initiated by cert-manager),
      certificate installation is completed
    - sysinv api 'certificate_install' installs & confirmed via openssl
    s_client -connect registry.local:9001
    - When certificate is renewed, keys get regenerated (no changes
      needed. Confirmed that existing infrastructure takes care of it)
    
    Story: 2007361
    Task: 41717
    Change-Id: Iffa68486764287a1b82a183ab9801a53c1e4885b
    Signed-off-by: Isac Souza <IsacSacchi.Souza@windriver.com>

commit c69304a998f6bc5b76b31c161ae4b6815ad60b54
Author: Gonzalo Gallardo <gonzalo.gallardo@windriver.com>
Date:   Thu Feb 4 16:29:26 2021 -0300

Restore system modify command for SNMPv3
    
    Restores "system modify" command for --(contact|name|location) options.
    Also, this affects the "system_name" variable in [fm.conf] configuration
    file.
    
    Story: 2008132
    Task: 41761
    Signed-off-by: Gonzalo Gallardo <gonzalo.gallardo@windriver.com>
    Change-Id: I143873953603b36c5e79f5247801b33efe464022

commit 4cd77f035a812d8fbc6f060ca483b8050c65cb91
Author: John Kung <john.kung@windriver.com>
Date:   Mon Feb 1 15:23:04 2021 -0600

Update config tracking for reboot required config
    
    The reboot config tracking needs to be updated to ensure
    it aligns the target with the actual applied config.
    It must also initialize and only clear the config for
    the active controller if a reboot has actually occurred.
    
    On host-swact, the target controller is checked for config
    up to date condition, rather than both source and target
    controller; as the source active controller would still
    tracked the config requirement in the persisted database.
    
    Improve traceback of config requests.
    
    Tests Performed:
      Perform host-swact with active controller config out of date
      Perform host-swact with standby controller config out of date
      Verify AIO-SX and Duplex config operations
      Restart sysinv with reboot config set and perform runtime config
      Perform runtime config while reboot required config set
      Perform runtime config while reboot required config cleared
    
    Change-Id: I339ad82a2c7b37ac1c97c3eb790a231a40914250
    Closes-Bug: 1914085
    Signed-off-by: John Kung <john.kung@windriver.com>

commit 36a4ff4fd2194bf876c1ce68c17f16b047bff7ae
Author: Adriano Oliveira <adriano.oliveira@windriver.com>
Date:   Mon Feb 1 19:48:28 2021 -0500

Fix migration scripts execution sequence
    
    The following changes have been introduced as a fix for this issue:
    
    1. Changed the sorting on the migration script file names to be based
    on the first number on the file name.
    2. Added file name format validation: "nnn-*.*", where "nnn" string
    shall contain only digits.
    3. Fixed the name of two migration scripts that were not following the
    correct format (not using "-" separator).
    4. Added set of unit tests to test and validate the execution of
    migration scripts code.
    
    Manual upgrade testing to STX 5.0 has been executed.
    
    Closes-Bug: 1887985
    Signed-off-by: Adriano Oliveira <adriano.oliveira@windriver.com>
    Change-Id: I04fdb8a3b3e177c609c4037825810a531954d99c

commit d64f6f59244c56f26899aa5f2b8fb8a33714f3b7
Author: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>
Date:   Wed Jan 27 13:43:59 2021 -0500

Support data network assignment on an unlocked host
    
    For SR-IOV interfaces, when operating in AIO-SX, it will be
    possible to assign it to a datanetwork without host lock
    
    Story: 2008531
    Task: 41705
    Change-Id: Ia4b5670ffe09255845823dc3ae2c3fc19c709fc9
    Signed-off-by: Andre Fernando Zanella Kantek <AndreFernandoZanella.Kantek@windriver.com>

commit cd344669d381103b8e065617811b928eb895eacb
Author: Cole Walker <cole.walker@windriver.com>
Date:   Tue Jan 12 10:25:18 2021 -0500

Add api call to trigger creation of gnp
    
    Creating an oam network was not correctly setting the flag to trigger
    the runtime puppet manifest to create the default globalnetwork policy
    and hostendpoint resources.
    
    This change adds the correct api call to ensure that the flag is set
    when an oam network is created. This ensures that the default
    globalnetworkpolicy and hostendpoint kubernetes resources are created as
    intended.
    
    This issue was previously masked by the fact that the relevant runtime
    manifest was being run on every single unlock. This behaviour was
    changed and revealed that the required flag was not being set upon oam
    creation.
    
    Closes-Bug: https://bugs.launchpad.net/starlingx/+bug/1911213
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: I5b570e82246d82a4243ee68a871333876d231c85

commit a4a969c94f2ef4dc1531592b6563e90d6465d6ac
Author: Mingyuan Qi <mingyuan.qi@intel.com>
Date:   Tue Nov 24 02:13:09 2020 +0000

Add host command support for the edgeworker node
    
    This is an experimental feature in stx5.0.
    
    This commit enables following commands for the edgeworker node:
      system host-add/system host-update/system host-delete
    After the host being added/updated, the mgmt ip of an edgeworker
    node will be assigned during the configuration process of it.
    
    There will be limitations of edgeworker nodes before the final
    phase of the feature finished:
    - The Kubernetes provisioning requires ansible playbook triggered
      manually.
    - Gather node HW information is not supported.
    - Configure node from controller is not supported.
    - Manage node lifecycle is not supported.
    - Update/upgrade node is not supported.
    
    Story: 2008129
    Task: 40862
    
    Change-Id: I7e6de65ba848d9468a4e5afddd16b1cd9e3cd7dd
    Signed-off-by: Mingyuan Qi <mingyuan.qi@intel.com>
    Depends-On: https://review.opendev.org/c/starlingx/config/+/761716

commit aa851db31fa3ea25337da819d6fd9fc874f7fcfd
Author: Martin, Chen <haochuan.z.chen@intel.com>
Date:   Fri Apr 17 11:02:08 2020 +0800

Introduce rook ceph
    
    1, add storage backend rook ceph for container-based ceph cluster
    2, create puppet rook.py
    3, update sysinv-agent audit function
       rook ceph provision disk with bluestore, which create a vg and lv
       update sysinv-agent audition function, so after rook-ceph applied
       add rook-ceph created vg named ceph-xxxxx to sysinv-db
    4, update lvm filter for rook-ceph provisioned osd
    
    Story: 2005527
    Task: 39452
    
    Depends-On: https://review.opendev.org/#/c/713084/
    Depends-On: https://review.opendev.org/c/starlingx/rook-ceph/+/716792
    
    Change-Id: If8c1204dd3c7cc25487b2f645ace9aa680d32d59
    Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>

commit 8671cb8f5a8340bdd5f623343bde39f76e9dc3fc
Author: John Kung <john.kung@windriver.com>
Date:   Thu Jan 21 09:14:18 2021 -0600

Upgrades: data migration for dcorch scaling
    
    The dcorch scaling feature introduced new tables, of
    which the subcloud_sync table needs to be populated.
    
    The subcloud_sync data is populated with initial
    values similar to what would have been
    done on subcloud add.
    
    Change-Id: Ib880ea19cb7f69cb004d58557ad07535c1e45774
    Story: 2007267
    Task: 41648
    Signed-off-by: John Kung <john.kung@windriver.com>

commit b40f98b57e3274892c893d30139e03bb6b21d6d2
Author: Litao Gao <litao.gao@windriver.com>
Date:   Mon Jan 25 03:46:08 2021 -0500

Interface profile support for VF rate limiting configuration
    
    This feature is part of 'Single NIC support'.
    This commit eanbles interface profile creation and apply of
    interface configuration which includes VF max-tx-rate config.
    
    Story: 2008470
    Task: 41683
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/770135
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: Ib8a8a1844d5a151c15454f446129642ed4f4dfe8

commit 69664edc58bbf062d0628b0d283e66104f8d0a0e
Author: Litao Gao <litao.gao@windriver.com>
Date:   Mon Jan 11 08:51:24 2021 -0500

VF rate limiting support
    
    This feature is part of 'Single NIC support'.
    This commit implements the rate limit on VFs in terms of
    max_tx_rate leveraging hardware NIC driver capability.
    And adjust sriov-device-plugin config.json to make it easy
    to allocate rate limited VFs to the target Pods.
    
    Story: 2008470
    Task: 41508
    
    Depends-On: https://review.opendev.org/c/starlingx/config/+/770132
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I3b609296b6b5b872f0bb0bd9733740b01e9d421c

commit 52677deb2c56038d108edbd5ae7dd788b9501f77
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Wed Jan 20 11:31:22 2021 -0600

Remove upgrade flags during abort
    
    If the upgrade flags are present during controller-1 downgrade, the
    controller will fail to unlock. Previously these were removed as part of
    the host-downgrade command, but this requires root access. These flags
    will now be removed by the conductor as part of the upgrade-abort
    command.
    
    Story: 2008055
    Task: 41633
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I7bb60e42d956140e06e44145ab8b168d1f3b72be

commit a875f32e57b05420d3bc018bc14718a95591ba53
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Sat Jan 23 13:16:22 2021 -0600

Remove host manifest during downgrade
    
    During the host-downgrade action the host manifest needs to be removed.
    This is similar to the process taken during host-reinstall. The manifest
    needs to be removed to prevent the host from running kubeadm
    prematurely.
    
    Story: 2008055
    Task: 41674
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I2242e8ab95228549ea3c1b3f6924025ae66ecb0f

commit ac284046e449b3998db861b16db59dabbe8eb7bc
Author: albailey <Al.Bailey@windriver.com>
Date:   Mon Jan 25 14:27:52 2021 -0600

Fix tox failure when certain tmp folder un-writable
    
    When /tmp/device_images folder is not writeable, tox
    fails for three device_image API unit tests.
    
    This change mocks open and fdopen when calling that POST
    API endpoint, to allow the unit tests to pass.
    
    Closes-Bug: 1911997
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: Ib6132a9b95b214e60ab34faa8a9615e1a2381fef

commit c991e1e122db066f386c8ba4295f206fbae818e5
Author: Bin Qian <bin.qian@windriver.com>
Date:   Wed Jan 20 14:23:47 2021 -0500

cert-mon secret data migration for upgrade to stx5
    
    Add data migration code to populate secret data for cert-mon
    service. The secret data is stored in static secret data
    file. The data in static secret is only configured in initial
    bootstrap.
    
    Closes-bug: 1913173
    Change-Id: I9ddb1aca9b2ba136facf1b3c294a273010e2a26b
    Signed-off-by: Bin Qian <bin.qian@windriver.com>

commit c0fadef2c1aa8a0335b69a27d6c41db757a78131
Author: Litao Gao <litao.gao@windriver.com>
Date:   Thu Jan 7 01:43:37 2021 -0500

Single NIC support
    
    This feature is to allow user to configure CaaS and Data networks
    to single VF capable NIC during ‘Host interface configuration’
    before unlock operation.
    
    This commit introduce a new subinterface type which is used to
    support for ethernet subinterface creation over pci-sriov class
    ethernet interface.
    
    Story: 2008470
    Task: 41505
    
    Signed-off-by: Litao Gao <litao.gao@windriver.com>
    Change-Id: I2c148d24c73854e59ebb2be122e97017c6235c89

commit 6d4d5e384727548a68d1509d94fe8afc22b16e53
Author: David Sullivan <david.sullivan@windriver.com>
Date:   Wed Jan 20 10:25:33 2021 -0600

Migrate etcd after both controllers are upgraded
    
    The etcd database can go out of sync if we swact back and forth between
    controllers running different versions. This can happen if we need to
    abort the upgrade after swacting to controller-1. Since we are not
    upversioning etcd in this release we will address this by waiting to
    migrate the etcd data when both controllers are running the new release.
    The migration will now take place during the swact to controller-0
    before upgrade-activate.
    
    This solution will present some problems when we do upgrade etcd, so
    further development will be required at that time.
    
    Story: 2008055
    Task: 41630
    Signed-off-by: David Sullivan <david.sullivan@windriver.com>
    Change-Id: I02b82bfe1a4b4b69aaa85d5f0d20246b9cda5629

commit 81df87b5b675c34d3216b5057153f510fcfa9baf
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Fri Jan 22 17:48:28 2021 -0500

Fix urlparse call statement
    
    During previous commit of cert-mon, import urlparse statement
    was changed, which introduced a bug. This fixes the bug.
    
    Closes-Bug: 1911057
    
    Change-Id: Ib90c89c3681ea97d77b5cdad4298cd7a08b42345
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>

commit 44d4774372d422d95ae198e778c5ddee32992f5a
Author: Cole Walker <cole.walker@windriver.com>
Date:   Fri Jan 22 16:55:10 2021 -0500

Add required vars for ptp-notification app
    
    Story: 2008529
    Task: 41671
    
    Signed-off-by: Cole Walker <cole.walker@windriver.com>
    Change-Id: I807b03203a7589e5b1a04885dab09476323c5293

commit 6a51c9420a01ad48c8c8eec4835d9ed01a83d36c
Author: Yuxing Jiang <yuxing.jiang@windriver.com>
Date:   Thu Jan 14 09:37:11 2021 -0500

Upgrade: fix hieradata software version mismatch
    
    This commit disables updating puppet hiera data in runtime if the
    software load of a host is different from the active controller, to
    prevent the hiera data software version mismatch during an
    upgrade/rollback procedure.
    
    Tested with an upgrade and rollback an AIODX system.
    
    Closes-Bug: 1911457
    Change-Id: I42a953c0a31b80f2536de9292ed8b17789f6328c
    Signed-off-by: Yuxing Jiang <yuxing.jiang@windriver.com>

commit 3477d1e3ed3266c091756acc07128277c31ee6db
Author: Jessica Castelino <jessica.castelino@windriver.com>
Date:   Thu Nov 19 04:19:48 2020 -0500

Remove identity entry on upgrade
    
    Identity services are no longer a shared service in DC.
    This commit updates the i_system table on subclouds to
    remove identity from the shared service list.
    
    Change-Id: I9a93c1aa364413d77af60a67c70a16ecbd546356
    Signed-off-by: Jessica Castelino <jessica.castelino@windriver.com>
    Closes-Bug: 1904675
    Depends-On: https://review.opendev.org/#/c/763127/

commit f2dbdddcd160b643ccdc89b3d7b8e98acb25f6db
Author: Angie Wang <angie.wang@windriver.com>
Date:   Wed Jan 20 15:42:06 2021 -0500

Add Armada pod ready check in sysinv application audit
    
    In rare cases, after controller-0 is up from the initial
    unlock, armada pod is not running and ready yet when
    uploading platform-integ-apps. This will cause application
    upload failed.
    
    For the sysinv managed applications, update to ensure
    Armada pod is running and ready before attempting to
    upload/apply applications.
    
    Change-Id: I176bd1bbdb2ecf6285bd680091812aac43ea0ae3
    Closes-Bug: 1912520
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 93a86019cf68de0069702e7cf0ab7776f10c015f
Author: Dan Voiculeasa <dan.voiculeasa@windriver.com>
Date:   Fri Nov 6 17:40:49 2020 +0200

Introduce lifecycle operator
    
    The goal of this commit is to:
      - move most of the app specific logic outside conductor
      - introduce an operator for this purpose
      - keep the same logic as before
      - keep compatibility: apps that don't implement the new operator
      behave in a default way (same way as before)
    
    Create a lifecycle operator.
    Move lifecycle hooks from manifest operator to lifecycle operator.
    Move rbd actions from manifest operator to lifecycle operator.
    
    Move app specific code outside perform_app_operation to be
    executed by the app through lifecycle hooks.
    
    Conductor:
      - add lifecycle_hook_info param to all rpc perform_app_operation
      - introduce lifecycle hooks in perform_app_operation
      - remove some (de)activate_plugins in perform_app_operation
      - spawn only conductor.manager.perform_app_operation:
        greenthread.spawn(self.perform_app_upload
        greenthread.spawn(self.perform_app_apply
    
    Rest controller:
      - use manual mode hook in all rpc perform_app_operation
      - add semantic check hook for rpc operations:
        apply, remove, delete, abort, update
    
    Tested:
     - throw failed to download images (stx-openstack, snmp)
     - auto-apply hook throws exception on ceph down (platform-integ-apps)
     - upload (platform-integ-apps, stx-openstack, nginx-ingress-controller,
       snmp)
     - apply (platform-integ-apps, stx-openstack, cert-manager,
       nginx-ingress-controller, snmp)
     - remove (platform-integ-apps, stx-openstack, cert-manager,
       nginx-ingress-controller, snmp)
     - delete (platform-integ-apps, stx-openstack, cert-manager,
       nginx-ingress-controller, oidc-auth-apps, snmp)
     - auto apply platform-integ-apps
     - auto upload cert-manager, platform-integ-apps, oidc-auth-apps
     - create and delete cinder-volume in stx-openstack
     - application update (snmp)
    
    Change-Id: Ic83fbd25d23ae34889cb288330ec448f920bda39
    Depends-On: Ibe994411fee55c84fa86770fad5497040f13b78f
    Depends-On: I41858c831a4af564dbdf38934d51d34489bf8a9a
    Depends-On: I533f2bd41c5627f83a004ffbf4543dd3c26d06b7
    Story: 2007960
    Task: 40463
    Task: 41291
    Signed-off-by: Dan Voiculeasa <dan.voiculeasa@windriver.com>

commit ad1188203e29dfe7d106efe449b08f222d0ecf1a
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Tue Dec 22 12:27:56 2020 -0500

Use deployment namespace for platform certificates
    
    Instead of using kube-system namespace, we want to use
    deployment namespace for platform certificates. This commit
    includes:
    - Creation of namespace during upgrades
    - Changing namespace that cert-mon monitors
    
    Depends-on: https://review.opendev.org/#/c/768241/
    
    Change-Id: Ieac392ed6d560be77327dd9e713fae01b17fda04
    Story: 2007361
    Task: 41165
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>

commit 20a761a073f5034d2548ee5edb4b04de63edd462
Author: Angie Wang <angie.wang@windriver.com>
Date:   Wed Jan 13 12:35:18 2021 -0500

Clean up helm and armada directories from old release
    
    Delete the /opt/platform/helm and /opt/platform/armada
    directories from the old release at the stage of
    completing upgrade.
    
    Tested: AIO-SX and AIO-DX upgrades
    
    Change-Id: Ie2a582877135ac387c20bae4df7a6a6f244a0c3e
    Closes-Bug: 1910801
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 576bdca7406712d2465ff36e216970e3da87152a
Author: Angie Wang <angie.wang@windriver.com>
Date:   Tue Jan 12 13:23:18 2021 -0500

Ensure Armada lock is released
    
    If the apply/remove of an application is failed due to
    an exception of the Armada request or an abnormal exit
    of the Armada request, the Armada lock cannot be released
    by Armada which causes the subsequent re-apply of the
    application to fail as it cannot acquire the lock.
    
    Update to delete the lock if an exception occurs during
    Armada request or Armada request exits abnormally.
    
    Closes-Bug: 1911243
    Change-Id: I6ab02474831152b5b2b7302f0799c48d05fef64d
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit 2ab82b7262e4881be8ca6997dfcf99a8931911d6
Author: albailey <Al.Bailey@windriver.com>
Date:   Wed Jan 6 19:18:09 2021 -0600

Support passing an ignore alarm list to kube upgrade start API
    
    Health utils support an ignore alarm list.
    The kube_upgrade API makes use of those commands.
    
    Story: 2008137
    Task: 41559
    Signed-off-by: albailey <Al.Bailey@windriver.com>
    Change-Id: I19db852f2e87273551d8a30f4bab470afa420de2

commit 84936cf189215ca60b9210ce27783f6331a98679
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Mon Jan 11 22:53:55 2021 +0800

Fix etcd AIO-DX upgrade failure issue
    
    The data migration is failed after controller-1 is upgraded
    (system host-upgrade controller-1) which is due to the error
    “/opt/platform/puppet/20.12/hieradata/static.yaml doesn’t exist.”
    when migrating the hieradata.
    
    Closes-Bug: 1894870
    
    Change-Id: I71a5ef6d43da13039d0607b2457bd5f561704dfe
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit efa5f521c30a0d9bce4a8b0381d633f6f32fb4a1
Author: Angie Wang <angie.wang@windriver.com>
Date:   Mon Sep 28 11:25:50 2020 -0400

Configure SQL as helm storage backend
    
    Configmap is the default helmv2 storage backend to store
    release information but its 1MB resource limit prevents
    scaling up stx openstack worker nodes, so we want to use
    SQL as helm storage backend.
    
    To configure SQL backend, generate helm database hieradata
    that will be used in puppet to create helm database. The
    helm database password is stored in keyring which can be
    retrieved in ansible playbook to configure database connection
    address.
    
    System upgrade support:
    The helm DB is new in the release stx5.0, so a password is
    generated for helm user. Helm user and password are written
    into the hieradata of release stx5.0. For AIO-SX upgrade,
    helm DB is created when applying bootstrap puppet manifest
    during ansible upgrade playbook. For two controllers upgrade,
    helm DB is created when applying upgrade puppet manifest
    during controller-1 upgrade. A migration script is created
    to migrate helm releases from configmap to postgresql.
    
    Partial-Bug: 1887677
    Depends-On: https://review.opendev.org/#/c/761642/
    Change-Id: I2f4f414068af297b5f4a3792c061443b7d3bdb32
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit f9b6c1c0a648abcdd6cdbc5a040895b0a3d7d4c9
Author: Sabeel Ansari <Sabeel.Ansari@windriver.com>
Date:   Mon Jan 4 16:49:16 2021 -0500

Update certificate pem files for unit tests
    
    Tox/Zuul failed unit tests due to expiring certificate pem files.
    This commit updates the expired (&extends other certs)
    for a long duration (10yrs).
    
    Closes-Bug: 1909817
    
    Change-Id: I85cfb1896ae62947fb868315ecf1ff6b53fd4e08
    Signed-off-by: Sabeel Ansari <Sabeel.Ansari@windriver.com>

commit b6d64bef3935a58ac7ad946822f29fe7ed0fc316
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Thu Jan 7 11:05:23 2021 -0300

Support trap_server_port configurable
    
    Add parameter for trap_server_port to allow sysadmin to
    configure snmp trap server port number through
    user helm override.
    
    Story: 2008132
    Task: 41547
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>
    Depends-On: https://review.opendev.org/769732
    Change-Id: Iac33ae11412852b59f6375abccb81988e85040e1

commit 06ae123d180648eb7b35c12c591198160d646db9
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Fri Dec 18 17:06:17 2020 -0300

Remove REST API for host-based snmp
    
    -Remove controller related to icommunity/itrapdest API
    -Remove object related to icommunity/itrapdest API
    -Remove endpoint related to icommunity/itrapdest API
    -Remove db accessor related to icommunity/itrapdest API
    -Remove tables icommunity/itrapdest from sysinv
    -Remove data related to icommunity/itrapdest from dcorch
    
    Story: 2008132
    Task: 41380
    Change-Id: Ib1811a1bc507ee88e9a551a055068a3cd9d73a61
    Depends-On: https://review.opendev.org/765381
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

commit bdf1888386094ee4a61fb4c78d268e187b1c3677
Author: Zhipeng Liu <zhipengs.liu@intel.com>
Date:   Sat Oct 31 00:54:25 2020 +0800

Enable etcd with security setting.
    
    After upgrade from StarlingX 4.0 to 5.0, etcd will still keep
    insecurity before sending "system upgrade-activate".
    During upgrade activate stage, it will create etcd server/client
    certs and distribute them to all controllers before restart etcd
    and kube-apiserver with security enabled.
    
    Upgrade test pass on both simplex and duplex.
    
    Closes-Bug: 1894870
    
    Depends-on: https://review.opendev.org/#/c/760510/
    Change-Id: I27733a881a267e61502b36627dcab4136de23e3f
    Signed-off-by: Zhipeng Liu <zhipengs.liu@intel.com>

commit f4dc4a0c54bfacc0f30cf7c5b52fcd1b4ee4e025
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Wed Dec 2 14:51:43 2020 -0300

Remove puppet entry related to host-based snmp
    
    With the host-based SNMP removal:
    -Remove trap destination entry
    -Remove snmp config entry
    
    Story: 2008132
    Task: 41349
    Change-Id: Ibd0faafe2255f6d325b11ab3383dc5f08f181955
    Depends-On: https://review.opendev.org/765381
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

commit 12a73ed1126b3577d6f5fe450b0cfb6c81d8aab4
Author: Andy Ning <andy.ning@windriver.com>
Date:   Wed Dec 23 13:59:04 2020 -0500

Update local puppet cache during runtime manifests apply
    
    This change added support to update local puppet hieradata cache
    during runtime manifests apply. The local cached puppet hieradata
    is used by controller_config during DOR to configure controllers.
    
    Change-Id: Ia5748230854ba54a9d73c7d5b44fdcb87d7cfe95
    Closes-Bug: 1904739
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 97ad36382ae7abeba0cf1118f3e80683f2d6215a
Author: albailey <Al.Bailey@windriver.com>
Date:   Fri Jan 1 10:40:58 2021 -0600

Update expired unit test certificate
    
    The sysinv unit test certificate expired Dec 22, 2020
    This extends the date to Dec 22 2021.
    
    The unit test header was updated to trigger zuul to verify
    the changes are valid.
    
    A Readme file has been provided which lists the expiry dates
    for all the pem files used for the unit tests. Information
    on how to update the pem files will be added to the readme
    when all the expiry dates are aligned.
    
    Change-Id: I63cdd865a1818756f9ec3f20ebd724d2ad2d5729
    Partial-Bug: 1909817
    Co-Authored-By: Sabeel Ansari <Sabeel.Ansari@windriver.com>
    Signed-off-by: albailey <Al.Bailey@windriver.com>

commit 760e103ee4bf1190d907aa3532797018f832f9d1
Author: XinxinShen <shenxinxin@inspur.com>
Date:   Tue Dec 8 08:51:14 2020 +0800

Update the PCI device section in the API document
    
    An error occurred while using the API interface
    (/v1/devices/{device_id}) in the API documentation.
    After analysis code, I found that the API interface should be
    /v1/pci_devices/{device_id}, and the URL of "Modifies a
    specific PCI device" also has this problem, so I also update
    this part of the API document.
    
    Closes-Bug:#1907074
    
    Signed-off-by: XinxinShen<shenxinxin@inspur.com>
    Change-Id: I636b830f34f9a74e0173d4a53224ec3f0249bba2

commit 596742595fe9cef9359e08c1c668cc1629de0351
Author: YuehuiLei <leiyuehui-s@inspur.com>
Date:   Thu Dec 31 16:32:25 2020 +0800

API documentation error for partitions
    
    An error occurred when using the partitions part
    of the API documentation
    (POST /v1/ihosts/{host_id}/partitions). After analyzing the
    code, I found that the URLs set in the code are /v1/partitions.
    This is inconsistent with the API documentation,
    so update the partitions section of the API documentation.
    
    Closes-Bug: #1897231
    
    Change-Id: I9fc61b86bdfc4cb12acfd4f4cd8ecf8853ab338e
    Signed-off-by: YuehuiLei<leiyuehui@inspur.com>

commit 8f89ba2e0eca77ffd417e0e7f549546b073ffe3c
Author: YuehuiLei <leiyuehui-s@inspur.com>
Date:   Wed Dec 30 11:21:03 2020 +0800

Document error for post /V1/ilvgs interface.
    
    When I was using the /v1/ilvgs POST API, I found that
    the request parameters are mandatory rather than optional.
    
    Closes-Bug: #1897866
    Change-Id: Iffb01013b16e62e1f3d3deb08f2b75132d62fde7
    Signed-off-by: YuehuiLei<leiyuehui@inspur.com>

commit fd2f1b105d2e46a05a60ce4efe650e94a5d26ce7
Author: Pablo Bovina <pablo.bovina@windriver.com>
Date:   Mon Dec 7 09:23:15 2020 -0500

Remove system CLI for Net-SNMP commands
    
    The following commands are removed
    from CLI shell.
    
    snmp-comm-add: Add a new SNMP community.
    snmp-comm-delete: Delete an SNMP community.
    snmp-comm-list: List community strings.
    snmp-comm-show: Show SNMP community attributes.
    snmp-trapdest-add: Create a new SNMP trap destination.
    snmp-trapdest-delete: Delete an SNMP trap destination.
    snmp-trapdest-list: List SNMP trap destinations.
    snmp-trapdest-show: Show a SNMP trap destination.
    
    All the Net-SNMP configuration is responsibility
    of starlingx/snmp-armada-app.
    
    Story: 2008132
    Task: 41367
    
    Change-Id: I649d91d7e43f8b471434be5b433b47e8189600ce
    Depends-On: https://review.opendev.org/766267
    Signed-off-by: Pablo Bovina <pablo.bovina@windriver.com>

commit 41b5fa0b507d85e77912d790fabb854ab788461b
Author: Andy Ning <andy.ning@windriver.com>
Date:   Tue Dec 15 13:31:39 2020 -0500

keep and reuse ssl certificate
    
    Currently when https is disabled, the installed ssl certificate
    is removed from the system. The default self signed certificate
    is installed again once https is enabled.
    
    This change enhanced ssl certificate handling in that:
    - The very first time https is enabled, the default self signed
      certificate is installed not only in fs but also in sysinv.
    - When https is disabled, installed ssl/tpm certificate is no longer
      deleted.
    - When https is enabled, the existing ssl/tpm certificate will be
      used if there is one installed. Otherwise the default self signed
      certificate will be installed (this is the case that https is
      enabled for the very first time).
    
    Change-Id: Iaef7b4acc4badaab617c05dcbd6654ea3d1e126a
    Closes-Bug: 1908437
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit e7c99304ce5acfc97ac29fe4b12fd020aac88bb7
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Wed Dec 16 14:43:40 2020 -0500

Resolving unpacking-non-sequence error
    
    Pylint error dealt with by per line suppression,
    error caused no staticmethod that breaks.
    
    Story: 2007082
    Task: 37996
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: I8d56ba47ee58edb070e99c5240afac1fd7942d7a

commit 7312b1bf1de72e47bdea9d628f070afa1f4df8a1
Author: albailey <Al.Bailey@windriver.com>
Date:   Thu Dec 10 18:25:11 2020 -0600

Turn off the legacy pip resolver for sysinv
    
    The requirements that had conflicts have been updated.
    
    The local upper-constraints is no longer being used by tox
    since two constraints files with different values for the same
    requirements are not supported.
    
    Additional pylint error codes are being suppressed as
    pylint is now running in python3 and has some newer checks.
    
    Cleans up some linter warnings related to yaml.
    
    Change-Id: I9b5158d59b0791e49c3037a8e823b67fb30c8292
    Related-Bug: #1907678
    Signed-off-by: albailey <Al.Bailey@windriver.com>

commit b5999ed6d180aa499252e869c436e0f2abf8338b
Author: Andy Ning <andy.ning@windriver.com>
Date:   Tue Dec 15 10:40:51 2020 -0500

Enhance puppet hieradata copy during controller config
    
    This change enhanced the copying of puppet hieradata during controller
    config by using rsync to "sync" hieradata to a temp cache directory,
    then rename it to the final cache directory. This is more atomic and
    minimize the chance to have incomplete or corrupted hieradata.
    
    Change-Id: I062ea54507a377e73102f29f40babc3d4fc214a6
    Closes-Bug: 1904739
    Signed-off-by: Andy Ning <andy.ning@windriver.com>

commit 51f1c359ae36ae5bf0dae0b69ebb0ecdf1235a72
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Tue Dec 15 11:14:35 2020 -0500

Resolving no-value-for-parameters pylint error
    
    With sqlalchemy documentation the following:
    insert(), update() and delete() objects do not
    take in arguments despite pylint errors for E11020
    when calling objects from the same sqlalchemy library
    other instances work without arguments. Using per line
    suppression for the instances that raise pylint errors
    
    Story: 2007082
    Task: 37996
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: I6639bbca3ef7f1a48c4e242523ee6e62e624cbce

commit 83445495d95fba3fff16d8845e60a487a0ffc4af
Author: Lu Yao Chen <luyao.chen@windriver.com>
Date:   Mon Dec 14 12:17:32 2020 -0500

Resolving bad-except error
    
    Pylint catches bad-except error, in
    api/controllers/v1/interface.py, this part of code was
    not reached, resolving by removing except statement.
    
    Story: 2007082
    Task: 37996
    
    Signed-off-by: Lu Yao Chen <luyao.chen@windriver.com>
    Change-Id: I60ae5b8622e96c43d8f28febb0405259aadad0c0

commit 202902f60bf3483ae898b6f97979565f0c418c07
Author: Tee Ngo <tee.ngo@windriver.com>
Date:   Wed Dec 9 16:34:11 2020 -0500

Consistent creation of registry secret
    
    In this commit, the default-registry-key is created from sysinv
    credentials during the apply of the very first platform app
    (nginx-ingress-controller) in kube-system namespace. This key
    can no longer be removed. It will be used to create registry secret
    for all helm-based applications in their namespaces upon
    applying the application.
    
    Tests in AIOSX:
      - Install the system.
      - Remove cert-manager app, prune containerd image cache to force
        repull of the images from the registry, reapply the app.
      - Remove platform-integ-apps, verify that the default-registry-key
        in kube-system ns is not removed, reapply platform-integ-apps
      - Apply oidc-auth-apps, remove the app, verify that the
        default-registry-key in kube-system ns is not removed.
    
    Closes-Bug: 1906337
    Change-Id: I042b1bf81f44a8c5498661c0bfca219c7150c57a
    Signed-off-by: Tee Ngo <Tee.Ngo@windriver.com>

commit dfe0de2a0b2aecf4e5ea5c5fbbf1a77db4996f8a
Author: Jim Gauld <james.gauld@windriver.com>
Date:   Tue Jul 14 14:06:47 2020 -0400

Support upgrade to Helm v3 with containerized armada
    
    This provides an upgrade script for containerized armada.
    
    This launches armada using Helm v3. This does cleanup of the previous
    tiller-deployment, serviceaccounts and clusterrolebindings.
    
    Story: 2007927
    Task: 40357
    Depends-on: https://review.opendev.org/#/c/741024/
    
    Change-Id: I4b3ba29c5210bcad269ed8dd25e00acafa1c8bb4
    Signed-off-by: Jim Gauld <james.gauld@windriver.com>
    Signed-off-by: Angie Wang <angie.wang@windriver.com>

commit b339eb000517a2627658da170b8aec0f1c131d7f
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Fri Dec 4 11:35:23 2020 +0200

Fix upgrades that use restore procedure
    
    The restore procedure was recently changed,
    and it has additional steps.
    Upgrades that are based on backup and restore
    procedure don't automate the `complete the restore`
    additional step.
    In this fix, we automate this step when we do a
    `system upgrade-complete` command.
    
    Closes-bug: 1906557
    Change-Id: I93db335ff058f987ed5f10ecac1aa402fe82032f
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit 2b211cf44f8bccd091206103ab72680dfc2cfac2
Author: Takamasa Takenaka <takamasa.takenaka@windriver.com>
Date:   Tue Nov 3 15:17:00 2020 -0300

Add variables for snmp in fm.conf
    
    Snmp trap client needs the following three variables
    to connect to snmp trap server.
    - trap_server_ip
    - trap_server_port
    - snmp_enabled
    This commit is to update sysinv puppet hieradata with
    the value of snmp_enabled based on whether the snmp
    app is applied.
    trap_server_ip and trap_server_port are fixed.
    snmp_enabled takes 1(True)/0(False) depends on snmp
    armada app is applied or not (1 when applied).
    
    Change-Id: I76f9ef3e9ca6a98c0cd37a72f7aaccd10353b26b
    Story: 2008132
    Task: 41205
    Signed-off-by: Takamasa Takenaka <takamasa.takenaka@windriver.com>

commit 045e6c69e822369266831850f50d80da9b86b22c
Author: Martin, Chen <haochuan.z.chen@intel.com>
Date:   Fri Nov 6 15:50:00 2020 +0800

Remove rook-ceph app constants
    
    Change-Id: I72f2dc74d8891e04ff693bac455f7940ddae0fb1
    Closes-Bug: 1896628
    Signed-off-by: Martin, Chen <haochuan.z.chen@intel.com>

commit 083a073332d1654ee65f9e608e2974cedee0902f
Author: Robert Church <robert.church@windriver.com>
Date:   Mon Oct 19 00:41:40 2020 -0400

Add support for disk wiping without GPT formatting
    
    To support a Rook based Ceph cluster, provide an option when wiping
    disks to skip adding a partition table. This allows preparing disks on
    all hosts during provisioning from the active controller that will be
    formatted and run as OSDs when the Rook k8s application is applied.
    
    Change-Id: I93a9770b0d78ddddc01fe7956d7ad7058acc8e71
    Story: 2005527
    Task: 41127
    Signed-off-by: Robert Church <robert.church@windriver.com>

commit 42fde43da5a4af5bba5363dde72beee71435f69f
Author: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Date:   Fri Sep 11 18:53:49 2020 +0300

Trigger reapply of apps with reapply support only if VIM is up
    
    When stx-openstack needs to be automatically reapplied,
    it has to wait for the VIM services to be enabled in order
    to be applied successfully.
    
    Closes-Bug: 1879018
    Depends-On: https://review.opendev.org/#/c/751358/
    Change-Id: I4d310f2faba71e6102dc6d6c7ea98ddf63d82633
    Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

commit 8c8890eecda59d4c5cdf288d8d5b2e59b1488d5b
Author: YuehuiLei <leiyuehui-s@inspur.com>
Date:   Tue Oct 13 10:10:49 2020 +0800

update API documentation for Ceph Monitors
    
    An error occurred when using the Ceph Monitors
    of the API documentation
    (DELETE /v1/ceph_mon/{ceph_mon_id}). After analyzing the
    code, I found that the URLs set in the code are
    /v1/ceph_mon/{host_uuid}.
    This is inconsistent with the API documentation,
    so update the partitions section of the API documentation.
    
    code:https://github.com/starlingx/config/blob/
    99616a7f72125c6fd5c86e5dd90af3d87ef3902c/sysinv/
    cgts-client/cgts-client/cgtsclient/v1/ceph_mon.py#L55
    
    Closes-Bug: #1899550
    Change-Id: Ie1cc900a20d37246afeed740c9f67d5be405bae7
    Signed-off-by: YuehuiLei<leiyuehui@inspur.com>

commit 44d7f1eea99050e4e8e7c6de34fb2a17b8875a46
Author: YuehuiLei <leiyuehui-s@inspur.com>
Date:   Mon Oct 12 15:08:31 2020 +0800

Add file storage backend parameter error
    
    When I used the /v1/storage_file interface to
    create a file storage backend, I found that
    the request parameters backend and services
    are mandatory and not optional.
    
    Closes-Bug: #1899402
    Change-Id: I9e78dc3f2623521185b7c6605015490ff0e09098
    Signed-off-by: YuehuiLei<leiyuehui@inspur.com>

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-14: Change abandoned on config (f/centos8)

#53

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793696

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-14:

#54

Change abandoned by "Chuck Short <email address hidden>" on branch: f/centos8
Review: https://review.opendev.org/c/starlingx/config/+/793460

Changed in starlingx:
importance:	Undecided → Critical
status:	New → Triaged
tags:	added: stx.5.0 stx.apps

StarlingX

Stx-openstack apply-fail after swact standby controller, lock, unlock standby controller

Bug Description

CVE References

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Changed in starlingx:
assignee:	nobody → Gustavo Santos (gooshtavow)