TLS-E re-deployments slow down linearly by 1-2 minutes at every redeploy with no changes

Bug #1923468 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Medium
Unassigned

Bug Description

In our master dev env for BGP we noticed that redeploy times increased over time even with no changes. To verify this I let a for loop deploy command run over the weekend:
- Started with
real 53m59.700s
user 25m39.266s
sys 12m28.188s

- After ~44 deploys we are at
real 93m21.663s
user 53m34.944s
sys 12m46.968s

Roughly we seem to be increasing deployment time constantly by about ~1 minute (or more) on average.
So every deployment is one minute slower than the previous one.

Initially I suspected heat but quickly ruled that out, since the first deployment took ~18minutes in heat:
deploy-log-1.log:2021-04-10 17:03:22.647 37082 WARNING tripleoclient.v1.overcloud_deploy.DeployOvercloud [-] WARNING: Nova endpoint not available. Assuming --deployed-server: keystoneauth1.exceptions.catalog.EndpointNotFound: public endpoint for compute service not found
deploy-log-1.log:2021-04-10 17:21:40Z [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully

And so did the 32nd deployment (18min 36sec):
deploy-log-32.log:2021-04-12 08:09:39.279 326159 WARNING tripleoclient.v1.overcloud_deploy.DeployOvercloud [-] WARNING: Nova endpoint not available. Assuming --deployed-server: keystoneauth1.exceptions.catalog.EndpointNotFound: public endpoint for compute service not found
deploy-log-32.log:2021-04-12 08:28:15Z [overcloud]: UPDATE_COMPLETE Stack UPDATE completed successfully

It seems to me that one of the non-constant tasks is the tripleo_ipa_registration (we have 9 nodes in the tls-e overcloud). In fact during the first deployment those tasks took slightly more than one second:
deploy-log-1.log:2021-04-10 17:27:29.239801 | 525400fc-0754-896c-4655-000000008f30 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:04:16.084463 | 1.10s
deploy-log-1.log:2021-04-10 17:27:38.621757 | 525400fc-0754-896c-4655-000000008f66 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:04:25.466413 | 1.16s
deploy-log-1.log:2021-04-10 17:27:46.969275 | 525400fc-0754-896c-4655-000000008f9c | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:04:33.813940 | 1.04s
deploy-log-1.log:2021-04-10 17:27:55.205791 | 525400fc-0754-896c-4655-000000008fd2 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:04:42.050446 | 1.10s
deploy-log-1.log:2021-04-10 17:28:04.534810 | 525400fc-0754-896c-4655-000000009008 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:04:51.379475 | 1.21s
deploy-log-1.log:2021-04-10 17:28:13.127911 | 525400fc-0754-896c-4655-00000000903e | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:04:59.972576 | 1.33s
deploy-log-1.log:2021-04-10 17:28:21.710408 | 525400fc-0754-896c-4655-000000009074 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:05:08.555068 | 1.21s
deploy-log-1.log:2021-04-10 17:28:45.781958 | 525400fc-0754-896c-4655-0000000090aa | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:05:32.626613 | 1.32s
deploy-log-1.log:2021-04-10 17:29:10.627339 | 525400fc-0754-896c-4655-0000000090e0 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:05:57.472003 | 1.57s

But during the 32nd run we see it is a lot slower:
deploy-log-32.log:2021-04-12 08:35:17.888455 | 525400fc-0754-c315-6286-000000008edd | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:05:26.714234 | 67.46s
deploy-log-32.log:2021-04-12 08:36:35.054186 | 525400fc-0754-c315-6286-000000008f13 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:06:43.879965 | 69.11s
deploy-log-32.log:2021-04-12 08:37:50.542878 | 525400fc-0754-c315-6286-000000008f49 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:07:59.368654 | 68.56s
deploy-log-32.log:2021-04-12 08:39:06.560776 | 525400fc-0754-c315-6286-000000008f7f | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:09:15.386554 | 69.00s
deploy-log-32.log:2021-04-12 08:40:23.704068 | 525400fc-0754-c315-6286-000000008fb5 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:10:32.529827 | 69.96s
deploy-log-32.log:2021-04-12 08:41:41.123423 | 525400fc-0754-c315-6286-000000008feb | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:11:49.949190 | 70.11s
deploy-log-32.log:2021-04-12 08:42:58.608076 | 525400fc-0754-c315-6286-000000009021 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:13:07.433841 | 70.08s
deploy-log-32.log:2021-04-12 08:44:32.637128 | 525400fc-0754-c315-6286-000000009057 | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:14:41.462907 | 71.09s
deploy-log-32.log:2021-04-12 08:46:06.608770 | 525400fc-0754-c315-6286-00000000908d | TIMING | tripleo_ipa_registration : get host raw data and keytab info | undercloud | 0:16:15.434548 | 70.95s

I checked and indeed that seems to be the only task that keeps increasing in time at every redeployment without changes. And since it is an external task and hence ran serially it exacerbates the slowdown.

Revision history for this message
Ade Lee (alee-3) wrote :

The operation that is being done there is to get host data to confirm whether or not the host is registered and if it already has a keytab.

I suspect that because we're issuing more certs each time, the amount of data that is returned is increasing and that this is increasing the response time from IPA. Verifying this on Michele's system now.

But certainly this can be improved -- we don't need all that data -- in fact, we can probably do a couple of ldap searches to get exactly the attributes and data we need using searches that are already indexed. This could make things much faster - and more consistent. Investigating ..

Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
milestone: xena-1 → xena-2
Changed in tripleo:
milestone: xena-2 → xena-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.