Intervals should be staggered

Bug #1788518 reported by Ryan Finnie
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Landscape Client
Fix Committed
High
Simon Poirier
landscape-client (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Unassigned
Bionic
Fix Released
Undecided
Unassigned
Disco
Fix Released
Undecided
Unassigned

Bug Description

[Impact]

 * Restarting hosts/clouds of multiple VMs or containers managed by
   landscape leads to synchronized monitoring tasks. This can generate
   significant recurring load.

 * The fix adds a configurable randomized interval to scheduled tasks.

[Test Case]

sudo landscape-config --stagger-launch 0.5 --log-level debug
# accept client in server interface
sudo tail -f /var/log/landscape/monitor.log
# Check for debug log entries mentioning monitors getting delayed

[Regression Potential]

 * Although changing the scheduler for non-fixed loop intervals could
   have been slightly better at avoid load peaks on shared hosts, the
   current approach has been deemed safer as it simply adds an initial
   delay to tasks. The current logic of scheduling is unchanged.

 * External scripts expecting monitors to run instantly on
   landscape-client restart could be affected. I'm not aware of
   any such case, as the async nature of the code never made that
   guarantee.

 * The stagger interval is configurable with a flag and a configuration
   entry to work around regressions.

[Original Description]

The various intervals appear to be statically determined from program start. As a practical example, a rebooted VM host with a large number of guests means all the guests start landscape-client at about the same time. Then every 30 minutes, package-monitor runs (which is a rather heavy operation). Multiply by a hundred or so and you've got a large flood on the host.

When the client starts up, it would be good to have the initial schedule for each timer be ((interval / 2) + random(interval)), so the collective load is spread out over the entire interval.

Related branches

Changed in landscape-client:
importance: Undecided → High
tags: added: is
Revision history for this message
Joel Sing (jsing) wrote :

In addition to (or instead of) spreading the initial schedule, it would be beneficial to +/- a random percentage (e.g. between -10% and +10%) to the next interval on each run. This means that instead of running at hard intervals, the runs become skewed but remain at the given interval on average. This avoids synchronisation and consistent load spikes if the initial offset results in it matching another fixed processing schedule (e.g. from a cronjob). The same applies with load distribution on the server side (if applicable).

no longer affects: ubuntu
Changed in landscape-client:
status: New → Triaged
Revision history for this message
Ryan Finnie (fo0bar) wrote :

The attached patch spreads out the various reactor calls +/- 10% and was used as a cowboy in our use case. It's not production ready in that it should be a configurable option (and would also need to do the initial stagger which I couldn't figure out how to do -- mine works over time, not immediately), but should be a good starting point.

Revision history for this message
Ryan Finnie (fo0bar) wrote :

Sorry, I should clarify this patch does NOT solve the problem over time, since the +/- 10% theoretically averages out to zero over time. What solved our synchronized load spikes was pushing out the patched landscape-client via landscape, so the units updated themselves over an N-hour window and spread out the initial timers.

Simon Poirier (simpoir)
Changed in landscape-client:
assignee: nobody → Simon Poirier (simpoir)
status: Triaged → In Progress
Revision history for this message
Junien Fridrick (axino) wrote :

Hi, what's the progress here ? Maintaining our own landscape-client package is painful.

Thanks

Revision history for this message
Adam Collard (adam-collard) wrote :

Hi Junien,

There's a branch in review for this at https://github.com/CanonicalLtd/landscape-client/pull/63

As Ryan noted, the patch he attached will tend towards all the landscape-clients averaging out to the same intervals.

Any feedback you have on the above PR is welcome.

Simon Poirier (simpoir)
Changed in landscape-client:
status: In Progress → Fix Committed
Simon Poirier (simpoir)
description: updated
Changed in landscape-client (Ubuntu):
status: New → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Ryan, or anyone else affected,

Accepted landscape-client into disco-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/landscape-client/18.01-0ubuntu7.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-disco to verification-done-disco. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-disco. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in landscape-client (Ubuntu Disco):
status: New → Fix Committed
tags: added: verification-needed verification-needed-disco
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Ryan, or anyone else affected,

Accepted landscape-client into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/landscape-client/18.01-0ubuntu3.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in landscape-client (Ubuntu Bionic):
status: New → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Simon Poirier (simpoir) wrote :

Verified bionic and disco proposed, by setting launching a couple of instances and verifying the monitors are scattered in the logs.

tags: added: verification-done-bionic verification-done-disco
removed: verification-needed-bionic verification-needed-disco
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Ryan, or anyone else affected,

Accepted landscape-client into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/landscape-client/16.03-0ubuntu2.16.04.7 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-xenial to verification-done-xenial. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-xenial. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in landscape-client (Ubuntu Xenial):
status: New → Fix Committed
tags: added: verification-needed-xenial
Revision history for this message
Simon Poirier (simpoir) wrote :

I verified the pacakges in xenial-proposed and the logs indicated scattered start delays, as expected.

tags: added: verification-done verification-done-xenial
removed: verification-needed verification-needed-xenial
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Update Released

The verification of the Stable Release Update for landscape-client has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 18.01-0ubuntu3.4

---------------
landscape-client (18.01-0ubuntu3.4) bionic; urgency=medium

  * d/p/product-name-vminfo-1828217.patch: Add product_name to things scanned
    for vm_info (LP: #1828217)
  * d/landscape-client.postinst: Set default value if data_path is
    missing. (LP: #1728681)
  * d/p/stagger-launch-1788518.patch: Add option to stagger launch of broker
    plugins. (LP: #1788518)
  * d/landscape-client.init: Fix init script stop action (LP: #1833137)

 -- Simon Poirier <email address hidden> Thu, 27 Jun 2019 11:07:30 -0400

Changed in landscape-client (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 18.01-0ubuntu7.1

---------------
landscape-client (18.01-0ubuntu7.1) disco; urgency=medium

  * d/p/product-name-vminfo-1828217.patch: Add product_name to things scanned
    for vm_info (LP: #1828217)
  * d/landscape-client.postinst: Set default value if data_path is
    missing. (LP: #1728681)
  * d/p/stagger-launch-1788518.patch: Add option to stagger launch of broker
    plugins. (LP: #1788518)
  * d/landscape-client.init: Fix init script stop action (LP: #1833137)

 -- Simon Poirier <email address hidden> Thu, 27 Jun 2019 11:07:30 -0400

Changed in landscape-client (Ubuntu Disco):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package landscape-client - 16.03-0ubuntu2.16.04.7

---------------
landscape-client (16.03-0ubuntu2.16.04.7) xenial; urgency=medium

  * d/p/product-name-vminfo-1828217.patch: Add product_name to things scanned
    for vm_info (LP: #1828217)
  * d/landscape-client.postinst: Set default value if data_path is
    missing. (LP: #1728681)
  * d/p/stagger-launch-1788518.patch: Add option to stagger launch of broker
    plugins. (LP: #1788518)
  * d/landscape-client.init: Fix init script stop action (LP: #1833137)

 -- Simon Poirier <email address hidden> Fri, 28 Jun 2019 12:18:32 -0400

Changed in landscape-client (Ubuntu Xenial):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.