Intervals should be staggered
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Landscape Client |
Fix Committed
|
High
|
Simon Poirier | ||
landscape-client (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Xenial |
Fix Released
|
Undecided
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Disco |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
[Impact]
* Restarting hosts/clouds of multiple VMs or containers managed by
landscape leads to synchronized monitoring tasks. This can generate
significant recurring load.
* The fix adds a configurable randomized interval to scheduled tasks.
[Test Case]
sudo landscape-config --stagger-launch 0.5 --log-level debug
# accept client in server interface
sudo tail -f /var/log/
# Check for debug log entries mentioning monitors getting delayed
[Regression Potential]
* Although changing the scheduler for non-fixed loop intervals could
have been slightly better at avoid load peaks on shared hosts, the
current approach has been deemed safer as it simply adds an initial
delay to tasks. The current logic of scheduling is unchanged.
* External scripts expecting monitors to run instantly on
landscape-client restart could be affected. I'm not aware of
any such case, as the async nature of the code never made that
guarantee.
* The stagger interval is configurable with a flag and a configuration
entry to work around regressions.
[Original Description]
The various intervals appear to be statically determined from program start. As a practical example, a rebooted VM host with a large number of guests means all the guests start landscape-client at about the same time. Then every 30 minutes, package-monitor runs (which is a rather heavy operation). Multiply by a hundred or so and you've got a large flood on the host.
When the client starts up, it would be good to have the initial schedule for each timer be ((interval / 2) + random(interval)), so the collective load is spread out over the entire interval.
Related branches
- Andreas Hasenack (community): Approve
- git-ubuntu developers: Pending requested
-
Diff: 377 lines (+326/-1)6 files modifieddebian/changelog (+12/-0)
debian/landscape-client.init (+2/-1)
debian/landscape-client.postinst (+3/-0)
debian/patches/product-name-vminfo-1828217.patch (+49/-0)
debian/patches/series (+2/-0)
debian/patches/stagger-launch-1788518.patch (+258/-0)
- Andreas Hasenack (community): Approve
- Canonical Server: Pending requested
-
Diff: 338 lines (+287/-1)6 files modifieddebian/changelog (+12/-0)
debian/landscape-client.init (+2/-1)
debian/landscape-client.postinst (+3/-0)
debian/patches/product-name-vminfo-1828217.patch (+49/-0)
debian/patches/series (+2/-0)
debian/patches/stagger-launch-1788518.patch (+219/-0)
- Andreas Hasenack (community): Approve
- git-ubuntu developers: Pending requested
-
Diff: 377 lines (+326/-1)6 files modifieddebian/changelog (+12/-0)
debian/landscape-client.init (+2/-1)
debian/landscape-client.postinst (+3/-0)
debian/patches/product-name-vminfo-1828217.patch (+49/-0)
debian/patches/series (+2/-0)
debian/patches/stagger-launch-1788518.patch (+258/-0)
- Andreas Hasenack (community): Approve
- Canonical Server: Pending requested
- git-ubuntu developers: Pending requested
-
Diff: 377 lines (+326/-1)6 files modifieddebian/changelog (+12/-0)
debian/landscape-client.init (+2/-1)
debian/landscape-client.postinst (+3/-0)
debian/patches/product-name-vminfo-1828217.patch (+49/-0)
debian/patches/series (+2/-0)
debian/patches/stagger-launch-1788518.patch (+258/-0)
Changed in landscape-client: | |
importance: | Undecided → High |
tags: | added: is |
no longer affects: | ubuntu |
Changed in landscape-client: | |
status: | New → Triaged |
Changed in landscape-client: | |
assignee: | nobody → Simon Poirier (simpoir) |
status: | Triaged → In Progress |
Changed in landscape-client: | |
status: | In Progress → Fix Committed |
description: | updated |
Changed in landscape-client (Ubuntu): | |
status: | New → Fix Released |
In addition to (or instead of) spreading the initial schedule, it would be beneficial to +/- a random percentage (e.g. between -10% and +10%) to the next interval on each run. This means that instead of running at hard intervals, the runs become skewed but remain at the given interval on average. This avoids synchronisation and consistent load spikes if the initial offset results in it matching another fixed processing schedule (e.g. from a cronjob). The same applies with load distribution on the server side (if applicable).