Time To SSH Regression

Bug #2039505 reported by Brett Holman
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
Fix Committed
Critical
Brett Holman

Bug Description

=== Begin SRU Template ===
[Impact]
In 23.3.1, system unit ordering changes moved the configuration Before=systemd-user-sessions.service out of the earlier cloud-init.service boot stage and into the subsequent cloud-config.service boot stage to ensure all users have been configured before providing a login prompt on the console. This was originally intended to fix LP: #2013403 for emulated riscv environments, but the general time to SSH/login cost is too great at this stage to leave the impact active in most images where snap seeding is being perfomed on first boot.

Leaving this current extended delay to login unresolved also breaks tooling which uses `uvt-kvm wait` which is present in some continuous integration testing.

The fix is to revert https://github.com/canonical/cloud-init/commit/b3c9b6a7.

[Test Case]
Launch a daily image and a daily-image with proposed cloud-init and compare the following data points:

  - time to SSH: number of retries required to successfully SSH into the VM
  - validate time to ssh by sampling: systemd-analyze critical-chain systemd-user-sessions.service to see total time until login was unblocked from systemd perspective
  - systemctl show -p Before,After cloud-config.service cloud-init.service --no-pager
  - systemd-analyze blame: # validate systemd-user-sessions is not one of the longest blocks to boot

[Regression Potential]
This is a revert to functionality that was working for years. It will regress emulated riscV users per LP: #2013403 as they may be able to see a login prompt that will show up before cloud-config completes and could reject their correct configured password as invalid until the cloud-config.service completes setup on first boot.

[Other info]
LP: #2013403
LP: #2039441

[Original Description]

Affected version: 23.3

Commit b3c9b6a7 introduced a dependency on snapd.seeded.service for systemd-user-sessions.service. This adds a 9.65s delay to user login (ssh or local tty) which was missed during performance testing[2] due to using an image that had already seeded snapd (testing on a dirty image re-run via `cloud-init clean`).

This was discovered while investigating LP: #2039441.

[1] https://github.com/canonical/cloud-init/commit/b3c9b6a79c85ebc8c87908383c34b0314c2205b6
[2] https://github.com/canonical/cloud-init/pull/2111#issuecomment-1616634930

Chad Smith (chad.smith)
Changed in cloud-init (Ubuntu):
status: New → Fix Committed
Revision history for this message
Chad Smith (chad.smith) wrote :

Downstream commit[1] for Ubuntu merged which reverts this change. It will be released as cloud-init version 23.3.2-0ubuntu0~23.10.1.

[1] https://github.com/canonical/cloud-init/commit/052d898023fbd6f7d87338e31f6cca6535cccef7

When cloud-init merges the ability to avoid snap.seeded.service costs in cloud-config.service this change will be re-applied.

Changed in cloud-init (Ubuntu):
importance: Undecided → High
importance: High → Critical
assignee: nobody → Brett Holman (holmanb)
Chad Smith (chad.smith)
description: updated
Chad Smith (chad.smith)
description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.