underscores should be stripped from hostnames generated for apt config
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init |
Fix Released
|
High
|
Dan Watkins | ||
cloud-init (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
In a ticket filed in the Ubuntu RT instance we were made aware of an issue where if a cloud is configured with an “_” in the region name, cloud-init will generate an apt configuration that also includes that “_” in the name.
So for example if the region name is zone_01, apt will be configured to use zone_01.
On Friday March 13th we deployed some new archive servers on 18.04 using Apache 2.4.29-1ubuntu4.13. This version of apache has more strict protocol options than previous versions, per https:/
Could cloud-init be updated to remove non-permitted characters including “_” per https:/
Changed in cloud-init: | |
assignee: | nobody → Dan Watkins (daniel-thewatkins) |
status: | Triaged → In Progress |
Changed in cloud-init: | |
status: | In Progress → Fix Committed |
Changed in cloud-init (Ubuntu): | |
status: | New → Fix Committed |
The current behaviour is that cloud-init will use the patterns defined in /etc/cloud/ cloud.cfg:
primary: %(ec2_region) s.ec2.archive. ubuntu. com/ubuntu/ %(availability_ zone)s. clouds. archive. ubuntu. com/ubuntu/ %(region) s.clouds. archive. ubuntu. com/ubuntu/
- http://
- http://
- http://
to determine the mirror to use. In this case, it will be one of the two latter patterns, depending on exactly how the data source in question presents "zone_01". Either way, the problem is the same.
Once this mirror URL is generated, cloud-init tests that it _resolves_ before using it. This is where the problem lies: *.clouds. archive. ubuntu. com will always resolve, but the newly-deployed Apache servers will no longer serve every domain that resolves. Arguably this is a misconfiguration of the archive servers (why resolve something that you can't serve?), but cloud-init should handle this case gracefully regardless.
There are (at least) a couple of ways in which we could address this issue in cloud-init:
(a) rewrite the generated URL (or the variables which we are substituting into the pattern) to only include valid URI characters
(b) modify cloud-init to check that mirrors are accessible via HTTP (rather than simply resolvable)
While both of these would address the immediate issue, only implementing (b) would mean that all instances in such zones would fallback to using archive.ubuntu.com, so I think we should do some form of (a) regardless.
One obvious downside to (b) is that it will introduce an additional HTTP request to each boot on a Debian/Ubuntu host; this could be a concern both from a client boot speed perspective, but perhaps more importantly from a server load perspective. (My gut feel is that the cost in both cases wouldn't be significantly noticeable: most Debian/Ubuntu instances that come up will perform many HTTP requests to the archive hosts, so one additional one isn't likely to be noticed. We should consider this more deeply before we implement this, however.)
(As an aside, we should do some research to confirm that the non-ASCII encoding described in the linked RFC 3986 section won't be affected by our filtering. For example, if we currently rely on the libraries we use to convert non-ASCII hostnames to the defined percent-encoding, then we would regress non-ASCII hostnames by applying a naive filter before we pass the name to those libraries.)