cloud-init on EC2 trying to resolve wrong name for IMDS
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Below is using cloud-init 23.2.2-
The metadata_urls in cloud-init/
"http://
"http://[fd00:ec2::254]",
"http://
The hostname of the last entry is intended to be provided by the AWS local-to-the-VPC DNS server, and always returns 169.254.169.254. HOWEVER, the name as given in the list above "instance-data." is trying to do the "hostnames ending in a '.' are fully qualified" thing, but in fact that name in AWS is not fully qualified. Instead, it requires the AWS region-specific local domain be appended:
[This is on an EC2 instance in a 10.37.64.0/22 network. Thus, the AWS DNS server is at 10.37.64.2.]
$ resolvectl status
Global
Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=
resolv.conf mode: stub
Link 2 (eth0)
Current Scopes: DNS
Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=
Current DNS Server: 10.37.64.2
DNS Servers: 10.37.64.2
DNS Domain: us-east-
$ resolvectl query www.google.com.
www.google.com.: 2607:f8b0:
-- Information acquired via protocol DNS in 3.3ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
$ resolvectl query instance-data
instance-data: 169.254.169.254 -- link: eth0
-- Information acquired via protocol DNS in 24.8ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network
$ resolvectl query instance-data.
instance-data.: resolve call failed: No appropriate name servers or networks for name found
The three queries show that appending a '.' to a fully-qualified name is correctly resolved, but that the specific case of "instance-data" is not a fully-qualified name.
This results in failed lookups and UrlErrors bubbling up to the cloud-init.log:
2023-10-11 03:39:59,555 - util.py[DEBUG]: Resolving URL: http://
2023-10-11 03:39:59,555 - util.py[DEBUG]: Resolving URL: http://[fd00:ec2::254] took 0.000 seconds
2023-10-11 03:39:59,555 - util.py[DEBUG]: Resolving URL: http://
2023-10-11 03:39:59,555 - DataSourceEc2.
2023-10-11 03:39:59,556 - DataSourceEc2.
2023-10-11 03:39:59,556 - url_helper.
2023-10-11 03:39:59,707 - url_helper.
2023-10-11 03:40:49,557 - url_helper.
2023-10-11 03:40:49,557 - url_helper.
2023-10-11 03:40:50,559 - url_helper.
2023-10-11 03:40:50,712 - url_helper.
2023-10-11 03:41:58,562 - url_helper.
Note that it almost certainly won't actually connect to port 8773, but it should at least be able to resolve the hostname. The DataSourceEc2 class will filter out metadata servers that don't resolve, but it feels like "instance-data" is being dropped from consideration unnecessarily.
(I wonder if this was originally done to support EC2-Classic? Detection of Classic instances is handled elsewhere, and AWS dropped supported for Classic networking in 2022 having migrated all such instances to a VPC. So if "instance-data." is a remnant of that era, it should be migrated also by removing the trailing dot.)