cloud-init on EC2 trying to resolve wrong name for IMDS

Bug #2039723 reported by Tired Sysadmin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
cloud-init (Ubuntu)
New
Undecided
Unassigned

Bug Description

Below is using cloud-init 23.2.2-0ubuntu0~22.04.1, although the relevant code in upstream has not changed.

The metadata_urls in cloud-init/cloudinit/sources/DataSourceEc2.py consists of

    "http://169.254.169.254",
    "http://[fd00:ec2::254]",
    "http://instance-data.:8773",

The hostname of the last entry is intended to be provided by the AWS local-to-the-VPC DNS server, and always returns 169.254.169.254. HOWEVER, the name as given in the list above "instance-data." is trying to do the "hostnames ending in a '.' are fully qualified" thing, but in fact that name in AWS is not fully qualified. Instead, it requires the AWS region-specific local domain be appended:

[This is on an EC2 instance in a 10.37.64.0/22 network. Thus, the AWS DNS server is at 10.37.64.2.]

$ resolvectl status
Global
       Protocols: -LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
resolv.conf mode: stub

Link 2 (eth0)
    Current Scopes: DNS
         Protocols: +DefaultRoute +LLMNR -mDNS -DNSOverTLS DNSSEC=no/unsupported
Current DNS Server: 10.37.64.2
       DNS Servers: 10.37.64.2
        DNS Domain: us-east-2.compute.internal

$ resolvectl query www.google.com.
www.google.com.: 2607:f8b0:4009:81b::2004 -- link: eth0
                 142.250.191.228 -- link: eth0
                 (www.google.com)

-- Information acquired via protocol DNS in 3.3ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network

$ resolvectl query instance-data
instance-data: 169.254.169.254 -- link: eth0
               (instance-data.us-east-2.compute.internal)

-- Information acquired via protocol DNS in 24.8ms.
-- Data is authenticated: no; Data was acquired via local or encrypted transport: no
-- Data from: network

$ resolvectl query instance-data.
instance-data.: resolve call failed: No appropriate name servers or networks for name found

The three queries show that appending a '.' to a fully-qualified name is correctly resolved, but that the specific case of "instance-data" is not a fully-qualified name.

This results in failed lookups and UrlErrors bubbling up to the cloud-init.log:

2023-10-11 03:39:59,555 - util.py[DEBUG]: Resolving URL: http://169.254.169.254 took 0.001 seconds
2023-10-11 03:39:59,555 - util.py[DEBUG]: Resolving URL: http://[fd00:ec2::254] took 0.000 seconds
2023-10-11 03:39:59,555 - util.py[DEBUG]: Resolving URL: http://instance-data.:8773 took 0.000 seconds
2023-10-11 03:39:59,555 - DataSourceEc2.py[DEBUG]: Removed the following from metadata urls: ['http://instance-data.:8773']
2023-10-11 03:39:59,556 - DataSourceEc2.py[DEBUG]: Fetching Ec2 IMDSv2 API Token
2023-10-11 03:39:59,556 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/latest/api/token' with {'url': 'http://169.254.169.254/latest/api/token', 'stream': False, 'allow_redirects': True, 'method': 'PUT', 'timeout': 50.0, 'headers': {'User-Agent': 'Cloud-Init/23.2.2-0ubuntu0~22.04.1', 'X-aws-ec2-metadata-token-ttl-seconds': 'REDACTED'}} configuration
2023-10-11 03:39:59,707 - url_helper.py[DEBUG]: [0/1] open 'http://[fd00:ec2::254]/latest/api/token' with {'url': 'http://[fd00:ec2::254]/latest/api/token', 'stream': False, 'allow_redirects': True, 'method': 'PUT', 'timeout': 50.0, 'headers': {'User-Agent': 'Cloud-Init/23.2.2-0ubuntu0~22.04.1', 'X-aws-ec2-metadata-token-ttl-seconds': 'REDACTED'}} configuration
2023-10-11 03:40:49,557 - url_helper.py[WARNING]: Calling 'None' failed [50/120s]: unexpected error [sequence item 0: expected str instance, UrlError found]
2023-10-11 03:40:49,557 - url_helper.py[DEBUG]: Please wait 1 seconds while we wait to try again
2023-10-11 03:40:50,559 - url_helper.py[DEBUG]: [0/1] open 'http://169.254.169.254/latest/api/token' with {'url': 'http://169.254.169.254/latest/api/token', 'stream': False, 'allow_redirects': True, 'method': 'PUT', 'timeout': 68.0, 'headers': {'User-Agent': 'Cloud-Init/23.2.2-0ubuntu0~22.04.1', 'X-aws-ec2-metadata-token-ttl-seconds': 'REDACTED'}} configuration
2023-10-11 03:40:50,712 - url_helper.py[DEBUG]: [0/1] open 'http://[fd00:ec2::254]/latest/api/token' with {'url': 'http://[fd00:ec2::254]/latest/api/token', 'stream': False, 'allow_redirects': True, 'method': 'PUT', 'timeout': 68.0, 'headers': {'User-Agent': 'Cloud-Init/23.2.2-0ubuntu0~22.04.1', 'X-aws-ec2-metadata-token-ttl-seconds': 'REDACTED'}} configuration
2023-10-11 03:41:58,562 - url_helper.py[WARNING]: Calling 'None' failed [119/120s]: unexpected error [sequence item 0: expected str instance, UrlError found]

Note that it almost certainly won't actually connect to port 8773, but it should at least be able to resolve the hostname. The DataSourceEc2 class will filter out metadata servers that don't resolve, but it feels like "instance-data" is being dropped from consideration unnecessarily.

(I wonder if this was originally done to support EC2-Classic? Detection of Classic instances is handled elsewhere, and AWS dropped supported for Classic networking in 2022 having migrated all such instances to a VPC. So if "instance-data." is a remnant of that era, it should be migrated also by removing the trailing dot.)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.