Add virtual host style bucket support

Bug #1871745 reported by Andrey Grebennikov
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ceph RADOS Gateway Charm
Triaged
Wishlist
Unassigned

Bug Description

The S3 API provides two mechanisms of accessing buckets; path style requests and virtual host style requests. The charm today supports the path style requests but does not support the virtual host style requests out of the box.

One option to overcome this today is to use the charm's config-flags and specify the "rgw resolve cname" and "rgw dns name" options.

An example of such is:

$ juju config ceph-radosgw config-flags="{'global': {'rgw resolve cname': 'true', 'rgw dns name': 's3.project.serverstack'}}"

One use case for this is to provide static s3 website hosting.

======== Previous Description ==========
In order for the RadosGW to properly work out S3 requests via HTTPS, the config should contain following options:

rgw resolve cname = true
rgw dns name = ceph-s3.com << DNS name of the S3 endpoint

Currently the charm doesn't support these settings.

https://www.redhat.com/en/blog/https-ization-ceph-object-storage-public-endpoint

Changed in charm-ceph-radosgw:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

I took a look at this in detail and here is what I found:

There are 2 styles of S3 endpoints that RadosGW supports mimicking AWS:

* path style (marked as preferred in the rgw docs) https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html#path-style-access
  * https://rgw.example.com/<bucket-name>/<key-name>
* virtualhost style https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html#virtual-hosted-style-access
  * https://<bucket-name>.rgw.example.com/<key-name>

Additionally, there are website endpoints which optionally add another component to an FQDN of a bucket:

https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteEndpoints.html
http://<bucket-name>.<custom-s3-website-name>.rgw.example.com

This is also supported in radosgw:
https://github.com/ceph/ceph/blob/v16.0.0/src/common/legacy_config_opts.h#L1288

The options described affect the way radosgw looks up a bucket based on request parameters so certain lookup code paths are ignored if the required config is not set.

So I think there are several distinct items here:

1) set rgw_dns_name correctly:

ceph.conf needs the following to make this work:

rgw dns name = {{ os_public_hostname }}

NOTE: a single hostname is taken from rgw_dns_name config, other hostnames are taken from zone_group hostnames config which can be added dynamically for a zone group (maybe we can use this to include os_internal_hostname and os_admin_hostname too):
https://github.com/ceph/ceph/blob/v16.0.0/src/rgw/rgw_rest.cc#L209-L210

"rgw dns name" will automatically be included into a zone group's "hostnames" config upon a restart:
https://docs.ceph.com/docs/master/radosgw/multisite/#set-a-zone-group

2) Attempt to resolve the "Hostname" parameter in an HTTP request as a CNAME record (support using "vanity domains" and possibly handle multi-site failover by creating CNAME RRs named after a remote site but pointing to the local site):

rgw resolve cname = true in ceph.conf;

For example:

CNAME RR: { NAME: the-best-bucket.example, RDATA: bucket-42.rgw.example }

An HTTP client request will contain "Hostname: the-best-bucket.example", radosgw will resolve it to "bucket-42.rgw.example" via the CNAME record and find the right bucket.

https://github.com/ceph/ceph/blob/v16.0.0/src/rgw/rgw_rest.cc#L2082-L2098

3) support the vhost style of addressing S3 buckets

GET / HTTP/1.1
Host: bucket-key.rgw.example.com

vs

GET /bucket-key HTTP/1.1
Host: rgw.example.com

For that, there is more work to do:

* request wildcard certificates from vault for a subdomain such as *.rgw.example.com or *.<os-public-hostname>;
* document that the wildcard certificates are needed when ssl_ca option is used instead of Vault;
* validate, functionally test and document the steps so that this works in a multi-site replication configuration.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Based on prior discussions we are going to implement the path-based approach only for now.

Implementation considerations:

* Deployments where os-{public,admin,internal} hostnames are not set are supported and this situation needs to be handled too. "rgw_dns_name" config does not support IP addresses or multiple hostnames;

* Both single-unit and HA deployments are supported (where units have different hostnames and can be multi-homed where each IP has its own hostname);

* not every environment has a proper DNS setup but the ones we test and deploy in mostly do (MAAS, OpenStack with ML2 DNS);

* there are multiple hostname config options supported by the charm and all of them should be added to the zone group after it is created;

* "rgw_dns_name" is only added to the in-memory set of hostnames that a given radosgw daemon looks at, not to the zone group config: https://github.com/ceph/ceph/blob/v16.0.0/src/rgw/rgw_rest.cc#L209-L210

This means that if we want to manage multiple hostnames in the zone group state (public, internal, admin) we need to add all of them and not just rely on config.

Whether the addition of hostnames to the zone group is idempotent is to be determined (required for error handling and deciding on which unit is going to manage the zone group config).

* One config option needs to be added to the charm (resolve-cname=<true|false>);

* The following approach for handling hostnames could be adopted:

1. # obtain unit_ip_in_the_public_network_space via network-get and do a reverse resolution
# unit_public_hostname = socket.getnameinfo((unit_ip_in_the_public_network_space, 0), 0)[0]
rgw dns name = {{ unit_public_hostname }}

2. add the values of os-{public,admin,internal}-hostname config keys into the zone group specified in the "zone-group" charm config (when it is created).

Testing considerations:

1. A scenario without HA:

* deploy a bundle with a single unit of ceph-radosgw and tls enabled (via vault) and the resolve-cname configuration option set to "true";
* create a bucket via the S3 API;
* read a hostname from a radosgw unit (via `hostname -f` over juju run) and access the bucket using the hostname of the radosgw unit: https://<unit-hostname-in-the-public-space>/bucket-name;

2. The HA scenario:

* deploy a bundle with 3 units of ceph-radosgw and hacluster with tls enabled (via vault), the resolve-cname configuration option set to "true" and os-public-hostname set to some value (needs DNS configuration which is tricky in the CI environment. Alternatively modifying /etc/hosts on units in the model via the Zaza test code could be done);
* create a bucket via the S3 API;
* read the hostname from a radosgw unit (via hostname -f over juju run) and access the bucket using a hostname of a single unit in a URL: https://<unit-hostname-in-the-public-space>/bucket-name;
* access the bucket via os-public-hostname used in the URL: https://<os-public-hostname>/bucket-name;

Revision history for this message
Billy Olsen (billy-olsen) wrote :

I have deployed a 3-node radosgw cluster with tls configured and I am able to create buckets, etc via the s3mcd just fine.

I believe the s3cfg file just needs to be properly setup in order to use the s3 interface to a cluster with https configured.

Since the charm doesn't setup the radosgw to handle the hostname bucket access method (e.g. %(bucket)s.s3.example.com), the client will not be able to access it this way. If using the s3cmd client, this can be handled by configuring the host_bucket parameter to not use the %(bucket)s in the configuration, e.g.:

host_base = s3.project.serverstack
host_bucket = s3.project.serverstack

Host buckets (bucket names as part of the domain name) are interesting and should be considered for a future improvement. I'm going to update the title and description of the bug to make this a bit more clear.

However, to be clear - the s3 access w/ tls enabled does work against a charmed ceph-radosgw deployment.

summary: - The charm needs to support proper https/s3 settings
+ Add virtual host style bucket support
Revision history for this message
Billy Olsen (billy-olsen) wrote :

For future should s3 website hosting come up, https://gist.github.com/robbat2/ec0a66eed28e5f0e1ef7018e9c77910c

description: updated
Revision history for this message
Tom Haddon (mthaddon) wrote :

IS would be interested in this as well, as there are a number of applications (e.g. Discourse) which require this. We can use S3 for now but it would be nice to be able to use radosgw.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.