Download large objects concurrently
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
python-swiftclient |
Confirmed
|
Wishlist
|
Unassigned |
Bug Description
When uploading a segmented object, we use concurrent connections to boost throughput. We should do something similar during downloads. There are at least two ways we could do this:
Option 1: we could do something like what we do for --skip-identical -- tack on a ?multipart-
Option 2: issue the GET normally, then check the content-length of the object that we're getting. If it's larger than some threshold (1GB? probably want it configurable) then issue several ranged GETs, and close the original connection after reading the first "segment". Note that this segment size may not line up with the segment size used during upload.
The second option has the benefit of not needing any special knowledge about the object, which means (1) it should work for any new types of large objects we might come up with and (2) it may be beneficial even for regular objects, as the subsequent connections may pull from different replicas. However, it will litter operators' swift logs with "Client disconnected" warnings.
I don't think the Range request approach is likely to yield much results - the backend isn't doing anything to load balance the disks servicing the requests - if you make three requests you're just as likely to trounce on yourself as hit a different replica on a new disk.
If swiftclient wants to download the entirety of a SLO it seems not unreasonable to me that on a GET response with a Static-Large-Object header we could consider the potential tradeoff to close the current connection and instead download each segment into the gaps of a preallocated sparse file.