Error when using S3 multipart upload - TypeError: cannot use a string pattern on a bytes-like object

Bug #1930640 reported by Jon Yoon
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Duplicity
Invalid
Medium
Unassigned
duplicity (Ubuntu)
New
Undecided
Unassigned

Bug Description

I followed the solution in https://bugs.launchpad.net/ubuntu/+source/duplicity/+bug/1908971 (I was running duplicity 0.8.11.1612-1 as well), but after upgrading to 0.8.19, the problem still persists for me when trying to use S3 multipart uploads.

Note: This problem doesn't appear to exist in duplicity 0.7.17 running on python2.7, which is making me wonder if this is a python3.8 related issue.

user@host:~$ lsb_release -rd
Description: Ubuntu 20.04.2 LTS
Release: 20.04

user@host:~$ apt-cache policy duplicity
duplicity:
  Installed: 0.8.19-ppa202104291804~ubuntu20.04.1
  Candidate: 0.8.19-ppa202104291804~ubuntu20.04.1
  Version table:
 *** 0.8.19-ppa202104291804~ubuntu20.04.1 500
        500 http://ppa.launchpad.net/duplicity-team/duplicity-release-git/ubuntu focal/main amd64 Packages
        100 /var/lib/dpkg/status
     0.8.11.1612-1 500
        500 http://us-west-2.ec2.archive.ubuntu.com/ubuntu focal/main amd64 Packages

Start duply v2.2, time is 2021-06-02 22:58:11.
Using profile '/root/.duply/artifactory_backup'.
Using installed duplicity version 0.8.19, python 3.8.5 (/usr/bin/python3), gpg 2.2.19 (Home: /root/.gnupg), awk 'GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)', grep 'grep (GNU grep) 3.4', bash '5.0.17(1)-release (x86_64-pc-linux-gnu)'.
Checking TEMP_DIR '/tmp' is a folder and writable (OK)
Test - En/Decryption skipped. (GPG disabled)

--- Start running command PRE at 22:58:11.550 ---
Skipping n/a script '/root/.duply/artifactory_backup/pre'.
--- Finished state OK at 22:58:11.563 - Runtime 00:00:00.013 ---

--- Start running command BKP at 22:58:11.595 ---
Using archive dir: /root/.cache/duplicity/duply_artifactory_backup
Using backup name: duply_artifactory_backup
GPG binary is gpg, version (2, 2, 19)
Import of duplicity.backends.adbackend Succeeded
Import of duplicity.backends.azurebackend Succeeded
Import of duplicity.backends.b2backend Succeeded
Import of duplicity.backends.boxbackend Failed: No module named 'boxsdk'
Import of duplicity.backends.cfbackend Succeeded
Import of duplicity.backends.dpbxbackend Succeeded
Import of duplicity.backends.gdocsbackend Succeeded
Import of duplicity.backends.gdrivebackend Succeeded
Import of duplicity.backends.giobackend Succeeded
Import of duplicity.backends.hsibackend Succeeded
Import of duplicity.backends.hubicbackend Succeeded
Import of duplicity.backends.idrivedbackend Succeeded
Import of duplicity.backends.imapbackend Succeeded
Import of duplicity.backends.jottacloudbackend Succeeded
Import of duplicity.backends.lftpbackend Succeeded
Import of duplicity.backends.localbackend Succeeded
Import of duplicity.backends.mediafirebackend Succeeded
Import of duplicity.backends.megabackend Succeeded
Import of duplicity.backends.megav2backend Succeeded
Import of duplicity.backends.megav3backend Succeeded
Import of duplicity.backends.multibackend Succeeded
Import of duplicity.backends.ncftpbackend Succeeded
Import of duplicity.backends.onedrivebackend Succeeded
Import of duplicity.backends.par2backend Succeeded
Import of duplicity.backends.pcabackend Succeeded
Import of duplicity.backends.pydrivebackend Succeeded
Import of duplicity.backends.rclonebackend Succeeded
Import of duplicity.backends.rsyncbackend Succeeded
Import of duplicity.backends.s3_boto3_backend Succeeded
Multiprocessing is not supported on linux, will use threads instead.
Import of duplicity.backends.s3_boto_backend Succeeded
Import of duplicity.backends.ssh_paramiko_backend Succeeded
Import of duplicity.backends.ssh_pexpect_backend Succeeded
Import of duplicity.backends.swiftbackend Succeeded
Import of duplicity.backends.sxbackend Succeeded
Import of duplicity.backends.tahoebackend Succeeded
Import of duplicity.backends.webdavbackend Succeeded
Setting multipart boto backend process pool to 4 processes
Reading globbing filelist /root/.duply/artifactory_backup/exclude
Main action: inc
Acquiring lockfile b'/root/.cache/duplicity/duply_artifactory_backup/lockfile'
================================================================================
duplicity 0.8.19
Args: /usr/bin/duplicity --name duply_artifactory_backup --no-encryption --verbosity 9 --full-if-older-than 7D --volsize 1024 --s3-use-multiprocessing --s3-multipart-chunk-size 512 --allow-source-mismatch --exclude-filelist /root/.duply/artifactory_backup/exclude /MyDirectory s3://s3-us-west-2.amazonaws.com/BUCKET-NAME/artifactory_backup
Linux ip-172-30-115-92 5.4.0-1047-aws #49-Ubuntu SMP Wed Apr 28 22:47:04 UTC 2021 x86_64 x86_64
/usr/bin/python3 3.8.5 (default, May 27 2021, 13:30:53)
[GCC 9.3.0]

===============

Log errors:

Attempt 4 failed. BackendException: Multipart upload failed. Aborted.
Writing duplicity-full.20210602T225820Z.vol1.difftar.gz
Uploading s3://s3-us-west-2.amazonaws.com/BUCKET-NAME/artifactory_backup/duplicity-full.20210602T225820Z.vol1.difftar.gz to STANDARD Storage
Uploading 1073705932 bytes in 2 chunks
Waiting for the pool to finish processing 2 tasks
Thread-29: Uploading chunk 1
Thread-30: Uploading chunk 2
Thread-30: Upload of chunk 2 failed. Retrying 4 more times...
Thread-30: Uploading chunk 2
Thread-29: Upload of chunk 1 failed. Retrying 4 more times...
Thread-29: Uploading chunk 1
Thread-30: Upload of chunk 2 failed. Retrying 3 more times...
Thread-30: Uploading chunk 2
Thread-29: Upload of chunk 1 failed. Retrying 3 more times...
Thread-29: Uploading chunk 1
Thread-30: Upload of chunk 2 failed. Retrying 2 more times...
Thread-30: Uploading chunk 2
Thread-29: Upload of chunk 1 failed. Retrying 2 more times...
Thread-29: Uploading chunk 1
Thread-30: Upload of chunk 2 failed. Retrying 1 more times...
Thread-30: Uploading chunk 2
Thread-29: Upload of chunk 1 failed. Retrying 1 more times...
Thread-29: Uploading chunk 1
Thread-30: Upload of chunk 2 failed. Retrying 0 more times...
Thread-30: Uploading chunk 2
Thread-29: Upload of chunk 1 failed. Retrying 0 more times...
Thread-29: Uploading chunk 1
Thread-30: Upload of chunk 2 failed. Aborting...
Thread-29: Upload of chunk 1 failed. Aborting...
Part upload not successful, aborting multipart upload.
A process pool already exists. Destroying previous pool.
Setting multipart boto backend process pool to 4 processes
Done waiting for the pool to finish processing
Backtrace of previous error: Traceback (innermost last):
  File "/usr/lib/python3/dist-packages/duplicity/backend.py", line 384, in inner_retry
    return fn(self, *args)
  File "/usr/lib/python3/dist-packages/duplicity/backend.py", line 555, in put
    self.__do_put(source_path, remote_filename)
  File "/usr/lib/python3/dist-packages/duplicity/backend.py", line 541, in __do_put
    self.backend._put(source_path, remote_filename)
  File "/usr/lib/python3/dist-packages/duplicity/backends/_boto_single.py", line 265, in _put
    self.upload(source_path.name, key, headers)
  File "/usr/lib/python3/dist-packages/duplicity/backends/_boto_multi.py", line 192, in upload
    raise BackendException(u"Multipart upload failed. Aborted.")
 duplicity.errors.BackendException: Multipart upload failed. Aborted.

=========

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/duplicity/backends/_boto_multi.py", line 223, in _upload
    mp.upload_part_from_file(fd, offset + 1, cb=_upload_callback,
  File "/usr/lib/python3/dist-packages/boto/s3/multipart.py", line 257, in upload_part_from_file
    key.set_contents_from_file(fp, headers=headers, replace=replace,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 1307, in set_contents_from_file
    self.send_file(fp, headers=headers, cb=cb, num_cb=num_cb,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 760, in send_file
    self._send_file_internal(fp, headers=headers, cb=cb, num_cb=num_cb,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 932, in _send_file_internal
    self.content_type = mimetypes.guess_type(self.path)[0]
  File "/usr/lib/python3.8/mimetypes.py", line 292, in guess_type
    return _db.guess_type(url, strict)
  File "/usr/lib/python3.8/mimetypes.py", line 117, in guess_type
    scheme, url = urllib.parse._splittype(url)
  File "/usr/lib/python3.8/urllib/parse.py", line 1008, in _splittype
    match = _typeprog.match(url)
TypeError: cannot use a string pattern on a bytes-like object

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

It's not in duplicity, but in urllib. Do this to upgrade all modules we use:

    $ sudo python3 -m pip install -U -r requirements.txt

where requirements.txt is from:

    https://git.launchpad.net/duplicity/plain/requirements.txt

Let me know how it goes.

Changed in duplicity (Ubuntu):
status: New → Incomplete
Revision history for this message
Jon Yoon (jony-labkey) wrote :

Unfortunately, the same issue is coming up, even after installing the requirements via pip.

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/duplicity/backends/_boto_multi.py", line 223, in _upload
    mp.upload_part_from_file(fd, offset + 1, cb=_upload_callback,
  File "/usr/lib/python3/dist-packages/boto/s3/multipart.py", line 257, in upload_part_from_file
    key.set_contents_from_file(fp, headers=headers, replace=replace,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 1307, in set_contents_from_file
    self.send_file(fp, headers=headers, cb=cb, num_cb=num_cb,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 760, in send_file
    self._send_file_internal(fp, headers=headers, cb=cb, num_cb=num_cb,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 932, in _send_file_internal
    self.content_type = mimetypes.guess_type(self.path)[0]
  File "/usr/lib/python3.8/mimetypes.py", line 292, in guess_type
    return _db.guess_type(url, strict)
  File "/usr/lib/python3.8/mimetypes.py", line 117, in guess_type
    scheme, url = urllib.parse._splittype(url)
  File "/usr/lib/python3.8/urllib/parse.py", line 1008, in _splittype
    match = _typeprog.match(url)
TypeError: cannot use a string pattern on a bytes-like object

There were some errors in the install of packages via pip, but I don't think any of these would have caused this particular issue:

ERROR: launchpadlib 1.10.13 requires testresources, which is not installed.
ERROR: pyopenssl 20.0.1 has requirement cryptography>=3.2, but you'll have cryptography 2.8 which is incompatible.
ERROR: mediafire 0.6.0 has requirement requests<=2.11.1,>=2.4.1, but you'll have requests 2.25.1 which is incompatible.
ERROR: debtcollector 2.2.0 has requirement pbr!=2.1.0,>=2.0.0, but you'll have pbr 1.10.0 which is incompatible.
ERROR: oslo-i18n 5.0.1 has requirement pbr!=2.1.0,>=2.0.0, but you'll have pbr 1.10.0 which is incompatible.
ERROR: stevedore 3.3.0 has requirement pbr!=2.1.0,>=2.0.0, but you'll have pbr 1.10.0 which is incompatible.
ERROR: oslo-utils 4.9.0 has requirement pbr!=2.1.0,>=2.0.0, but you'll have pbr 1.10.0 which is incompatible.
ERROR: os-service-types 1.7.0 has requirement pbr!=2.1.0,>=2.0.0, but you'll have pbr 1.10.0 which is incompatible.
ERROR: keystoneauth1 4.3.1 has requirement pbr!=2.1.0,>=2.0.0, but you'll have pbr 1.10.0 which is incompatible.
ERROR: oslo-serialization 4.1.0 has requirement pbr!=2.1.0,>=2.0.0, but you'll have pbr 1.10.0 which is incompatible.
ERROR: python-keystoneclient 4.2.0 has requirement pbr!=2.1.0,>=2.0.0, but you'll have pbr 1.10.0 which is incompatible.

Shouldn't it also be urllib3 since the above is in python3.5.8? I do have the urllib3 package:

pip3 install urllib3
Requirement already satisfied: urllib3 in /usr/lib/python3/dist-packages (1.25.8)

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Yes, it should be urllib3. We keep the older boto+s3:// around because it is still needed.

Try using the boto3+s3:// backend instead.

Revision history for this message
Jon Yoon (jony-labkey) wrote :
Download full text (8.5 KiB)

I've tried using boto3+s3:// as well before and it was unsuccessful. Swapping it out with this newer version of duplicity produced an error regarding the HeadBucket operation.

See new failure below. I turned up the verbosity to 9 for this one.

Note: My company is using Duplicity 0.8.11 and Duply in other servers (Servers are running Ubuntu 20.04 w/Python 3.5.8) without issue. However, this particular machine is the only one where we needed to implement multipart chunks due to the size of the backup being so huge compared to our other servers that don't need it.

I know that on a different server (Ubuntu 16.04) running Duplicity 0.7.19 on Python 2.7, the multipart chunking worked without issue, but since we can't run that version of Python or Duplicity on this newer version of Ubuntu, I'm not sure what else we need to do to get 0.8.19 to work here.

====================

Start duply v2.2, time is 2021-06-04 21:26:37.
Using profile '/root/.duply/artifactory_backup'.
Using installed duplicity version 0.8.19, python 3.8.5 (/usr/bin/python3), gpg 2.2.19 (Home: /root/.gnupg), awk 'GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)', grep 'grep (GNU grep) 3.4', bash '5.0.17(1)-rele
ase (x86_64-pc-linux-gnu)'.
Checking TEMP_DIR '/tmp' is a folder and writable (OK)
Test - En/Decryption skipped. (GPG disabled)

--- Start running command PRE at 21:26:37.615 ---
Skipping n/a script '/root/.duply/artifactory_backup/pre'.
--- Finished state OK at 21:26:37.628 - Runtime 00:00:00.012 ---

--- Start running command BKP at 21:26:37.659 ---
Using archive dir: /root/.cache/duplicity/duply_artifactory_backup
Using backup name: duply_artifactory_backup
GPG binary is gpg, version (2, 2, 19)
Import of duplicity.backends.adbackend Succeeded
Import of duplicity.backends.azurebackend Succeeded
Import of duplicity.backends.b2backend Succeeded
Import of duplicity.backends.boxbackend Succeeded
Import of duplicity.backends.cfbackend Succeeded
Import of duplicity.backends.dpbxbackend Succeeded
Import of duplicity.backends.gdocsbackend Succeeded
Import of duplicity.backends.gdrivebackend Succeeded
Import of duplicity.backends.giobackend Succeeded
Import of duplicity.backends.hsibackend Succeeded
Import of duplicity.backends.hubicbackend Succeeded
Import of duplicity.backends.idrivedbackend Succeeded
Import of duplicity.backends.imapbackend Succeeded
Import of duplicity.backends.jottacloudbackend Succeeded
Import of duplicity.backends.lftpbackend Succeeded
Import of duplicity.backends.localbackend Succeeded
Import of duplicity.backends.mediafirebackend Succeeded
Import of duplicity.backends.megabackend Succeeded
Import of duplicity.backends.megav2backend Succeeded
Import of duplicity.backends.megav3backend Succeeded
Import of duplicity.backends.multibackend Succeeded
Import of duplicity.backends.ncftpbackend Succeeded
Import of duplicity.backends.onedrivebackend Succeeded
Import of duplicity.backends.par2backend Succeeded
Import of duplicity.backends.pcabackend Succeeded
Import of duplicity.backends.pydrivebackend Succeeded
Import of duplicity.backends.rclonebackend Succeeded
Import of duplicity.backends.rsyncbackend Succeeded
Import of duplicity.backends.s3_boto3_backe...

Read more...

Changed in duplicity (Ubuntu):
status: Incomplete → In Progress
Changed in duplicity:
status: New → In Progress
assignee: nobody → Kenneth Loafman (kenneth-loafman)
milestone: none → 0.8.20
importance: Undecided → Medium
Changed in duplicity (Ubuntu):
status: In Progress → New
Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Are there any funny or escaped characters in BUCKET_NAME (from the first log/traceback)?

We pass the URL as a unicode string. Somehow boto/urllib thinks its a byte string.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :
Revision history for this message
Jon Yoon (jony-labkey) wrote :

No funny characters in the bucket name.

It's literally just alpha characters and dashes: labkey-artifactory-backup

And the folder it drops into uses an underscore: artifactory_backup

I'll recheck that link. There were a few that I already saw, but were not applicable to my situation.

Revision history for this message
Jon Yoon (jony-labkey) wrote :

After a little more digging, I believe part of my issue with the boto3+s3 option was that I was still using the full URL rather than just the bucket name, so rather than:

boto3+s3://BUCKET_NAME/folder

I was still using:

boto3+s3://s3-us-west-2.amazonaws.com/BUCKET_NAME/folder

So this appears to be working and I'm seeing files show up in S3 and nothing has produced an error so far after six hours (and counting) of backing up.

Although it's still weird that the old S3 URL of s3://s3-us-west-2.amazonaws.com/BUCKET_NAME/folder that works in the 0.7.19 version of Duplicity with Python 2.7 was able to use the old S3 URL, but the new one on 0.8.19 (and 0.8.11) didn't seem to jive with it.

But since the boto3+s3://bucket_name works with the multipart chunking, I'll just stick with this instead.

Revision history for this message
Jon Yoon (jony-labkey) wrote :

Confirmed everything appears to be working. We can consider this closed.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Thanks for your patience with this!

Changed in duplicity:
assignee: Kenneth Loafman (kenneth-loafman) → nobody
milestone: 0.8.20 → none
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.