Large backup signature and manifest files should be split with --volsize too

Bug #385495 reported by stagenex
438
This bug affects 87 people
Affects Status Importance Assigned to Milestone
Duplicity
In Progress
High
Kenneth Loafman
Déjà Dup
Triaged
High
Unassigned

Bug Description

With new 0.6.0 release, the signature & manifest archive should be split to respect the volume size's command line option.

Without this, it's not possible to backup to limited filesize backend, like imap, or ftp.

Revision history for this message
Ross Patterson (rossp) wrote :

Here's some history from the previous tracker:

Kenneth Loafman <email address hidden> writes:

> Kenneth Loafman <email address hidden> writes:
> >
> > Ross Patterson <email address hidden> writes:
> >
> > > Follow-up Comment #2, bug #25542 (project duplicity):
> > >
> > > I was thinking of attempting this but since I'm cutting my teeth
> > > on this and Duplicity seems to have a pretty unique layout for a
> > > python project, I'd love some guidance.
> > >
> > > It looks like breaking up the signature file while writing it to
> > > the remote should be mostly a matter of duplicating the relevante
> > > bits of the "while not at_end:" loop logic from
> > > duplicity-bin:275(write_multivol) to the FileobjHooked.to_remote
> > > method at duplicity/dup_temp.py:169(to_remote). Does that sound
> > > correct?
> > >
> > > What I'm having a harder time finding in the code is the right
> > > place to hook into for re-assembling split signatures on restore
> > > or inspection of the remote archive. Can anyone offer any
> > > pointers on that?
>
> > Follow-up Comment #3, bug #25542 (project duplicity):
> >
> > Wait until 0.6.0 is out and it will be a lot easier. The sig and
> > manifest file handling has changed a fair bit, so the split and
> > reassembly will be a lot easier.
> >
> > Plus, I've got a version of file_naming.py that handles volume
> > numbers in sig files, just not released yet.
>
> Follow-up Comment #7, bug #25542 (project duplicity):
>
> I read comment 2 and you were right about where to break it up, in
> dup_temp.py. I have not thought about where to reassemble it, but it
> would probably be best to look at the place it's downloaded into a
> temp. Catch it there and reassemble into a single file. That way you
> won't have to mess with collections.py.
>
> Since the ~/.duplicity directory is expected to be around between
> backups now, the only reason to download it from remote is if that dir
> has been destroyed somehow, so even a special case might do it, kind
> of a preparatory step before duplicity begins. The local copy would
> still be one file, assuming it was still there.
>
> I really have not thought about this much. Backing up to IMAP is such
> a special use case for duplicity that it just has not been on my
> mental radar.

Revision history for this message
Ross Patterson (rossp) wrote :

I've got signature splitting working on a branch with some caveats documented in http://bazaar.launchpad.net/~rossp/duplicity/signature-volsize/revision/575. Since the duplicity code and I aren't getting along, I won't be working on this further or using duplicity, but I wanted to leave my contributions here in case someone can pick them up and do something with them. But they should certainly be fully reviewed and the problems addressed before merging.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote : Re: [Bug 385495] Re: Large backup signature and manifest files should be split with --volsize too

Ross Patterson wrote:
> I've got signature splitting working on a branch with some caveats
> documented in http://bazaar.launchpad.net/~rossp/duplicity/signature-
> volsize/revision/575. Since the duplicity code and I aren't getting
> along, I won't be working on this further or using duplicity, but I
> wanted to leave my contributions here in case someone can pick them up
> and do something with them. But they should certainly be fully reviewed
> and the problems addressed before merging.

Thanks for the effort. Its all appreciated and I'll take a look at it.
 Sorry to hear you won't be using duplicity. Maybe later.

I'm not sure how the signature volume size problem should be handled.
That's one of the issues to address later. So far, the restriction on
file sizes has only hit IMAP users. I don't know of any serious size
restrictions on other protocols, but there may be some out there. The
usage of duplicity has been moving up the food chain, stressing larger
volume sizes to support larger backups with fewer volumes.

That said, one quick fix to the volume size problem is to output a
signature file with every data volume. This has an absolute guarantee
that the signature size will be smaller than the volume size, and its
easy to implement, however, the number of files open during incrementals
go from 1 sigtar per incremental to N sigtars per incremental. That
would break the max file open limits on a lot of the folks that want to
live dangerously and have few full backups, but lots of incrementals.
Like most engineering problems, its all in the tradeoffs.

Revision history for this message
Dan Carleton (dacc) wrote :

Is there any plan to fix this for 0.7? It's impacting large S3 full backups:

WARNING 1
. Upload 's3://s3.amazonaws.com/rieke-duplicity/duplicity-full-signatures.20100620T182934Z.sigtar.gz'
failed (attempt #5, reason: error: [Errno 32] Broken pipe)

WARNING 1
. Giving up trying to upload
s3://s3.amazonaws.com/rieke-duplicity/duplicity-full-signatures.20100620T182934Z.sigtar.gz
after 5 attempts

ERROR 23 BackendException
. BackendException: Error uploading
s3://s3.amazonaws.com/rieke-duplicity/duplicity-full-signatures.20100620T182934Z.sigtar.gz

("duplicity-full-signatures.20100620T182934Z.sigtar.gz" is a bit over 5GB)

Revision history for this message
Christian Dysthe (christian-dysthe) wrote :

It's affecting my large backups to 4shared,com over WebDAV as well.

Revision history for this message
arneko (goblinpower) wrote :

Isn't this also contradictory to the aim of a "bandwith efficient [..] backup" duplicity has? Having to upload e.g. 5GB of data at once is not easy, since it requires a very stable connection etc.

My ISP offers me a huge WebDAV server, but it only allows filesizes up to ~500 MB.

Imagine my excitement when I found duplicity, helping me over this restriction.
And now imagine my disappointment when I found out about sigtars being too large ;-)

It would be great if this could be implemented somehow...

Revision history for this message
Jehan (jbruggem) wrote :

I concur to the importance of this bug. This limitation has been a issue for me too.

Revision history for this message
Michal Papis (mpapis) wrote :

Hi I have found another issue with non splitted signatures, currently there is a limit as of 2GB for strings in python 2.4 and for me it is only half of my signatures.

This means that my signatures files in full backup does contain only 49% of my files, first incremental backup takes additional 49% of files, and finally in the second incremental backup last 2% is saved in signatures.

For me signatures should be splitted into array collecting strings of max volsize, and the same restoring them can not occur into single string, it should be done using array as during collection. This solution should overcome limitation of 2GB and allow it work correctly.

Revision history for this message
bJXjLjEHIaWT0tFd (bjxjljehiawt0tfd-deactivatedaccount) wrote :

I push my backups out over a somewhat dodgy WiFi connection and am hit by this every other full backup run also.

Revision history for this message
Gerlag (pietje-puk) wrote :

Hi, I have the same problem. I did some small pre-tests over a 'tiny' openvpn link to a small windows box. Then I did a full time consuming backup of all my data resulting in a > 2 GB signature file which caused a failure. I didn't expect that the signature file would be so large. I would suggest to at least mention this in the manual under 'volsize' or 'Operation and Data Formats'

Revision history for this message
robe (r-evert) wrote :

Hi, my signature file is about 60mb, even with compression. Duplicity downloads this file before every backup ( first fail ) and re-uploads it after every backup. This renders duplicity useless with low bandwidth ( ADSL ).

The file should not be downloaded everytime, what about storing it locally, signing it and uploading the signature. Then download the signature and check, if the local file is the same. Otherwise download the signatures. Then only a signatures delta should be uploaded.

Is there anything wrong with these thoughts?

-rob

Revision history for this message
robe (r-evert) wrote :

Sorry for the second post, adding --compress-algo=bzip2 to the GPG options improves the situation, as the signature file is about half es big afterwards. Of course you need GPG compression.

Revision history for this message
Jens Finkhäuser (finkhaeuser-consulting) wrote :

Other people have raised excellent points why the signature file should be split, too. Here's another: for large but full-ish volumes, the signature file can actually fill the volume, thus making backups of the whole volume impossible.

Granted, that's a fairly extreme case, but one I'm running into...

Revision history for this message
Wolfram Riedel (taisto-web) wrote :

Using duplicity 0.6.14 on lucid I'm experiencing this issue with large backups. Similar to what Michal Papis wrote, a big chunk of my data gets backed up twice because duplicity doesn't regconize/remember all the files it has processed already. For my pretty good filled 500 GB harddrive, the full backup takes about 407 GB (incl 2.1 GB sig). The first incremental backup again takes 124 GB (incl 1.4 GB sig) which is data stored twice (minus a couple hundred MB of actual changes). Second incr after that is only 12 MB (incl 5 MB sig). Haven't tried the workaround with the compression option yet.

Changed in duplicity:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
D. Grady (fehknt) wrote :

I was running 2x duplicity instances in parallel compressing to 14G tmpfs on /tmp for speed, but I get out of space reported with a 8.3G full signatures and a 5.8G full signatures. Having the signature files be a predictable size would be a big advantage. I'm already splitting the backup into many smaller sub-sets, I don't really want to go finer than this but I guess I could as a workaround. For now I unmounted the tmpfs and am letting it back up to disk prior to network transfer to get a full set. I'm not sure if this costs me more speed than the 2x parallel runs gains me or not.

Revision history for this message
Philipp (philipp30) wrote :

I am affected by this bug too.

duplicity --version
duplicity 0.6.08b

I got a full signature file as large as 6.57 GB (1.14 TB SourceFileSize) and 250 mb splitsize. Bad for ftp backups. :-(

Revision history for this message
Martin R. Siegert (martin-siegert) wrote :

I also suffer from un-split (i.e. big) signature files.

In my case I "loose" all files that were recorded behind the 2GB position in the compressed signatures file as outlined in
https://bugs.launchpad.net/duplicity/+bug/914504/comments/1

Technical background:
At least (in my case) Python 2.4 for Solaris 10 seems to have some kind of 2GB limit (32bit?)
(cf. comments 8 and 14 on this bug.)
I am investigating on this, as I saw others (using Ubuntu instead of Solaris) reporting sigtar.gz of up to 43GB in size.

Revision history for this message
Alphazo (alphazo) wrote :

I'm also impacted by this limitation and reported it in a different bug report before realizing it was link to this one.

Revision history for this message
derp herp (junkmail-trash) wrote :

Seeing this now on S3. Any fix planned?

Revision history for this message
Alexander Fortin (alexander-fortin) wrote :

Hi folks, I think that there could be another problem, I'm testing duplicity 0.6.08b with S3 (european bucket) uploading just a small file (61kb) and it works fine until I use --no-encryption, but it breaks with these errors as soon as I use --encrypt-key:

Upload 's3+http://bucket/system-2012-02-01/duplicity-full.20120201T145356Z.vol1.difftar.gpg' failed (attempt #1, reason: error: [Errno 32] Broken pipe)
Upload 's3+http://bucket/system-2012-02-01/duplicity-full.20120201T145356Z.vol1.difftar.gpg' failed (attempt #2, reason: error: [Errno 104] Connection reset by peer)
Upload 's3+http://bucket/system-2012-02-01/duplicity-full.20120201T145356Z.vol1.difftar.gpg' failed (attempt #3, reason: error: [Errno 104] Connection reset by peer)
Upload 's3+http://bucket/system-2012-02-01/duplicity-full.20120201T145356Z.vol1.difftar.gpg' failed (attempt #4, reason: error: [Errno 32] Broken pipe)

I specified --gpg-options "--compress-algo=bzip2" too with no luck.

Revision history for this message
Alexander Fortin (alexander-fortin) wrote :

Sorry guys, I guess it was some outage on S3 because now it has started to work just fine, wrong timing to test duplicity I guess...

Revision history for this message
andrey_campbell (andreycampbell) wrote :

Any progress on this? This is a potential deal breaker for many people now, especially when we have some many choices for the backup destination (all kinds of various cloud services, each with their own quirks - but especially quirks related to the maximum size of the files).

Thanks!

Revision history for this message
andrey_campbell (andreycampbell) wrote :

And I should add that this doesn't seem very difficult to implement - someone above apparently actually already implemented a patch for this.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote : Re: [Bug 385495] Re: Large backup signature and manifest files should be split with --volsize too

The patch was not complete. I'm willing to accept completed patches.

Plus, I'm thinking there should be a 1:1 correlation between difftar and
sigtar files. That would make somethings a lot easier.

On Sat, May 12, 2012 at 9:45 AM, andreic <email address hidden> wrote:

> And I should add that this doesn't seem very difficult to implement -
> someone above apparently actually already implemented a patch for this.
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/385495
>
> Title:
> Large backup signature and manifest files should be split with
> --volsize too
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> Confirmed
>
> Bug description:
> With new 0.6.0 release, the signature & manifest archive should be
> split to respect the volume size's command line option.
>
> Without this, it's not possible to backup to limited filesize backend,
> like imap, or ftp.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/385495/+subscriptions
>

Revision history for this message
Richard Merren (richard-merren) wrote :

I have finally determined that this is what is causing my backups to error out. My sigtar file is reaching 1.8G and 1.9G on an NTFS USB drive. Presumably when it hits 2G it throws an error ("Error splicing file: Input/output error" pops up), though I have not been able to find a log file that tells me any details about this error to confirm my suspicion. It looks like this has not been addressed in some time.

This is a deal-breaker for me. I cannot use the duplicity/deja-dup solution in Ubuntu or recommend that others use it until this issue is resolved. Using backup with a preformatted (and therefor NTFS) external USB drive is a fairly common use case, and 2GB files should be avoided (or broken up on detecting this error or, at least, the option to avoid them should be included).

Revision history for this message
Martin R. Siegert (martin-siegert) wrote : Re: [Bug 385495] Re: Large backup signature and manifest files should be split with --volsize too

Hello Richard,

> [...] My sigtar file is reaching 1.8G and 1.9G on an NTFS USB
> drive. Presumably when it hits 2G it throws an error [...]

> Using backup with a preformatted (and therefor NTFS) external
> USB drive is a fairly common use case, and 2GB files should be
> avoided[...]

Are you sure your drive is NTFS formatted - NTFS usually has not 2GB file size limit, but FAT has such.

Also, usually preformatted drives (except for MacOS) usually have FAT formatting, for easy shareing with Linux _and_ Windows, while (older) Linux-distributions cannot write to NTFS out-of-the-box.

Best, Martin

Changed in duplicity:
importance: Medium → High
assignee: nobody → Kenneth Loafman (kenneth-loafman)
Changed in duplicity:
milestone: none → 0.6.22
Revision history for this message
Richard Merren (richard-merren) wrote :

Martin: Very good question. The drive is definitely NTFS (or it was...I wiped it to try reformatting with EXT4, but NTFS was confirmed with the Disk Tool beforehand.). Oddly, NTFS is supposed to be able to handle files much larger than 2GB, but I have found a few other references online to people having problems with 2GB file size limits on NTFS-3G under various distros, though nobody has an explanation for why it happens other than to point fingers at FUSE or the Kernel. There are also a number of bug reports or message board comments on various distros about the "splicing" error I am getting related to either large file size or sparsely populated files in NTFS.

It is possible that the 2GB failure is a hardware problem with my disk--I am having trouble formatting it with EXT4 when plugged in to two separate computers. It is only about 9 months old, but I have had previous large USB drives fail early on as well. Still, I think it is a reasonable request (even if I didn't ask for it reasonably before ;-) ) that the signature file be limited in size just like the content files

Revision history for this message
Tõnis B (bramanis) wrote :

Im also experiencing this problem with a 200+GB backup.

I see this will be fixed in the next release, but what about a patch for those affected with the problem?

When is 0.6.22 coming out?

Thanks

Revision history for this message
mbn18 (miki) wrote :

Having the same problem with backup of 350GB and a signature file of 9GB.

Its impossible to reliably backup the set.

Hope 0.6.22 will be released soon

Thanks

Changed in duplicity:
milestone: 0.6.22 → 0.6.23
Revision history for this message
Tõnis B (bramanis) wrote : Re: [Bug 385495] Re: Large backup signature and manifest files should be split with --volsize too

Hello

Did You include this in the 0.6.23 release?

Regards
Tõnis Bramanis

On 17.08.2013 13:57, Kenneth Loafman wrote:
> ** Changed in: duplicity
> Milestone: 0.6.22 => 0.6.23
>

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Not yet. I need to move the milestone again.

On Wed, Feb 12, 2014 at 7:42 AM, Tõnis Bramanis <email address hidden> wrote:

> Hello
>
> Did You include this in the 0.6.23 release?
>
> Regards
> Tõnis Bramanis
>
> On 17.08.2013 13:57, Kenneth Loafman wrote:
> > ** Changed in: duplicity
> > Milestone: 0.6.22 => 0.6.23
> >
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/385495
>
> Title:
> Large backup signature and manifest files should be split with
> --volsize too
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> Confirmed
>
> Bug description:
> With new 0.6.0 release, the signature & manifest archive should be
> split to respect the volume size's command line option.
>
> Without this, it's not possible to backup to limited filesize backend,
> like imap, or ftp.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/385495/+subscriptions
>

Revision history for this message
Antoine (antoine+lauchpad) wrote :

Hello,

I am also having the same problem with backup of 900GB and large signature files (>10 GB).

If the full fix for "Large backup signature and manifest files should be split with --volsize too", is too complex to get into the next release, at least for Amazon S3, there is a hack around the issue:

Amazon S3 announced an increase in the maximum size of an object from 5 gigabytes to 5 terabytes (http://aws.amazon.com/about-aws/whats-new/2010/12/09/Announcing-Amazon-S3-Large-Object-Support/) in december 2010.
What is needed to use this feature is to upload S3 objects in parts. Parts can be uploaded independently, in any order, and in parallel. (http://docs.aws.amazon.com/AmazonS3/latest/dev/UploadingObjects.html).

boto supports multipart uploads, but they are not transparent (ie calling key.set_contents_from_filename(...) will not use boto's mutlipart upload capability for large files).

Best,
Antoine

Revision history for this message
Antti Peltonen (bcow) wrote :

Hi,

I too am hitting this issue. When can we expect a release that splits up large signature files as well?

Revision history for this message
Gaurav Ashtikar (gau1991) wrote :

Hi,, I am also facing this issu for backup size greter than 400 GB
I know this is very old thread, but it is affecting so many people.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

I am working on this for the 0.7 series. It will break backwards
compatibility, so be ready.

On Fri, Jun 13, 2014 at 12:44 AM, Gaurav Ashtikar <email address hidden> wrote:

> Hi,, I am also facing this issu for backup size greter than 400 GB
> I know this is very old thread, but it is affecting so many people.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/385495
>
> Title:
> Large backup signature and manifest files should be split with
> --volsize too
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> Confirmed
>
> Bug description:
> With new 0.6.0 release, the signature & manifest archive should be
> split to respect the volume size's command line option.
>
> Without this, it's not possible to backup to limited filesize backend,
> like imap, or ftp.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/385495/+subscriptions
>

Revision history for this message
hippich (hippich) wrote :

Is there github/other repo I can try with current 0.7 version? Trying to find a way around this.

Revision history for this message
Remy van Elst (raymii) wrote :

I also have a scenario where this issue bites me. A rather large backup set (1.6 TB) on a node with an 8 GB root partition and a 4 TB /home partition. The /root/.cache/duplicity folder fills up the root disk with the sigtar/difftar files. After changing /root/.cache to a symlink to /home/tmp/.cache and the TMPDIR to /home/tmp the backup now runs without issues.

Revision history for this message
Andy Skalet (ubuntucom-1) wrote :

This bites us as well. Identical issue to Dan Carleton back in 2010 - our duplicity-full-signatures is a little over 5GB, and S3 gives us a broken pipe retry after retry until we fail.

Has anyone found a workaround in the multipart parameters or timeouts for this? And I take it this was not fixed in 0.6.23 since I'm running 0.6.24?

Great package otherwise, thanks for the work on it, but this issue is a deal breaker for us at the moment.

Revision history for this message
Andy Skalet (ubuntucom-1) wrote :

When I made my previous comment I didn't realize that duplicity supports S3's multipart uploads.

I added --s3-use-multiprocessing to turn this on, which has allowed my initial full backup to complete.

I still vote for this bug and will continue to follow it :)

Changed in duplicity:
assignee: Kenneth Loafman (kenneth-loafman) → nobody
Revision history for this message
Pavel Karoukin (pavelkaroukin) wrote :

Kenneth, you mentioned you were working on fixing this issue in 0.7 series. Is there any code I can help test?

Revision history for this message
Remy van Elst (raymii) wrote :

Is there any progress on this issue? Would throwing money at the problem work? It is a dealbreaker for some of my backups...

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

There is progress on this and the need is greater now that the new librsync
has come out.

It will come out in the 0.8 series along with full support for the new
librsync hash. With the new hash the table size will double, thus doubling
the need for a solution to this issue. That's going to cause the need for
full backups on transition to 0.8. We'll still be able to read 0.7 and
below, but not write them.

On Wed, Mar 11, 2015 at 9:55 AM, Remy van Elst <email address hidden>
wrote:

> Is there any progress on this issue? Would throwing money at the problem
> work? It is a dealbreaker for some of my backups...
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/385495
>
> Title:
> Large backup signature and manifest files should be split with
> --volsize too
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> Confirmed
>
> Bug description:
> With new 0.6.0 release, the signature & manifest archive should be
> split to respect the volume size's command line option.
>
> Without this, it's not possible to backup to limited filesize backend,
> like imap, or ftp.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/385495/+subscriptions
>

Revision history for this message
Remy van Elst (raymii) wrote :

Good to hear there is progress. Do you have, by any chance, a roadmap or estimation when thiss will be ready? Am I thinking about 1 month, 3 or maybe more? It would help me plan time and management resources for my projects. And, would there be any way to help progress on this? As said, possibly money?

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

I am shooting for 3-4 months from now with something to test in the 3rd
month.

On Thu, Mar 12, 2015 at 4:17 AM, Remy van Elst <email address hidden>
wrote:

> Good to hear there is progress. Do you have, by any chance, a roadmap or
> estimation when thiss will be ready? Am I thinking about 1 month, 3 or
> maybe more? It would help me plan time and management resources for my
> projects. And, would there be any way to help progress on this? As said,
> possibly money?
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/385495
>
> Title:
> Large backup signature and manifest files should be split with
> --volsize too
>
> Status in Duplicity - Bandwidth Efficient Encrypted Backup:
> Confirmed
>
> Bug description:
> With new 0.6.0 release, the signature & manifest archive should be
> split to respect the volume size's command line option.
>
> Without this, it's not possible to backup to limited filesize backend,
> like imap, or ftp.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/385495/+subscriptions
>

Revision history for this message
Remy van Elst (raymii) wrote :

OK, thank you very much for the indication :)

Revision history for this message
andy (andy-bear) wrote :

Hello, I've spent hard time troubleshooting duplicity @Synology right now - may be it can help somebody else looking for a solution - and some clever DEV here could fix / workaround it ;-).

The latest Synology ipkg in this moment are:
- duplicity: 0.6.21
- rsync: 3.0.9 protocol version 30
- NcFTP: 3.2.4
- Python: 2.7.9

Running duplicity with FTP back-end on a big ( >500MB) source dir produces locally (read: not only remotely, so NcFTP is not the bottleneck) unusable archive: 1st run seems to be ok (although it's not: the sigtar is truncated at 2GB) and every following incremental backup re-creates endlessly another full backup set, holding another broken sigtar.

Now, the duplicity tells you on another INC backup, that it can't gpg decrypt the sigtar - it deletes the local cached sigtar, gets the encrypted one from remote, which has to be decrypted locally - and hence it fails, not having the gpg encryption key inside the script, which would be anyway useless as this sigtar from remote is anyway broken.

I'm not sure, where is the bottleneck: is it the Python or Duplicity internally? FTP should not matter, as it happens also locally (I deployed to the FTP remote my locally-created seed disk, where the sigtar was truncated @2GB already).

I've googled a lot and found the most wild theories, all wrong it seems; at the end as I realized there is always the 2GB boundary, I've found this forum.

As a workaround, the switch "max-sigtar" should be fine; but where is the real trouble with those 2GB? The local fs is ext4, so this can't be it. The Synology handles huge files also without any problems (e.g. some internal linux stuff/tools/libs). So it seems like Python/duplicity issue to me.

Another almost-free workaround could be: duplicity would backup always only as much in 1-pass, as long some argument-boundary (e.g. 2GB) would not be exceeded. So instead of implementing now some fancy complex sigtar splitting logic, duplicity would prevent that those files making sigtar grow too big, were simply ignored like "not there" - hence, it would produce "not-so-complete" FULL/INC backup set, which is just fine: it could warn the user like "please re-run to complete". One could simulate this behavior with using --exclude/include combinations, but hey, this is a killing exercise ;-/. Duplicity could handle this _very_ easily I guess; just stop processing the input if(size(sigtar)>$configed) and give some warning output.

Btw: duplicity -v9 --dry-run tells me always that the '.manifest.gpg is not part of a known set; creating new set' - but this seems not to influence the success/failure of the backup process. When there were no changes, it doesn't upload anything and when there were changes, it seems to upload only the needed stuff - but it leaves bad taste somehow.

Thank you for any input,
Andy

Revision history for this message
Remy van Elst (raymii) wrote :

Is there already a bit of progress? I'd be happy to test code.

Revision history for this message
Joris van Eijden (joris-vaneijden) wrote :

It's 2016 now and we're also running into this issue, using openstack objectstore (5GB limit).

Is there any workaround available?

Or anything we can do to help move this along?

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

The only workaround is to back up in smaller chunks, or to use something
like 'split' before sending it to openstack.

It is coming along. Just getting over some health issues now.

On Mon, Jan 4, 2016 at 8:19 AM, Joris van Eijden <<email address hidden>
> wrote:

> It's 2016 now and we're also running into this issue, using openstack
> objectstore (5GB limit).
>
> Is there any workaround available?
>
> Or anything we can do to help move this along?
>
> --
> You received this bug notification because you are subscribed to
> Duplicity.
> https://bugs.launchpad.net/bugs/385495
>
> Title:
> Large backup signature and manifest files should be split with
> --volsize too
>
> Status in Duplicity:
> Confirmed
>
> Bug description:
> With new 0.6.0 release, the signature & manifest archive should be
> split to respect the volume size's command line option.
>
> Without this, it's not possible to backup to limited filesize backend,
> like imap, or ftp.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/385495/+subscriptions
>

Changed in duplicity:
milestone: 0.6.23 → 0.8.00
assignee: nobody → Kenneth Loafman (kenneth-loafman)
Revision history for this message
Joris van Eijden (joris-vaneijden) wrote :

I worked around it by setting volume size to 250 and max-block-size to 20480
Now it takes 42 hours to create a full backup but at least it completes.

No idea if things can be improved with different values.

Revision history for this message
JP (0j-p) wrote :

That helped a bunch, with Volume Size 500MB, max-blocksize 20480, 330GB of files, 527MB signature file.
Earlier backup with default volume size and blocksize, less data, 2GB signature file.

Changed in duplicity:
status: Confirmed → In Progress
Revision history for this message
Nick (n6ck) wrote :

I tried also to set a different volume and blocksize, but after 2 or 3 weeks this stopped now working again. Will this issue be fixed in any of the 0.7.x releases?

Revision history for this message
seenxu (seenxu) wrote :

so far a workaround for me is to split the backup into several parts.

Revision history for this message
Nick (n6ck) wrote :

Thanks, I will try to split the backups into several parts, but it would still be nice to know if this issue will be fixed in one of the 0.7.x releases.

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

It's going to be fixed in one of the 0.8 releases. I'm working on it now.

Stay with the smaller backups for now.

On Wed, Feb 15, 2017 at 1:39 AM, Nick <email address hidden> wrote:

> Thanks, I will try to split the backups into several parts, but it would
> still be nice to know if this issue will be fixed in one of the 0.7.x
> releases.
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/385495
>
> Title:
> Large backup signature and manifest files should be split with
> --volsize too
>
> Status in Duplicity:
> In Progress
>
> Bug description:
> With new 0.6.0 release, the signature & manifest archive should be
> split to respect the volume size's command line option.
>
> Without this, it's not possible to backup to limited filesize backend,
> like imap, or ftp.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/duplicity/+bug/385495/+subscriptions
>

Revision history for this message
Nick (n6ck) wrote :

Thank you very much for the update! Will try with the smaller backups for now.

Revision history for this message
Aleksey Yakovenko (firaxis) wrote :

I was using paramiko+scp backend and encountered with this issue, but for me worked switching to pexpect+scp backend. As a bonus, it much faster(about 6 times) than paramiko. Not sure it's a proper solution, but it works for me.

Revision history for this message
Josh (joshhansen) wrote :

Hi, just ran into this on a 500GB backup using Backblaze B2 as the backend. duplicity 0.7.14.

My signature file is 12GB. B2 actually supports files up to 10TB in size but requires special handling for files over 5GB, see https://www.backblaze.com/b2/docs/large_files.html

What are the prospects of resolving this in the near future?

Revision history for this message
vixns (stephane-cottin-d) wrote :

We faced this problem with swift backend, sometimes sigtar files are > 5GB, the Openstack limit for object put.

Because we cannot wait for a stable 0.8, we found a workaround for 0.7.x

The attached file is a modified swiftbackend.py, using a SwiftService instead of a simple Connection object for all operations.
It enables static large objects support by default, raising the limit to ~ 1TB ( 1000 * 1GB segments).

Default segment size is 1GB, it can be overwritten by the SWIFT_SEGMENT_SIZE env var.
When SWIFT_SEGMENT_SIZE = 0, static large objects are disabled.

Tested with duplicity 0.7.16 and python-swiftclient 3.5.0

Revision history for this message
Erik M (goodbyte100) wrote :

Like #57 I also had problems with paramiko+scp, but I switched to paramiko+sftp which solved my problems. From what I can see in the sources the paramiko+scp backend does not do chunked uploads, instead the whole file is read into memory and then uploaded, which works fine for normal segments that only are 200MB but fails for big signatures.

Revision history for this message
Vej (vej) wrote :

Hello.

I marked this as affecting Déjà Dup because of the duplicate, which had been sitting in our bug tracker.

Changed in deja-dup:
status: New → Confirmed
status: Confirmed → Triaged
importance: Undecided → High
Changed in duplicity:
milestone: 0.8.00 → 0.8.01
Changed in duplicity:
milestone: 0.8.01 → none
Revision history for this message
Tõnis B (bramanis) wrote :

Does that mean, it won't be fixed?

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

No, just that I can't commit to a revision at this point.

Revision history for this message
Arthur Andersen (leocgit) wrote :

I am having a directory of 400GB that is to be backupped by duplicity incrementally & encrypted to AWS S3. The sigtar file is almost 10GB in size, causing the backup to fail with `Broken pipe` (I suspect due to network connection loss, timeouts or limitations).

As of now I cannot split the backup into smaller chunks. Is there a work-around that I could use until duplicity splits sigtar files aswell?

Revision history for this message
IvanRicotti (ivan.ricotti) wrote :

Hello, thanks for your work on this tool.
I'm stuck on uploading on S3 a huge sigtar file and I cannot split the backup.
Any work-around?
Any plan for a fix in a next release?
Thank you!

Revision history for this message
Kenneth Loafman (kenneth-loafman) wrote :

Work is in progress, just going very slow.

The only workaround is to do the backup locally, then use split to split the file into size less than the S3 limit.

Revision history for this message
Eric G. (eric-from-paris-75) wrote :

Same issue here... `Broken pipe` error when uploading a huge sigtar file whith S3 backend :(

Revision history for this message
Eric G. (eric-from-paris-75) wrote :

After some tests and research, I found a workaround for me. Maybe it can help somebody else...

I am using Scaleway Object Storage (S3 compatible).
I set volsize to 1GB but always get a "Broken pipe" error when uploading large sigtar files (>8GB)

Tried to use boto backend and "--s3-use-multiprocessing" whith no luck (got a bytes/string TypeError with _boto_multi.py)

So I switch to boto3 backend to get multiprocessing working
"Boto3 always attempts to multiprocessing when it is believed it will be more efficient"
Source : http://duplicity.nongnu.org/vers8/duplicity.1.html

But still get the same error as the begin : "Broken pipe"

After some research I found a limitation with Scaleway :
"Object Storage supports multipart upload. We recommend uploading by chunks, in a limit of 1000 chunks per upload and 5TB per object."
Source : https://www.scaleway.com/en/faq/how-can-i-upload-large-objects/

The default boto3 multipart_chunksize is 8MB and my sigtar file is 8.8GB
So I tweak multipart_threshold & multipart_chunksize boto3 variables (from 8MB to 50MB) and everything works fine ! :)

Revision history for this message
Sean Danischevsky (seaniedan) wrote (last edit ):

Confirming this is an issue for me backing up 650GB of data via rsync on a Raspberry Pi running a 32bit OS.

Filesystem is ext3, don't know how to use "the filesystem's large file option to allow >2gb file size." ( https://gitlab.com/duplicity/duplicity/-/issues/67 ), so I'm stuck. My sigtar files max out at 2^31 bytes.

Hoping that Kenneth is on the case and doing well after the trip to the hospital!

p.s. both OSes (on the machine to be backed up and the backup machine) are 32 bit. Would changing either to a 64bit OS help, or do both need to be 64bit?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.