git-ubuntu

import: proposed changes

Bug #1731554 reported by Nish Aravamudan on 2017-11-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	git-ubuntu	New	Undecided	Unassigned

Bug Description

Our importer algorithm is a bit out of date and ineffecient due to cruft. I'm hesitant to change it without the tests in place, but here is my conception. We might reject this bug if we decide not to do it, as well.

1) Rather than do an import-attempt-per-new-publishing-record from Launchpad, obtain a list of <version> "objects" where an object is only in the list if <version> has not been imported before (no such tag) or <version> has been imported before but with different contents, using the cache to determine what set of Launchpad publishes to consider. This means we'll have a sequence of SPI objects, in version order, I think, which we will call new_spis.

Notes: If a given version appears twice, it is because it was re-published with different contents. We currently do not have a clean way to distinguish more than two uploads of the same version with different contents. While that should not happen, I don't think we should rely on ftpmaster or Launchpad behavior in the algorithm (as it leads to special casing). We currently use an orphan tag for the first time, but if it happens again, we won't and I believe we emit an error. I wonder if we can just not use the orphan tag and perhaps we should have levels of versioned tags (e.g. import/<version>/0, import/<version>/1) which we iterate to see if the treeishs match? This can be in a function that looks for a single-level tag preferentially else iterates and returns a found tag or None. The semantics of our tags is different now as we only expect to see one tagged commit for any given version?

2) For each spi in new_spis, import it patches-unapplied. No branches are manipulated in this step.

3) For each spi in order, import it patches-applied. No branches are manipulated in this step.

4) For the current archive status of the source package in the itertools.product of (series/pocket), update the correspondig branch pointer

Notes: this makes the devel case and the normal case identical, I think. We could just abstract out that function to take a series and the corresponding launchpad data?

Effectively, we update the commit graph and then we forcibly update where the branches are in the commit graph. From an efficiency perspective, we can probably limit 4) to the series (or even series+pocket) seen in 1)?

Tags:

Related branches

~nacc/git-ubuntu:lp1731554-importer-rework-v2

Ready for review for merging into git-ubuntu:master

Server Team CI bot: Needs Fixing (continuous-integration) on 2018-06-15

git-ubuntu developers: Pending requested 2018-06-15

~nacc/git-ubuntu:lp1731554-importer-rework

Superseded for merging into git-ubuntu:master

git-ubuntu developers: Pending requested 2017-11-16

Nish Aravamudan (nacc) on 2017-11-10

tags:

added: hash-abi-break

Robie Basak (racb) on 2017-11-28

tags:

added: import

Robie Basak (racb) on 2018-05-24

tags:

added: spec

Revision history for this message

Nish Aravamudan (nacc) wrote on 2018-06-15:

note the intention here is to *not* break the hash abi. There are two changes that do, but they are actually unrelated (I need to clean up my MP, but ENOTIME): 1) branch name in the commit messages (this gets dropped, becuse the import functions no longer know about the branches) and 2) once we decide in the spec how to handle the order of import tag creation (e.g., if something is actually published in ubuntu before debian, we might naively expect the ubuntu one to be import and the debian one to be reimport, but that's not how the code works currently).

I thikn another big speedup can probably be done by doing the applied import right after the unapplied.

Revision history for this message

Nish Aravamudan (nacc) wrote on 2018-06-15:

err, wrong on that last sentence, because the applied branches can be at different history points than the unapplied branches, so it's not purely based upon the spi. I wonder if we can be smarter here, but I'm not context-switched enough.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.