import: proposed changes

Bug #1731554 reported by Nish Aravamudan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
git-ubuntu
New
Undecided
Unassigned

Bug Description

Our importer algorithm is a bit out of date and ineffecient due to cruft. I'm hesitant to change it without the tests in place, but here is my conception. We might reject this bug if we decide not to do it, as well.

1) Rather than do an import-attempt-per-new-publishing-record from Launchpad, obtain a list of <version> "objects" where an object is only in the list if <version> has not been imported before (no such tag) or <version> has been imported before but with different contents, using the cache to determine what set of Launchpad publishes to consider. This means we'll have a sequence of SPI objects, in version order, I think, which we will call new_spis.

Notes: If a given version appears twice, it is because it was re-published with different contents. We currently do not have a clean way to distinguish more than two uploads of the same version with different contents. While that should not happen, I don't think we should rely on ftpmaster or Launchpad behavior in the algorithm (as it leads to special casing). We currently use an orphan tag for the first time, but if it happens again, we won't and I believe we emit an error. I wonder if we can just not use the orphan tag and perhaps we should have levels of versioned tags (e.g. import/<version>/0, import/<version>/1) which we iterate to see if the treeishs match? This can be in a function that looks for a single-level tag preferentially else iterates and returns a found tag or None. The semantics of our tags is different now as we only expect to see one tagged commit for any given version?

2) For each spi in new_spis, import it patches-unapplied. No branches are manipulated in this step.

3) For each spi in order, import it patches-applied. No branches are manipulated in this step.

4) For the current archive status of the source package in the itertools.product of (series/pocket), update the correspondig branch pointer

Notes: this makes the devel case and the normal case identical, I think. We could just abstract out that function to take a series and the corresponding launchpad data?

Effectively, we update the commit graph and then we forcibly update where the branches are in the commit graph. From an efficiency perspective, we can probably limit 4) to the series (or even series+pocket) seen in 1)?

Related branches

Nish Aravamudan (nacc)
tags: added: hash-abi-break
Robie Basak (racb)
tags: added: import
Robie Basak (racb)
tags: added: spec
Revision history for this message
Nish Aravamudan (nacc) wrote :

note the intention here is to *not* break the hash abi. There are two changes that do, but they are actually unrelated (I need to clean up my MP, but ENOTIME): 1) branch name in the commit messages (this gets dropped, becuse the import functions no longer know about the branches) and 2) once we decide in the spec how to handle the order of import tag creation (e.g., if something is actually published in ubuntu before debian, we might naively expect the ubuntu one to be import and the debian one to be reimport, but that's not how the code works currently).

I thikn another big speedup can probably be done by doing the applied import right after the unapplied.

Revision history for this message
Nish Aravamudan (nacc) wrote :

err, wrong on that last sentence, because the applied branches can be at different history points than the unapplied branches, so it's not purely based upon the spi. I wonder if we can be smarter here, but I'm not context-switched enough.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.