import: proposed changes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
git-ubuntu |
New
|
Undecided
|
Unassigned |
Bug Description
Our importer algorithm is a bit out of date and ineffecient due to cruft. I'm hesitant to change it without the tests in place, but here is my conception. We might reject this bug if we decide not to do it, as well.
1) Rather than do an import-
Notes: If a given version appears twice, it is because it was re-published with different contents. We currently do not have a clean way to distinguish more than two uploads of the same version with different contents. While that should not happen, I don't think we should rely on ftpmaster or Launchpad behavior in the algorithm (as it leads to special casing). We currently use an orphan tag for the first time, but if it happens again, we won't and I believe we emit an error. I wonder if we can just not use the orphan tag and perhaps we should have levels of versioned tags (e.g. import/<version>/0, import/<version>/1) which we iterate to see if the treeishs match? This can be in a function that looks for a single-level tag preferentially else iterates and returns a found tag or None. The semantics of our tags is different now as we only expect to see one tagged commit for any given version?
2) For each spi in new_spis, import it patches-unapplied. No branches are manipulated in this step.
3) For each spi in order, import it patches-applied. No branches are manipulated in this step.
4) For the current archive status of the source package in the itertools.product of (series/pocket), update the correspondig branch pointer
Notes: this makes the devel case and the normal case identical, I think. We could just abstract out that function to take a series and the corresponding launchpad data?
Effectively, we update the commit graph and then we forcibly update where the branches are in the commit graph. From an efficiency perspective, we can probably limit 4) to the series (or even series+pocket) seen in 1)?
Related branches
- Server Team CI bot: Needs Fixing (continuous-integration)
- git-ubuntu developers: Pending requested
-
Diff: 4178 lines (+2634/-771)9 files modifieddoc/SPECIFICATION.importer (+72/-0)
gitubuntu/dsc.py (+38/-0)
gitubuntu/git_repository.py (+23/-11)
gitubuntu/importer.py (+519/-586)
gitubuntu/importer_test.py (+1803/-43)
gitubuntu/importerutils.py (+50/-0)
gitubuntu/importlocal.py (+0/-2)
gitubuntu/importppa.py (+5/-16)
gitubuntu/source_information.py (+124/-113)
- git-ubuntu developers: Pending requested
-
Diff: 1780 lines (+765/-608)6 files modifiedgitubuntu/dsc.py (+38/-0)
gitubuntu/git_repository.py (+18/-5)
gitubuntu/importer.py (+567/-493)
gitubuntu/importerutils.py (+26/-0)
gitubuntu/importppa.py (+2/-4)
gitubuntu/source_information.py (+114/-106)
tags: | added: hash-abi-break |
tags: | added: import |
tags: | added: spec |
note the intention here is to *not* break the hash abi. There are two changes that do, but they are actually unrelated (I need to clean up my MP, but ENOTIME): 1) branch name in the commit messages (this gets dropped, becuse the import functions no longer know about the branches) and 2) once we decide in the spec how to handle the order of import tag creation (e.g., if something is actually published in ubuntu before debian, we might naively expect the ubuntu one to be import and the debian one to be reimport, but that's not how the code works currently).
I thikn another big speedup can probably be done by doing the applied import right after the unapplied.