Comment 9 for bug 1730734

Revision history for this message
Robie Basak (racb) wrote :

I've drafted a specification for this as follows:

---
Commit metadata in git’s data model comprises:
A person’s name
A person’s email address
A timestamp expressed as a number of seconds from the Unix epoch
A timezone offset
for each of:
The author
The committer

Commit metadata generated by the importer is synthesized deterministically from the corresponding source package data instance and its underlying source package.

All authorship information fields are derived directly from their corresponding fields from the sign-off line of the first entry from the debian/changelog file of the underlying source package. An absent name in the changelog entry is interpreted as the empty string. Any failure to parse the changelog sign-off line must result in an import failure.

The committer timestamp is converted from the timestamp of the source package data instance.

The commit name and email address are statically set to usd-importer and <email address hidden> respectively.

Rationale: this presents users looking at git history with useful metadata in an appropriate mapping while also maintaining the determinism necessary for commit hash stability.
---

"source package data instance" means SourcePackageRelease, where I'm defining "the timestamp of the source package data instance" as the earliest timestamp of all the SourcePackagePublishingHistory for that SourcePackageRelease available to us. I might need to think about SourcePackagePublishingHistory entries invisible to us though (such as in a pocket copy from a PPA) by defining what "available to us" should mean exactly.

The commit name and email address probably should be changed to be made more Ubuntu-general now.

Feedback appreciated.