Thoughts on file ids

Jelmer Vernooij jelmer at samba.org
Sun May 8 11:53:43 UTC 2011


On Fri, 2011-05-06 at 11:06 -0400, Aaron Bentley wrote:
> On 11-05-05 11:07 AM, Jelmer Vernooij wrote:
> > I wonder if it would make sense to have a process before transform
> > operations to find renames/copies - was that what you had in mind? Such
> > a process in its simplest form could just return the existing file ids.
> No, that wasn't something I had in mind.  Finding renames is one thing,
> but merge-across-copies, and the inverse, merge-across-joins, is evil
> and would require lots of work.
> 
> I have thought about implementing merge-by-path, though.
What I mean is allowing a process before delta/transform operations that
assigns short-lived (i.e. only relevant to that action) file ids to each
file in the relevant trees.

That sort of thing would allow the implementation of things like
merge-by-path, or other more advanced mechanisms (Git's algorithm of "if
X percent of two files matches, it's probably the same file"), without
affecting the storage layer.

> >> Absent (3), we'd probably just use the path for (1).  Using the path for
> >> (2) would mean that renaming files without changing their contents would
> >> take more space than it does with file-ids.
> > For (2), it doesn't necessarily have to be the path if we're not using a
> > file id - it could be a checksum, or perhaps even the file id of another
> > file in a parallel import. Whatever it is, it should be a repository
> > implementation detail not exposed at the higher level API / UI level.
> The tuples we use for versionedfiles are already repository
> implementation details, aren't they?
They are now, but that's a relatively recent change.

> Mind you, there's also the per-file graph, which I don't think you've
> really discussed here.
I think the per file graph is should just be considered a sort of sparse
version of the revision graph.

Cheers,

Jelmer
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <https://lists.ubuntu.com/archives/bazaar/attachments/20110508/7c6526fd/attachment.pgp>


More information about the bazaar mailing list