Thoughts on file ids
Aaron Bentley
aaron at aaronbentley.com
Mon May 9 13:42:51 UTC 2011
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 11-05-08 07:53 AM, Jelmer Vernooij wrote:
> On Fri, 2011-05-06 at 11:06 -0400, Aaron Bentley wrote:
>> On 11-05-05 11:07 AM, Jelmer Vernooij wrote:
>>> I wonder if it would make sense to have a process before transform
>>> operations to find renames/copies - was that what you had in mind? Such
>>> a process in its simplest form could just return the existing file ids.
>> No, that wasn't something I had in mind. Finding renames is one thing,
>> but merge-across-copies, and the inverse, merge-across-joins, is evil
>> and would require lots of work.
>>
>> I have thought about implementing merge-by-path, though.
> What I mean is allowing a process before delta/transform operations that
> assigns short-lived (i.e. only relevant to that action) file ids to each
> file in the relevant trees.
I think that might make sense, but it's also worth seeing if that could
be merged with TreeTransform trans_ids, because they have a similar
lifetime and purpose.
> That sort of thing would allow the implementation of things like
> merge-by-path, or other more advanced mechanisms (Git's algorithm of "if
> X percent of two files matches, it's probably the same file"), without
> affecting the storage layer.
Sure, but you could also achieve this kind of thing by rewriting the
file-ids in one of the trees, e.g. using a PreviewTree.
>> The tuples we use for versionedfiles are already repository
>> implementation details, aren't they?
> They are now, but that's a relatively recent change.
Before now, they were part of the model?
>> Mind you, there's also the per-file graph, which I don't think you've
>> really discussed here.
> I think the per file graph is should just be considered a sort of sparse
> version of the revision graph.
I'm not sure the per-file graph would survive the elimination of
file-ids. File-ids represent the idea that we know at commit time which
files in a tree are comparable to which other files in another tree. I
think that if we can't encode that comparability at commit time, we
can't have per-file *anything* encoded in a repository. And
establishing that comparability later could be very expensive.
Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAk3H71gACgkQ0F+nu1YWqI3pmwCeOIQ7SJyzIjU7WJXPyuKYxJ36
dVQAn20Dwv4q8vgwtdjKP5RY1Eipig5G
=Y7XN
-----END PGP SIGNATURE-----
More information about the bazaar
mailing list