dirstate2 comments

Thu May 20 08:20:55 BST 2010

On Thu, May 20, 2010 at 2:23 PM, John Arbash Meinel
<john at arbash-meinel.com> wrote:

Thanks for the eyeballs John.

> Robert recently wanted to look at his dirstate2 branch, and wanted to
> get some feedback on it. I've looked at it, but I won't say I've been
> fully thorough.
>
> 1) Currently incomplete. Just reading what the docs mention, versus what
> the code does.  However, the docs seem to outline something that would work.

Yes, its minimal at the moment. I wanted to get early discussion
before committing deeply to an implementation. I've merged all of
trunk in just now, and will spend tomorrow poking at this - it would
be nice to work better for emacs users, given their patience with
savannah and sftp.

> 2) Initial thoughts is that it seems overly complicated. *Especially* on
> platforms with atomic rename. Which also brings up the question of how
> to make sure it is adequately tested on platforms where you can rename
> open files. Maybe you've already solved this, or maybe it's why this
> isn't fully complete yet.

On this point specifically, my theory was that it wouldn't matter if
you can do atomic renames - its correctness doesn't depend on renames
over open files failing, its simply designed to not depend on open
files renames succeeding.

> 3) Depending on renaming files in/out of the way to get correct behavior
> seems like it could be extra problematic on NFS. Then again, things may
> not be worse than where we already are.

We often fail completely on NFS at the moment with the dependency on
OSLocks; even when we work we're likely to be running into cache
coherency issues.

> 4) I was a bit confused by the naming scheme ($MD5.check is the pointer
> to the last file, while $MD5 is the new content, and $MD5.current is the
> pointer to $MD5.) I don't have anything immediate, but giving an
> extension to the new content, and possibly name the pointers as '.ptr'
> or something, might make it a bit clearer what is what.

Sure, we can do that.

> 5) Adding a new directory under '.bzr/checkout' to hold this stuff seems
> superfluous, especially since it seems to have yet-another format file.
> Maybe I misunderstood, but it would be nice to just use .bzr/checkout.

This scheme has the potential to leave cruft behind and need garbage
collecting; thats simpler if we can whitelist a small, constant set of
files and then grab everything else. That was my main motivation for
putting it in a subdir. The format marker was so that I could change
things without worry about accidentally using an old test dir.

> 6) If we are revisiting how dirstate works, what about splitting out the
> stat file again. It seems that file is what gives us the most difficulty
> (since it can be updated in a read lock). If we made it more clearly
> separate, we might avoid some of the problems entirely. IIRC the main
> desire to not have it was performance, but I wonder the real cost would
> be if we structured it appropriately. Having to repeat all the paths
> again would be bad, but maybe we can do better. IIRC we also do
> something indexed on stat rather than by path, maybe that would be
> useful here...

I don't want to bite off too much; splittout of stat is, of course,
possible - but its orthogonal to getting rid of the need for oslocks
entirely, which is what windows needs for reliable operation.

-Rob