[RFC] Removing hash prefix in storage vastly improves performance
Robert Collins
robertc at robertcollins.net
Fri Aug 18 00:51:49 BST 2006
On Thu, 2006-08-17 at 17:47 -0500, John Arbash Meinel wrote:
> The attached patch adds a very simple new knit repository format. The
> only change is that it doesn't use the hash prefixes when figuring out
> where a given file id is stored, it stores everything in one directory.
>
> Originally, I was suspicious that this would help, but I was seeing that
> 25+% of the time spent committing a new kernel was just creating the
> knit index header. So I gave it a shot. And what I found was that:
>
> time bzr.dev/bzr -q commit -m "first"
> 8:36.95
>
> time no-knit-hash-prefix/bzr -q commit -m "first"
> 4:12.68
>
> Yep, that's right. 2x faster. Just as some comparison points:
>
...
> This would be pretty easy to phase into an 0.10 release, since it really
> is a small change. Because it is a repository format change, we may want
> to just have it available, but not the default for the next release.
I dont think its been analyzed enough to be honest.
What are the performance changes on kernel sized trees for
- BSD (perhaps)
- MacOSX (definately)
- Windows (definately, NTFS for sure, FAT32 if feeling generous)
- linux ext3 with hashed directories turned off.
42K files is a lot to put in one directory, I'm really not happy with
the idea of doing that without some careful testing of the situation.
Now, from our discussions with the hg folk, I recall that the seek
problem is that inside one dir you end up accessing/creating files in
many repository dirs.
So I'd suggest that the right change is one that ensures that all the
repository files for the contents of a directory are in a single
directory in the repository, without causing directories that are so
big.
I'm against this going into 0.10 - as a normal dev, not as release
manager. As release manager, if its in before monday, and its does not
cause regressions, fine.
Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20060818/c5978063/attachment.pgp
More information about the bazaar
mailing list