[MERGE] sha_file_by_name using raw os files; -Dhashcache

John Arbash Meinel john at arbash-meinel.com
Fri Oct 5 18:10:07 BST 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> While profiling towards https://bugs.edge.launchpad.net/bzr/+bug/146176
> it seemed that we were double-buffering files while hashing them.  This
> seems about 10% faster but it's somewhat unstable to measure.  If someone
> else would like to confirm or deny it that would be useful.

I did some testing on Manganese. And with the Mozilla tree (and forcing every
cached value to miss), I get:

time bzr.dev status
10 loops, best of 3: 8.03 sec per loop

time bzr.patched status
10 loops, best of 3: 6.08 sec per loop

time bzr.mmap status [1]
~6.5s

time bzr.subprocess status
9min 26s (Obviously not the way to go, and it seemed to give the wrong values
anyway. :)
I get a time of about 2ms to spawn sha1sum, *49k files = 98sec, so I think it
will always be slower.




[1]: this is the code I was using for mmap:
def sha_file_by_name(fname):
    """Calculate the SHA1 of a file by reading the full text"""
    fn = os.open(fname, os.O_RDONLY)
    try:
        # The documentation says you can use 0 to set it to the full size of the
        # file, but in testing this does not work
        size = os.fstat(fn).st_size
        if size == 0:
            return sha.new().hexdigest()
        mem = mmap.mmap(fn, size, access=mmap.ACCESS_READ)
        try:
            digest = sha.new(mem).hexdigest()
        finally:
            mem.close()
    finally:
        os.close(fn)
    return digest

I'm guessing that creating a couple extra objects (there is an fstat and a mmap
object that is created) is why this is slower than just doing the read directly.

So

BB:approve

It seems to be genuinely better for me. (without sha1 it is about 2s, so this
is 6s => 4s or 50% faster)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHBm/uJdeBCYSNAAMRAuPjAJsHtXw3kpQXxtuSow8Z8VHJOAFZLwCgjoZ7
Iw0MeuiSXqIV6Z7V0Sg+mxQ=
=t+Mf
-----END PGP SIGNATURE-----



More information about the bazaar mailing list