[Bug 737234] Re: too much data transferred making a new stacked branch
Jelmer Vernooij
737234 at bugs.launchpad.net
Wed Jun 8 09:15:15 UTC 2011
** Also affects: bzr (Ubuntu)
Importance: Undecided
Status: New
** Changed in: bzr (Ubuntu)
Status: New => Fix Released
** Also affects: bzr (Ubuntu Natty)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to bzr in Ubuntu.
https://bugs.launchpad.net/bugs/737234
Title:
too much data transferred making a new stacked branch
Status in Bazaar Version Control System:
Fix Released
Status in Bazaar 2.3 series:
Fix Released
Status in “bzr” package in Ubuntu:
Fix Released
Status in “bzr” source package in Natty:
In Progress
Bug description:
In thread "Linaro bzr feedback" John writes:
Note, I just did 'bzr branch lp:gcc-linaro', and it transferred about
500MB, about 457MB on disk. (Not bad considering lp:emacs transferred
400-500MB and was only 200MB on disk.)
I then ran 'bzr serve' and 'bzr branch --stacked bzr://localhost:...'.
What was scary was:
8141442kB 24128kB/s / Finding Revisions
...
> Grepping the .bzr.log file in question, I do, indeed see about 8.1GB of
> data transferred before we read the first .tix.
> If my grep fu is strong, then we only read 30MB of .cix data. Which
> leaves us with 8GB of .pack content, or actual CHK page content.
This is a change which drops the 8GB down to 150MB:
=== modified file 'bzrlib/inventory.py'
- --- bzrlib/inventory.py 2010-09-14 13:12:20 +0000
+++ bzrlib/inventory.py 2011-03-17 15:38:40 +0000
@@ -736,6 +736,13 @@
specific_file_ids = set(specific_file_ids)
# TODO? Perhaps this should return the from_dir so that the root is
# yielded? or maybe an option?
+ if from_dir is None and specific_file_ids is None:
+ # They are iterating from the root, assume they are iterating
+ # everything and preload all file_ids into the
+ # _fileid_to_entry_cache. This doesn't build things into
.children
+ # for each directory, but that will happen later.
+ for _ in self.iter_just_entries():
+ continue
if from_dir is None:
if self.root is None:
return
Basically, iter_entries_by_dir goes in a specific order which doesn't
match the order in the repository. 'iter_just_entries' loads everything
in repository order, and puts it into the
CHKInventory._file_id_entry_cache, and then the rest of the requests are
fed from there.
We don't usually notice this effect, because of the
chk_map._thread_caches.page_cache and the GCCHKRepository block cache.
Once the inventory is large enough to not be in the bytes cache, we have
to load it from the repository again.
I just checked, and this also has a large effect for local
repositories.
'time list(rev_tree.inventory.iter_entries_by_dir())'
drops from 4m30s down to 13s with the patch.
So we certainly should think about other ramifications, but short term
it looks quite good.
More information about the foundations-bugs
mailing list