Rev 2497: Add performance analysis docs in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Sun May 27 20:38:08 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2497
revision-id: pqm at pqm.ubuntu.com-20070527193804-znv53vjd04u6es6x
parent: pqm at pqm.ubuntu.com-20070526160035-utugnd3he5zvo60s
parent: aaron.bentley at utoronto.ca-20070527184754-p0w83p0v54ejks8s
committer: Canonical.com Patch Queue Manager<pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Sun 2007-05-27 20:38:04 +0100
message:
  Add performance analysis docs
added:
  doc/developers/bundle-creation.rst bundlecreation.rst-20070527173558-rqaqxn1al7vzgcto-1
  doc/developers/initial-push-pull.rst initialpushpull.rst-20070527184539-wodba32mi5dehhct-1
  doc/developers/merge-scaling.rst mergescaling.rst-20070527173558-rqaqxn1al7vzgcto-2
  doc/developers/performance.dot performance.dot-20070527173558-rqaqxn1al7vzgcto-3
modified:
  doc/developers/performance-roadmap.txt performanceroadmap.t-20070507174912-mwv3xv517cs4sisd-2
    ------------------------------------------------------------
    revno: 2495.2.2
    merged: aaron.bentley at utoronto.ca-20070527184754-p0w83p0v54ejks8s
    parent: aaron.bentley at utoronto.ca-20070527174550-3su5c5701pi9vv1y
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: bzr.docs
    timestamp: Sun 2007-05-27 14:47:54 -0400
    message:
      Add initial push/pull analysis
    ------------------------------------------------------------
    revno: 2495.2.1
    merged: aaron.bentley at utoronto.ca-20070527174550-3su5c5701pi9vv1y
    parent: pqm at pqm.ubuntu.com-20070525050023-ip6kst9coq8a32z5
    committer: Aaron Bentley <aaron.bentley at utoronto.ca>
    branch nick: bzr.docs
    timestamp: Sun 2007-05-27 13:45:50 -0400
    message:
      Add bundle creation and merge scaling analysis
=== added file 'doc/developers/bundle-creation.rst'
--- a/doc/developers/bundle-creation.rst	1970-01-01 00:00:00 +0000
+++ b/doc/developers/bundle-creation.rst	2007-05-27 17:45:50 +0000
@@ -0,0 +1,37 @@
+Bundle Creation
+===============
+1. Find common ancestor [O(a)] **O(b)**
+2. Emit bundle [O(a)] **O(b) O(h)**
+
+  Per revision
+
+  1. emit metadata O(1)
+  2. emit changes for files
+
+    1. find changed files [O(c)] **O(f)**
+    2. emit file metadata O(d)
+    3. emit diff [O(e * e) * O(f) + O(h)] **O(i)**
+    4. base64 encode O(g)
+
+3. **emit overal diff (or maybe do interdiff) O(e * e) * O(f)**
+
+:a: nodes in revision graph
+:b: number of descendants of common ancestor
+:c: number of files in the tree
+:d: length of metadata
+:e: number of lines
+:f: number of modified files
+:g: length of diff
+:h: nodes in knit graph of modified files
+:i: length of stored diff
+
+Needs
+=====
+- Improved common ancestor algorithm
+- Access to partial revision graph proportional to relevant revisions
+- Access to changed files proportional to number of change files and
+  intervening revisions
+- Use knit deltas without recomputing
+- Access to knit deltas in O(1) time
+- Access to snapshots in O(1) amortized time
+- All snapshots must have knit deltas

=== added file 'doc/developers/initial-push-pull.rst'
--- a/doc/developers/initial-push-pull.rst	1970-01-01 00:00:00 +0000
+++ b/doc/developers/initial-push-pull.rst	2007-05-27 18:47:54 +0000
@@ -0,0 +1,69 @@
+Initial push / pull
+###################
+
+Optimal case
+------------
+(a motivating example of ultimate performance)
+Assume there is a file with exactly the right data in compressed form.  This
+may be a tarred branch, a bundle, or a blob format.  Performance in this case
+scales with the size of the file.
+
+Disk case
+---------
+Assume current repo format.  Attempt to achieve parity with ``cp -r``.  Read
+each file only 1 time.
+
+- read knit graph for revisions
+- write filtered copy of revision knit O(d+a)
+- write filtered copy of knit index O(d)
+- Open knit index for inventory
+- Write a filtered copy of inventory knit and simultaneously not all referenced
+  file-ids O(b+d)
+- Write filtered copy of inventory knit index O(d)
+- For each referenced file-id:
+
+  - Open knit index for each file knit O(e)
+  - If acceptable threshold of irrelevant data hard-link O(f)
+  - Otherwise write filtered copy of text knit and simultaneously write
+    the fulltext to tree transform O(h)
+
+- Write format markers O(1)
+
+:a: size of aggregate revision metadata
+:b: size of inventory changes for all revisions
+:c: size of text changes for all files and all revisions (e * g)
+:d: number of relevant revisions
+:e: number of relevant versioned files
+:f: size of the particular versioned file knit index
+:g: size of the filtered versioned file knit
+:h: size of the versioned file fulltext
+:i: size of the largest file fulltext
+
+Smart Network Case
+------------------
+
+Phase 1
+=======
+Push: ask if there is a repository, and if not, what formats are okay
+Pull: Nothing
+
+Phase 2
+=======
+Push: send initial push command, streaming data in acceptable format, following
+disk case strategy
+Pull: receive initial pull command, specifying format
+
+Pull client complexity: O(a), memory cost O(1)
+Push client complexity: procesing and memory cost same as disk case
+
+Dumb Network Case
+-----------------
+Pull: same as disk case, but request all file knit indices at once and request
+al file knits at once.
+Push: same as disk case, but write all files at once.
+
+Wants
+-----
+- Read partial graph
+- Read multiple segments of multiple files on http and sftp
+- Write multiple files over SFTP

=== added file 'doc/developers/merge-scaling.rst'
--- a/doc/developers/merge-scaling.rst	1970-01-01 00:00:00 +0000
+++ b/doc/developers/merge-scaling.rst	2007-05-27 17:45:50 +0000
@@ -0,0 +1,31 @@
+Scaling analysys of Merge
+=========================
+
+1. Fetch revisions O(a)
+2. Common Ancestor [O(b)] **O(h)**
+3. Calculate tree merge O(c) [+ O(b) + O(d)] **+ O(i)**
+
+ - text merge O(e * e * f) + O(b)
+
+4. Find filesystem conflicts O(c)
+5. Resolve filesystem conflicts O(g)
+6. Apply changes O(c) + O(log(d))
+7. Set pending merges O(1)
+8. Print conflicts O(g)
+9. Print changes O(c)
+
+:a: revisions missing from repo:
+:b: nodes in the revision graph:
+:c: files that differ between base and other:
+:d: number of files in the tree
+:e: number of lines in the text
+:f: number number of files requiring text merge
+:g: number of conflicts (g <= c)
+:h: humber of uncommon ancestors
+:i: number of revisions between base and other
+
+Needs
+=====
+- Access to revision graph proportional to number of revisions read
+- Access to changed file metadata proportional to number of changes and number of intervening revisions.
+- O(1) access to fulltexts

=== added file 'doc/developers/performance.dot'
--- a/doc/developers/performance.dot	1970-01-01 00:00:00 +0000
+++ b/doc/developers/performance.dot	2007-05-27 18:47:54 +0000
@@ -0,0 +1,32 @@
+/* ESTIMATES ARE VERY ROUGH APPROXIMATIONS */
+digraph performance {
+  gdfo_api -> gdfo_cache;
+  gdfo_api -> gdfo_usage;
+  gdfo_api[label="GDFO API\n1 day"];
+  gdfo_cache[label="GDFO Cache\n1 week"];
+  gdfo_usage[label="GDFO Usage\n3 days"];
+  data_collation[label="Data co-location API\n1 month"];
+  repository_stacking[label="Repository stacking API\n2 months"];
+  bundle_container[label="Bundle container format\n2 weeks"]
+  xdelta[label="Xdelta sanity/learning\n2 weeks"];
+  xdelta_imp[label="Xdelta implementation\n1 week"];
+  xdelta -> xdelta_imp;
+  q_splitting[label="Question radix directory splitting\n2 weeks"];
+  i_splitting[label="Implement inventory splitting\n6-8 weeks"]
+  q_splitting -> i_splitting;
+  get_weave[label="deprecate get_weave\n1 week"];
+  per_file_graph -> get_weave;
+  per_file_graph[label="Access for per-file graph data\n1 days"];
+  repo_apis[label="For each use case, build targeted repo agnostic APIs\n10-40 days"];
+  rev_validators[label="Revision validators (use in GPG sigs etc) may poly\n3 days"];
+  anno_cache[label="Annotations become a cache:\n logically separate data\n2 weeks"]
+  anno_regen[label="Annotation regeneration\n"];
+  anno_kinds[label="Different styles of annotation"];
+  anno_regen -> anno_kinds;
+  anno_cache -> anno_regen;
+  memory_copies[label="Stop requiring full memory copies of files"];
+  wt_disk_order[label="Working Tree disk ordering\n6-8 weeks"];
+  repo_disk_order[label="Repository disk ordering\n1 month"];
+  graph_api[label="Network-efficient revision-graph API\n3 week"];
+  iter_merge[label="iter_changes based merge\n2 days"];
+}

=== modified file 'doc/developers/performance-roadmap.txt'
--- a/doc/developers/performance-roadmap.txt	2007-05-09 15:36:06 +0000
+++ b/doc/developers/performance-roadmap.txt	2007-05-27 18:47:54 +0000
@@ -12,4 +12,10 @@
 
 .. include:: performance-use-case-analysis.txt
 
+.. include:: initial-push-pull.rst
+
 .. include:: incremental-push-pull.txt
+
+.. include:: merge-scaling.rst
+
+.. include:: bundle-creation.rst




More information about the bazaar-commits mailing list