Rev 2505: (robertc) More London sprint documentation in file:///home/pqm/archives/thelove/bzr/%2Btrunk/

Canonical.com Patch Queue Manager pqm at pqm.ubuntu.com
Mon Jun 4 07:29:28 BST 2007


At file:///home/pqm/archives/thelove/bzr/%2Btrunk/

------------------------------------------------------------
revno: 2505
revision-id: pqm at pqm.ubuntu.com-20070604062925-0a2e6fvr3qpfngzo
parent: pqm at pqm.ubuntu.com-20070602184854-kwqaduxs0b19r76n
parent: robertc at robertcollins.net-20070604040720-c5ti0k49w0ye8zcl
committer: Canonical.com Patch Queue Manager<pqm at pqm.ubuntu.com>
branch nick: +trunk
timestamp: Mon 2007-06-04 07:29:25 +0100
message:
  (robertc) More London sprint documentation
added:
  doc/developers/add.txt         add.txt-20070515094933-xhgz3xjc7o0edok0-2
  doc/developers/annotate.txt    annotate.txt-20070515142136-rq51c4kqhwrjsh8k-1
  doc/developers/gc.txt          gc.txt-20070515102609-90x5kzjokrurfbke-1
  doc/developers/revert.txt      revert.txt-20070515111013-grc9hgp21zxqbwbl-1
modified:
  doc/developers/incremental-push-pull.txt incrementalpushpull.-20070508045640-zneiu1yzbci574c6-1
  doc/developers/merge-scaling.rst mergescaling.rst-20070527173558-rqaqxn1al7vzgcto-2
  doc/developers/performance-roadmap.txt performanceroadmap.t-20070507174912-mwv3xv517cs4sisd-2
    ------------------------------------------------------------
    revno: 2485.4.8
    merged: robertc at robertcollins.net-20070604040720-c5ti0k49w0ye8zcl
    parent: robertc at robertcollins.net-20070604033948-erld0bugxcbo62gf
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: roadmap
    timestamp: Mon 2007-06-04 14:07:20 +1000
    message:
      Note that multiparent deltas affect merge to, from the mailing list review discussion.
    ------------------------------------------------------------
    revno: 2485.4.7
    merged: robertc at robertcollins.net-20070604033948-erld0bugxcbo62gf
    parent: robertc at robertcollins.net-20070604005154-yvx2q8jnwiprw6du
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: roadmap
    timestamp: Mon 2007-06-04 13:39:48 +1000
    message:
      Review feedback.
    ------------------------------------------------------------
    revno: 2485.4.6
    merged: robertc at robertcollins.net-20070604005154-yvx2q8jnwiprw6du
    parent: robertc at robertcollins.net-20070518181212-acujb9ai3r3ds8s3
    parent: pqm at pqm.ubuntu.com-20070602184854-kwqaduxs0b19r76n
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: roadmap
    timestamp: Mon 2007-06-04 10:51:54 +1000
    message:
      Merge bzr.dev
    ------------------------------------------------------------
    revno: 2485.4.5
    merged: robertc at robertcollins.net-20070518181212-acujb9ai3r3ds8s3
    parent: robertc at robertcollins.net-20070515143142-1u0xo65hyhewgkyw
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: roadmap
    timestamp: Sat 2007-05-19 04:12:12 +1000
    message:
      Incremental push-pull notes.
    ------------------------------------------------------------
    revno: 2485.4.4
    merged: robertc at robertcollins.net-20070515143142-1u0xo65hyhewgkyw
    parent: robertc at robertcollins.net-20070515112623-rb3uq4tern5lsn6l
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: roadmap
    timestamp: Wed 2007-05-16 00:31:42 +1000
    message:
      Add annotate roadmap.
    ------------------------------------------------------------
    revno: 2485.4.3
    merged: robertc at robertcollins.net-20070515112623-rb3uq4tern5lsn6l
    parent: robertc at robertcollins.net-20070515102628-j2e0lgz5k42ckyzh
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: roadmap
    timestamp: Tue 2007-05-15 21:26:23 +1000
    message:
      Add revert analysis.
    ------------------------------------------------------------
    revno: 2485.4.2
    merged: robertc at robertcollins.net-20070515102628-j2e0lgz5k42ckyzh
    parent: robertc at robertcollins.net-20070515094946-9nrohtlybz2i5jt9
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: roadmap
    timestamp: Tue 2007-05-15 20:26:28 +1000
    message:
      Add gc analysis
    ------------------------------------------------------------
    revno: 2485.4.1
    merged: robertc at robertcollins.net-20070515094946-9nrohtlybz2i5jt9
    parent: pqm at pqm.ubuntu.com-20070510055501-w262sk5hl33vmd19
    committer: Robert Collins <robertc at robertcollins.net>
    branch nick: roadmap
    timestamp: Tue 2007-05-15 19:49:46 +1000
    message:
      add analysis
=== added file 'doc/developers/add.txt'
--- a/doc/developers/add.txt	1970-01-01 00:00:00 +0000
+++ b/doc/developers/add.txt	2007-06-04 03:39:48 +0000
@@ -0,0 +1,34 @@
+Add
+---
+
+Add is used to recursively version some paths supplied by the user. Paths that
+match ignore rules are not versioned, and paths that become versioned are
+versioned in the nearest containing bzr tree. Currently we only do this within
+a single tree, but perhaps with nested trees this should change.
+
+Least work we can hope to perform
+=================================
+
+* Read a subset of the full versioned paths data for the tree matching the scope of the paths the user supplied.
+* Seek once to each directory within the scope and readdir its contents.
+* Probe if each directory is a child tree to avoid adding data for paths within a child tree.
+* Calculate the ignored status for paths not previously known to be ignored
+* Write data proportional to the newly versioned file count to record their versioning.
+* Assign a fileid for each path (so that merge --uncommitted can work immediately)
+
+Optionally:
+
+* Print the ignore rule for each ignored path in the scope.
+* Print the path of each added file.
+* Print the total count of ignored files within the scopes.
+* Record the result of calculating ignored status for ignored files.
+  (proportional to the number we actually calculate).
+
+Per file algorithm
+==================
+
+#. If the path is versioned, and it is a directory, push onto the recurse stack.
+#. If the path is supplied by the user or is not ignored, version it, and if a 
+   directory, push onto the recurse stack. Versioning the path may require
+   versioning the paths parents.
+#. Output or otherwise record the ignored rule as per the user interface selected.

=== added file 'doc/developers/annotate.txt'
--- a/doc/developers/annotate.txt	1970-01-01 00:00:00 +0000
+++ b/doc/developers/annotate.txt	2007-05-15 14:31:42 +0000
@@ -0,0 +1,28 @@
+Annotate
+--------
+
+Broadly tries to ascribe parts of the tree state to individual commits.
+
+There appear to be three basic ways of generating annotations:
+
+If the annotation works by asking the storage layer for successive full texts
+then the scaling of this will be proportional to the time to diff throughout
+the history of thing being annotated.
+
+If the annotation works by asking the storage layer for successive deltas
+within the history of the thing being annotated we believe we can make it scale
+broadly proportional to the depth of the tree of revisions of the annotated
+object.
+
+If the annotation works by combining cached annotations such that creating a
+full text recreates annotations for it then it will scale with the cost of
+obtaining that text.
+
+Generally we want our current annotations but it would be nice to be able to do
+whitespace annotations and potentially other diff based annotations.
+
+Some things to think about:
+
+ * Perhaps multiparent deltas would allow us to not store the cached
+   annotations in each delta without losing performance or accuracy.
+

=== added file 'doc/developers/gc.txt'
--- a/doc/developers/gc.txt	1970-01-01 00:00:00 +0000
+++ b/doc/developers/gc.txt	2007-05-15 14:31:42 +0000
@@ -0,0 +1,26 @@
+Garbage Collection
+------------------
+
+Garbage collection is used to remove data from a repository that is no longer referenced.
+
+Generally this involves locking the repository and scanning all its branches
+then generating a new repository with less data.
+
+Least work we can hope to perform
+=================================
+
+* Read all branches to get initial references - tips + tags.
+* Read through the revision graph to find unreferenced revisions. A cheap HEADS
+  list might help here by allowing comparison of the initial references to the
+  HEADS - any unreferenced head is garbage.
+* Walk out via inventory deltas to get the full set of texts and signatures to preserve.
+* Copy to a new repository
+* Bait and switch back to the original
+* Remove the old repository.
+
+A possibility to reduce this would be to have a set of grouped 'known garbage
+free' data - 'ancient history' which can be preserved in total should its HEADS
+be fully referenced - and where the HEADS list is deliberate cheap (e.g. at the
+top of some index).
+
+possibly - null data in place without saving size.

=== added file 'doc/developers/revert.txt'
--- a/doc/developers/revert.txt	1970-01-01 00:00:00 +0000
+++ b/doc/developers/revert.txt	2007-06-04 03:39:48 +0000
@@ -0,0 +1,26 @@
+Revert
+------
+
+Change users selected paths to be the same as those in a given revision making
+backups of any paths that bzr did not set the last contents itself.
+
+Least work we can hope to perform
+=================================
+
+We should be able to do work proportional to the scope the user is reverting
+and the amount of changes between the working tree and the revision being
+reverted to.
+
+This depends on being able to compare unchanged subtrees without recursing so that the mapping of paths to revert to ids to revert can be done efficiently. Specifically we should be able to avoid getting the transitive closure of directory contents when mapping back to paths from ids at the start of revert.
+
+One way this might work is to:
+for the selected scopes, for each element in the wt:
+
+ 1. get hash tree data for that scope.
+ 1. get 'new enough' hash data for the siblings of the scope: it can be out of date as long as its not older than the last move or rename out of that siblings scope.
+ 1. Use the hash tree data to tune the work done in finding matching paths/ids which are different in the two trees.
+
+For each thing that needs to change - group by target directory?
+ 
+ 1. Extract new content.
+ 1. Backup old content or replace-in-place (except windows where we move and replace).

=== modified file 'doc/developers/incremental-push-pull.txt'
--- a/doc/developers/incremental-push-pull.txt	2007-05-09 15:36:06 +0000
+++ b/doc/developers/incremental-push-pull.txt	2007-06-04 03:39:48 +0000
@@ -240,3 +240,40 @@
 transmission method to reasonably closely match the desired write ordering
 locally. This suggests that once we decide on the best local storage means we
 should design the api.
+
+
+take N commits from A to B, if B is local then merge changes into the tree.
+copy ebough data to recreate snapshots
+avoid ending up wth corrupt/bad data
+
+Notes from London
+=================
+
+ #. setup
+
+   look at graph of revisions for ~N comits to deretmine eligibility for 
+   if preserve mainline is on, check LH only
+
+    identify objects to send that are not on the client repo
+      - revision - may be proportional to the graph
+      - inventory - proportional to work
+      - texts     - proportional to work
+      - signatures - ???
+
+ #. data transmission
+
+  * send data proportional to the new information
+  * validate the data:
+
+   #. validate the sha1 of the full text of each transmitted text.
+   #. validate the sha1:name mapping in each newly referenced inventory item.
+   #. validate the sha1 of the XML of each inventory against the revision.
+      *** this is proportional to tree size and must be fixed ***
+
+ #. write the data to the local repo.
+    The API should output the file texts needed by the merge as by product of the transmission
+
+ #. tree application
+
+Combine the output from the transmission step with additional 'new work data' for anything already in the local repository that is new in this tree.
+should write new files and stat existing files proportional to the count of the new work and the size of the full texts.

=== modified file 'doc/developers/merge-scaling.rst'
--- a/doc/developers/merge-scaling.rst	2007-05-27 17:45:50 +0000
+++ b/doc/developers/merge-scaling.rst	2007-06-04 04:07:20 +0000
@@ -29,3 +29,7 @@
 - Access to revision graph proportional to number of revisions read
 - Access to changed file metadata proportional to number of changes and number of intervening revisions.
 - O(1) access to fulltexts
+
+Notes
+=====
+Multiparent deltas may offer some nice properties for performance of annotation based merging.

=== modified file 'doc/developers/performance-roadmap.txt'
--- a/doc/developers/performance-roadmap.txt	2007-05-27 18:47:54 +0000
+++ b/doc/developers/performance-roadmap.txt	2007-06-04 00:51:54 +0000
@@ -16,6 +16,14 @@
 
 .. include:: incremental-push-pull.txt
 
+.. include:: add.txt
+
+.. include:: gc.txt
+
+.. include:: revert.txt
+
+.. include:: annotate.txt
+
 .. include:: merge-scaling.rst
 
 .. include:: bundle-creation.rst




More information about the bazaar-commits mailing list