Recommended backup procedure and preserving my data...

Fri Oct 16 09:24:36 BST 2009

On Thu, Oct 15, 2009 at 2:22 PM, John Arbash Meinel
<john at arbash-meinel.com> wrote:
[snip]
> There has been some work to make this possible. The current main problem
> is that bzr can represent "ghosts" (revisions whose identifier we know,
> but where we do not have the actual content for the revision.)

For my own edification, where do those situations come up?

> The fast-export stream does not seem to have a way to talk about those
> sorts of objects. (The stream is based on git, where everything is
> addressed as the hash of the content, thus if you can hash it, you must
> have the actual content available.)
>
> Aside from that, I thought Ian had round-tripping working. Though you
> would need to get that from *him*, since I've never done any experiments
> myself.
>
> I think it is something that you need the fast-export stream plus a
> 'marks' file that indicates what identifier bzr uses for each object in
> the stream.

Thanks, I'll add it my backlog of things to look at.

[snip some details]
> Well, I have to take that back slightly. If an autopack/pack was
> occurring at the same time you were backing up, it may move content into
> a new pack file, and rename it into upload, and then update the
> pack-names file. It won't write the new pack-names file until the data
> has been fully updated, though. So one option would be to loop. So you
> could grab pack-names, backup everything in .bzr/repository/packs and
> indices, and then check to see if pack-names has been updated. If it
> has, go around again until you have successfully copied everything
> without pack-names changing.
>
> Note that this is what 'bzr branch' does, which is why I recommend
> staging everything to a 'warm backup' location first.

Yeah, I'd rather leave the heavy lifting to Bazaar as well. :-)

[snip]
>> It doesn't need to be absolutely minimal churn... I can cope with the
>> autopacking.  We don't have much (in terms of size), but he have 50 or
>> more Subversion repositories at the moment.  And it seems to grow
>> every week. :-)
>
> As for Bazaar repos, you can have as many or as few as works for you.
> You can share multiple projects in one repo, or have one repo per
> project, or one repo per branch... The actual layout tends to be
> dictated by access control (balanced against disk storage).

Is there some limit on throughput?  Seems like that at some point that
would have to become a factor, since the actual revisions would be in
the shared repo, correct?

I'd more than likely go with the shared-repo-per-project approach.

>> [snip]
>>> You can use 'bzr_access', you can use bzr+http + .htaccess files. You
>>> can use just "bzr://" access and just use firewall rules to restrict who
>>> can actually access the server.
>>
>> Is that new?  I don't remember seeing that in the guides.  What's the
>> performance impact?
>
> I'm not sure what guide you would be looking for, but it has been around
> for quite some time. Basically, it just uses 'POST' as an RPC layer to
> send requests. The protocol is specifically designed to be 'stateless'
> so that we can tunnel over HTTP. So the specific impact should be
> negligible to performing better than 'ssh' because you don't have the
> ssh handshake overhead.

I guess I still have the issue mentioned here though:
    http://doc.bazaar-vcs.org/latest/en/user-guide/http_smart_server.html#pushing-over-bzr-http

That is, I need to do something so that I can segregate write access
and read access.

FWIW, I didn't think there was an http smart server because "Running a
smart server" page in the User's Guide never mentions it:
    http://doc.bazaar-vcs.org/latest/en/user-guide/server.html

> Though if you do "bzr+https://" then it should essentially be identical.

Right.

> Note that Loggerhead now supports:
>
>  bzr serve --http
>
> Which provides a bzr smart server pre-configured. And many people like
> to run loggerhead and proxy it through Apache. I *think* that in doing
> so you can get ACL at the Apache layer, and minimal setup overhead via
> loggerhead. Not to mention nice visuals when you manually browse to
> "http://host/my/branch".

I don't understand how that gets me ACLs.  Aren't the incoming
requests still the same?  If I could get ACLs using Apache as a proxy,
shouldn't I be able to do so without the proxy portion?  Sorry for
being thick, I just don't get what is enabling ACLs in this case.

[snip]
>> Thanks for taking the time to answer my questions John!  BTW, are any
>> of you guys going to be at PyCon?  I'd love to meet you.
>>
>> Thanks again!
>>
>> -John
>>
>
> I'm not sure if we are sending anyone this year. I've gone to the last
> 2, because they were in Chicago (about 1hr away from where I live).
> Being in Atlanta would require traveling away from my family...

Ah.  I swung by the room last year, but you had already left.

> I know in the past we've had quite a few Canonical people travel to
> PyCon, I just don't know if someone from the Bazaar group will be
> specifically going.

I'll keep an eye out.  You're a good group of devs, and I think it's
nice to see faces once in a while. :-)

-John