Signing snapshots

Tue Jun 21 16:28:15 BST 2005

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Martin Pool wrote:
> On 21 Jun 2005, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> 
>>Hi all,
>>
>>Part of the plan for signing in bzr was to sign the snapshot, not the
>>data generated from it (i.e. the revision store gzips or whatever).

> As we discussed a little while ago, we primarily plan to actually sign
> the revision, which includes by reference the inventory.  That doesn't
> make any difference to asuffield's points though.

Yes.  I think that needs to be done very carefully, though.  We want to
be able to upgrade the signatures without invalidating old signatures.

For example, if you sign a hash of
mbp at sourcefrog.net-20050309040815-13242001617e4a06, and the hash
algorithm is later broken, it should be possible to re-sign that
revision using a later hash, yet still be able verify it using the old
hash.  And it would also be nice to be able to remove the old hash
without disturbing the new hash.

So I guess what I'm saying is, when generating hashes, you should not
pay any attention to hashes generated using a different algorithm.  If
you're generating an SHA-1, you should only look at SHA-1 hashes, not
MD5 or SHA-160.  If you're generating a SHA-160, you should only look at
SHA-160 hashes for the file/inventory/etc.

In light of this, I don't know what to make of the recently-added
"revision_sha1" attribute for parent revisionss.  I thought the notion
was that we would sign the entire revision history.  This means that
creating a sha-160 signature for a revision requires adding sha-160s to
every ancestor revision.  I think this makes merge horizons impossible.

Also, I think signing snapshots makes sense because not every snapshot
is a revision.  (Or is it?)  Requiring people to commit in order to
produce changesets seems onerous.

>>00:30 < asuffield> abentley: I would expect to find DoS attacks against
>>		   the inventory process and ways to slip files past it
>>		   which never appear in the inventory, and that's
>>		   without even thinking about it
> 
> 
> I think that is less plausible with bzr than with arch; files which
> aren't in the inventory simply don't exist from bzr's point of view,
> and won't be considered for merging.

Hmm.  True.  The files may not even be stored in a temporary directory,
for ChangesetTrees or when merge is better-integrated.

>>00:31 < asuffield> I would also expect to find implementation bugs that
>>		   were exploitable, probably suitable for remote
>>		   arbitrary code execution
> 
> 
> This is certainly a good point; the verification should be done as
> early as possible in the pipe, so that untrusted data gets to touch
> the least code.
> 
>>From this perspective the tla approach of writing the hash of the
> files then signing the hashes is rather nice.

Yes, this is what asuffield was pushing as the only sane option.  His
case was that signing anything more abstract would always lead to holes.

> All we need to do with
> untrusted data is calculate its hash, and we can be reasonably sure
> that there won't be vulnerabilities in the SHA-1 calculator.  There
> might be some in the code that parses the checksum file or the gpg
> signature.  On the other hand this approach flakes out of the more
> important problem of evaluating whether the code is signed by a
> meaningful key.

Sorry, didn't parse that.

> One approach is to just put a GPG signature next to every revision
> file, and verify that before reading the revision.  In that case the
> only exploitable code is GPG itself.
> 
>   gpg --detach-sign .bzr/revision-store/thingthing

I wonder whether there's a useful difference between trusted and
authoritative?  E.g., I will trust John Meinel's signature to prove that
data is not malicious, but I will only trust your signature to prove
that the revision produced is actually
mbp at sourcefrog.net-20050620052204-c4253c3feb664088.

> Perhaps the most interesting attack method is to mail someone a
> malicious changeset, because this avoids the need to convince the
> targetted user to access a malicious server.

Sure.  The risk is limited to compromising the implementation of
ChangesetTree, though.

> Processing untrusted data is always a risk.  I propose a defence in
> several lines:
...
>  - Authenticate data as soon as possible in processing it; make this
>    give a reasonable level of security by default.

I'd suggest that we retain that authentication data as well, so that we
can determine later that data is signed by a compromised key.

> Regardless of what signing method we use, it's possible that people
> will create malicious changesets signed by trusted keys.  Then it just
> comes down to whether the program has any vulnerabilities throughout.
> We can aim for that but historically it's rarely achieved.

Yeah, though the impossibility of certain kinds of overflows in Python
does work to our advantage.  But bug-free code is not an attainable ideal.

Often systems that need to handle untrusted data will have a way to drop
privilages and/or use a sandbox.  Paranoia might lead to a StreamTree
class that communicated with a chrooted bzr over a pipe.  Then it's just
a few short steps to a smart server...

Aaron
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCuDIP0F+nu1YWqI0RAi6RAJwNvOo6ZZzjnTnpmKlBYYaAaEhu7gCfYIVx
OahyQfdXYggQH/x0C22Ij/M=
=ZmnI
-----END PGP SIGNATURE-----