Signing snapshots

Tue Jun 21 14:48:53 BST 2005

On 21 Jun 2005, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> Hi all,
> 
> Part of the plan for signing in bzr was to sign the snapshot, not the
> data generated from it (i.e. the revision store gzips or whatever).
> 
> In #arch, Andrew Suffield has listed a couple of reasons why he thinks
> this is a terrible idea.
> 
> 00:29 < abentley> asuffield: let's say as a straw-man, we took an
>                   inventory of the tree, with SHA-1 sums, sorted that
> 		  inventory in a rigorously defined way, and signed it. 			   What
> kind of holes would you expect to find?

As we discussed a little while ago, we primarily plan to actually sign
the revision, which includes by reference the inventory.  That doesn't
make any difference to asuffield's points though.

> 00:30 < asuffield> abentley: I would expect to find DoS attacks against
> 		   the inventory process and ways to slip files past it
> 		   which never appear in the inventory, and that's
> 		   without even thinking about it

I think that is less plausible with bzr than with arch; files which
aren't in the inventory simply don't exist from bzr's point of view,
and won't be considered for merging.  I suppose someone could try
tricks with those files existing as ignored or unknown, but that just
means the tool must never assume those are covered by the signature,
of course.

> 00:31 < asuffield> I would also expect to find implementation bugs that
> 		   were exploitable, probably suitable for remote
> 		   arbitrary code execution

This is certainly a good point; the verification should be done as
early as possible in the pipe, so that untrusted data gets to touch
the least code.

>From this perspective the tla approach of writing the hash of the
files then signing the hashes is rather nice.  All we need to do with
untrusted data is calculate its hash, and we can be reasonably sure
that there won't be vulnerabilities in the SHA-1 calculator.  There
might be some in the code that parses the checksum file or the gpg
signature.  On the other hand this approach flakes out of the more
important problem of evaluating whether the code is signed by a
meaningful key.

One approach is to just put a GPG signature next to every revision
file, and verify that before reading the revision.  In that case the
only exploitable code is GPG itself.

  gpg --detach-sign .bzr/revision-store/thingthing

Perhaps the most interesting attack method is to mail someone a
malicious changeset, because this avoids the need to convince the
targetted user to access a malicious server.

Processing untrusted data is always a risk.  I propose a defence in
several lines:

 - Don't process data expect under the user's directions; that is to
   say malicious data should only get into the program by e.g. the
   user typing the URL of a malicious server.

   (As I understand it monotone allows you to have untrusted data
   inside your local database, which while designed to be safe does
   feel a bit unhygenic to me.)

 - Authenticate data as soon as possible in processing it; make this
   give a reasonable level of security by default.

   The aim is to either confine attacks to the very front of the
   program, or limit them to those signed by a trusted key.

 - Try to carefully handle data that does get further in.

Regardless of what signing method we use, it's possible that people
will create malicious changesets signed by trusted keys.  Then it just
comes down to whether the program has any vulnerabilities throughout.
We can aim for that but historically it's rarely achieved.

> He also pointed out that there have been exploits against gzip in the
> past, that that, in his estimate neither tar nor gzip can be considered
> secure.  Good thing we don't use tar, I guess :-)

asuffield will probably disagree, but I don't feel obliged to design
something safe in the presence of holes in gzip.

-- 
Martin