Signing snapshots

Wed Jun 22 03:14:41 BST 2005

On 21 Jun 2005, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> Martin Pool wrote:
> > On 21 Jun 2005, Aaron Bentley <aaron.bentley at utoronto.ca> wrote:
> > 
> >>Hi all,
> >>
> >>Part of the plan for signing in bzr was to sign the snapshot, not the
> >>data generated from it (i.e. the revision store gzips or whatever).
> 
> > As we discussed a little while ago, we primarily plan to actually sign
> > the revision, which includes by reference the inventory.  That doesn't
> > make any difference to asuffield's points though.
> 

> Yes.  I think that needs to be done very carefully, though.  We want to
> be able to upgrade the signatures without invalidating old signatures.
> 
> For example, if you sign a hash of
> mbp at sourcefrog.net-20050309040815-13242001617e4a06, and the hash
> algorithm is later broken, it should be possible to re-sign that
> revision using a later hash, yet still be able verify it using the old
> hash.  And it would also be nice to be able to remove the old hash
> without disturbing the new hash.

There are a few possibilities.  For one thing, we can just have
multiple signatures.  For example if my signing key expires or is
revoked, I might want to go through and add signatures using a
different key.

If we decide we want to use a stronger hash algorithm then we'll
probably want to not just add a new signature at the top level, but
also regenerate inventories and revision records that use the stronger
hash.  Since that changes the text of the revision the original
signature will not be valid.  One approach is to let that happen and
just re-sign the revisions.   

Another is to keep the old form of the object with the old signature,
and make a new form of the object with a new signature.  That means
that mbp at sourcefrog-29381938123 will refer to more objects that have
different texts and signatures, but supposedly equivalent meaning.
I'm not completely sure that would be worth the possible confusion but
it is an option.

It might be plausible to, say, keep the old signatures for
verification, move that archive aside, and make a new archive with
upgraded signatures.

> So I guess what I'm saying is, when generating hashes, you should not
> pay any attention to hashes generated using a different algorithm.  If
> you're generating an SHA-1, you should only look at SHA-1 hashes, not
> MD5 or SHA-160.  If you're generating a SHA-160, you should only look at
> SHA-160 hashes for the file/inventory/etc.

At the moment, when we generate a hash, we simply make a hash of the
text; the code that generates a hash of the inventory doesn't know or
care what kind of hash the inventory uses to identify the contained
files.

I guess we could do what you say by pre-processing the inventory file
to strip out all the hashes but those relevant to the one we're
computing.

> In light of this, I don't know what to make of the recently-added
> "revision_sha1" attribute for parent revisionss.  I thought the notion
> was that we would sign the entire revision history.  This means that
> creating a sha-160 signature for a revision requires adding sha-160s to
> every ancestor revision.  I think this makes merge horizons impossible.

I don't see why that follows; we could have a sequence of revisions
where at some point we switch from using sha-1 to sha-160.

> Also, I think signing snapshots makes sense because not every snapshot
> is a revision.  (Or is it?)  Requiring people to commit in order to
> produce changesets seems onerous.

I'm not quite sure what you mean.  I think normally I would ask people
to commit before e.g. submitting a changeset by mail, because
otherwise we don't have any good identifier of what was submitted.

> >>From this perspective the tla approach of writing the hash of the
> > files then signing the hashes is rather nice.
> 
> Yes, this is what asuffield was pushing as the only sane option.  His
> case was that signing anything more abstract would always lead to
> holes.

Remember that gpg internally hashes the input data before computing a
signature; the signature is actually the signature of a hash.  So arch
is actually storing a signature of a hash of a file containing hashes.
If we just make a detached-signature of the revision xml then we will
avoid one extra step and just store the signature of the hash of that
file.  

> > All we need to do with
> > untrusted data is calculate its hash, and we can be reasonably sure
> > that there won't be vulnerabilities in the SHA-1 calculator.  There
> > might be some in the code that parses the checksum file or the gpg
> > signature.  On the other hand this approach flakes out of the more
> > important problem of evaluating whether the code is signed by a
> > meaningful key.
> 
> Sorry, didn't parse that.

What I'm trying to say is that the exposure is the same: gpg verifying
a signature by hashing a file and reading a signature.  It does not
seem possible to reduce this.

I think it's reasonable to declare gzip vulnerabilities not our problem.

> > One approach is to just put a GPG signature next to every revision
> > file, and verify that before reading the revision.  In that case the
> > only exploitable code is GPG itself.
> > 
> >   gpg --detach-sign .bzr/revision-store/thingthing
> 
> I wonder whether there's a useful difference between trusted and
> authoritative?  E.g., I will trust John Meinel's signature to prove that
> data is not malicious, but I will only trust your signature to prove
> that the revision produced is actually
> mbp at sourcefrog.net-20050620052204-c4253c3feb664088.

Right, so we could say that only data signed by a trusted key gets
considered at all, and then in a second round we check if it's
authoritative.

> Often systems that need to handle untrusted data will have a way to drop
> privilages and/or use a sandbox.  Paranoia might lead to a StreamTree
> class that communicated with a chrooted bzr over a pipe.  Then it's just
> a few short steps to a smart server...

You could do something along the lines of Colin Walter's guarded image
loader.

-- 
Martin