Signing snapshots

John A Meinel john at arbash-meinel.com
Wed Jun 22 20:18:31 BST 2005


On Jun 22, 2005, at 12:41 PM, Aaron Bentley wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
...

>> One approach is to let that happen and
>> just re-sign the revisions.
>>
>> Another is to keep the old form of the object with the old signature,
>> and make a new form of the object with a new signature.  That means
>> that mbp at sourcefrog-29381938123 will refer to more objects that have
>> different texts and signatures, but supposedly equivalent meaning.
>> I'm not completely sure that would be worth the possible confusion but
>> it is an option.
>
> I think the best thing is to not treat hashes as part of the inventory
> identity, but as supplemental verification data.  That way, you're not
> really creating different object when you add new hashes.
>
>
>> At the moment, when we generate a hash, we simply make a hash of the
>> text; the code that generates a hash of the inventory doesn't know or
>> care what kind of hash the inventory uses to identify the contained
>> files.
>
> I assume this means that two valid inventories with the same meaning
> could differ textually, and have different hashes.  That seems really
> unfortunate to me.  It would be nice to be able to say "If the 
> inventory
> sha-1 hash is not X, it is not a true copy of Y."
>
>> I guess we could do what you say by pre-processing the inventory file
>> to strip out all the hashes but those relevant to the one we're
>> computing.
>
> No, the way I'd do it is by not signing the inventory file-- sign the
> inventory data instead.  As a straw man, you'd sort it by unicode
> codepoint, then write out a space-delimited inventory summary with id,
> name, parent, type and contents-hash(if applicable) fields for each
> entry.  The format doesn't need to be parseable, just unique for each 
> tree.
>
Actually, you are missing an important point. What algorithm is used to 
generate "contents-hash" if not a hash function. Which means that if 
you upgrade you hash algorithm, suddenly all of those "contents-hash" 
entries change, and you need a new signature.
This really isn't any different from signing the <inventory> XML. The 
only trick is that you would want to be careful that the <inventory> 
tree would always be sorted in a specific way.
The only thing that I am aware of that currently exposes storage 
mechanism, is that inventory stores a text_id="", which in my mind is a 
no-no, since it causes the <inventory> tree to reveal how it is stored. 
Revfiles don't use a text_id, though you could arguably generate an 
text_id since it is necessary for the plain file storage, and just not 
make use of it in a revfile storage.

Also, keep in mind that you also want to sign the fact of the name of 
the committer, and the timestamp, and whatever other meta information 
you might have for a specific snapshot. Oh, if we start tracking file 
meta data (possibly inside the inventory file, possibly somewhere else) 
you also want to sign that. If someone is tracking the permissions of 
/etc, they don't want someone to be able to hack in and change it so 
that /etc/httpd/ssl/server.key is suddenly world readable.

I understand your desire to be able to generate the signed text through 
some other process, and have it still satisfy the signature. But 
realize that as soon as you change hash algorithms, the old signature 
is invalidated. Since we use a hash instead of the actual file 
contents.
Yes, we could have multiple signatures, such that the "SHA-1" signature 
used SHA-1 hashes, and the "SHA-256" signature used SHA-256 hashes. And 
the two wouldn't conflict, because they don't see eachother.

But also be aware that there is a very positive benefit to signing the 
exact contents of a file. It is trivial to verify using external tools. 
If you don't sign the contents of a file, you need to have some code 
which exactly reproduces the method that was used to generate the 
signature. Which means that if you upgrade that method, you have to 
keep the old method around to validate old signatures. And nothing 
other than that code can validate those signatures.
Also, it is certainly possible that someone hacks the bzr code such 
that it starts to say ignore certain files when generating and checking 
signatures. Because the thing which is signed is never actually 
visible, and not actually used anywhere else, it is very difficult to 
detect that this is happening.
(Think of a company where people use an "recommended" bzr, or all use 
the same bzr on the same machine).

I feel like the only 2 valid things to sign are:
zcat revision.gz | gpg
gpg --detach-sign revision.gz

You could optionally also sign inventory[.gz], and even go around and 
sign all of the text-store files. In the current storage mechanism, 
nothing changes, so everything could be signed.
In the future when we use a revfile, it becomes harder to sign, though 
if you just track by the offset in the file, you could have a 
revfile-800.sig which would be the signature of a revfile through byte 
800. This causes problems if we ever want to compact revfiles (say to 
uncommit the nuclear launch codes). But if you only uncommit the last 
entries, you can just keep peeling the file back and removing 
signatures until you have removed enough stuff.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050622/15efe1bc/attachment.pgp 


More information about the bazaar mailing list