Signing snapshots
John A Meinel
john at arbash-meinel.com
Wed Jun 22 20:18:31 BST 2005
On Jun 22, 2005, at 12:41 PM, Aaron Bentley wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
...
>> One approach is to let that happen and
>> just re-sign the revisions.
>>
>> Another is to keep the old form of the object with the old signature,
>> and make a new form of the object with a new signature. That means
>> that mbp at sourcefrog-29381938123 will refer to more objects that have
>> different texts and signatures, but supposedly equivalent meaning.
>> I'm not completely sure that would be worth the possible confusion but
>> it is an option.
>
> I think the best thing is to not treat hashes as part of the inventory
> identity, but as supplemental verification data. That way, you're not
> really creating different object when you add new hashes.
>
>
>> At the moment, when we generate a hash, we simply make a hash of the
>> text; the code that generates a hash of the inventory doesn't know or
>> care what kind of hash the inventory uses to identify the contained
>> files.
>
> I assume this means that two valid inventories with the same meaning
> could differ textually, and have different hashes. That seems really
> unfortunate to me. It would be nice to be able to say "If the
> inventory
> sha-1 hash is not X, it is not a true copy of Y."
>
>> I guess we could do what you say by pre-processing the inventory file
>> to strip out all the hashes but those relevant to the one we're
>> computing.
>
> No, the way I'd do it is by not signing the inventory file-- sign the
> inventory data instead. As a straw man, you'd sort it by unicode
> codepoint, then write out a space-delimited inventory summary with id,
> name, parent, type and contents-hash(if applicable) fields for each
> entry. The format doesn't need to be parseable, just unique for each
> tree.
>
Actually, you are missing an important point. What algorithm is used to
generate "contents-hash" if not a hash function. Which means that if
you upgrade you hash algorithm, suddenly all of those "contents-hash"
entries change, and you need a new signature.
This really isn't any different from signing the <inventory> XML. The
only trick is that you would want to be careful that the <inventory>
tree would always be sorted in a specific way.
The only thing that I am aware of that currently exposes storage
mechanism, is that inventory stores a text_id="", which in my mind is a
no-no, since it causes the <inventory> tree to reveal how it is stored.
Revfiles don't use a text_id, though you could arguably generate an
text_id since it is necessary for the plain file storage, and just not
make use of it in a revfile storage.
Also, keep in mind that you also want to sign the fact of the name of
the committer, and the timestamp, and whatever other meta information
you might have for a specific snapshot. Oh, if we start tracking file
meta data (possibly inside the inventory file, possibly somewhere else)
you also want to sign that. If someone is tracking the permissions of
/etc, they don't want someone to be able to hack in and change it so
that /etc/httpd/ssl/server.key is suddenly world readable.
I understand your desire to be able to generate the signed text through
some other process, and have it still satisfy the signature. But
realize that as soon as you change hash algorithms, the old signature
is invalidated. Since we use a hash instead of the actual file
contents.
Yes, we could have multiple signatures, such that the "SHA-1" signature
used SHA-1 hashes, and the "SHA-256" signature used SHA-256 hashes. And
the two wouldn't conflict, because they don't see eachother.
But also be aware that there is a very positive benefit to signing the
exact contents of a file. It is trivial to verify using external tools.
If you don't sign the contents of a file, you need to have some code
which exactly reproduces the method that was used to generate the
signature. Which means that if you upgrade that method, you have to
keep the old method around to validate old signatures. And nothing
other than that code can validate those signatures.
Also, it is certainly possible that someone hacks the bzr code such
that it starts to say ignore certain files when generating and checking
signatures. Because the thing which is signed is never actually
visible, and not actually used anywhere else, it is very difficult to
detect that this is happening.
(Think of a company where people use an "recommended" bzr, or all use
the same bzr on the same machine).
I feel like the only 2 valid things to sign are:
zcat revision.gz | gpg
gpg --detach-sign revision.gz
You could optionally also sign inventory[.gz], and even go around and
sign all of the text-store files. In the current storage mechanism,
nothing changes, so everything could be signed.
In the future when we use a revfile, it becomes harder to sign, though
if you just track by the offset in the file, you could have a
revfile-800.sig which would be the signature of a revfile through byte
800. This causes problems if we ever want to compact revfiles (say to
uncommit the nuclear launch codes). But if you only uncommit the last
entries, you can just keep peeling the file back and removing
signatures until you have removed enough stuff.
John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PGP.sig
Type: application/pgp-signature
Size: 186 bytes
Desc: This is a digitally signed message part
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050622/15efe1bc/attachment.pgp
More information about the bazaar
mailing list