Revfile vs Atomicity & Dumbfs

Tue May 10 05:31:49 BST 2005

Martin Pool wrote:
> On  9 May 2005, John A Meinel <john at arbash-meinel.com> wrote:
>
>>Martin Pool wrote:
>>
...
> You can never just append unless you want the same data to appear in
> all trees.  It's OK to have versions appear in the files of one tree
> that aren't used there.  The only file that needs to be not hard
> linked is the revision-history.

I suppose. It would seem that you would have to worry more about
concurrency then. Because running two bzr in the same directory is
pretty unlikely, but running one in 2 directories seems very likely. Do
you currently lock the file before you append to it?

>
>
>>It seems that with a write-ahead-log you could get away without the
>>later vacuum, because you would effectively vacuum as you go. Yes you
>>could vacuum later, but that would require lots of locking to make sure
>>the index gets rewritten inbetween the time you are rewriting the revlog
>>file. If you always truncate bogus data, then you don't ever have the
>>locking issue.
>
>
> You can also truncate out bogus data with revfiles by just checking
> that the last index entry is valid and matches the end of the data
> file.
>
>
>>How is it detected? Do you have a checksum? Or is it just if it points
>>to something bogus?
>
>
> There is a SHA-1 in each index line.
>
>
>>I believe I understand your method of atomicity to be that appends don't
>>matter until you write the final file, because they will just be
>>ignored. I think you could be more proactive with a WAL and allow
>>automatic corrections.
>
>
> Maybe you could sketch out how WAL would work in more detail?
>

 From the name "write-ahead-log", you basically just write down what you
are going to do before you do it. You can write at 2 different times,
incrementally, or all at once. But you fsync the WAL before doing any
work, so that you know it is committed to disk. You don't have to fsync
the files until the transaction is done.

So say we have a commit which affects 10 files. You could do it 2 ways:

------------------------------------------------------------
walfilename = '.bzr/commit-wal'
wal = open(walfilename, 'wb')
flock(wal)

# Alternatively, you might be able to use
# os.open(walfilename, O_CREAT | O_EXCL | O_APPEND)
# or something like that, you just need a lock on the
# file.

wal.write('commit revno: %d\n' % revno)
files = []
for id, f in zip(file_ids, files_to_modify):
	fp = open(f, 'ab')
	flock(fp)
	files.append(fp)
	st = fstat(fp)
	# Basically, you just need some way of stating what additions
	# you are adding. There are several ways of doing it
	wal.write('append rev %s to %s orig size: %d\n' \
		% (id, f, st.st_size))

wal.fsync()

# update files
# update indexes
# update revision history

# Done with the transaction

# You can declare "finished" in 2 ways, if you want to
# always append the WAL, then add DONE, and fsync
# This allows for the fact that the OS hasn't written
# out all files to disk yet.
# Alternatively, fsync all of the files, and then
# delete the wal.
if append:
	wal.write('done\n')
	wal.fsync()
elif remove:
	for fp in files:
		fp.fsync()
		fp.close()
	wal.close()
	os.remove(walfilename)
-------------------------------------------------------

The other way would be for incremental WAL, which would be:

for id, f in zip(file_ids, files_to_modify):
	fp = open(f, 'ab')
	flock(fp)
	files.append(fp)
	st = fstat(fp)
	# Basically, you just need some way of stating what additions
	# you are adding. There are several ways of doing it
	wal.write('append rev %s to %s orig size: %d\n' \
		% (id, f, st.st_size))
	wal.fsync()

	# Update revtext
	# Update index

wal.write('update rev history %s\n')
wal.fsync()
# Update revision history

# Commit the WAL, (remove or add DONE)
----------------------------------------------------------

Then if another .bzr process starts, it can try:

if os.path.exists(walfilename):
	try:
		fp = open(walfilename, 'rb')
		# I think to test the lock, you need to
		flock(fp)
		# Lock is not active, so this is a leftover
		# WAL, clean-up time
		for line in fp:
			# Clean out failed transactions.
	except:
		pass # Lock still exists

# Do whatever it was that you were going to do
------------------------------------------------------------

One interesting thing is that WAL could be used to make the entire
working tree transaction safe, since you have a record of what you did,
so that you know what to undo. So that if any action is canceled by the
user (such as a merge), the next run of bzr (or bzr fix) will back out
all of the changes.

I don't know how much fsync on the WAL would cost in terms of
performance, but in general that is how all databases/filesystems
maintain integrity. It's the same thing as the journal for
ext3/reiserfs/xfs/etc.

The biggest point is just to define the notation for entries in the wal
(we could always just use a Pickle), and try to make sure there is a way
to undo operations.

For instance, in a merge, you have the original file, a intermediate
product as you merge, and then the final. So I would do:

# start merge

wal.write('merging %s %s with %s' % (merge_id, rev_a_info, rev_b_info))
wal.fsync()
for f in files:
	wal.write('merging %s' % f)
	wal.fsync()
	mergename = f + '.merged'
	do_merge(f, mergename, otherbase)
	# If we die here, we just delete the f.merged files

# Merge has finished
for f in files:
	mergename = f + '.merged'
	origname = f + '.orig'
	os.rename(f, origname)
	os.rename(mergename, f)
	# If we die here, we just rename f.orig -> f
	# and remove any f.merged files that still are around

wal.write('commit merge %s' % merge_id)
wal.fsync()

# From now on, if we die, future bzr invocations
# just remove f.orig

for f in files:
	origname = f + '.orig'
	os.remove(origname)

wal.close()
os.remove(walfilename)

---------------------------------------

I probably would recommend a binary format for the WAL, because you want
to have to fsync as little as possible. However, for a first run, it is
probably easier to do it with text.

Anyway, that's my thoughts about it. It basically lets you do a small
fsync to pretend like you did an fsync on everything else, and lets you
create a form of transactionality. Probably this would not work over a
dumbfs network protocol, but a smart server could support it.

John
=:->
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 251 bytes
Desc: OpenPGP digital signature
Url : https://lists.ubuntu.com/archives/bazaar/attachments/20050509/bf65c4cc/attachment.pgp