Slower performance with ext4
markir at paradise.net.nz
markir at paradise.net.nz
Tue Nov 3 08:37:36 UTC 2009
Quoting Chan Chung Hang Christopher <christopher.chan at bradbury.edu.hk>:
> Maybe things have changed for XFS now but for ext3, disk = journal.
>
> http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L71
>
> When data=journal, data and metadata for file are written to the journal
>
> and then fsync returns. End of story.
>
> When data=ordered, when metadata is written via sync_inode(), fsync
> returns and you hope nothing happens within the next half second if you
>
> want data consistency too.
>
> Hence the reason why a ext3 filesystem on software raid but mounted
> data=journal and with an external journal on a bbu nvram card will blow
>
> away other filesystems in performance and data consistency.
>
> Comments for your pleasure:
>
> 53
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L53>
> *//*/*
> 54
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L54> */
> * data=writeback:/*
> 55
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L55> */
> * The caller's filemap_fdatawrite()/wait will sync the data./*
> 56
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L56> */
> * sync_inode() will sync the metadata/*
> 57
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L57> */
> */*
> 58
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L58> */
> * data=ordered:/*
> 59
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L59> */
> * The caller's filemap_fdatawrite() will write the data and/*
> 60
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L60> */
> * sync_inode() will write the inode if it is dirty. Then the caller's/*
> 61
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L61> */
> * filemap_fdatawait() will wait on the pages./*
> 62
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L62> */
> */*
> 63
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L63> */
> * data=journal:/*
> 64
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L64> */
> * filemap_fdatawrite won't do anything (the buffers are clean)./*
> 65
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L65> */
> * ext3_force_commit will write the file data into the journal and/*
> 66
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L66> */
> * will wait on that./*
> 67
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L67> */
> * filemap_fdatawait() will encounter a ton of newly-dirtied pages/*
> 68
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L68> */
> * (they were dirtied by commit). But that's OK - the blocks are/*
> 69
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L69> */
> * safe in-journal, which is all fsync() needs to ensure./*
> 70
> <http://tomoyo.sourceforge.jp/cgi-bin/lxr/source/fs/ext3/fsync.c#L70> */
> *//*
>
>
Good idea to post the source :-).
However it does not seem to actually support your statement.
When fs is mounted data=journal then yes - the logic goes as you suggest.
Clearly, as the data+metadata is in the journal, then this is all we need to
sync (its a nice optimization).
In other cases (no journal, data=ordered,writeback), then the metadata is
synced to the journal, and the data buffers are synced to their respective
inodes - that is what the comments appear to say as well.
So it seems that disk = journal *only* if you are journalling the *data*! (not
that staggering an observation, but as you mentioned does explain why sometimes
data=journal performs better than the other ext3 journal options).
Also there is still the issue of does your data (or metadata) actually hit the
disk platter (whether via the journal or the file itself), and this concerns the
business of disk write caches and barrier support - since for journal or file
you gotta signal the backing device to flush. If it tells fibs to you, or your
barrier support is buggy - then you can still get data loss, no matter what fs
options are enabled.
regards
Mark
More information about the ubuntu-users
mailing list