[Bug 1480923] Re: lvm thin corruption after lvresize
mai ling
1480923 at bugs.launchpad.net
Wed Mar 23 14:20:21 UTC 2022
I have just got bitten by this. I have deployed dozens of boxes with the
same cloned disk image, so I expect more will hit me sooner or later.
Does anyone if there is a Redhat bugzilla issue for it?
RHEL clone (OL8.4), kernel 5.4.17-2102.202.5.el8uek.x86_64
[root at localhost ~]# journalctl --since '2022-02-09 10:47:54' --until '2022-02-09 10:47:56' --no-pager
-- Logs begin at Fri 2021-04-09 13:02:56 EEST, end at Wed 2022-03-23 16:02:07 EET. --
Feb 09 10:47:54 localhost.localdomain systemd[1]: Starting Cleanup of Temporary Directories...
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Feb 09 10:47:54 localhost.localdomain kernel: EXT4-fs error (device dm-10): __ext4_get_inode_loc:4713: inode #652801: block 2621472: comm systemd-tmpfile: unable to read itable block
Feb 09 10:47:54 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Feb 09 10:47:55 localhost.localdomain kernel: Buffer I/O error on dev dm-10, logical block 0, lost sync page write
Feb 09 10:47:55 localhost.localdomain kernel: EXT4-fs (dm-10): I/O error while writing superblock
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:55 localhost.localdomain kernel: EXT4-fs warning (device dm-10): htree_dirblock_to_tree:997: inode #130564: lblock 0: comm systemd-tmpfile: error -5 reading directory block
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:55 localhost.localdomain kernel: EXT4-fs error (device dm-10): __ext4_get_inode_loc:4713: inode #261121: block 1048608: comm systemd-tmpfile: unable to read itable block
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Feb 09 10:47:55 localhost.localdomain kernel: Buffer I/O error on dev dm-10, logical block 0, lost sync page write
Feb 09 10:47:55 localhost.localdomain kernel: EXT4-fs (dm-10): I/O error while writing superblock
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: btree spine: node_check failed: blocknr 10012793332687714485 != wanted 94
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: block manager: btree_node validator check failed for block 94
Feb 09 10:47:55 localhost.localdomain kernel: EXT4-fs error (device dm-10): __ext4_get_inode_loc:4713: inode #522241: block 2097184: comm systemd-tmpfile: unable to read itable block
Feb 09 10:47:55 localhost.localdomain kernel: device-mapper: thin: process_cell: dm_thin_find_block() failed: error = -15
Feb 09 10:47:55 localhost.localdomain kernel: Buffer I/O error on dev dm-10, logical block 0, lost sync page write
Feb 09 10:47:55 localhost.localdomain kernel: EXT4-fs (dm-10): I/O error while writing superblock
Feb 09 10:47:55 localhost.localdomain kernel: EXT4-fs warning (device dm-10): htree_dirblock_to_tree:997: inode #130563: lblock 0: comm systemd-tmpfile: error -5 reading directory block
Feb 09 10:47:54 localhost.localdomain systemd-tmpfiles[122395]: stat(/tmp/.Test-unix) failed: Input/output error
Feb 09 10:47:54 localhost.localdomain systemd-tmpfiles[122395]: stat(/tmp/.XIM-unix) failed: Input/output error
Feb 09 10:47:54 localhost.localdomain systemd-tmpfiles[122395]: stat(/tmp/.font-unix) failed: Input/output error
Feb 09 10:47:55 localhost.localdomain systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Feb 09 10:47:55 localhost.localdomain systemd[1]: Started Cleanup of Temporary Directories.
disk layout:
[root at localhost ~]# parted /dev/sda p
Model: ATA HP SSD S700 120G (scsi)
Disk /dev/sda: 120GB
Sector size (logical/physical): 512B/4096B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 1049kB 1075MB 1074MB primary ext4 boot
2 1075MB 120GB 119GB primary lvm
[root at localhost ~]# lvs -a -o +devices
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices
home plj Vwi-aotz-- 16.00g thin 3.97
[lvol1_pmspare] plj ewi------- 12.00m /dev/sda2(2668)
root plj Vwi-aotz-- 16.00g thin 37.30
swap plj Vwi-aotz-- 4.00g thin 0.39
thin plj twi-aotz-- <10.40g 79.16 45.61 thin_tdata(0)
thin_meta0 plj -wi-a----- 12.00m /dev/sda2(1025)
thin_meta0 plj -wi-a----- 12.00m /dev/sda2(2004)
[thin_tdata] plj Twi-ao---- <10.40g /dev/sda2(1)
[thin_tdata] plj Twi-ao---- <10.40g /dev/sda2(1028)
[thin_tdata] plj Twi-ao---- <10.40g /dev/sda2(2006)
[thin_tmeta] plj ewi-ao---- 12.00m /dev/sda2(0)
[thin_tmeta] plj ewi-ao---- 12.00m /dev/sda2(1027)
[thin_tmeta] plj ewi-ao---- 12.00m /dev/sda2(2005)
tmp plj Vwi-a-tz-- 16.00g thin 0.00
varlog plj Vwi-aotz-- 16.00g thin 10.07
[root at localhost ~]# lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sda
├─sda1 ext4 cb170528-4d10-48ac-959f-cb24feca2baa /boot
└─sda2 LVM2_member sZVdNK-9FYl-rwMJ-7uj0-687t-wYA2-1ojz1K
├─plj-thin_tmeta
│ └─plj-thin-tpool
│ ├─plj-root crypto_LUKS 8746df75-bb21-45bb-8266-9aff93e756fe
│ │ └─root ext4 root e6bab103-3c7c-4577-a9e3-7c319c3d2d8d /
│ ├─plj-swap crypto_LUKS fd8f2dc2-5d80-4fe9-a7be-d497a152a552
│ │ └─swap swap swap 7e750545-b4d9-47e8-8d66-6bbfcdf8578a [SWAP]
│ ├─plj-home crypto_LUKS b47f7980-c029-4e10-9a6e-40b195ebcb9a
│ │ └─home ext4 home ecffaf62-4ac3-4688-bee1-654d6498b2f0 /home
│ ├─plj-tmp
│ ├─plj-thin
│ └─plj-varlog crypto_LUKS varlog bc13b95d-fbea-4be3-a0ef-7bf1130bbd3f
│ └─varlog ext4 varlog fa50103a-9aaf-4117-a250-1047b2a9afb8 /var/log
├─plj-thin_tdata
│ └─plj-thin-tpool
│ ├─plj-root crypto_LUKS 8746df75-bb21-45bb-8266-9aff93e756fe
│ │ └─root ext4 root e6bab103-3c7c-4577-a9e3-7c319c3d2d8d /
│ ├─plj-swap crypto_LUKS fd8f2dc2-5d80-4fe9-a7be-d497a152a552
│ │ └─swap swap swap 7e750545-b4d9-47e8-8d66-6bbfcdf8578a [SWAP]
│ ├─plj-home crypto_LUKS b47f7980-c029-4e10-9a6e-40b195ebcb9a
│ │ └─home ext4 home ecffaf62-4ac3-4688-bee1-654d6498b2f0 /home
│ ├─plj-tmp
│ ├─plj-thin
│ └─plj-varlog crypto_LUKS varlog bc13b95d-fbea-4be3-a0ef-7bf1130bbd3f
│ └─varlog ext4 varlog fa50103a-9aaf-4117-a250-1047b2a9afb8 /var/log
└─plj-thin_meta0
sr0
[root at localhost ~]# grep _autoextend_ /etc/lvm/lvm.conf|grep -v \#
snapshot_autoextend_threshold = 100
snapshot_autoextend_percent = 20
thin_pool_autoextend_threshold = 95
thin_pool_autoextend_percent = 10
vdo_pool_autoextend_threshold = 100
[root at localhost ~]# journalctl | grep plj-thin | grep WARNING
Jan 29 05:50:06 localhost.localdomain lvm[1630]: WARNING: Thin pool plj-thin-tpool data is now 80.00% full.
Feb 08 10:32:55 localhost.localdomain lvm[1639]: WARNING: Thin pool plj-thin-tpool data is now 80.49% full.
attempting to overuse the space does correctly result in the thin pool
autoextending:
[root at localhost ~]# LD_PRELOAD=/usr/lib64/nosync/nosync.so rsync -aqxPHAX /usr /home/
Mar 23 16:16:46 localhost.localdomain lvm[1422]: WARNING: Thin pool plj-thin-tpool data is now 83.80% full.
Mar 23 16:16:56 localhost.localdomain lvm[1422]: WARNING: Thin pool plj-thin-tpool data is now 87.32% full.
Mar 23 16:17:06 localhost.localdomain lvm[1422]: WARNING: Thin pool plj-thin-tpool data is now 91.74% full.
Mar 23 16:17:12 localhost.localdomain kernel: device-mapper: thin: 252:2: reached low water mark for data device: sending event.
Mar 23 16:17:12 localhost.localdomain lvm[1422]: Size of logical volume plj/thin_tdata changed from <10.40 GiB (2662 extents) to 11.44 GiB (2929 extents).
Mar 23 16:17:12 localhost.localdomain kernel: device-mapper: thin: 252:2: growing the data device from 170368 to 187456 blocks
Mar 23 16:17:12 localhost.localdomain lvm[1422]: Logical volume plj/thin_tdata successfully resized.
Mar 23 16:17:16 localhost.localdomain lvm[1422]: WARNING: Thin pool plj-thin-tpool data is now 88.06% full.
Mar 23 16:17:26 localhost.localdomain lvm[1422]: WARNING: Thin pool plj-thin-tpool data is now 92.06% full.
Mar 23 16:17:38 localhost.localdomain kernel: device-mapper: thin: 252:2: reached low water mark for data device: sending event.
Mar 23 16:17:38 localhost.localdomain lvm[1422]: Rounding size to boundary between physical extents: 16.00 MiB.
Mar 23 16:17:38 localhost.localdomain lvm[1422]: Size of logical volume plj/thin_tmeta changed from 12.00 MiB (3 extents) to 16.00 MiB (4 extents).
Mar 23 16:17:38 localhost.localdomain kernel: device-mapper: thin: 252:2: switching pool to out-of-data-space (queue IO) mode
Mar 23 16:17:38 localhost.localdomain kernel: device-mapper: thin: 252:2: switching pool to write mode
Mar 23 16:17:38 localhost.localdomain kernel: device-mapper: thin: 252:2: growing the metadata device from 3072 to 4096 blocks
Mar 23 16:17:38 localhost.localdomain kernel: device-mapper: thin: 252:2: reached low water mark for data device: sending event.
Mar 23 16:17:38 localhost.localdomain kernel: device-mapper: thin: 252:2: switching pool to out-of-data-space (queue IO) mode
Mar 23 16:17:39 localhost.localdomain lvm[1422]: Size of logical volume plj/thin_tdata changed from 11.44 GiB (2929 extents) to <12.59 GiB (3222 extents).
Mar 23 16:17:39 localhost.localdomain kernel: device-mapper: thin: 252:2: switching pool to write mode
Mar 23 16:17:39 localhost.localdomain kernel: device-mapper: thin: 252:2: growing the data device from 187456 to 206208 blocks
Mar 23 16:17:39 localhost.localdomain lvm[1422]: Logical volume plj/thin_tdata successfully resized.
Mar 23 16:17:46 localhost.localdomain lvm[1422]: WARNING: Thin pool plj-thin-tpool data is now 88.03% full.
Mar 23 16:18:06 localhost.localdomain lvm[1422]: WARNING: Thin pool plj-thin-tpool data is now 90.61% full.
the affected lost volume is full of zeroes:
[root at localhost ~]# cmp -b /dev/plj/tmp /dev/zero
cmp: EOF on /dev/plj/tmp after byte 17179869184, in line 1
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to lvm2 in Ubuntu.
https://bugs.launchpad.net/bugs/1480923
Title:
lvm thin corruption after lvresize
Status in lvm2 package in Ubuntu:
New
Bug description:
lvm2 version 2.02.98-6ubuntu2
After doing a lvresize of a LVM Thin Pool, I had a corruption all sub
LVM Thin Volumes and lost all of them. Then tried to dump/repair the
tmeta and end up with empty thin volumes (no more filesystem signature on them).
To sum up
The thin_pool was 2T and I tried to increased it to 3T...
As fare as I know, none of the partitions were full but I increased the main
thin pool as it was close to the sum of all sub thin volumes.
I assume that using LVM Thin is still not stable on 14.04 LTS right?
I guess that lvm2 2.02.98 does not properly handle the metadata resize
of a thin pool right? (maybe add a warning somewhere in doc?)
Maybe related to
http://comments.gmane.org/gmane.linux.kernel.device-mapper.devel/19190
https://www.redhat.com/archives/lvm-devel/2013-June/msg00371.html
I managed to recover some files from the raw thin_pool (tdata/tpool) with scalapel
but that is it.
Do you known any other tools to recovery lvm thin volumes or
partition/data on it?
Errors
attempt to access beyond end of device
dm-6: rw=0, want=7753528, limit=262144
attempt to access beyond end of device
dm-6: rw=0, want=7753528, limit=262144
attempt to access beyond end of device
dm-6: rw=0, want=7753528, limit=262144
attempt to access beyond end of device
dm-6: rw=0, want=7753528, limit=262144
/dev/mainvg/thin_rsnapshot: read failed after 0 of 4096 at 2199023190016: Input/output error
/dev/mainvg/thin_rsnapshot: read failed after 0 of 4096 at 2199023247360: Input/output error
/dev/mainvg/thin_rsnapshot: read failed after 0 of 4096 at 0: Input/output error
/dev/mainvg/thin_rsnapshot: read failed after 0 of 4096 at 4096: Input/output error
/dev/mainvg/thin_archive: read failed after 0 of 4096 at 805306302464: Input/output error
/dev/mainvg/thin_archive: read failed after 0 of 4096 at 805306359808: Input/output error
/dev/mainvg/thin_archive: read failed after 0 of 4096 at 0: Input/output error
/dev/mainvg/thin_archive: read failed after 0 of 4096 at 4096: Input/output error
lvs
LV VG Attr LSize Pool Origin Data% Move Log Copy% Convert
thin_archive mainvg Vwi-aotz- 500.00g thin_pool 94.65
thin_rsnapshot mainvg Vwi-aotz- 1.50t thin_pool 94.01
thin_pool mainvg twi-a-tz- 3.00t 71.65
lvresize -L 2T /dev/mapper/mainvg-thin_rsnapshot
/dev/mainvg/thin_rsnapshot: read failed after 0 of 4096 at 1649267376128: Input/output error
/dev/mainvg/thin_rsnapshot: read failed after 0 of 4096 at 1649267433472: Input/output error
/dev/mainvg/thin_archive: read failed after 0 of 4096 at 536870846464: Input/output error
/dev/mainvg/thin_archive: read failed after 0 of 4096 at 536870903808: Input/output error
Extending logical volume thin_rsnapshot to 2.00 TiB
Logical volume thin_rsnapshot successfully resized
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1480923/+subscriptions
More information about the foundations-bugs
mailing list