[Bug 1817713] Re: Grub2 does not detect MD raid (level 1) 1.0 superblocks on 4k block devices
Bug Watch Updater
1817713 at bugs.launchpad.net
Mon Sep 2 18:58:01 UTC 2019
Launchpad has imported 18 comments from the remote bug at
https://bugzilla.redhat.com/show_bug.cgi?id=1443144.
If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.
------------------------------------------------------------------------
On 2017-04-18T15:08:07+00:00 kwalker wrote:
Description of problem:
Per the description above, Grub2 does not currently detect MD raid 1.0 superblocks when written to 4k block devices. The issue is only visible on disks that have 4K native sector sizes.
See below example:
Disk /dev/vdb: 10.7 GB, 10737418240 bytes, 2621440 sectors
Units = sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x6c5f13de
Device Boot Start End Blocks Id System
/dev/vdb1 256 128255 512000 83 Linux
/dev/vdb2 128256 2621439 9972736 83 Linux
Disk /dev/vdc: 10.7 GB, 10737418240 bytes, 2621440 sectors
Units = sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk label type: dos
Disk identifier: 0x00000000
Device Boot Start End Blocks Id System
/dev/vdc1 256 128255 512000 83 Linux
/dev/vdc2 128256 2621439 9972736 83 Linux
Two (degraded) raid devices are created, using 1.0 and 1.2 metadata revisions.
# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 vdc1[1]
511680 blocks super 1.2 [2/1] [_U]
md0 : active raid1 vdb1[1]
511936 blocks super 1.0 [2/1] [_U]
unused devices: <none>
For the 1.2 superblock format, the following is visible:
# grub2-probe --device /dev/md1 --target fs_uuid
f2ebfe82-55ab-45a8-b391-39c77f3c489e
Whereas the 1.0 format, with the superblock at the end of the device,
returns:
# grub2-probe --device /dev/md0 --target fsuuid
grub2-probe error disk mduuid/4589a761dde10c78a204bcfd705df061 not found.
Version-Release number of selected component (if applicable):
grub2-2.02-0.44.el7.x86_64
How reproducible:
Easily - With a 4k native storage device for boot
Steps to Reproduce:
1. Install a system
2. Migrate to a MD raid (level 1) metadata 1.0 configuration using the
process outlined in the article below:
How do I create /boot on mdadm RAID 1 after installation in RHEL 7? - Red Hat Customer Portal
https://access.redhat.com/solutions/1360363
3. Issue a "grub2-probe --device /dev/md<device> --target fs_uuid"
against the device which /boot is installed to
Actual results:
# grub2-probe --device /dev/md0 --target fsuuid
grub2-probe error disk mduuid/4589a761dde10c78a204bcfd705df061 not found.
Expected results:
# grub2-probe --device /dev/md1 --target fs_uuid
f2ebfe82-55ab-45a8-b391-39c77f3c489e
Additional info:
Note, the issue was originally noted at installation-time as the default superblock format for PPC64 (PReP boot) systems is 1.0 according to the python-blivet library, used by the anaconda installer, code snippet below:
def preCommitFixup(self, *args, **kwargs):
""" Determine create parameters for this set """
mountpoints = kwargs.pop("mountpoints")
log_method_call(self, self.name, mountpoints)
if "/boot" in mountpoints:
bootmountpoint = "/boot"
else:
bootmountpoint = "/"
# If we are used to boot from we cannot use 1.1 metadata
if getattr(self.format, "mountpoint", None) == bootmountpoint or \
getattr(self.format, "mountpoint", None) == "/boot/efi" or \
self.format.type == "prepboot":
self.metadataVersion = "1.0"
In short, if we install to a PPC64 system on 4k block disks and MD raid,
the problem observed is present.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/0
------------------------------------------------------------------------
On 2017-04-19T18:00:18+00:00 bugproxy wrote:
------- Comment From ruddk at us.ibm.com 2017-04-19 13:58 EDT-------
(In reply to comment #4)
...
> Additional info:
> Note, the issue was originally noted at installation-time as the default
> superblock format for PPC64 (PReP boot) systems is 1.0 according to the
> python-blivet library, used by the anaconda installer, code snippet below:
>
> def preCommitFixup(self, *args, **kwargs):
> """ Determine create parameters for this set """
> mountpoints = kwargs.pop("mountpoints")
> log_method_call(self, self.name, mountpoints)
>
> if "/boot" in mountpoints:
> bootmountpoint = "/boot"
> else:
> bootmountpoint = "/"
>
> # If we are used to boot from we cannot use 1.1 metadata
> if getattr(self.format, "mountpoint", None) == bootmountpoint or \
> getattr(self.format, "mountpoint", None) == "/boot/efi" or \
> self.format.type == "prepboot":
> self.metadataVersion = "1.0"
This is probably the key observation. It doesn't really make sense to
have this restriction in place for disks with a PReP partition. The
above appears to have already been backed out via the following commit:
commit 8bce84025e0f0af9b2538a2611e5d52257a82881
Author: David Lehman <dlehman at redhat.com>
Date: Wed May 27 16:07:05 2015 -0500
Use the default md metadata version for everything except /boot/efi.
Now that we've moved to grub2 this is no longer necessary for /boot.
As far as I know we have never actually allowed PReP on md, so that's
not needed either. Apparently UEFI firmware/bootloader still needs it.
Related: rhbz#1061711
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/1
------------------------------------------------------------------------
On 2017-04-23T08:19:23+00:00 hannsj_uhl wrote:
.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/2
------------------------------------------------------------------------
On 2017-04-23T08:25:16+00:00 bugproxy wrote:
------- Comment From cjt at us.ibm.com 2017-04-21 12:48 EDT-------
Here's where I have stopped grub2-install in gdb:
(gdb) run -vvv /dev/sda1
...
grub-core/osdep/hostdisk.c:415: opening the device `/dev/sda2' in open_device()
(gdb) bt
#0 grub_util_fd_seek (fd=0x8, off=0x3dcf8000) at grub-core/osdep/unix/hostdisk.c:105
#1 0x000000001013f3ac in grub_util_fd_open_device (disk=0x101e88e0, sector=0x3dcf8, flags=0x101000, max=0x3fffffffe018)
at grub-core/osdep/linux/hostdisk.c:450
#2 0x000000001013c56c in grub_util_biosdisk_read (disk=0x101e88e0, sector=0x404f8, size=0x8, buf=0x101ee130 "\370\016\347\267\377?")
at grub-core/kern/emu/hostdisk.c:289
#3 0x0000000010133ccc in grub_disk_read_small_real (disk=0x101e88e0, sector=0x2027c0, offset=0x6000, size=0x100, buf=0x3fffffffe308)
at grub-core/kern/disk.c:344
#4 0x0000000010133fac in grub_disk_read_small (disk=0x101e88e0, sector=0x2027c0, offset=0x6000, size=0x100, buf=0x3fffffffe308)
at grub-core/kern/disk.c:401
#5 0x00000000101341a8 in grub_disk_read (disk=0x101e88e0, sector=0x2027f0, offset=0x0, size=0x100, buf=0x3fffffffe308)
at grub-core/kern/disk.c:440
#6 0x000000001004371c in grub_mdraid_detect (disk=0x101e88e0, id=0x3fffffffe4c8, start_sector=0x3fffffffe4c0)
at grub-core/disk/mdraid1x_linux.c:149
#7 0x0000000010155eb0 in scan_disk_partition_iter (disk=0x101e88e0, p=0x3fffffffe548, data=0x101e8860) at grub-core/disk/diskfilter.c:161
#8 0x0000000010147000 in part_iterate (dsk=0x101e88e0, partition=0x3fffffffe660, data=0x3fffffffe900) at grub-core/kern/partition.c:196
#9 0x000000001015a2b8 in grub_partition_msdos_iterate (disk=0x101e88e0, hook=0x10146f24 <part_iterate>, hook_data=0x3fffffffe900)
at grub-core/partmap/msdos.c:196
#10 0x000000001014718c in grub_partition_iterate (disk=0x101e88e0, hook=0x10155ccc <scan_disk_partition_iter>, hook_data=0x101e8860)
at grub-core/kern/partition.c:233
#11 0x00000000101560c0 in scan_disk (name=0x101e8860 "hd0", accept_diskfilter=0x1) at grub-core/disk/diskfilter.c:204
#12 0x00000000101591ec in grub_diskfilter_get_pv_from_disk (disk=0x101e8810, vg_out=0x3fffffffea30) at grub-core/disk/diskfilter.c:1173
#13 0x0000000010154f9c in grub_util_get_ldm (disk=0x101e8810, start=0x2800) at grub-core/disk/ldm.c:876
#14 0x0000000010135bb0 in grub_util_biosdisk_get_grub_dev (os_dev=0x101e5fd0 "/dev/sda2") at util/getroot.c:437
#15 0x000000001013531c in grub_util_pull_device (os_dev=0x101e5fd0 "/dev/sda2") at util/getroot.c:111
#16 0x000000001013a6a0 in grub_util_pull_device_os (os_dev=0x101e7520 "/dev/md0", ab=GRUB_DEV_ABSTRACTION_RAID)
at grub-core/osdep/linux/getroot.c:1064
#17 0x0000000010135300 in grub_util_pull_device (os_dev=0x101e7520 "/dev/md0") at util/getroot.c:108
#18 0x0000000010006688 in main (argc=0x3, argv=0x3ffffffff528) at util/grub-install.c:1233
(gdb) frame 6
#6 0x000000001004371c in grub_mdraid_detect (disk=0x101e88e0, id=0x3fffffffe4c8, start_sector=0x3fffffffe4c0)
at grub-core/disk/mdraid1x_linux.c:149
149 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x),
(gdb) print minor_version
$34 = 0x0
(gdb) print *((*disk)->partition)
$35 = {number = 0x1, start = 0x2800, len = 0x200000, offset = 0x0, index = 0x1, parent = 0x0, partmap = 0x101ba250 <grub_msdos_partition_map>,
msdostype = 0xfd}
(gdb) print sector
$36 = 0x1ffff0
(gdb) frame 0
#0 grub_util_fd_seek (fd=0x8, off=0x3dcf8000) at grub-core/osdep/unix/hostdisk.c:105
105 if (lseek (fd, offset, SEEK_SET) != offset)
(gdb) print offset
$37 = 0x3dcf8000
There seem to be at least 1 problem with the sector/offset computations.
According to mdadm -E /dev/sda2:
Super Offset : 2097136 sectors
(that's 512b sectors because my partition is only 1GB)
Therefore we find our md superblock at 0x3FFFE.
I believe one problem is in grub_util_fd_open_device(). It is passed
sector=404f8, which refers to 4096b sectors. grub_util_fd_open_device()
then subtracts the part_start which comes from
disk->partition->start=0x2800 in grub_partition_get_start(). But that
0x2800 refers to 512b sectors.
Here is the partition table:
Model: AIX VDASD (scsi)
Disk /dev/sda: 5242880s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 256s 1279s 1024s primary boot, prep
2 1280s 263423s 262144s primary raid
3 263424s 4451327s 4187904s primary raid
Partition 2 starts at 1280s (based on 4096b sectors) which is 0x500s.
(gdb) frame 6
#6 0x000000001004371c in grub_mdraid_detect (disk=0x101e88e0, id=0x3fffffffe4c8, start_sector=0x3fffffffe4c0)
at grub-core/disk/mdraid1x_linux.c:149
149 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x),
(gdb) print ((*disk)->partition)->start
$42 = 0x2800
So grub_util_fd_open() mixed the native 4096b sectors with the grub
sector size of 512b. sector=0x404f8 was passed in but the mixing of
sector sizes caused the offset sent to grub_util_fd_seek to be
0x3dcf8000 instead of 0x3FFF8000.
------- Comment From ruddk at us.ibm.com 2017-04-21 12:57 EDT-------
Just to clarify for others in case the "All" arch specification in this bug is missed: This isn't really PPC64* specfic. The problem can be easily reproduced in an x86_64 KVM guest environment by simply presenting one of the virtual disks as a 4096 block device. For example:
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/>
<source dev='/dev/vg_root/spare'/>
<blockio logical_block_size='4096' physical_block_size='4096'/>
<target dev='vdb' bus='virtio'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
</disk>
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/3
------------------------------------------------------------------------
On 2017-06-08T18:00:22+00:00 bugproxy wrote:
------- Comment From willianr at br.ibm.com 2017-06-08 13:55 EDT-------
I just ran a fresh installation enabling raid on a 4k block disk and I could not reproduce the problem stated on additional notes "the issue was originally noted at installation-time". Here are the information right after the first boot:
[root at rhel-grub ~]# uname -a
Linux rhel-grub 3.10.0-514.el7.ppc64le #1 SMP Wed Oct 19 11:27:06 EDT 2016 ppc64le ppc64le ppc64le GNU/Linux
[root at rhel-grub ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 7.3 (Maipo)
[root at rhel-grub ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/md126 9.0G 1.2G 7.9G 13% /
devtmpfs 8.0G 0 8.0G 0% /dev
tmpfs 8.0G 0 8.0G 0% /dev/shm
tmpfs 8.0G 14M 8.0G 1% /run
tmpfs 8.0G 0 8.0G 0% /sys/fs/cgroup
/dev/md127 1018M 145M 874M 15% /boot
tmpfs 1.6G 0 1.6G 0% /run/user/0
[root at rhel-grub ~]# cat /proc/mdstat
md126 : active raid1 sdb1[1] sda2[0]
9423872 blocks super 1.2 [2/2] [UU]
bitmap: 1/1 pages [64KB], 65536KB chunk
md127 : active raid1 sdb2[1] sda3[0]
1048512 blocks super 1.0 [2/2] [UU]
bitmap: 0/1 pages [0KB], 65536KB chunk
[root at rhel-grub ~]# grub2-probe --device /dev/md126 --target fs_uuid
5de99add-1cf2-41f0-ba54-c08067e404d4
[root at rhel-grub ~]# grub2-probe --device /dev/md127 --target fs_uuid
d48f8f83-717b-405e-9e7b-02ba37de959a
[root at rhel-grub ~]# parted /dev/sda u s p
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sda: 2621440s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 256s 1279s 1024s primary boot, prep
2 1280s 2359295s 2358016s primary raid
3 2359296s 2621439s 262144s primary raid
[root at rhel-grub ~]# parted /dev/sdb u s p
Model: QEMU QEMU HARDDISK (scsi)
Disk /dev/sdb: 2621440s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 256s 2358271s 2358016s primary raid
2 2358272s 2620415s 262144s primary raid
I will do another installation without raid and then migrate it to raid
to check if the problem happens.
So, for now, can someone confirm this problem happens during install
time?
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/4
------------------------------------------------------------------------
On 2017-06-12T21:40:19+00:00 bugproxy wrote:
------- Comment From willianr at br.ibm.com 2017-06-12 17:33 EDT-------
As expected, migrating /boot to raid 1 using metadata 1.0 when it is the first partition after prep fails:
[root at rhel-grub2-1 ~]# mdadm -D /dev/md0
/dev/md0:
Version : 1.0
Creation Time : Mon Jun 12 17:22:07 2017
Raid Level : raid1
Array Size : 1048512 (1023.94 MiB 1073.68 MB)
Used Dev Size : 1048512 (1023.94 MiB 1073.68 MB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Mon Jun 12 17:26:45 2017
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Name : rhel-grub2-1:0 (local to host rhel-grub2-1)
UUID : 537bfbf4:0b89fb58:f50f14c3:ba5f2bf3
Events : 33
Number Major Minor RaidDevice State
2 253 2 0 active sync /dev/vda2
1 253 17 1 active sync /dev/vdb1
[root at rhel-grub2-1 ~]# parted /dev/vda u s p
Model: Virtio Block Device (virtblk)
Disk /dev/vda: 2621440s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 256s 1279s 1024s primary boot, prep
2 1280s 263423s 262144s primary raid
3 263424s 2360575s 2097152s primary
[root at rhel-grub2-1 ~]# parted /dev/vdb u s p
Model: Virtio Block Device (virtblk)
Disk /dev/vdb: 2621440s
Sector size (logical/physical): 4096B/4096B
Partition Table: msdos
Disk Flags:
Number Start End Size Type File system Flags
1 256s 262399s 262144s primary raid
[root at rhel-grub2-1 ~]# grub2-probe --device /dev/md0 --target fs_uuid
grub2-probe: error: disk ?mduuid/537bfbf40b89fb58f50f14c3ba5f2bf3? not found.
[root at rhel-grub2-1 ~]# grub2-install /dev/vdb
Installing for powerpc-ieee1275 platform.
grub2-install: error: disk ?mduuid/537bfbf40b89fb58f50f14c3ba5f2bf3? not found.
Now, interesting thing is that I was not able to migrate /boot to raid 1
using metadata 1.0 when /boot is not the first partition after prep
(just like the installer did on comment #15). When I tried the same as
the installer did, grub was not able to find /boot partition after root
partition.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/5
------------------------------------------------------------------------
On 2017-09-26T12:10:21+00:00 bugproxy wrote:
------- Comment From victora at br.ibm.com 2017-09-26 08:09 EDT-------
Hi,
I still didn't have time to work on this. I will try to work on this bz this week.
I will let you know when I have updates.
Thanks
Victor
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/6
------------------------------------------------------------------------
On 2017-10-27T11:32:30+00:00 ccoates wrote:
Is there any progress on this?
We've just been hit by the same issue on an E850 during install, getting
to the point where we can't install a system using software RAID 1.
As it stands, we're having to install without RAID to get a system up
and running...
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/7
------------------------------------------------------------------------
On 2017-10-27T15:01:01+00:00 bugproxy wrote:
------- Comment From ruddk at us.ibm.com 2017-10-27 10:51 EDT-------
(In reply to comment #23)
> Is there any progress on this?
>
> We've just been hit by the same issue on an E850 during install, getting to
> the point where we can't install a system using software RAID 1.
>
> As it stands, we're having to install without RAID to get a system up and
> running...
The install-side issue should already have been addressed for RHEL 7.4
via RH Bug 1184945. The easy workaround is to not use version 1.0
metadata for the RAID config.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/8
------------------------------------------------------------------------
On 2017-10-27T16:14:52+00:00 ccoates wrote:
(In reply to IBM Bug Proxy from comment #9)
> ------- Comment From ruddk at us.ibm.com 2017-10-27 10:51 EDT-------
> (In reply to comment #23)
> > Is there any progress on this?
> >
> > We've just been hit by the same issue on an E850 during install, getting to
> > the point where we can't install a system using software RAID 1.
> >
> > As it stands, we're having to install without RAID to get a system up and
> > running...
>
> The install-side issue should already have been addressed for RHEL 7.4 via
> RH Bug 1184945. The easy workaround is to not use version 1.0 metadata for
> the RAID config.
Unfortunately you can't specify a metadata type via kickstart for md
devices - so that's still a show-stopper for using RHEL7.3 on an E850.
As a work-around to allow RAID during install, i've had to specify /boot
as a btrfs partition, which worked perfectly fine.
Still - this isn't exactly an ideal solution for anyone using an E850
with RHEL 7.3... The customer i'm building out for isn't prepared to use
RHEL 7.4 yet.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/9
------------------------------------------------------------------------
On 2017-10-27T17:30:22+00:00 bugproxy wrote:
------- Comment From desnesn at br.ibm.com 2017-10-27 13:23 EDT-------
Hello Kevin,
The engineer that was in charged of this Bug is leaving IBM.
I am working in this bug as we speak (started this week), and I think I
am up to something. I will post my results by the end of the day.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/10
------------------------------------------------------------------------
On 2017-10-27T21:00:27+00:00 bugproxy wrote:
------- Comment From desnesn at br.ibm.com 2017-10-27 16:59 EDT-------
For now, I tried to look inside grub-probe to see if I could find any clues.
Through gdb, I noticed that when using 4k blocksize with --metadata=1.0
on the MD Raid disks, the `dev` variable from util/grub-probe.c +376 is
coming out not allocated, so a grub_util_error() is being thrown.
============================
util/grub-probe.c +376
============================
376 dev = grub_device_open (drives_names[0]);
377 if (! dev)
378 grub_util_error ("%s", grub_errmsg);
============================
Now, comparing grub_device_open() code on grub-core/kern/device.c, and
using both --metadata=1.0 and --metadata=0.90:
============================
grub-core/kern/device.c +47
============================
47 dev = grub_malloc (sizeof (*dev));
48 if (! dev)
49 goto fail;
50
51 dev->net = NULL;
52 /* Try to open a disk. */
53 dev->disk = grub_disk_open (name);
54 if (dev->disk)
55 return dev;
56 if (grub_net_open && grub_errno == GRUB_ERR_UNKNOWN_DEVICE)
57 {
58 grub_errno = GRUB_ERR_NONE;
59 dev->net = grub_net_open (name);
60 }
61
62 if (dev->net)
63 return dev;
64
65 fail:
66 grub_free (dev);
============================
CURIOSITY: The addresses that came out of the grub_malloc() on line 47
seem a bit odd with 1.0.
=====
RAID using --metadata=1.0: FAILS on grub2-probe
=====
Breakpoint 4, grub_device_open (name=0x10185290 "mduuid/ceebb143b7f740ba41794f2e88b1e1de") at grub-core/kern/device.c:48
48 if (! dev)
(gdb) print *dev
$3 = {
disk = 0x0,
net = 0x3fffb7ed07b8 <main_arena+104>
}
(gdb) print *dev->net
$4 = {
server = 0x3fffb7ed07a8 <main_arena+88> "\240(\034\020",
name = 0x3fffb7ed07a8 <main_arena+88> "\240(\034\020",
protocol = 0x3fffb7ed07b8 <main_arena+104>,
packs = {
first = 0x3fffb7ed07b8 <main_arena+104>,
last = 0x3fffb7ed07c8 <main_arena+120>,
count = 70367534974920
},
offset = 70367534974936,
fs = 0x3fffb7ed07d8 <main_arena+136>,
eof = -1209202712,
stall = 16383
}
=====
=====
RAID using --metadata=0.90: SUCCESS on grub2-probe
=====
Breakpoint 2, grub_device_open (name=0x10185830 "mduuid/1940b3311771bbb17b777c24c48ad94b") at grub-core/kern/device.c:48
48 if (! dev)
(gdb) print *dev
$1 = {
disk = 0x0,
net = 0x10185120
}
(gdb) print *dev->net
$3 = {
server = 0x61 <Address 0x61 out of bounds>,
name = 0x21 <Address 0x21 out of bounds>,
protocol = 0x3fffb7ed07b8 <main_arena+104>,
packs = {
first = 0x3fffb7ed07b8 <main_arena+104>,
last = 0x20,
count = 32
},
offset = 7742648064551382888,
fs = 0x64762f7665642f2f,
eof = 98,
stall = 0
}
=====
Anyhow, this was only an allocation, and on line 51 of grub-
core/kern/device.c dev->net receives NULL.
Using --metadata=1.0, `dev` is allocated, and the execution moves into
grub_disk_open() on line 53. This function is returning a struct with
its value set to zero here. Thus, it will jump the ifs on lines 54, 56
and 62; and eventually fails on 65.
=====
RAID using --metadata=1.0: FAILS on grub2-probe
=====
Breakpoint 2, grub_device_open (name=0x10185290 "mduuid/7266eba408736585cf9c00e3a2342fdc") at grub-core/kern/device.c:54
54 if (dev->disk)
(gdb) print *dev
$3 = {
disk = 0x0,
net = 0x0
}
=====
Whereas using --metadata=0.90, the struct is not zeroed out, and
grub_device_open() returns `dev` on line 54.
=====
RAID using --metadata=0.90: SUCCESS on grub2-probe
=====
Breakpoint 1, grub_device_open (name=0x10185830 "mduuid/a29da500c684c0d47b777c24c48ad94b") at grub-core/kern/device.c:54
54 if (dev->disk)
(gdb) print *dev
$1 = {
disk = 0x101834a0,
net = 0x0
}
=====
Riddle me this: why?
More grub to come ... will look into grub_disk_open() and on mdadm next.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/11
------------------------------------------------------------------------
On 2017-10-31T13:50:35+00:00 bugproxy wrote:
------- Comment From desnesn at br.ibm.com 2017-10-31 09:49 EDT-------
Going deeper in the rabbit hole from IBM Comment 27 / RH Comment 12:
============================
grub-core/kern/disk.c +187
============================
187 grub_disk_t
188 grub_disk_open (const char *name)
189 {
...
224 for (dev = grub_disk_dev_list; dev; dev = dev->next)
225 {
226 if ((dev->open) (raw, disk) == GRUB_ERR_NONE)
227 break;
228 else if (grub_errno == GRUB_ERR_UNKNOWN_DEVICE)
229 grub_errno = GRUB_ERR_NONE;
230 else
231 goto fail;
232 }
233
234 if (! dev)
235 {
236 grub_error (GRUB_ERR_UNKNOWN_DEVICE, N_("disk `%s' not found"),
237 name);
238 goto fail;
239 }
============================
Using --metadata=1.0, `dev` comes zeroed out after the for loop on line
224, whereas on 0.90 it is defined. Moreover, line 236 grub_error()
message is the one being printed by grub2-probe.
=====
FAILURE grub2-probe - RAID using --metadata=1.0
=====
Breakpoint 1, grub_disk_open (name=0x10185290 "mduuid/0ef5c3920edae097657894d84aef753d") at grub-core/kern/disk.c:234
234 if (! dev)
(gdb) print dev
$1 = (grub_disk_dev_t) 0x0
(gdb) print *dev
Cannot access memory at address 0x0
(gdb) s
236 grub_error (GRUB_ERR_UNKNOWN_DEVICE, N_("disk `%s' not found"),
=====
=====
SUCCESS grub2-probe - RAID using --metadata=0.90
=====
Breakpoint 1, grub_disk_open (name=0x10185830 "mduuid/ebae38d5105eed037b777c24c48ad94b") at grub-core/kern/disk.c:234
234 if (! dev)
(gdb) print dev
$1 = (grub_disk_dev_t) 0x10165e80 <grub_diskfilter_dev>
(gdb) print *dev
$2 = {
name = 0x10146d50 "diskfilter",
id = GRUB_DISK_DEVICE_DISKFILTER_ID,
iterate = 0x101107cc <grub_diskfilter_iterate>,
open = 0x10110fd8 <grub_diskfilter_open>,
close = 0x10111120 <grub_diskfilter_close>,
read = 0x1011220c <grub_diskfilter_read>,
write = 0x1011227c <grub_diskfilter_write>,
memberlist = 0x10110950 <grub_diskfilter_memberlist>,
raidname = 0x10110df4 <grub_diskfilter_getname>,
next = 0x10165fb8 <grub_procfs_dev>
}
(gdb) s
240 if (disk->log_sector_size > GRUB_DISK_CACHE_BITS + GRUB_DISK_SECTOR_BITS
=====
Since `dev` is used for a couple of devices on grub and this is a C
template struct, each dev had its own functions. In our case, we are
dealing with grub_disk_dev_t, and through gdb we can see that
dev->open() on line 226 actually is grub_diskfilter_open() on:
============================
grub-core/disk/diskfilter.c +419
============================
419 static grub_err_t
420 grub_diskfilter_open (const char *name, grub_disk_t disk)
421 {
422 struct grub_diskfilter_lv *lv;
423
424 if (!is_valid_diskfilter_name (name))
425 return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "unknown DISKFILTER device %s",
426 name);
427
428 lv = find_lv (name);
429
430 if (! lv)
431 {
432 scan_devices (name);
433 if (grub_errno)
434 {
435 grub_print_error ();
436 grub_errno = GRUB_ERR_NONE;
437 }
438 lv = find_lv (name);
439 }
440
441 if (!lv)
442 return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "unknown DISKFILTER device %s",
443 name);
444
445 disk->id = lv->number;
446 disk->data = lv;
447
448 disk->total_sectors = lv->size;
449 disk->max_agglomerate = GRUB_DISK_MAX_MAX_AGGLOMERATE;
450 return 0;
============================
The is_valid_diskfilter_name() check on line 426 passes for both
metadatas 0.90 and 1.0.
However, if we break line 441, a strange thing to note here - using
--metadata=1.0 all my disk devices passes through the breakpoint,
whereas using 0.90 only the raid device passed through the breakpoint on
line 441.
=====
FAILURE grub2-probe - RAID using --metadata=1.0
=====
Breakpoint 1, grub_diskfilter_open (name=0x10185b90 "lvm/rhel-root", disk=0x101827d0) at grub-core/disk/diskfilter.c:441
441 if (!lv)
(gdb) c
Continuing.
Breakpoint 1, grub_diskfilter_open (name=0x101859f0 "lvm/rhel-home", disk=0x101827d0) at grub-core/disk/diskfilter.c:441
441 if (!lv)
(gdb) c
Continuing.
Breakpoint 1, grub_diskfilter_open (name=0x101857e0 "lvm/rhel-swap", disk=0x101827d0) at grub-core/disk/diskfilter.c:441
441 if (!lv)
(gdb) c
Continuing.
...
Breakpoint 2, grub_diskfilter_open (name=0x10185290 "mduuid/0ef5c3920edae097657894d84aef753d", disk=0x10182780) at grub-core/disk/diskfilter.c:441
441 if (!lv)
(gdb) print lv
$2 = (struct grub_diskfilter_lv *) 0x0
(gdb) print *lv
Cannot access memory at address 0x0
(gdb) s
442 return grub_error (GRUB_ERR_UNKNOWN_DEVICE, "unknown DISKFILTER device %s",
=====
=====
SUCCESS grub2-probe - RAID using --metadata=0.90
=====
Breakpoint 1, grub_diskfilter_open (name=0x10185830 "mduuid/c5e0adca3d6a76ef7b777c24c48ad94b", disk=0x101834a0) at grub-core/disk/diskfilter.c:441
441 if (!lv)
(gdb) print lv
$1 = (struct grub_diskfilter_lv *) 0x10183690
(gdb) print *lv
$2 = {
fullname = 0x10183580 "md/md1",
idname = 0x10183700 "mduuid/c5e0adca3d6a76ef7b777c24c48ad94b",
name = 0x10183580 "md/md1",
number = 0,
segment_count = 1,
segment_alloc = 0,
size = 20969344,
became_readable_at = 1,
scanned = 0,
visible = 1,
segments = 0x10183730,
vg = 0x10183530,
next = 0x0,
internal_id = 0x0
}
(gdb) s
445 disk->id = lv->number;
=====
So, `lv` on 441 is coming out zeroed out. Let's back up a bit and break
line 430:
=====
FAILURE grub2-probe - RAID using --metadata=1.0
=====
Breakpoint 2, grub_diskfilter_open (name=0x10185290 "mduuid/2872dd311d2585e4690defc1d9ba07a7", disk=0x10182780) at grub-core/disk/diskfilter.c:430
430 if (! lv)
(gdb) print lv
$2 = (struct grub_diskfilter_lv *) 0x0
(gdb) print *lv
Cannot access memory at address 0x0
(gdb) c
Continuing.
...
Breakpoint 2, grub_diskfilter_open (name=0x10185b90 "lvm/rhel-root", disk=0x101827d0) at grub-core/disk/diskfilter.c:430
430 if (! lv)
(gdb) c
Continuing.
Breakpoint 2, grub_diskfilter_open (name=0x101859f0 "lvm/rhel-home", disk=0x101827d0) at grub-core/disk/diskfilter.c:430
430 if (! lv)
(gdb) c
Continuing.
Breakpoint 2, grub_diskfilter_open (name=0x101857e0 "lvm/rhel-swap", disk=0x101827d0) at grub-core/disk/diskfilter.c:430
430 if (! lv)
(gdb) c
Continuing.
/usr/sbin/grub2-probe: error: disk ?mduuid/2872dd311d2585e4690defc1d9ba07a7? not found.
[Inferior 1 (process 19832) exited with code 01]
=====
=====
SUCCESS grub2-probe - RAID using --metadata=0.90
=====
Breakpoint 1, grub_diskfilter_open (name=0x10185830 "mduuid/a2f06ca0ad6cedbc7b777c24c48ad94b", disk=0x101834a0) at grub-core/disk/diskfilter.c:430
430 if (! lv)
(gdb) print lv
$1 = (struct grub_diskfilter_lv *) 0x10183690
(gdb) print *lv
$2 = {
fullname = 0x10183580 "md/md1",
idname = 0x10183700 "mduuid/a2f06ca0ad6cedbc7b777c24c48ad94b",
name = 0x10183580 "md/md1",
number = 0,
segment_count = 1,
segment_alloc = 0,
size = 20969344,
became_readable_at = 1,
scanned = 0,
visible = 1,
segments = 0x10183730,
vg = 0x10183530,
next = 0x0,
internal_id = 0x0
}
=====
Thus, apparently the culprit now might be hiding on find_lv ().
============================
grub-core/disk/diskfilter.c +401
============================
401 static struct grub_diskfilter_lv *
402 find_lv (const char *name)
403 {
404 struct grub_diskfilter_vg *vg;
405 struct grub_diskfilter_lv *lv = NULL;
406
407 for (vg = array_list; vg; vg = vg->next)
408 {
409 if (vg->lvs)
410 for (lv = vg->lvs; lv; lv = lv->next)
411 if (((lv->fullname && grub_strcmp (lv->fullname, name) == 0)
412 || (lv->idname && grub_strcmp (lv->idname, name) == 0))
413 && is_lv_readable (lv, 0))
414 return lv;
415 }
416 return NULL;
417 }
============================
=====
FAILURE grub2-probe - RAID using --metadata=1.0
=====
Breakpoint 1, find_lv (name=0x10185290 "mduuid/e5cf979ce818f58cc57b618bd78b4b86") at grub-core/disk/diskfilter.c:407
407 for (vg = array_list; vg; vg = vg->next)
(gdb) print array_list
$1 = (struct grub_diskfilter_vg *) 0x0
(gdb) print *array_list
Cannot access memory at address 0x0
(gdb) s
416 return NULL;
=====
=====
SUCCESS grub2-probe - RAID using --metadata=0.90
=====
Breakpoint 5, find_lv (name=0x10185830 "mduuid/a2f06ca0ad6cedbc7b777c24c48ad94b") at grub-core/disk/diskfilter.c:407
407 for (vg = array_list; vg; vg = vg->next)
(gdb) print array_list
$5 = (struct grub_diskfilter_vg *) 0x10183530
(gdb) print *array_list
$6 = {
uuid = 0x10183300 "\242\360l\240\255l\355\274{w|$?\331Ke/en_US.!",
uuid_len = 16,
name = 0x10183580 "md/md1",
extent_size = 1,
pvs = 0x10186090,
lvs = 0x10183690,
next = 0x0,
driver = 0x10160228 <grub_mdraid_dev>
}
(gdb) s
409 if (vg->lvs)
=====
Therefore, at this point we can infer that RAID 1 with --metadata=1.0 on
4k blocksize disks is leading to the creation of an empty array_list,
which is leading to everything else.
Riddle me this: why?
More gdb to come ... apparently array_list is being populated on
grub_diskfilter_vg_register() at grub-core/disk/diskfilter.c +838.
Will look into that next, and eventually on mdadm.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/12
------------------------------------------------------------------------
On 2017-12-11T20:31:04+00:00 bugproxy wrote:
------- Comment From desnesn at br.ibm.com 2017-12-11 15:28 EDT-------
Finally had quality time for this bug again. Carrying on:
>From breaking grub_diskfilter_vg_register(), we can observe that the
disk is being registered differently on 1.0 to 0.90; and that is because
the diskfilter that is being registered is for my rhel OS instead of my
raid. By doing a backtrace, I also noticed that even the stack is
different when calling grub_diskfilter_vg_register():
=====
--- bad 2017-12-07 13:44:39.654222238 -0200
+++ good 2017-12-07 13:43:52.563919187 -0200
@@ -1,36 +1,39 @@
-Breakpoint 1, grub_diskfilter_vg_register (vg=0x10185ea0) at grub-core/disk/diskfilter.c:849
+Breakpoint 1, grub_diskfilter_vg_register (vg=0x10183530) at grub-core/disk/diskfilter.c:849
849 for (lv = vg->lvs; lv; lv = lv->next)
(gdb) print *vg
$4 = {
- uuid = 0x10185ef0 "xZL9PN-dXgE-Vflt-rtI5-Y203-gQ6e-TBS0Mz",
- uuid_len = 38,
- name = 0x10185e80 "rhel",
- extent_size = 8192,
- pvs = 0x10185f20,
- lvs = 0x10185b00,
+ uuid = 0x10183300 "?\300\263\362\242\326]\232\334\316r\364\264\370\267e/en_US.!",
+ uuid_len = 16,
+ name = 0x10183580 "md/1",
+ extent_size = 1,
+ pvs = 0x101850c0,
+ lvs = 0x10183690,
next = 0x0,
driver = 0x0
}
(gdb) bt
-#0 grub_diskfilter_vg_register (vg=0x10185ea0) at grub-core/disk/diskfilter.c:849
-#1 0x0000000010009810 in grub_lvm_detect (disk=0x101827d0, id=0x3ffffffde4e8, start_sector=0x3ffffffde4e0) at grub-core/disk/lvm.c:744
-#2 0x00000000101102b0 in scan_disk_partition_iter (disk=0x101827d0, p=0x3ffffffde568, data=0x101812d0) at grub-core/disk/diskfilter.c:161
-#3 0x0000000010101400 in part_iterate (dsk=0x101827d0, partition=0x3ffffffde680, data=0x3ffffffde920) at grub-core/kern/partition.c:196
-#4 0x00000000101146b8 in grub_partition_msdos_iterate (disk=0x101827d0, hook=0x10101324 <part_iterate>, hook_data=0x3ffffffde920)
+#0 grub_diskfilter_vg_register (vg=0x10183530) at grub-core/disk/diskfilter.c:849
==> +#1 0x0000000010112dd0 in grub_diskfilter_make_raid (uuidlen=16,
+ uuid=0x10183300 "?\300\263\362\242\326]\232\334\316r\364\264\370\267e/en_US.!", nmemb=2, name=0x3ffffffde3e8 "rhel-7.3:1",
+ disk_size=20969216, stripe_size=0, layout=0, level=1) at grub-core/disk/diskfilter.c:1030
+#2 0x000000001000a414 in grub_mdraid_detect (disk=0x101834a0, id=0x3ffffffde588, start_sector=0x3ffffffde580)
+ at grub-core/disk/mdraid1x_linux.c:202
+#3 0x00000000101102b0 in scan_disk_partition_iter (disk=0x101834a0, p=0x3ffffffde608, data=0x10182800) at grub-core/disk/diskfilter.c:161
+#4 0x0000000010101400 in part_iterate (dsk=0x101834a0, partition=0x3ffffffde720, data=0x3ffffffde9c0) at grub-core/kern/partition.c:196
+#5 0x00000000101146b8 in grub_partition_msdos_iterate (disk=0x101834a0, hook=0x10101324 <part_iterate>, hook_data=0x3ffffffde9c0)
at grub-core/partmap/msdos.c:196
-#5 0x000000001010158c in grub_partition_iterate (disk=0x101827d0, hook=0x101100cc <scan_disk_partition_iter>, hook_data=0x101812d0)
+#6 0x000000001010158c in grub_partition_iterate (disk=0x101834a0, hook=0x101100cc <scan_disk_partition_iter>, hook_data=0x10182800)
at grub-core/kern/partition.c:233
-#6 0x00000000101104c0 in scan_disk (name=0x101812d0 "hd0", accept_diskfilter=0) at grub-core/disk/diskfilter.c:204
-#7 0x000000001011054c in scan_disk_hook (name=0x101812d0 "hd0", data=0x0) at grub-core/disk/diskfilter.c:213
-#8 0x00000000100f62e8 in grub_util_biosdisk_iterate (hook=0x1011051c <scan_disk_hook>, hook_data=0x0, pull=GRUB_DISK_PULL_NONE)
- at grub-core/kern/emu/hostdisk.c:119
-#9 0x0000000010110610 in scan_devices (arname=0x10185290 "mduuid/76da2b4aea5de03cfc91176e98fc2140") at grub-core/disk/diskfilter.c:231
-#10 0x0000000010111050 in grub_diskfilter_open (name=0x10185290 "mduuid/76da2b4aea5de03cfc91176e98fc2140", disk=0x10182780)
- at grub-core/disk/diskfilter.c:432
-#11 0x00000000100ee00c in grub_disk_open (name=0x10185290 "mduuid/76da2b4aea5de03cfc91176e98fc2140") at grub-core/kern/disk.c:226
-#12 0x00000000100ed0a0 in grub_device_open (name=0x10185290 "mduuid/76da2b4aea5de03cfc91176e98fc2140") at grub-core/kern/device.c:53
-#13 0x0000000010003ee0 in probe (path=0x0, device_names=0x10181050, delim=10 '\n') at util/grub-probe.c:376
-#14 0x00000000100056b4 in main (argc=5, argv=0x3ffffffff3d8) at util/grub-probe.c:882
+#7 0x00000000101104c0 in scan_disk (name=0x10182800 "hostdisk//dev/vda", accept_diskfilter=1) at grub-core/disk/diskfilter.c:204
+#8 0x00000000101135ec in grub_diskfilter_get_pv_from_disk (disk=0x101827b0, vg_out=0x3ffffffdeaf0) at grub-core/disk/diskfilter.c:1173
+#9 0x000000001010f39c in grub_util_get_ldm (disk=0x101827b0, start=2048) at grub-core/disk/ldm.c:876
+#10 0x00000000100f03f4 in grub_util_biosdisk_get_grub_dev (os_dev=0x1018a800 "/dev/vda1") at util/getroot.c:437
+#11 0x00000000100efb60 in grub_util_pull_device (os_dev=0x1018a800 "/dev/vda1") at util/getroot.c:111
+#12 0x00000000100f4ee4 in grub_util_pull_device_os (os_dev=0x10181350 "/dev/md1", ab=GRUB_DEV_ABSTRACTION_RAID)
+ at grub-core/osdep/linux/getroot.c:1064
+#13 0x00000000100efb44 in grub_util_pull_device (os_dev=0x10181350 "/dev/md1") at util/getroot.c:108
+#14 0x0000000010003b14 in probe (path=0x0, device_names=0x10181050, delim=10 '\n') at util/grub-probe.c:304
+#15 0x00000000100056b4 in main (argc=5, argv=0x3ffffffff3d8) at util/grub-probe.c:882
...
-/usr/sbin/grub2-probe: error: disk ?mduuid/76da2b4aea5de03cfc91176e98fc2140? not found.
-[Inferior 1 (process 17894) exited with code 01]
+d0aeef94-f2ba-4831-8bbe-523e587adc46
+[Inferior 1 (process 17933) exited normally]
=====
Since I also noticed that grub_diskfilter_make_raid() (grub-
core/disk/mdraid_linux.c:256 for 0.90, and grub-
core/disk/mdraid1x_linux.c:202 for 1.0) was never called when using 4k
block size, I thought now would be interesting to change my testing and
follow the good stack (512 block size) to see where the path deviates.
Thus, I decided to compare a raid disk of 4k (bad) with another of 512
(good), but only using only metadata 1.0 from this point on, which made
me reach:
=====
--- bad 2017-11-29 23:27:37.127052665 -0200
+++ good 2017-11-29 23:24:12.039919190 -0200
...
Breakpoint 1, grub_mdraid_detect (disk=0x101834a0, id=0x3ffffffde9c8, start_sector=0x3ffffffde9c0) at grub-core/disk/mdraid1x_linux.c:153
153 if (sb.magic != grub_cpu_to_le32_compile_time (SB_MAGIC)
@@ -50,18 +50,21 @@ $18 = 0x8
(gdb) n
124 for (minor_version = 0; minor_version < 3; ++minor_version)
(gdb) print disk->name
-$19 = 0x101834f0 "hostdisk//dev/vdc"
+$19 = 0x101834f0 "hostdisk//dev/vda"
(gdb) c
Continuing.
...
(gdb) c
Continuing.
...
(gdb) c
Continuing.
...
Breakpoint 1, grub_mdraid_detect (disk=0x101834a0, id=0x3ffffffde588, start_sector=0x3ffffffde580) at grub-core/disk/mdraid1x_linux.c:153
153 if (sb.magic != grub_cpu_to_le32_compile_time (SB_MAGIC)
(gdb) p/x sb.magic
-$20 = 0x0
+$20 = 0xa92b4efc
(gdb) p/x sb.super_offset
-$21 = 0x0
+$21 = 0x13ff7f0
(gdb) p/x sector
$22 = 0x13ff7f0
(gdb) n
-155 continue;
+154 || grub_le_to_cpu64 (sb.super_offset) != sector)
+(gdb) n
+157 if (sb.major_version != grub_cpu_to_le32_compile_time (1))
...
and latter, grub_diskfilter_make_raid() was only called on the 512 raid.
Moreover, the next time this breakpoint is reached is for the other disk.
=====
Quick note:
=====
[root at rhel-7 grub-2.02~beta2]# grep -rnI "define SB_MAGIC" .
./grub-core/disk/mdraid1x_linux.c:32:#define SB_MAGIC 0xa92b4efc
./grub-core/disk/mdraid_linux.c:97:#define SB_MAGIC 0xa92b4efc
=====
Thus, the conditionals on line 153 and 157 are always falling since sb
is zeroed out; never allowing grub_diskfilter_make_raid() to be called.
The sb variable should had been filled in the line before:
============================
grub-core/disk/mdraid1x_linux.c
============================
149 if (grub_disk_read (disk, sector, 0, sizeof (struct grub_raid_super_1x),
150 &sb))
151 return NULL;
============================
Going deeper in the rabbit hole and skipping a few steps (inside
grub_disk_read() at grub-core/kern/disk.c:413; which is a function that
calls grub_disk_read_small() on grub-core/kern/disk.c:396; which calls
grub_disk_read_small_real() at grub-core/kern/disk.c:317):
============================
grub-core/kern/disk.c
============================
380 if ((disk->dev->read) (disk, transform_sector (disk, aligned_sector),
381 num, tmp_buf))
============================
We can observe that:
======
--- bad 2017-12-08 16:40:58.277936277 -0200
+++ good 2017-12-08 15:02:37.620925526 -0200
@@ -1,22 +1,22 @@
380 if ((disk->dev->read) (disk, transform_sector (disk, aligned_sector),
(gdb) p/x ((struct grub_raid_super_1x*) tmp_buf)->magic
-$11 = 0xb7ed0ec8
+$11 = 0xb7ed09a8
(gdb) p/x ((struct grub_raid_super_1x*) tmp_buf)->super_offset
$12 = 0x0
(gdb) p/x ((struct grub_raid_super_1x*) buf)->magic
$13 = 0x12
(gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset
$14 = 0x3ffffffde460
(gdb) n
-Detaching after fork from child process 19992.
+Detaching after fork from child process 19972.
389 grub_memcpy (buf, tmp_buf + offset, size);
(gdb) p/x ((struct grub_raid_super_1x*) tmp_buf)->magic
-$15 = 0x0
+$15 = 0xa92b4efc
(gdb) p/x ((struct grub_raid_super_1x*) tmp_buf)->super_offset
-$16 = 0x0
+$16 = 0x13ff7f0
(gdb) n
390 grub_free (tmp_buf);
(gdb) p/x ((struct grub_raid_super_1x*) buf)->magic
-$17 = 0x0
+$17 = 0xa92b4efc
(gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset
-$18 = 0x0
+$18 = 0x13ff7f0
======
Further, disk->dev->read() first calls transform_sector():
============================
grub-core/kern/disk_common.c
============================
42 static inline grub_disk_addr_t
43 transform_sector (grub_disk_t disk, grub_disk_addr_t sector)
44 {
45 return sector >> (disk->log_sector_size - GRUB_DISK_SECTOR_BITS);
46 }
============================
And aftewards disk->dev->read() (pointer to grub_util_biosdisk_read() on
grub-core/kern/emu/hostdisk.c:282) calls grub_util_fd_read():
============================
grub-core/kern/emu/hostdisk.c
============================
281 static grub_err_t
282 grub_util_biosdisk_read (grub_disk_t disk, grub_disk_addr_t sector,
283 grub_size_t size, char *buf)
284 {
...
305 if (grub_util_fd_read (fd, buf, max << disk->log_sector_size)
306 != (ssize_t) (max << disk->log_sector_size))
307 return grub_error (GRUB_ERR_READ_ERROR, N_("cannot read `%s': %s"),
308 map[disk->id].device, grub_util_fd_strerror ());
============================
Which results:
======
--- bad 2017-12-08 17:21:19.963486041 -0200
+++ good 2017-12-08 17:21:18.714475999 -0200
@@ -1,17 +1,17 @@
305 if (grub_util_fd_read (fd, buf, max << disk->log_sector_size)
(gdb) p/x ((struct grub_raid_super_1x*) buf)->magic
-$18 = 0xb7ed0ec8
+$18 = 0xb7ed09a8
(gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset
$19 = 0x0
(gdb) p max << disk->log_sector_size
-$20 = 4096
+$20 = 512
(gdb) p/x sector
-$23 = 0x27fffe
+$23 = 0x13ffff0
(gdb) n
306 != (ssize_t) (max << disk->log_sector_size))
(gdb)
305 if (grub_util_fd_read (fd, buf, max << disk->log_sector_size)
(gdb) p/x ((struct grub_raid_super_1x*) buf)->magic
-$25 = 0x0
+$25 = 0xa92b4efc
(gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset
-$26 = 0x0
+$26 = 0x13ff7f0
======
Actually, the whole *buf is zeroed out after this call. Going inside
grub_util_fd_read() now:
============================
grub-core/osdep/unix/hostdisk.c
============================
113 /* Read LEN bytes from FD in BUF. Return less than or equal to zero if an
114 error occurs, otherwise return LEN. */
115 ssize_t
116 grub_util_fd_read (grub_util_fd_t fd, char *buf, size_t len)
117 {
118 ssize_t size = 0;
119
120 while (len)
121 {
122 ssize_t ret = read (fd, buf, len);
...
138 }
139
140 return size;
============================
Which led to:
======
--- bad 2017-12-08 18:42:33.937517190 -0200
+++ good 2017-12-08 18:39:08.588844344 -0200
@@ -1,16 +1,17 @@
-grub_util_fd_read (fd=8, buf=0x10184fd0 "\310\016\355\267\377?", len=4096) at grub-core/osdep/unix/hostdisk.c:118
+grub_util_fd_read (fd=8, buf=0x10183530 "\250\t\355\267\377?", len=512) at grub-core/osdep/unix/hostdisk.c:118
118 ssize_t size = 0;
(gdb) n
120 while (len)
(gdb)
122 ssize_t ret = read (fd, buf, len);
(gdb) p/x ((struct grub_raid_super_1x*) buf)->magic
-$31 = 0xb7ed0ec8
+$31 = 0xb7ed09a8
(gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset
$32 = 0x0
(gdb) n
124 if (ret == 0)
(gdb) p/x ((struct grub_raid_super_1x*) buf)->magic
-$35 = 0x0
+$35 = 0xa92b4efc
(gdb) p/x ((struct grub_raid_super_1x*) buf)->super_offset
-$36 = 0x0
+$36 = 0x13ff7f0
(gdb) p ret
-$37 = 4096
+$37 = 512
======
Note that 4096 zeroes were read, and that's why no error is thrown on
grub-core/kern/emu/hostdisk.c:305.
Now the question is: are we reading on the wrong sector, or is this data
really zeroed out (which would imply that the bug is on mdadm?)?
I believe we have enough data to start an upstream discussion, which I
plan to do soon. More to come ...
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/13
------------------------------------------------------------------------
On 2018-01-16T19:10:44+00:00 bugproxy wrote:
------- Comment From desnesn at br.ibm.com 2018-01-16 14:09 EDT-------
Just for the record, I have also reproduced this bug in x86_64 with ubuntu 17.10.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/14
------------------------------------------------------------------------
On 2018-06-06T19:01:20+00:00 pjones wrote:
This works for me using grub2-2.02-0.65.el7_4.2 on an EFI machine with a
4k disk:
[root at pjones3 tmp]# blockdev --getbsz /dev/sdb
4096
[root at pjones3 tmp]# blockdev --getbsz /dev/sdb2
4096
[root at pjones3 tmp]# blockdev --getbsz /dev/md0
4096
[root at pjones3 tmp]# ./usr/sbin/grub2-probe --target fs_uuid -d /dev/sdb2
c1b85a71-972d-4b69-84cc-e6a05326a4c8
[root at pjones3 tmp]# ./usr/sbin/grub2-probe --target fs_uuid -d /dev/md0
c1b85a71-972d-4b69-84cc-e6a05326a4c8
Note the detection in mdraid1x_linux.c still isn't right, because it's
trying to find the raid superblock based on the location of the
superblock data based on the size of /dev/sdb rather than /dev/sdb2, but
grub2-probe and booting the machine with /boot on this raid are both
successful.
I see this was reported with grub2-2.02-0.44.el7 ; does the newer
package work for you?
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/15
------------------------------------------------------------------------
On 2018-06-13T19:30:21+00:00 bugproxy wrote:
------- Comment From diegodo at br.ibm.com 2018-06-13 15:24 EDT-------
Hi, I'm still getting the result:
./grub-probe: error: disk `mduuid/b184ce73be4a91ec1b586dcce8ee7f9b' not
found.
One thing that I noticed is that we have some sector lengths hardcoded
for 512 bytes. Yet, it seems that grub is facing some problems when
trying to find magic number for 1.0 metadata.
I did dump the variables returned by the disk read when the
mdraid1x_linux.c tries to find the magic number and its getting the
wrong position.
When I finally changed the hardcoded sector lenghts for 4k instead, the
mdraid1x_linux.c was able to find the magic number, although it wasn't
able to successful find the disk yet.
I'm still investigating this problem and hope I find something in a
couple of days.
Thank you
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/16
------------------------------------------------------------------------
On 2019-06-13T09:12:28+00:00 hannsj_uhl wrote:
ok ... with no news for this bugzilla since exactly one year
I am closing this Red Hat bugzilla now
and please reopen if required with then using the current RHEL7.7 ...
... thanks for your support ...
Reply at:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/comments/18
** Changed in: grub2 (CentOS)
Status: Unknown => Expired
** Changed in: grub2 (CentOS)
Importance: Unknown => High
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to grub2 in Ubuntu.
https://bugs.launchpad.net/bugs/1817713
Title:
Grub2 does not detect MD raid (level 1) 1.0 superblocks on 4k block
devices
Status in grub2 package in Ubuntu:
New
Status in grub2 package in CentOS:
Expired
Bug description:
grub-install will fail if the /boot partition is located on a MD raid
device level 1 backed by 4k sectorsize devices, NVMe drives in my
case.
Steps to Reproduce:
1°) Create a raid1 with 1.0 superblock with two 4k sectorsize devices
2°) Create a partition for /boot and format it FAT32
3°) Mount /boot
4°) grub-install complains about not be being able to find the mduuid device
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/grub2/+bug/1817713/+subscriptions
More information about the foundations-bugs
mailing list