2010-04-07 21:56:36 |
TJ |
description |
Binary package hint: grub2
Many people are reporting failure of GRUB2 to boot. Usually this is Karmic and more lately Lucid. In the forums there is a thread with a workaround being used - to install Lilo:
http://ubuntuforums.org/showthread.php?t=1374209
I had Jaunty running fine on a C100 and decided to test Lucid. I booted using PXE over the network from a Xubuntu Live i386 CD image, ran the installer, and rebooted.
As soon as BIOS hands over to GRUB2 the screen shows:
GRUB
Geom Error
and that's it - nothing else.
GRUB1 had worked fine with the exact same partition layout on the disk:
1 ntfs 13GB Windows
2 extended
5 ext3 26GB Linux
6 swap ~1GB
In March 2009 I was diagnosing a problem with a USB key failing to boot in a similar way. The USB key used the syslinux project boot loader and so I wrote a diagnostic master boot record (MBR) that reports succinctly what the BIOS tells the boot code about which device it is booting from. It also allows to hold down the Shift or Ctrl keys to change its behaviour. The MBR code is only 435 bytes long.
I installed mbr-diag.bin into the MBR of the C100. It reveals that the BIOS is passing some very weird values to the boot code regardless of what the BIOS's Startup Configuration, Boot Order settings are.
Explanation of usage and output codes of mbr-diag.bin:
If a shift key is held down at boot, CHS addressing mode is forced
If Ctrl key is held down, drive number 0x80 is forced
L | C LBA or CHS addressing mode
D drive number BIOS-reported drive number
C cylinders Geometry of drive according to BIOS
H heads
S sectors
P partition active partition number (first partition flagged active). '?' if no active partition
O offset absolute sector offset of active partition . '????????' if no active partition
M magic magic bytes of active partition boot sector (sector <offset> as read by BIOS).
'????' if no active partition. Value is reset to 0xDEAD before the sector is read
to avoid inheriting the MBR magic on error
E error error code returned by BIOS 'read sector' interrupt (0x02 or 0x42, int 0x13).
'??' if no active partition.
It shows:
C D5F C000 H01 S01 P1 O0000003F MDEAD E01
So that means, CHS addressing mode, drive 95, 0 cylinders, 1 head, 1 sector, active partition #1, offset to partition#1 63 sectors, magic bytes not read since BIOS reported error 1.
I then tried holding the Ctrl key down to force hard disk 0x80 to be used:
L D80 C3FE HFF S3F P1 O0000003F M0000 E0
I'd have expected to see something close to this, which is an example of a 'good' set of BIOS boot parameters:
L D80 C3D9 HFF S3F P1 O00000020 MAA55 E00
However, I'd moved the Windows partition to the end of the disk to avoid any problems with the BIOS not being able to address beyond cylinder 1024. The new layout is:
1 30 0x83 ext4 (250MB ext4 /boot)
31 124 0x82 swap (750MB swap)
125 3208 0x83 ext4 (26GB Linux /)
3209 4864 0x07 ntfs (13GB Windows)
So P1 points to the Linux /boot partition which doesn't have a volume boot sector and *does* contain 0x0000 in the magic bytes slots.
To try and confirm that forcing drive 0x80 was causing BIOS to read the correct device I changed the active partition to #4 (Windows) that does have a volume boot sector with the magic bytes 0x55AA. When the PC was rebooted it showed (without Ctrl pressed):
C D5F C000 H01 S01 P4 O03126288 MDEAD E01
Well, progress! partition #4 has been seen as the active one but reads still fail as the magic bytes and error show.
I tried again, this time pressing Ctrl key:
L D80 C3FE HFF S3F P4 O03126288 MAA55 E00
Success! The magic bytes show the BIOS was able to read the volume boot sector from partition #4, and the initial "L" shows it was in LBA mode so was able to address beyond the 1024 cylinder limit.
My next step will be to create a patch for the GRUB2 boot sector similar to the one I contributed to the syslinux project that allows the use of the Ctrl key pressed at boot to force disk 0x80 and LBA mode. |
Binary package hint: grub2
Many people are reporting failure of GRUB2 to boot. Usually this is Karmic and more lately Lucid. In the forums there is a thread with a workaround being used - to install Lilo:
http://ubuntuforums.org/showthread.php?t=1374209
I had Jaunty running fine on an Acer Travelmate C100 and decided to test Lucid. I booted using PXE over the network from a Xubuntu Live i386 CD image, ran the installer, and rebooted.
As soon as BIOS hands over to GRUB2 the screen shows:
GRUB
Geom Error
and that's it - nothing else.
GRUB1 had worked fine with the exact same partition layout on the disk:
1 ntfs 13GB Windows
2 extended
5 ext3 26GB Linux
6 swap ~1GB
In March 2009 I was diagnosing a problem with a USB key failing to boot in a similar way. The USB key used the syslinux project boot loader and so I wrote a diagnostic master boot record (MBR) that reports succinctly what the BIOS tells the boot code about which device it is booting from. It also allows to hold down the Shift or Ctrl keys to change its behaviour. The MBR code is only 435 bytes long.
I installed mbr-diag.bin into the MBR of the C100. It reveals that the BIOS is passing some very weird values to the boot code regardless of what the BIOS's Startup Configuration, Boot Order settings are.
Explanation of usage and output codes of mbr-diag.bin:
If a shift key is held down at boot, CHS addressing mode is forced
If Ctrl key is held down, drive number 0x80 is forced
L | C LBA or CHS addressing mode
D drive number BIOS-reported drive number
C cylinders Geometry of drive according to BIOS
H heads
S sectors
P partition active partition number (first partition flagged active). '?' if no active partition
O offset absolute sector offset of active partition . '????????' if no active partition
M magic magic bytes of active partition boot sector (sector <offset> as read by BIOS).
'????' if no active partition. Value is reset to 0xDEAD before the sector is read
to avoid inheriting the MBR magic on error
E error error code returned by BIOS 'read sector' interrupt (0x02 or 0x42, int 0x13).
'??' if no active partition.
It shows:
C D5F C000 H01 S01 P1 O0000003F MDEAD E01
So that means, CHS addressing mode, drive 95, 0 cylinders, 1 head, 1 sector, active partition #1, offset to partition#1 63 sectors, magic bytes not read since BIOS reported error 1.
I then tried holding the Ctrl key down to force hard disk 0x80 to be used:
L D80 C3FE HFF S3F P1 O0000003F M0000 E0
I'd have expected to see something close to this, which is an example of a 'good' set of BIOS boot parameters:
L D80 C3D9 HFF S3F P1 O00000020 MAA55 E00
However, I'd moved the Windows partition to the end of the disk to avoid any problems with the BIOS not being able to address beyond cylinder 1024. The new layout is:
1 30 0x83 ext4 (250MB ext4 /boot)
31 124 0x82 swap (750MB swap)
125 3208 0x83 ext4 (26GB Linux /)
3209 4864 0x07 ntfs (13GB Windows)
So P1 points to the Linux /boot partition which doesn't have a volume boot sector and *does* contain 0x0000 in the magic bytes slots.
To try and confirm that forcing drive 0x80 was causing BIOS to read the correct device I changed the active partition to #4 (Windows) that does have a volume boot sector with the magic bytes 0x55AA. When the PC was rebooted it showed (without Ctrl pressed):
C D5F C000 H01 S01 P4 O03126288 MDEAD E01
Well, progress! partition #4 has been seen as the active one but reads still fail as the magic bytes and error show.
I tried again, this time pressing Ctrl key:
L D80 C3FE HFF S3F P4 O03126288 MAA55 E00
Success! The magic bytes show the BIOS was able to read the volume boot sector from partition #4, and the initial "L" shows it was in LBA mode so was able to address beyond the 1024 cylinder limit.
My next step will be to create a patch for the GRUB2 boot sector similar to the one I contributed to the syslinux project that allows the use of the Ctrl key pressed at boot to force disk 0x80 and LBA mode.
|
|