Buggy BIOS hard disk workaround missing; causes: "Geom Error"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
grub |
Unknown
|
Unknown
|
|||
grub2 (Ubuntu) |
Fix Released
|
High
|
Unassigned | ||
Lucid |
Fix Released
|
High
|
Unassigned |
Bug Description
Binary package hint: grub2
Many people are reporting failure of GRUB2 to boot. Usually this is Karmic and more lately Lucid. In the forums there is a thread with a workaround being used - to install Lilo:
http://
I had Jaunty running fine on an Acer Travelmate C100 and decided to test Lucid. I booted using PXE over the network from a Xubuntu Live i386 CD image, ran the installer, and rebooted.
As soon as BIOS hands over to GRUB2 the screen shows:
GRUB
Geom Error
and that's it - nothing else.
GRUB1 had worked fine with the exact same partition layout on the disk:
1 ntfs 13GB Windows
2 extended
5 ext3 26GB Linux
6 swap ~1GB
In March 2009 I was diagnosing a problem with a USB key failing to boot in a similar way. The USB key used the syslinux project boot loader and so I wrote a diagnostic master boot record (MBR) that reports succinctly what the BIOS tells the boot code about which device it is booting from. It also allows to hold down the Shift or Ctrl keys to change its behaviour. The MBR code is only 435 bytes long.
I installed mbr-diag.bin into the MBR of the C100. It reveals that the BIOS is passing some very weird values to the boot code regardless of what the BIOS's Startup Configuration, Boot Order settings are.
Explanation of usage and output codes of mbr-diag.bin:
If a shift key is held down at boot, CHS addressing mode is forced
If Ctrl key is held down, drive number 0x80 is forced
L | C LBA or CHS addressing mode
D drive number BIOS-reported drive number
C cylinders Geometry of drive according to BIOS
H heads
S sectors
P partition active partition number (first partition flagged active). '?' if no active partition
O offset absolute sector offset of active partition . '????????' if no active partition
M magic magic bytes of active partition boot sector (sector <offset> as read by BIOS).
E error error code returned by BIOS 'read sector' interrupt (0x02 or 0x42, int 0x13).
It shows:
C D5F C000 H01 S01 P1 O0000003F MDEAD E01
So that means, CHS addressing mode, drive 95, 0 cylinders, 1 head, 1 sector, active partition #1, offset to partition#1 63 sectors, magic bytes not read since BIOS reported error 1.
I then tried holding the Ctrl key down to force hard disk 0x80 to be used:
L D80 C3FE HFF S3F P1 O0000003F M0000 E0
I'd have expected to see something close to this, which is an example of a 'good' set of BIOS boot parameters:
L D80 C3D9 HFF S3F P1 O00000020 MAA55 E00
However, I'd moved the Windows partition to the end of the disk to avoid any problems with the BIOS not being able to address beyond cylinder 1024. The new layout is:
1 30 0x83 ext4 (250MB ext4 /boot)
31 124 0x82 swap (750MB swap)
125 3208 0x83 ext4 (26GB Linux /)
3209 4864 0x07 ntfs (13GB Windows)
So P1 points to the Linux /boot partition which doesn't have a volume boot sector and *does* contain 0x0000 in the magic bytes slots.
To try and confirm that forcing drive 0x80 was causing BIOS to read the correct device I changed the active partition to #4 (Windows) that does have a volume boot sector with the magic bytes 0x55AA. When the PC was rebooted it showed (without Ctrl pressed):
C D5F C000 H01 S01 P4 O03126288 MDEAD E01
Well, progress! partition #4 has been seen as the active one but reads still fail as the magic bytes and error show.
I tried again, this time pressing Ctrl key:
L D80 C3FE HFF S3F P4 O03126288 MAA55 E00
Success! The magic bytes show the BIOS was able to read the volume boot sector from partition #4, and the initial "L" shows it was in LBA mode so was able to address beyond the 1024 cylinder limit.
My next step will be to create a patch for the GRUB2 boot sector similar to the one I contributed to the syslinux project that allows the use of the Ctrl key pressed at boot to force disk 0x80 and LBA mode.
Changed in grub2 (Ubuntu): | |
status: | New → Confirmed |
status: | Confirmed → In Progress |
importance: | Undecided → High |
assignee: | nobody → TJ (intuitivenipple) |
summary: |
- Acer Travelmate C100 fails to boot: "Geom Error" + Buggy BIOS hard disk workaround missing; causes: "Geom Error" |
description: | updated |
Changed in grub2 (Ubuntu): | |
assignee: | TJ (tj) → nobody |
Changed in grub2 (Ubuntu Lucid): | |
assignee: | TJ (tj) → nobody |
This a problem in grub-setup.
grub2 ships a 'default' boot sector /boot/grub/boot.img created from boot/i386/pc/boot.S
grub-setup is supposed to modify the code, over-writing a couple of instructions with non-operations (nops =0x90) if it knows it is installing onto the first hard disk of the target:
/* If DEST_DRIVE is a hard disk, enable the workaround, which is drive_check = 0x9090;
for buggy BIOSes which don't pass boot drive correctly. Instead,
they pass 0x00 or 0x01 even when booted from 0x80. */
if (dest_dev->disk->id & 0x80)
/* Replace the jmp (2 bytes) with double nop's. */
*boot_
The result of this is that the two bytes at offset 0x66 (decimal 102) in the sector written to the hard disk should be nops to replace the jmp instruction at 0x66:
00000065 FA cli
00000066 EB07 jmp short 0x6f
00000068 F6C280 test dl,0x80
0000006B 7502 jnz 0x6f
0000006D B280 mov dl,0x80
I manually wrote the nops to the boot sector and grub2 started correctly. I'll now figure out why grub-setup is not doing this over-write itself.
As a temporary workaround for this issue you can fix this by:
1. Boot from a LiveCD image from CD or network (via PXE).
2. Open a terminal (there are two ways)
a. press Ctrl+Alt+F1 *twice* to get to virtual console #1
b. Applications > Accessories > Terminal
3. Create a file containing the nops:
echo -e -n "\0220\0220" >/tmp/nop.bin
4. Write the nops into the boot sector (replace /dev/sda if necessary with the boot device name on *your* system):
sudo dd if=/tmp/nop.bin of=/dev/sda bs=2 count=1 seek=102
5. Restart and test.