[Dell PowerEdge R810] System fails to install, "debootstrap program exited with an error (return value 1)" and hard disk "changes" its device

Bug #777441 reported by Daniel Manrique
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High
Unassigned

Bug Description

[Dell PowerEdge R810] System fails to install, "debootstrap program exited with an error (return value 1)" and hard disk "changes" its device

System: PowerEdge R810
Ubuntu version: Natty (11.04) amd64 server (image dated 20110426).

I'm doing a network install on this system using Ubuntu Server 11.04. The debian-installer starts and reads the preseed file, starts working, and eventually the screen goes red and a message appears:

Base system installation error
The debootstrap program exited with an error (return value 1).

Check /var/log/syslog or see virtual console 4 for the details.

The interesting part of the log at the end is like this:

May 4 20:22:25 debconf: --> PROGRESS INFO base-installer/debootstrap/info/extracting
May 4 20:22:25 debconf: <-- 0 OK
May 4 20:22:28 kernel: [ 78.646855] mpt2sas0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
(... repeat about 100 times ...)
May 4 20:22:29 kernel: [ 79.274759] mpt2sas0: log_info(0x31120500): originator(PL), code(0x12), sub_code(0x0500)
May 4 20:22:31 kernel: [ 81.271637] sd 2:0:0:0: [sda] Unhandled error code
May 4 20:22:31 kernel: [ 81.271640] sd 2:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 4 20:22:31 kernel: [ 81.271645] sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 07 84 0d 90 00 00 08 00
May 4 20:22:31 kernel: [ 81.271655] end_request: I/O error, dev sda, sector 126094736
(... repeat about 100 times ...)
May 4 20:22:31 kernel: [ 81.271665] sd 2:0:0:0: [sda] Unhandled error code
May 4 20:22:31 kernel: [ 81.271667] sd 2:0:0:0: [sda] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
May 4 20:22:31 kernel: [ 81.271670] sd 2:0:0:0: [sda] CDB: Write(10): 2a 00 07 84 0d 88 00 00 08 00
May 4 20:22:31 kernel: [ 81.275098] end_request: I/O error, dev sda, sector 126097008
May 4 20:22:31 kernel: [ 81.276161] Aborting journal on device sda1-8.
May 4 20:22:31 kernel: [ 81.276184] JBD2: I/O error detected when updating journal superblock for sda1-8.
May 4 20:22:31 kernel: [ 81.276226] EXT4-fs error (device sda1) in ext4_init_inode_table:1325: Journal has aborted
May 4 20:22:31 kernel: [ 81.284184] EXT4-fs (sda1): I/O error while writing superblock
May 4 20:22:31 kernel: [ 81.284188] EXT4-fs (sda1): Remounting filesystem read-only
May 4 20:22:31 kernel: [ 81.284193] EXT4-fs error (device sda1): ext4_journal_start_sb:260: Detected aborted journal
May 4 20:22:31 kernel: [ 81.284200] EXT4-fs (sda1): ext4_da_writepages: jbd2_start: 8192 pages, ino 4194309; err -30
May 4 20:22:31 kernel: [ 81.284221] EXT4-fs error (device sda1): ext4_journal_start_sb:260: Detected aborted journal
May 4 20:22:31 debconf: --> SUBST base-installer/debootstrap/error-exitcode EXITCODE 1
May 4 20:22:31 debconf: Adding [EXITCODE] -> [1]
May 4 20:22:31 debconf: <-- 0
May 4 20:22:31 debconf: --> INPUT critical base-installer/debootstrap/error-exitcode
May 4 20:22:31 debconf: <-- 0 question will be asked
May 4 20:22:31 debconf: --> GO
May 4 20:22:31 kernel: [ 81.364439] mpt2sas0: removing handle(0x000a), sas_addr(0x50014ee3555cd58e)
May 4 20:22:34 kernel: [ 84.589864] scsi 2:0:1:0: Direct-Access WD WD1460BKFG-18P2V D1E4 PQ: 0 ANSI: 6
May 4 20:22:34 kernel: [ 84.589875] scsi 2:0:1:0: SSP: handle(0x000a), sas_addr(0x50014ee3555cd58e), phy(7), device_name(0x0000000000000000)
May 4 20:22:34 kernel: [ 84.589881] scsi 2:0:1:0: SSP: enclosure_logical_id(0x5782bcb0197ba300), slot(0)
May 4 20:22:34 kernel: [ 84.589888] scsi 2:0:1:0: qdepth(254), tagged(1), simple(0), ordered(0), scsi_level(7), cmd_que(1)
May 4 20:22:34 kernel: [ 84.590782] sd 2:0:1:0: Attached scsi generic sg1 type 0
May 4 20:22:34 kernel: [ 84.592030] sd 2:0:1:0: [sdb] 286749480 512-byte logical blocks: (146 GB/136 GiB)
May 4 20:22:34 kernel: [ 84.594559] sd 2:0:1:0: [sdb] Write Protect is off
May 4 20:22:34 kernel: [ 84.594566] sd 2:0:1:0: [sdb] Mode Sense: 9f 00 10 08
May 4 20:22:34 kernel: [ 84.595757] sd 2:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPO and FUA
May 4 20:22:34 kernel: [ 84.656609] sdb: sdb1 sdb2 < sdb5 >
May 4 20:22:34 kernel: [ 84.669010] sd 2:0:1:0: [sdb] Attached SCSI disk

It looks like the hard disk disappears from under the installer's feet, and then reappears, in /dev/sdb instead of /dev/sda. So the installer obviously returns an error and stops.

I tried mounting /dev/sdb1 in /target and continuing the installation, but after a short while the same thing happens, and the disk "moves" to /dev/sdc.

This problem did/does not happen with Lucid or Maverick, so it looks like a regression for this system.

I'm attaching partman, syslog and lspci.

Revision history for this message
Daniel Manrique (roadmr) wrote :
Revision history for this message
Daniel Manrique (roadmr) wrote :
Revision history for this message
Daniel Manrique (roadmr) wrote :
Revision history for this message
Brendan Donegan (brendan-donegan) wrote :

Just wanted to note that I saw this too earlier, but didn't get a chance to file a bug before I went away.

Changed in hw-labs:
status: New → Confirmed
importance: Undecided → High
Daniel Manrique (roadmr)
description: updated
Daniel Manrique (roadmr)
affects: hw-labs → blitzortung-tracker
affects: blitzortung-tracker → debian-installer
Daniel Manrique (roadmr)
tags: added: blocks-hwcert regression
Revision history for this message
Colin Watson (cjwatson) wrote :

Looks like a kernel bug, not an installer bug. The installer is just unpacking files here - it's not doing anything special that might reasonably trigger this sort of behaviour.

Revision history for this message
Daniel Manrique (roadmr) wrote :

OK, so I was able to trigger this behavior without involving the installer (Colin, as usual, you're right!).

1- Started a network installation on this Poweredge R810 server (I'm unable to install from physical media as server is in a datacenter, but I imagine behavior should be the same).
2- Once debian-installer fails (red screen), I switched to a virtual console.
3- I create a new mountpoint and mount /dev/sdb1 there (formerly /dev/sda1):
    mkdir /charget; mount /dev/sdb1 /charget
4- I start writing an arbitrary file to the directory:
   dd if=/dev/zero of=/charget/hugefile

Eventually a kernel oops message scrolls by and /dev/sdb1 has disappeared, the whole disk has moved to /dev/sdc.

I'm attaching a new syslog with these events:

a) Boot and installation procedure up to debian-installer failure due to hard disk disappearing (around time mark 93.3249112).
b) Mounting disk under /charget (step 3 above, time mark 194.497300)
c) While writing the file, the system goes bananas (time mark 223.784648)
d) kernel bug trace (time mark 226.922763)

As per Colin's assessment, I'm also moving this bug to linux kernel.

affects: debian-installer → devmapper
affects: devmapper → linux
Revision history for this message
Alex Efros (powerman-asdf) wrote :

Looks like I've same issue.

I'm trying to install Hardened Gentoo on new Dell PowerEdge R610.

Gentoo LiveDVD 11.0 uses 2.6.37-gentoo-r1 kernel, and looks like it's mpt2sas driver works ok - at least I never have any errors and was able to successfully unpack stage3, recompile all system; compile/install a lot of additional packages; and build several kernels.

My installed system uses 2.6.37-hardened-r7 kernel, and this issue (changing sda to sdb with a lot of i/o errors) happens shortly after booting system, so I never was able even to log in using agetty or ssh.

Revision history for this message
Alex Efros (powerman-asdf) wrote :
Download full text (3.9 KiB)

Oops, no. I've just got same error on LiveDVD:

[ 691.420011] EXT3-fs: barriers not enabled
[ 691.433251] kjournald starting. Commit interval 5 seconds
[ 691.442276] EXT3-fs (sda4): using internal journal
[ 691.442281] EXT3-fs (sda4): mounted filesystem with writeback data mode
[ 695.216486] EXT3-fs: barriers not enabled
[ 695.221870] kjournald starting. Commit interval 5 seconds
[ 695.226653] EXT3-fs (sda2): using internal journal
[ 695.226657] EXT3-fs (sda2): recovery complete
[ 695.226661] EXT3-fs (sda2): mounted filesystem with writeback data mode
[ 779.731537] mpt2sas0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
[ 779.731542] mpt2sas0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
[ 779.731557] mpt2sas0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
[ 779.731569] mpt2sas0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)

[ 779.732926] mpt2sas0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
[ 779.732928] mpt2sas0: log_info(0x3112011a): originator(PL), code(0x12), sub_code(0x011a)
[ 779.739049] mpt2sas0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
[ 779.740051] mpt2sas0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)

[ 780.354138] mpt2sas0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
[ 780.357887] mpt2sas0: log_info(0x31120100): originator(PL), code(0x12), sub_code(0x0100)
0: [sda] Unhandled error code
[ 781.850891] sd 2:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
[ 781.850893] sd 2:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 08 c8 6e cd 00 00 08 00
[ 781.850897] end_request: I/O error, dev sda, sector 147353293
[ 781.850899] sd 2:0:0:0: [sda] Unhandled error code
[ 781.850900] sd 2:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
[ 781.850902] sd 2:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 08 c8 6e d5 00 00 08 00
[ 781.850906] end_request: I/O error, dev sda, sector 147353301
[ 781.850908] sd 2:0:0:0: [sda] Unhandled error code
[ 781.850909] sd 2:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
[ 781.850911] sd 2:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 08 c8 6e dd 00 00 08 00
[ 781.850915] end_request: I/O error, dev sda, sector 147353309

[ 781.851865] sd 2:0:0:0: [sda] Unhandled error code
[ 781.851866] sd 2:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
[ 781.851867] sd 2:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 08 c8 6d 5d 00 00 08 00
[ 781.851871] end_request: I/O error, dev sda, sector 147352925
[ 781.851873] sd 2:0:0:0: [sda] Unhandled error code
[ 781.851874] sd 2:0:0:0: [sda] Result: hostbyte=0x01 driverbyte=0x00
[ 781.851876] sd 2:0:0:0: [sda] CDB: cdb[0]=0x2a: 2a 00 08 c8 6d 65 00 00 08 00
[ 781.851880] end_request: I/O error, dev sda, sector 147352933
[ 781.851905] Aborting journal on device sda4.
[ 781.851911] JBD: I/O error detected when updating journal superblock for sda4.
[ 781.863543] mpt2sas0: removing handle(0x000a), sas_addr(0x50014ee30008d44a)
[ 785.342047] scsi 2:0:1:0: Direct-Access WD WD1460BKFG-18P2V D1E4 PQ: 0 ANSI: 6
[ 785.342056] scsi ...

Read more...

Revision history for this message
Alex Efros (powerman-asdf) wrote :

I've tried several different driver versions:

* Gentoo
2.6.32-hardened-r42: mpt2sas 02.100.03.00
2.6.34-hardened-r6: mpt2sas 04.100.01.00
2.6.36-hardened-r6: mpt2sas 06.100.00.00
2.6.37-hardened-r7: mpt2sas 06.100.00.00
2.6.38-hardened-r4: mpt2sas 07.100.00.00
* RHEL?
2.6.33.9-rt31.64.el5rt: mpt2sas 03.100.03.00
* RHEL6
2.6.32-71.29: mpt2sas 05.100.00.02
* Dell website
unknown: mpt2sas 07.00.01.00

Most stable is driver in .34 kernel, but it also have this issue which is triggered by unknown reason at random time.

RHEL and Dell's drivers failed to load in my gentoo kernel because of "kernel tried to execute NX-protected page - exploit attempt? (uid: 0, task: swapper, pid: 1)" error.

I'm going to ask Dell to either replace this H200 card with another H200 (it may be just broken) or with some other card which doesn't use mpt2sas driver at all (too many chaos in driver versions and incompatibilities).

Revision history for this message
Alex Efros (powerman-asdf) wrote :

We've replaced PERC H200 with PERL 6/i, which uses different driver (megaraid_sas), and looks like this issue solved now. So, I still doesn't know is it was buggy driver or hardware or firmware.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.