nvme controller is down will reset (regression in zesty on XPS laptop)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-signed (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I've just upgraded a Dell XPS 15" (9550, early 2016 model) with a Samsung NVME drive. Machine was stable under Kubuntu 16.10 with the same drive. After the upgrade to Zesty I've now seen 3 hard lockups (machine loses root fs) with the following message printed:
nvme controller is down will reset
there are also messages printed to the virtual console reporting failure to write to the underlying disk from the home-directory encfs.
Linux tass 4.10.0-19-generic #21-Ubuntu SMP Thu Apr 6 17:04:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 17.04 (Kubuntu)
dmesg about nvme:
[ 1.748864] nvme nvme0: pci function 0000:04:00.0
[ 1.864553] nvme0n1: p1 p2 p3 p4 p5 p6
[ 2.961181] EXT4-fs (nvme0n1p6): mounted filesystem with ordered data mode. Opts: (null)
[ 4.172546] EXT4-fs (nvme0n1p6): re-mounted. Opts: errors=remount-ro
NVME cli shows 57 errors in the error-log, all seeming to be invalid field or invalid namespace. Not sure if that's since boot or since machine creation.
Smartctrl shows...
smartctl 6.6 2016-05-31 r4324 [x86_64-
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontoo
=== START OF INFORMATION SECTION ===
Model Number: PM951 NVMe SAMSUNG 512GB
Serial Number: S29PNXAH142328
Firmware Version: BXV77D0Q
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Controller ID: 1
Number of Namespaces: 1
Namespace 1 Size/Capacity: 512,110,190,592 [512 GB]
Namespace 1 Utilization: 365,503,283,200 [365 GB]
Namespace 1 Formatted LBA Size: 512
Local Time is: Thu Apr 13 23:21:32 2017 EDT
Firmware Updates (0x06): 3 Slots
Optional Admin Commands (0x0017): Security Format Frmw_DL *Other*
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Maximum Data Transfer Size: 32 Pages
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.00W - - 0 0 0 0 5 5
1 + 4.20W - - 1 1 1 1 30 30
2 + 3.10W - - 2 2 2 2 100 100
3 - 0.0700W - - 3 3 3 3 500 5000
4 - 0.0050W - - 4 4 4 4 2000 22000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 35 Celsius
Available Spare: 100%
Available Spare Threshold: 50%
Percentage Used: 0%
Data Units Read: 2,724,346 [1.39 TB]
Data Units Written: 6,568,756 [3.36 TB]
Host Read Commands: 52,921,997
Host Write Commands: 157,530,880
Controller Busy Time: 1,349
Power Cycles: 831
Power On Hours: 5,358
Unsafe Shutdowns: 46
Media and Data Integrity Errors: 0
Error Information Log Entries: 57
Error Information (NVMe Log 0x01, max 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 57 0 0x0004 0x4016 0x000 0 1 -
1 56 0 0x0004 0x4016 0x000 0 1 -
2 55 0 0x0004 0x4016 0x000 0 1 -
3 54 0 0x0004 0x4016 0x000 0 1 -
4 53 0 0x0004 0x4016 0x000 0 1 -
5 52 0 0x0004 0x4016 0x000 0 1 -
6 51 0 0x0004 0x4016 0x000 0 1 -
7 50 0 0x0004 0x4016 0x000 0 1 -
8 49 0 0x001f 0x4004 0x000 0 0 -
9 48 0 0x001e 0x4004 0x000 0 0 -
10 47 0 0x001f 0x4004 0x000 0 0 -
11 46 0 0x001e 0x4004 0x000 0 0 -
12 45 0 0x001f 0x4004 0x000 0 0 -
13 44 0 0x001e 0x4004 0x000 0 0 -
14 43 0 0x0000 0x4016 0x000 0 1 -
15 42 0 0x0004 0x4016 0x000 0 1 -
... (41 entries not shown)
Status changed to 'Confirmed' because the bug affects multiple users.