arm64: Bus error on startx

Bug #1271649 reported by Tom Gall
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linaro Linux
New
Undecided
Unassigned
Linaro Linux Baseline
New
Undecided
Unassigned

Bug Description

In GWG-land working on getting armv8 framebuffer working again on the model.

We've noticed that in the device tree dma needs to be turned on for clc.

Trying to startx currently causes the following:

 [ 1540.493404] Unhandled fault: alignment fault (0x92000061) at 0x0000007fae2027ec
[ 1541.147941] xfce4-panel[1475]: unhandled level 2 translation fault (11) at 0x000000e4, esr 0x92000006
[ 1541.147953] pgd = ffffffc078840000
[ 1541.147983] [000000e4] *pgd=00000000f8ae3003, *pmd=0000000000000000
[ 1541.147987]
[ 1541.148019] CPU: 4 PID: 1475 Comm: xfce4-panel Not tainted 3.13.0+ #1
[ 1541.148047] task: ffffffc079042140 ti: ffffffc078a5c000 task.ti: ffffffc078a5c000
[ 1541.148063] PC is at 0x7fb2c57d8c
[ 1541.148077] LR is at 0x4142a4
[ 1541.148101] pc : [<0000007fb2c57d8c>] lr : [<00000000004142a4>] pstate: 80000000
[ 1541.148112] sp : 0000007fc5b7ab80
[ 1541.148144] x29: 0000007fc5b7ab80 x28: 0000000000000000
[ 1541.148175] x27: 0000000000000000 x26: 0000000000000000
[ 1541.148208] x25: 0000000038021c70 x24: 0000000000454758
[ 1541.148240] x23: 0000000038049400 x22: 0000000000431000
[ 1541.148273] x21: 000000003803b9c0 x20: 0000000038021f50
[ 1541.148306] x19: 0000000000000000 x18: 0000007fc5b7a920
[ 1541.148339] x17: 0000000000453b10 x16: 0000007fb2c57d8c
[ 1541.148373] x15: 0000007fb2bf9598 x14: 003b6bc138000000
[ 1541.148404] x13: 0000000000000002 x12: 0000000000000004
[ 1541.148437] x11: 0000000000000030 x10: 0101010101010101
[ 1541.148469] x9 : 0000007fc5b7a520 x8 : 0000000000000039
[ 1541.148499] x7 : 0000000000000001 x6 : 0000000000000001
[ 1541.148531] x5 : 00000000000003a0 x4 : 0000007fb2bf7cf0
[ 1541.148563] x3 : 000000003804e670 x2 : 0000007fb2bf75b0
[ 1541.148593] x1 : 0000000000000000 x0 : 0000000000000000

Revision history for this message
Tom Gall (tom-gall) wrote :

Some more data, captured via gdb attached to X. Note the 0x007f8bb85bdc address is within the /dev/fb0 address range.

[14252.047301] Unhandled fault: alignment fault (0x92000061) at 0x0000007f8bb85bdc
(gdb) ^CQuit
(gdb) shell cat /proc/2607/maps
00400000-00590000 r-xp 00000000 fe:02 10942 /usr/bin/Xorg
00590000-0059e000 rwxp 00190000 fe:02 10942 /usr/bin/Xorg
0059e000-005ac000 rwxp 00000000 00:00 0
0f451000-0f5e0000 rwxp 00000000 00:00 0 [heap]
7f8aeef000-7f8aef3000 r-xp 00000000 fe:02 9250 /usr/lib/libmtdev.so.1.0.0
7f8aef3000-7f8af02000 ---p 00004000 fe:02 9250 /usr/lib/libmtdev.so.1.0.0
7f8af02000-7f8af03000 rwxp 00003000 fe:02 9250 /usr/lib/libmtdev.so.1.0.0
7f8af03000-7f8af0e000 r-xp 00000000 fe:02 9588 /usr/lib/xorg/modules/input/evdev_drv.so
7f8af0e000-7f8af1e000 ---p 0000b000 fe:02 9588 /usr/lib/xorg/modules/input/evdev_drv.so
7f8af1e000-7f8af1f000 rwxp 0000b000 fe:02 9588 /usr/lib/xorg/modules/input/evdev_drv.so
7f8af1f000-7f8bb6c000 rwxp 00000000 00:00 0
7f8bb6c000-7f8bcec000 rwxs 00000000 00:05 1145 /dev/fb0

Revision history for this message
Tom Gall (tom-gall) wrote :

(There are multiple runs in the data below. /dev/fb0 is not mmaped to the same address for the process so you'll see variability that is there. The important thing to keep in mind is that the mmapped address range is valid and that the memcpys into that range have a destination that is in that mapped range)

I've created a simple little program to open up /dev/fb0 and write out a test pattern to the whole 1024x768 at 16bpp. This works.

I've instrumented X and loaded on a debug build.

The failure via gdb looks like :

#0 memcpy () at ../ports/sysdeps/aarch64/memcpy.S:75
#1 0x0000007f7f9c2018 in shadowUpdatePacked (pScreen=0x3f4269b0,
    pBuf=0x3f428020) at shpacked.c:103
#2 0x0000007f7f9c18d4 in shadowRedisplay (pScreen=0x3f4269b0) at shadow.c:62
#3 0x000000000043a964 in BlockHandler (pTimeout=pTimeout@entry=0x7fdcbf8020,
    pReadmask=0x5ab9f8 <LastSelectMask>) at dixutils.c:394
#4 0x000000000055760c in WaitForSomething (
    pClientsReady=pClientsReady@entry=0x3f579c80) at WaitFor.c:210
#5 0x0000000000436724 in Dispatch () at dispatch.c:361
#6 0x00000000004266dc in main (argc=2, argv=0x7fdcbf82b8,
    envp=<optimized out>) at main.c:298
(gdb) frame 1
#1 0x0000007f7f9c2018 in shadowUpdatePacked (pScreen=0x3f4269b0,
    pBuf=0x3f428020) at shpacked.c:103
103 memcpy(win, sha, i * sizeof(FbBits));
(gdb)
#1 0x0000007f7f9c2018 in shadowUpdatePacked (pScreen=0x3f4269b0,
    pBuf=0x3f428020) at shpacked.c:103
103 memcpy(win, sha, i * sizeof(FbBits));
(gdb) p win
$1 = (FbBits *) 0x7f7f8ffbfc
(gdb) p sha
$2 = (FbBits *) 0x7f7ecfec0c
(gdb) p size
No symbol "size" in current context.
(gdb) p i
$3 = <optimized out>

It's important to note that the kernel spits out this when the error occurs :

[ 1410.063428] Unhandled fault: alignment fault (0x92000061) at 0x0000007f9ccd1bfc

The first 32 bit value I'm pretty sure is the ESR. So
EC == 0b100100 aka data alignment or access fault
IL = 1 or 32 bit instruction
ISS breaks down to
   1 COND isn't valid
   buncha reserved bits
   1 = WFI / WFE trapped

Now about that memcpy : this code in X, specifically shadowUpdatePacked is called a fair amount. It's done a bunch of writing into the mmaped range for /dev/fb0 already.

The failing memcpy is using the following :

memcpy 7f9ccd1bfc win 7f9c0d0c0c sha 00000018 size 6 i 4 FbBits

yet moments ago there was a good long loop... here's a snipped from the output:

win & sha (source and destination address)
size is the amount copied in hex bytes.
i and FbBits I am dumping out because the size was a calculation of those two. (I was curious what FbBits was set to) Anyway the size and addresses are all multiples of 4.

memcpy 7f9cd8e800 win 7f9c18d810 sha 00000800 size 200 i 4 FbBits
memcpy 7f9cd8f000 win 7f9c18e010 sha 00000800 size 200 i 4 FbBits
memcpy 7f9cd8f800 win 7f9c18e810 sha 00000800 size 200 i 4 FbBits
memcpy 7f9cd90000 win 7f9c18f010 sha 00000800 size 200 i 4 FbBits
memcpy 7f9cd90800 win 7f9c18f810 sha 00000800 size 200 i 4 FbBits
memcpy 7f9cd91000 win 7f9c190010 sha 00000800 size 200 i 4 FbBits
memcpy 7f9cd91800 win 7f9c190810 sha 00000800 size 200 i 4 FbBits

Revision history for this message
Tom Gall (tom-gall) wrote :

I seem to have this fixed and have been able to run both normal boring X as well as xfce4. Taking out the memcpy seems to do the trick and doing the copying of memory in a loop "by hand". It's obviously a step back in speed but OTOH no more SIGBUS' due to alignment rules being broken.

I managed to fix the long mis-rendering bug as well.

Revision history for this message
Sumit Semwal (sumit-semwal) wrote : Re: [Bug 1271649] Re: arm64: Bus error on startx

 ~ sent from my mobile; kindly excuse typos and brevity
On Feb 5, 2014 3:45 AM, "Tom Gall" <email address hidden> wrote:

> I seem to have this fixed and have been able to run both normal boring X
> as well as xfce4. Taking out the memcpy seems to do the trick and doing
> the copying of memory in a loop "by hand". It's obviously a step back in
> speed but OTOH no more SIGBUS' due to alignment rules being broken.
>
> I managed to fix the long mis-rendering bug as well.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1271649
>
> Title:
> arm64: Bus error on startx
>
> Status in Linaro Linux Baseline:
> New
> Status in Linaro Linux:
> New
>
> Bug description:
> In GWG-land working on getting armv8 framebuffer working again on the
> model.
>
> We've noticed that in the device tree dma needs to be turned on for
> clc.
>
> Trying to startx currently causes the following:
>
> [ 1540.493404] Unhandled fault: alignment fault (0x92000061) at
> 0x0000007fae2027ec
> [ 1541.147941] xfce4-panel[1475]: unhandled level 2 translation fault
> (11) at 0x000000e4, esr 0x92000006
> [ 1541.147953] pgd = ffffffc078840000
> [ 1541.147983] [000000e4] *pgd=00000000f8ae3003, *pmd=0000000000000000
> [ 1541.147987]
> [ 1541.148019] CPU: 4 PID: 1475 Comm: xfce4-panel Not tainted 3.13.0+ #1
> [ 1541.148047] task: ffffffc079042140 ti: ffffffc078a5c000 task.ti:
> ffffffc078a5c000
> [ 1541.148063] PC is at 0x7fb2c57d8c
> [ 1541.148077] LR is at 0x4142a4
> [ 1541.148101] pc : [<0000007fb2c57d8c>] lr : [<00000000004142a4>]
> pstate: 80000000
> [ 1541.148112] sp : 0000007fc5b7ab80
> [ 1541.148144] x29: 0000007fc5b7ab80 x28: 0000000000000000
> [ 1541.148175] x27: 0000000000000000 x26: 0000000000000000
> [ 1541.148208] x25: 0000000038021c70 x24: 0000000000454758
> [ 1541.148240] x23: 0000000038049400 x22: 0000000000431000
> [ 1541.148273] x21: 000000003803b9c0 x20: 0000000038021f50
> [ 1541.148306] x19: 0000000000000000 x18: 0000007fc5b7a920
> [ 1541.148339] x17: 0000000000453b10 x16: 0000007fb2c57d8c
> [ 1541.148373] x15: 0000007fb2bf9598 x14: 003b6bc138000000
> [ 1541.148404] x13: 0000000000000002 x12: 0000000000000004
> [ 1541.148437] x11: 0000000000000030 x10: 0101010101010101
> [ 1541.148469] x9 : 0000007fc5b7a520 x8 : 0000000000000039
> [ 1541.148499] x7 : 0000000000000001 x6 : 0000000000000001
> [ 1541.148531] x5 : 00000000000003a0 x4 : 0000007fb2bf7cf0
> [ 1541.148563] x3 : 000000003804e670 x2 : 0000007fb2bf75b0
> [ 1541.148593] x1 : 0000000000000000 x0 : 0000000000000000
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/linaro-linux-baseline/+bug/1271649/+subscriptions
>

Revision history for this message
Sumit Semwal (sumit-semwal) wrote :

~ sent from my mobile; kindly excuse typos and brevity
On Feb 8, 2014 4:29 PM, <email address hidden> wrote:

> ~ sent from my mobile; kindly excuse typos and brevity
> On Feb 5, 2014 3:45 AM, "Tom Gall" <email address hidden> wrote:
>
>> I seem to have this fixed and have been able to run both normal boring X
>> as well as xfce4. Taking out the memcpy seems to do the trick and doing
>> the copying of memory in a loop "by hand". It's obviously a step back in
>> speed but OTOH no more SIGBUS' due to alignment rules being broken.
>>
>> I managed to fix the long mis-rendering bug as well.
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1271649
>>
>> Title:
>> arm64: Bus error on startx
>>
>> Status in Linaro Linux Baseline:
>> New
>> Status in Linaro Linux:
>> New
>>
>> Bug description:
>> In GWG-land working on getting armv8 framebuffer working again on the
>> model.
>>
>> We've noticed that in the device tree dma needs to be turned on for
>> clc.
>>
>> Trying to startx currently causes the following:
>>
>> [ 1540.493404] Unhandled fault: alignment fault (0x92000061) at
>> 0x0000007fae2027ec
>> [ 1541.147941] xfce4-panel[1475]: unhandled level 2 translation fault
>> (11) at 0x000000e4, esr 0x92000006
>> [ 1541.147953] pgd = ffffffc078840000
>> [ 1541.147983] [000000e4] *pgd=00000000f8ae3003, *pmd=0000000000000000
>> [ 1541.147987]
>> [ 1541.148019] CPU: 4 PID: 1475 Comm: xfce4-panel Not tainted 3.13.0+ #1
>> [ 1541.148047] task: ffffffc079042140 ti: ffffffc078a5c000 task.ti:
>> ffffffc078a5c000
>> [ 1541.148063] PC is at 0x7fb2c57d8c
>> [ 1541.148077] LR is at 0x4142a4
>> [ 1541.148101] pc : [<0000007fb2c57d8c>] lr : [<00000000004142a4>]
>> pstate: 80000000
>> [ 1541.148112] sp : 0000007fc5b7ab80
>> [ 1541.148144] x29: 0000007fc5b7ab80 x28: 0000000000000000
>> [ 1541.148175] x27: 0000000000000000 x26: 0000000000000000
>> [ 1541.148208] x25: 0000000038021c70 x24: 0000000000454758
>> [ 1541.148240] x23: 0000000038049400 x22: 0000000000431000
>> [ 1541.148273] x21: 000000003803b9c0 x20: 0000000038021f50
>> [ 1541.148306] x19: 0000000000000000 x18: 0000007fc5b7a920
>> [ 1541.148339] x17: 0000000000453b10 x16: 0000007fb2c57d8c
>> [ 1541.148373] x15: 0000007fb2bf9598 x14: 003b6bc138000000
>> [ 1541.148404] x13: 0000000000000002 x12: 0000000000000004
>> [ 1541.148437] x11: 0000000000000030 x10: 0101010101010101
>> [ 1541.148469] x9 : 0000007fc5b7a520 x8 : 0000000000000039
>> [ 1541.148499] x7 : 0000000000000001 x6 : 0000000000000001
>> [ 1541.148531] x5 : 00000000000003a0 x4 : 0000007fb2bf7cf0
>> [ 1541.148563] x3 : 000000003804e670 x2 : 0000007fb2bf75b0
>> [ 1541.148593] x1 : 0000000000000000 x0 : 0000000000000000
>>
>> To manage notifications about this bug go to:
>>
>> https://bugs.launchpad.net/linaro-linux-baseline/+bug/1271649/+subscriptions
>>
>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.