Comment 49 for bug 1598312

Revision history for this message
In , swelef (swelef-linux-kernel-bugs) wrote :

Using WARN_ON(.) I have determined that during initialization
  - ioapic entries are read from
    [<ffffffff810524e8>] __ioapic_read_entry+0x88/0xa0
    [<ffffffff81052532>] ioapic_read_entry+0x32/0x60
    [<ffffffff81d67f71>] enable_IO_APIC+0x63/0x10b
    [<ffffffff81d67001>] apic_bsp_setup+0x89/0xae
    [<ffffffff81d64cc4>] native_smp_prepare_cpus+0x2cf/0x30d
    [<ffffffff81d511a6>] kernel_init_freeable+0xa5/0x1ef
      kernel_init_freeable() init/main.c:995
      {inlined} do_pre_smp_initcalls() init/main.c:888
          do_one_initcall(*fn);
  - mp_irqdomain_activate() is called for apic=0, pin=0 (timer irq) from
    [<ffffffff81054aec>] mp_irqdomain_activate+0x9c/0xb0
    [<ffffffff810da501>] irq_domain_activate_irq+0x41/0x50
    [<ffffffff81d6887c>] setup_IO_APIC+0x687/0x830
    [<ffffffff81052c6d>] ? clear_IO_APIC+0x4d/0x70
    [<ffffffff81d6701a>] apic_bsp_setup+0xa2/0xae
    [<ffffffff81d64cc4>] native_smp_prepare_cpus+0x2cf/0x30d
    [<ffffffff81d511a6>] kernel_init_freeable+0xa5/0x1ef
      kernel_init_freeable() init/main.c:995
      {inlined} do_pre_smp_initcalls() init/main.c:888
          do_one_initcall(*fn);
  - ioapic_set_affinity() is called from
    [<ffffffff8105284c>] ioapic_set_affinity+0xdc/0xf0
    [<ffffffff81d68afd>] setup_ioapic_dest+0xd8/0xf6
    [<ffffffff81d64e39>] native_smp_cpus_done+0x10a/0x117
    [<ffffffff81d7633c>] smp_init+0x69/0x88
    [<ffffffff81d511dc>] kernel_init_freeable+0xdb/0x1ef
      kernel_init_freeable() init/main.c:999
          sched_init_smp();
  - the next mp_irqdomain_activate() is for apic=0, pin=21 and called from
    [<ffffffff81054aec>] mp_irqdomain_activate+0x9c/0xb0
    [<ffffffff810da501>] irq_domain_activate_irq+0x41/0x50
    [<ffffffff810d6f4c>] irq_startup+0x2c/0x80
    [<ffffffff810d5801>] __setup_irq+0x511/0x5a0
    [<ffffffff811dbbd2>] ? kmem_cache_alloc_trace+0x1e2/0x220
    [<ffffffff810d59cd>] ? request_threaded_irq+0xad/0x1b0
    [<ffffffff814409e4>] ? acpi_osi_handler+0xa9/0xa9
    [<ffffffff814409e4>] ? acpi_osi_handler+0xa9/0xa9
    [<ffffffff810d5a17>] request_threaded_irq+0xf7/0x1b0
    [<ffffffff814575be>] ? acpi_ev_sci_dispatch+0x65/0x65
    [<ffffffff81d99f47>] ? acpi_sleep_proc_init+0x2a/0x2a
    [<ffffffff81440ef5>] acpi_os_install_interrupt_handler+0x92/0xc6
    [<ffffffff8145762f>] acpi_ev_install_sci_handler+0x23/0x25
    [<ffffffff81454f72>] acpi_ev_install_xrupt_handlers+0x1c/0x6e
    [<ffffffff81d9b4ba>] acpi_enable_subsystem+0x8b/0x90
    [<ffffffff81d99fbc>] acpi_init+0x75/0x267
    [<ffffffff81d99f47>] ? acpi_sleep_proc_init+0x2a/0x2a
    [<ffffffff81002144>] do_one_initcall+0xd4/0x210
    [<ffffffff81098400>] ? parse_args+0x190/0x480
    [<ffffffff810bbe28>] ? __wake_up+0x48/0x60
    [<ffffffff81d51263>] kernel_init_freeable+0x162/0x1ef
      kernel_init_freeable() init/main.c:1001
      {inlined} do_basic_setup() init/main.c:880
      {inlined} do_initcalls() init/main.c:861
      {inlined} do_initcall_level(.) init/main.c:853
          do_one_initcall(*fn);

The timer irq (pin=0) seems to be special (legacy irq?) and its mp_irqdomain_activate() is called early through setup_IO_APIC() and check_timer(). However, the rest of the mp_irqdomain_activate() calls come later through acpi_init().

What does stand out is that ioapic_set_affinity() for all other irqs is called before acpi_init() and therefore it's likely that something is not yet initialized, so actually activating these irqs at that point is premature. And the combination of commits
    0be275e3a5607b23f5132121bca22a10ee23aa99
    4857c91f0d195f05908fff296ba1ec5fca87066c
actually causes activation of those irqs through __ioapic_write_entry() with eu.entry.mask=0 as already shown by my previous logs.

What I still do not understand is why do those "cached" entries have mask=0. The commit message

    commit 0be275e3a5607b23f5132121bca22a10ee23aa99
    Author: Jiang Liu <email address hidden>
    Date: Tue Apr 14 10:29:57 2015 +0800

        x86/irq: Use cached IOAPIC entry instead of reading from hardware

        Use cached IOAPIC entry instead of reading data from IOAPIC hardware
        registers to improve performance.
        [...]

seems to imply that it's simply a performance optimization and it should not affect what we write to the IOAPIC entry. That's in direct conflict with the observable fact that without this commit the RMW operation would preserve mask=1 but the "cached" entry we write with this commit has mask=0.

I would recommend reverting 0be275e3a5607b23f5132121bca22a10ee23aa99 until this discrepancy is explained and rectified. Unfortunately, that revert has a conflict but reverting

    commit 1f934641294ca2e09016c689862378fbb15da4d4
    Author: Thomas Gleixner <email address hidden>
    Date: Tue Apr 14 10:29:58 2015 +0800

        x86/irq: Remove sis apic bug workaround

        The SiS apic bug workaround is now obsolete as we cache the register
        values for performance reasons.
        [...]

first would solve that.