commit 4beda543e40efb4a8521e0fa4ea92461e01437c8
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Thu Jul 8 22:50:13 2021 +0000

    Linux 5.13.1-rt1-xanmod1
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit e9d81aaaa078e4addf5cc855c30b877fafb43ff7
Merge: 3d13cb77d7e0 7e175e6b5997
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Thu Jul 8 20:01:09 2021 +0000

    Merge tag 'v5.13-rt1' into 5.13
    
    v5.13-rt1       ResurrexiT!

commit 7e175e6b59975c8901ad370f7818937f68de45c1
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 8 20:25:16 2011 +0200

    Add localversion for -RT release
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 6f3f622fe87aee711020341478979d247ea605f6
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Oct 11 13:14:41 2019 +0200

    POWERPC: Allow to enable RT
    
    Allow to select RT.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 646e1ac9bc99e20cf7a5374b6d877d2796373945
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Jan 8 19:48:21 2021 +0100

    powerpc: Avoid recursive header includes
    
    - The include of bug.h leads to an include of printk.h which gets back
      to spinlock.h and complains then about missing xchg().
      Remove bug.h and add bits.h which is needed for BITS_PER_BYTE.
    
    - Avoid the "please don't include this file directly" error from
      rwlock-rt. Allow an include from/with rtmutex.h.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a1a11e9c8b5c222baeae02e027da0f1ae6669397
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Mar 26 18:31:29 2019 +0100

    powerpc/stackprotector: work around stack-guard init from atomic
    
    This is invoked from the secondary CPU in atomic context. On x86 we use
    tsc instead. On Power we XOR it against mftb() so lets use stack address
    as the initial value.
    
    Cc: stable-rt@vger.kernel.org
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit f35ef21b2f18a5367912423e4e17546db005c405
Author: Bogdan Purcareata <bogdan.purcareata@freescale.com>
Date:   Fri Apr 24 15:53:13 2015 +0000

    powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT
    
    While converting the openpic emulation code to use a raw_spinlock_t enables
    guests to run on RT, there's still a performance issue. For interrupts sent in
    directed delivery mode with a multiple CPU mask, the emulated openpic will loop
    through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop
    through all the pending interrupts for that VCPU. This is done while holding the
    raw_lock, meaning that in all this time the interrupts and preemption are
    disabled on the host Linux. A malicious user app can max both these number and
    cause a DoS.
    
    This temporary fix is sent for two reasons. First is so that users who want to
    use the in-kernel MPIC emulation are aware of the potential latencies, thus
    making sure that the hardware MPIC and their usage scenario does not involve
    interrupts sent in directed delivery mode, and the number of possible pending
    interrupts is kept small. Secondly, this should incentivize the development of a
    proper openpic emulation that would be better suited for RT.
    
    Acked-by: Scott Wood <scottwood@freescale.com>
    Signed-off-by: Bogdan Purcareata <bogdan.purcareata@freescale.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d91a27cdd893b229345d8f27075baea4c866f365
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Mar 26 18:31:54 2019 +0100

    powerpc/pseries/iommu: Use a locallock instead local_irq_save()
    
    The locallock protects the per-CPU variable tce_page. The function
    attempts to allocate memory while tce_page is protected (by disabling
    interrupts).
    
    Use local_irq_save() instead of local_irq_disable().
    
    Cc: stable-rt@vger.kernel.org
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 9d22762227ecb4ff0ba2f789d1129047fdef1c14
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Jul 26 11:30:49 2019 +0200

    powerpc: traps: Use PREEMPT_RT
    
    Add PREEMPT_RT to the backtrace if enabled.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit dd5ac564bb5e626d0f783f1862656f01f0b151a5
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Oct 11 13:14:35 2019 +0200

    ARM64: Allow to enable RT
    
    Allow to select RT.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit fb140d3f82d429d64ab0a7ef14751e3979ed3910
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Oct 11 13:14:29 2019 +0200

    ARM: Allow to enable RT
    
    Allow to select RT.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 2b7f161a51bc94e090dfb6d50e6873d4378f20e1
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Jul 25 14:02:38 2018 +0200

    arm64: fpsimd: Delay freeing memory in fpsimd_flush_thread()
    
    fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled
    section which is not working on -RT.
    
    Delay freeing of memory until preemption is enabled again.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b7f7c9df58ce626e8f2883a329d1ab4d6b3514b4
Author: Josh Cartwright <joshc@ni.com>
Date:   Thu Feb 11 11:54:01 2016 -0600

    KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable()
    
    kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating
    the vgic and timer states to prevent the calling task from migrating to
    another CPU.  It does so to prevent the task from writing to the
    incorrect per-CPU GIC distributor registers.
    
    On -rt kernels, it's possible to maintain the same guarantee with the
    use of migrate_{disable,enable}(), with the added benefit that the
    migrate-disabled region is preemptible.  Update
    kvm_arch_vcpu_ioctl_run() to do so.
    
    Cc: Christoffer Dall <christoffer.dall@linaro.org>
    Reported-by: Manish Jaggi <Manish.Jaggi@caviumnetworks.com>
    Signed-off-by: Josh Cartwright <joshc@ni.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit ae21940445d60c64c9e1f4efa2d1fa74ec913d2a
Author: Yadi.hu <yadi.hu@windriver.com>
Date:   Wed Dec 10 10:32:09 2014 +0800

    ARM: enable irq in translation/section permission fault handlers
    
    Probably happens on all ARM, with
    CONFIG_PREEMPT_RT
    CONFIG_DEBUG_ATOMIC_SLEEP
    
    This simple program....
    
    int main() {
       *((char*)0xc0001000) = 0;
    };
    
    [ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658
    [ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a
    [ 512.743217] INFO: lockdep is turned off.
    [ 512.743360] irq event stamp: 0
    [ 512.743482] hardirqs last enabled at (0): [< (null)>] (null)
    [ 512.743714] hardirqs last disabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
    [ 512.744013] softirqs last enabled at (0): [<c0426370>] copy_process+0x3b0/0x11c0
    [ 512.744303] softirqs last disabled at (0): [< (null)>] (null)
    [ 512.744631] [<c041872c>] (unwind_backtrace+0x0/0x104)
    [ 512.745001] [<c09af0c4>] (dump_stack+0x20/0x24)
    [ 512.745355] [<c0462490>] (__might_sleep+0x1dc/0x1e0)
    [ 512.745717] [<c09b6770>] (rt_spin_lock+0x34/0x6c)
    [ 512.746073] [<c0441bf0>] (do_force_sig_info+0x34/0xf0)
    [ 512.746457] [<c0442668>] (force_sig_info+0x18/0x1c)
    [ 512.746829] [<c041d880>] (__do_user_fault+0x9c/0xd8)
    [ 512.747185] [<c041d938>] (do_bad_area+0x7c/0x94)
    [ 512.747536] [<c041d990>] (do_sect_fault+0x40/0x48)
    [ 512.747898] [<c040841c>] (do_DataAbort+0x40/0xa0)
    [ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8)
    
    Oxc0000000 belongs to kernel address space, user task can not be
    allowed to access it. For above condition, correct result is that
    test case should receive a “segment fault” and exits but not stacks.
    
    the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in
    prefetch/data abort handlers"),it deletes irq enable block in Data
    abort assemble code and move them into page/breakpiont/alignment fault
    handlers instead. But author does not enable irq in translation/section
    permission fault handlers. ARM disables irq when it enters exception/
    interrupt mode, if kernel doesn't enable irq, it would be still disabled
    during translation/section permission fault.
    
    We see the above splat because do_force_sig_info is still called with
    IRQs off, and that code eventually does a:
    
            spin_lock_irqsave(&t->sighand->siglock, flags);
    
    As this is architecture independent code, and we've not seen any other
    need for other arch to have the siglock converted to raw lock, we can
    conclude that we should enable irq for ARM translation/section
    permission exception.
    
    
    Signed-off-by: Yadi.hu <yadi.hu@windriver.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 184bec33a00faa87b948d1b137e8937e9699afe2
Author: Anders Roxell <anders.roxell@linaro.org>
Date:   Thu May 14 17:52:17 2015 +0200

    arch/arm64: Add lazy preempt support
    
    arm64 is missing support for PREEMPT_RT. The main feature which is
    lacking is support for lazy preemption. The arch-specific entry code,
    thread information structure definitions, and associated data tables
    have to be extended to provide this support. Then the Kconfig file has
    to be extended to indicate the support is available, and also to
    indicate that support for full RT preemption is now available.
    
    Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 2f9ba402f708cf7195f29a477901cacbe11089c0
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Nov 1 10:14:11 2012 +0100

    powerpc: Add support for lazy preemption
    
    Implement the powerpc pieces for lazy preempt.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 95e00217927cdd22e872c6f22b21a8bf1cd02709
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Oct 31 12:04:11 2012 +0100

    arm: Add support for lazy preemption
    
    Implement the arm pieces for lazy preempt.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit f2f9e496208c584356e84e720a3dfd99970ee5e9
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Nov 1 11:03:47 2012 +0100

    x86: Support for lazy preemption
    
    Implement the x86 pieces for lazy preempt.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 5edd1691b69a1dff3109b39075eb05ed41534005
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jun 30 11:45:14 2020 +0200

    x86/entry: Use should_resched() in idtentry_exit_cond_resched()
    
    The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter.
    By using should_resched(0) instead of need_resched() the same check can
    be performed which uses the same variable as 'preempt_count()` which was
    issued before.
    
    Use should_resched(0) instead need_resched().
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 2d1c3636a727647cffb1185ccac6897e551a4071
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Oct 26 18:50:54 2012 +0100

    sched: Add support for lazy preemption
    
    It has become an obsession to mitigate the determinism vs. throughput
    loss of RT. Looking at the mainline semantics of preemption points
    gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER
    tasks. One major issue is the wakeup of tasks which are right away
    preempting the waking task while the waking task holds a lock on which
    the woken task will block right after having preempted the wakee. In
    mainline this is prevented due to the implicit preemption disable of
    spin/rw_lock held regions. On RT this is not possible due to the fully
    preemptible nature of sleeping spinlocks.
    
    Though for a SCHED_OTHER task preempting another SCHED_OTHER task this
    is really not a correctness issue. RT folks are concerned about
    SCHED_FIFO/RR tasks preemption and not about the purely fairness
    driven SCHED_OTHER preemption latencies.
    
    So I introduced a lazy preemption mechanism which only applies to
    SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the
    existing preempt_count each tasks sports now a preempt_lazy_count
    which is manipulated on lock acquiry and release. This is slightly
    incorrect as for lazyness reasons I coupled this on
    migrate_disable/enable so some other mechanisms get the same treatment
    (e.g. get_cpu_light).
    
    Now on the scheduler side instead of setting NEED_RESCHED this sets
    NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and
    therefor allows to exit the waking task the lock held region before
    the woken task preempts. That also works better for cross CPU wakeups
    as the other side can stay in the adaptive spinning loop.
    
    For RT class preemption there is no change. This simply sets
    NEED_RESCHED and forgoes the lazy preemption counter.
    
     Initial test do not expose any observable latency increasement, but
    history shows that I've been proven wrong before :)
    
    The lazy preemption mode is per default on, but with
    CONFIG_SCHED_DEBUG enabled it can be disabled via:
    
     # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features
    
    and reenabled via
    
     # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features
    
    The test results so far are very machine and workload dependent, but
    there is a clear trend that it enhances the non RT workload
    performance.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 4d507329b3b9318f8fddbc1508ce38d324b3325a
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Nov 7 17:49:20 2019 +0100

    x86: Enable RT also on 32bit
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 9e6ddccc4933cc2770eafd4974bfa32c32031c9e
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Aug 7 18:15:38 2019 +0200

    x86: Allow to enable RT
    
    Allow to select RT.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 60e58fe04abe7b6cfca0edcb14949141f29f97dc
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Nov 6 12:26:18 2011 +0100

    x86: kvm Require const tsc for RT
    
    Non constant TSC is a nightmare on bare metal already, but with
    virtualization it becomes a complete disaster because the workarounds
    are horrible latency wise. That's also a preliminary for running RT in
    a guest on top of a RT host.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 0cf7fc66884a8a410ff971fbe3c279feb24d7698
Author: Oleg Nesterov <oleg@redhat.com>
Date:   Tue Jul 14 14:26:34 2015 +0200

    signal/x86: Delay calling signals in atomic
    
    On x86_64 we must disable preemption before we enable interrupts
    for stack faults, int3 and debugging, because the current task is using
    a per CPU debug stack defined by the IST. If we schedule out, another task
    can come in and use the same stack and cause the stack to be corrupted
    and crash the kernel on return.
    
    When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and
    one of these is the spin lock used in signal handling.
    
    Some of the debug code (int3) causes do_trap() to send a signal.
    This function calls a spin lock that has been converted to a mutex
    and has the possibility to sleep. If this happens, the above issues with
    the corrupted stack is possible.
    
    Instead of calling the signal right away, for PREEMPT_RT and x86_64,
    the signal information is stored on the stacks task_struct and
    TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume
    code will send the signal when preemption is enabled.
    
    [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to
      ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ]
    
    
    Signed-off-by: Oleg Nesterov <oleg@redhat.com>
    Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [bigeasy: also needed on 32bit as per Yang Shi <yang.shi@linaro.org>]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a60399bc57eb8413ec018041b5231eebcd8c6898
Author: Clark Williams <williams@redhat.com>
Date:   Sat Jul 30 21:55:53 2011 -0500

    sysfs: Add /sys/kernel/realtime entry
    
    Add a /sys/kernel entry to indicate that the kernel is a
    realtime kernel.
    
    Clark says that he needs this for udev rules, udev needs to evaluate
    if its a PREEMPT_RT kernel a few thousand times and parsing uname
    output is too slow or so.
    
    Are there better solutions? Should it exist and return 0 on !-rt?
    
    Signed-off-by: Clark Williams <williams@redhat.com>
    Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 2e2c38e82e2ed71f1c38c1bad9341be63396fa73
Author: Haris Okanovic <haris.okanovic@ni.com>
Date:   Tue Aug 15 15:13:08 2017 -0500

    tpm_tis: fix stall after iowrite*()s
    
    ioread8() operations to TPM MMIO addresses can stall the cpu when
    immediately following a sequence of iowrite*()'s to the same region.
    
    For example, cyclitest measures ~400us latency spikes when a non-RT
    usermode application communicates with an SPI-based TPM chip (Intel Atom
    E3940 system, PREEMPT_RT kernel). The spikes are caused by a
    stalling ioread8() operation following a sequence of 30+ iowrite8()s to
    the same address. I believe this happens because the write sequence is
    buffered (in cpu or somewhere along the bus), and gets flushed on the
    first LOAD instruction (ioread*()) that follows.
    
    The enclosed change appears to fix this issue: read the TPM chip's
    access register (status code) after every iowrite*() operation to
    amortize the cost of flushing data to chip across multiple instructions.
    
    Signed-off-by: Haris Okanovic <haris.okanovic@ni.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit cf646084521dc80cf95247db9bfd1299e802e498
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jan 8 21:36:51 2013 +0100

    tty/serial/pl011: Make the locking work on RT
    
    The lock is a sleeping lock and local_irq_save() is not the optimsation
    we are looking for. Redo it to make it work on -RT and non-RT.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b42b6e6889e21b06cee7a633c0f1b562e5d56eef
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Jul 28 13:32:57 2011 +0200

    tty/serial/omap: Make the locking RT aware
    
    The lock is a sleeping lock and local_irq_save() is not the
    optimsation we are looking for. Redo it to make it work on -RT and
    non-RT.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit f5faab2a86dcb04973c3736f8692db8a05dd8798
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jul 7 12:25:11 2020 +0200

    drm/i915/gt: Only disable interrupts for the timeline lock on !force-threaded
    
    According to commit
        d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe")
    
    the intrrupts are disabled the code may be called from an interrupt
    handler and from preemptible context.
    With `force_irqthreads' set the timeline mutex is never observed in IRQ
    context so it is not neede to disable interrupts.
    
    Disable only interrupts if not in `force_irqthreads' mode.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 872271cf52c484d4f7ccc7fbf44f8861f402b18a
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Dec 19 10:47:02 2018 +0100

    drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE
    
    The order of the header files is important. If this header file is
    included after tracepoint.h was included then the NOTRACE here becomes a
    nop. Currently this happens for two .c files which use the tracepoitns
    behind DRM_I915_LOW_LEVEL_TRACEPOINTS.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 4cad3232d7f0c7e6a3116da68a1c44111b5c25f3
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Dec 6 09:52:20 2018 +0100

    drm/i915: disable tracing on -RT
    
    Luca Abeni reported this:
    | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003
    | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10
    | Call Trace:
    |  rt_spin_lock+0x3f/0x50
    |  gen6_read32+0x45/0x1d0 [i915]
    |  g4x_get_vblank_counter+0x36/0x40 [i915]
    |  trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915]
    
    The tracing events use trace_i915_pipe_update_start() among other events
    use functions acquire spin locks. A few trace points use
    intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also
    might acquire a sleeping lock.
    
    Based on this I don't see any other way than disable trace points on RT.
    
    Cc: stable-rt@vger.kernel.org
    Reported-by: Luca Abeni <lucabe72@gmail.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a2a18b7ffa33dc77da5a7b3a2cab414059a035e1
Author: Mike Galbraith <umgwanakikbuti@gmail.com>
Date:   Sat Feb 27 09:01:42 2016 +0100

    drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates
    
    Commit
       8d7849db3eab7 ("drm/i915: Make sprite updates atomic")
    
    started disabling interrupts across atomic updates. This breaks on PREEMPT_RT
    because within this section the code attempt to acquire spinlock_t locks which
    are sleeping locks on PREEMPT_RT.
    
    According to the comment the interrupts are disabled to avoid random delays and
    not required for protection or synchronisation.
    
    Don't disable interrupts on PREEMPT_RT during atomic updates.
    
    [bigeasy: drop local locks, commit message]
    
    Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 13e2d414eb6f1ecf942ed343c093842bc024b1fc
Author: Mike Galbraith <umgwanakikbuti@gmail.com>
Date:   Sat Feb 27 08:09:11 2016 +0100

    drm,radeon,i915: Use preempt_disable/enable_rt() where recommended
    
    DRM folks identified the spots, so use them.
    
    Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: linux-rt-users <linux-rt-users@vger.kernel.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 3ca5dbf1d1029d2379bf1cf2fd7cc1fb711e60b3
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Aug 21 20:38:50 2012 +0200

    random: Make it work on rt
    
    Delegate the random insertion to the forced threaded interrupt
    handler. Store the return IP of the hard interrupt handler in the irq
    descriptor and feed it into the random generator as a source of
    entropy.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 4d07cdf96442fca3858211cef9d4420832f899e3
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Thu Dec 16 14:25:18 2010 +0100

    x86: stackprotector: Avoid random pool on rt
    
    CPU bringup calls into the random pool to initialize the stack
    canary. During boot that works nicely even on RT as the might sleep
    checks are disabled. During CPU hotplug the might sleep checks
    trigger. Making the locks in random raw is a major PITA, so avoid the
    call on RT is the only sensible solution. This is basically the same
    randomness which we get during boot where the random pool has no
    entropy and we rely on the TSC randomnness.
    
    Reported-by: Carsten Emde <carsten.emde@osadl.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 625193bb2705b64d53004ecb4fa3ffec68ddb525
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 14 14:26:34 2015 +0200

    panic: skip get_random_bytes for RT_FULL in init_oops_id
    
    Disable on -RT. If this is invoked from irq-context we will have problems
    to acquire the sleeping lock.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 435b065c36a9a9d2ee46e61b679f6024dc044ea0
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Jul 26 18:52:00 2018 +0200

    crypto: cryptd - add a lock instead preempt_disable/local_bh_disable
    
    cryptd has a per-CPU lock which protected with local_bh_disable() and
    preempt_disable().
    Add an explicit spin_lock to make the locking context more obvious and
    visible to lockdep. Since it is a per-CPU lock, there should be no lock
    contention on the actual spinlock.
    There is a small race-window where we could be migrated to another CPU
    after the cpu_queue has been obtain. This is not a problem because the
    actual ressource is protected by the spinlock.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 63297909b77d84facb0c3341275dfb7ab9b74bd7
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Nov 30 13:40:10 2017 +0100

    crypto: limit more FPU-enabled sections
    
    Those crypto drivers use SSE/AVX/… for their crypto work and in order to
    do so in kernel they need to enable the "FPU" in kernel mode which
    disables preemption.
    There are two problems with the way they are used:
    - the while loop which processes X bytes may create latency spikes and
      should be avoided or limited.
    - the cipher-walk-next part may allocate/free memory and may use
      kmap_atomic().
    
    The whole kernel_fpu_begin()/end() processing isn't probably that cheap.
    It most likely makes sense to process as much of those as possible in one
    go. The new *_fpu_sched_rt() schedules only if a RT task is pending.
    
    Probably we should measure the performance those ciphers in pure SW
    mode and with this optimisations to see if it makes sense to keep them
    for RT.
    
    This kernel_fpu_resched() makes the code more preemptible which might hurt
    performance.
    
    Cc: stable-rt@vger.kernel.org
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 5190ea9f444791dfa6c60e1cc4fd1d561d79002d
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sat Nov 12 14:00:48 2011 +0100

    scsi/fcoe: Make RT aware.
    
    Do not disable preemption while taking sleeping locks. All user look safe
    for migrate_diable() only.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8b6bd58088fde59e48b75a090c1baa928dc73faf
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Apr 6 16:51:31 2010 +0200

    md: raid5: Make raid5_percpu handling RT aware
    
    __raid_run_ops() disables preemption with get_cpu() around the access
    to the raid5_percpu variables. That causes scheduling while atomic
    spews on RT.
    
    Serialize the access to the percpu data with a lock and keep the code
    preemptible.
    
    Reported-by: Udo van den Heuvel <udovdh@xs4all.nl>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Udo van den Heuvel <udovdh@xs4all.nl>

commit 35da1a4242a1a4559b803e7ad54117c55b42c8a7
Author: Mike Galbraith <umgwanakikbuti@gmail.com>
Date:   Thu Mar 31 04:08:28 2016 +0200

    drivers/block/zram: Replace bit spinlocks with rtmutex for -rt
    
    They're nondeterministic, and lead to ___might_sleep() splats in -rt.
    OTOH, they're a lot less wasteful than an rtmutex per page.
    
    Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a377dcde718af1e665957dcb8129337af35161a0
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jul 14 14:26:34 2015 +0200

    block/mq: do not invoke preempt_disable()
    
    preempt_disable() and get_cpu() don't play well together with the sleeping
    locks it tries to allocate later.
    It seems to be enough to replace it with get_cpu_light() and migrate_disable().
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8596053f3a473347c05ccc32d236326f9e8300e4
Author: Priyanka Jain <Priyanka.Jain@freescale.com>
Date:   Thu May 17 09:35:11 2012 +0530

    net: Remove preemption disabling in netif_rx()
    
    1)enqueue_to_backlog() (called from netif_rx) should be
      bind to a particluar CPU. This can be achieved by
      disabling migration. No need to disable preemption
    
    2)Fixes crash "BUG: scheduling while atomic: ksoftirqd"
      in case of RT.
      If preemption is disabled, enqueue_to_backog() is called
      in atomic context. And if backlog exceeds its count,
      kfree_skb() is called. But in RT, kfree_skb() might
      gets scheduled out, so it expects non atomic context.
    
    -Replace preempt_enable(), preempt_disable() with
     migrate_enable(), migrate_disable() respectively
    -Replace get_cpu(), put_cpu() with get_cpu_light(),
     put_cpu_light() respectively
    
    Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Rajan Srivastava <Rajan.Srivastava@freescale.com>
    Cc: <rostedt@goodmis.orgn>
    Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [bigeasy: Remove assumption about migrate_disable() from the description.]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 7bcfc0a2920ce68c056698007685ba74bd6578ab
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Mar 30 13:36:29 2016 +0200

    net: dev: always take qdisc's busylock in __dev_xmit_skb()
    
    The root-lock is dropped before dev_hard_start_xmit() is invoked and after
    setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away
    by a task with a higher priority then the task with the higher priority
    won't be able to submit packets to the NIC directly instead they will be
    enqueued into the Qdisc. The NIC will remain idle until the task(s) with
    higher priority leave the CPU and the task with lower priority gets back
    and finishes the job.
    
    If we take always the busylock we ensure that the RT task can boost the
    low-prio task and submit the packet.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b91f0be757d7d1e86f7ff94a79a5098e0df0d0c9
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Sep 16 16:15:39 2020 +0200

    net: Dequeue in dev_cpu_dead() without the lock
    
    Upstream uses skb_dequeue() to acquire lock of `input_pkt_queue'. The reason is
    to synchronize against a remote CPU which still thinks that the CPU is online
    enqueues packets to this CPU.
    There are no guarantees that the packet is enqueued before the callback is run,
    it just hope.
    RT however complains about an not initialized lock because it uses another lock
    for `input_pkt_queue' due to the IRQ-off nature of the context.
    
    Use the unlocked dequeue version for `input_pkt_queue'.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 08f72cc549122afe69a55b1367dffcc6e7d33016
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 12 15:38:34 2011 +0200

    net: Use skbufhead with raw lock
    
    Use the rps lock as rawlock so we can keep irq-off regions. It looks low
    latency. However we can't kfree() from this context therefore we defer this
    to the softirq and use the tofree_queue list for it (similar to process_queue).
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 34d9adb7cc9441a2dc21bd4862aa54eebb2fb29d
Author: Mike Galbraith <umgwanakikbuti@gmail.com>
Date:   Wed Feb 18 16:05:28 2015 +0100

    sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light()
    
    |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915
    |in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd
    |Preemption disabled at:[<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
    |CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9
    |Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014
    | ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002
    | 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008
    | ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000
    |Call Trace:
    | [<ffffffff815bdeb5>] dump_stack+0x4f/0x9e
    | [<ffffffff81073c86>] __might_sleep+0xe6/0x150
    | [<ffffffff815c3d84>] rt_spin_lock+0x24/0x50
    | [<ffffffffa06beec0>] svc_xprt_do_enqueue+0x80/0x230 [sunrpc]
    | [<ffffffffa06bf0bb>] svc_xprt_received+0x4b/0xc0 [sunrpc]
    | [<ffffffffa06c03ed>] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc]
    | [<ffffffffa06b2693>] svc_addsock+0x143/0x200 [sunrpc]
    | [<ffffffffa072e69c>] write_ports+0x28c/0x340 [nfsd]
    | [<ffffffffa072d2ac>] nfsctl_transaction_write+0x4c/0x80 [nfsd]
    | [<ffffffff8117ee83>] vfs_write+0xb3/0x1d0
    | [<ffffffff8117f889>] SyS_write+0x49/0xb0
    | [<ffffffff815c4556>] system_call_fastpath+0x16/0x1b
    
    
    Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit cd5a07cc5b4df4ca9a0dd206eca1dcff622933d5
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Jun 16 19:03:16 2017 +0200

    net/core: use local_bh_disable() in netif_rx_ni()
    
    In 2004 netif_rx_ni() gained a preempt_disable() section around
    netif_rx() and its do_softirq() + testing for it. The do_softirq() part
    is required because netif_rx() raises the softirq but does not invoke
    it. The preempt_disable() is required to remain on the same CPU which added the
    skb to the per-CPU list.
    All this can be avoided be putting this into a local_bh_disable()ed
    section. The local_bh_enable() part will invoke do_softirq() if
    required.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 256ed200bb58572900c16418f20442ce61f7f771
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Sep 8 16:57:11 2020 +0200

    net: Properly annotate the try-lock for the seqlock
    
    In patch
       ("net/Qdisc: use a seqlock instead seqcount")
    
    the seqcount has been replaced with a seqlock to allow to reader to
    boost the preempted writer.
    The try_write_seqlock() acquired the lock with a try-lock but the
    seqcount annotation was "lock".
    
    Opencode write_seqcount_t_begin() and use the try-lock annotation for
    lockdep.
    
    Reported-by: Mike Galbraith <efault@gmx.de>
    Cc: stable-rt@vger.kernel.org
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 13f621d9f778b2eb23d4c2104aff46245cf7ed69
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Sep 14 17:36:35 2016 +0200

    net/Qdisc: use a seqlock instead seqcount
    
    The seqcount disables preemption on -RT while it is held which can't
    remove. Also we don't want the reader to spin for ages if the writer is
    scheduled out. The seqlock on the other hand will serialize / sleep on
    the lock while writer is active.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 72d6f4f680bfea38795cc4afd165148cdd6461eb
Author: Scott Wood <swood@redhat.com>
Date:   Wed Sep 11 17:57:29 2019 +0100

    rcutorture: Avoid problematic critical section nesting on RT
    
    rcutorture was generating some nesting scenarios that are not
    reasonable.  Constrain the state selection to avoid them.
    
    Example #1:
    
    1. preempt_disable()
    2. local_bh_disable()
    3. preempt_enable()
    4. local_bh_enable()
    
    On PREEMPT_RT, BH disabling takes a local lock only when called in
    non-atomic context.  Thus, atomic context must be retained until after BH
    is re-enabled.  Likewise, if BH is initially disabled in non-atomic
    context, it cannot be re-enabled in atomic context.
    
    Example #2:
    
    1. rcu_read_lock()
    2. local_irq_disable()
    3. rcu_read_unlock()
    4. local_irq_enable()
    
    If the thread is preempted between steps 1 and 2,
    rcu_read_unlock_special.b.blocked will be set, but it won't be
    acted on in step 3 because IRQs are disabled.  Thus, reporting of the
    quiescent state will be delayed beyond the local_irq_enable().
    
    For now, these scenarios will continue to be tested on non-PREEMPT_RT
    kernels, until debug checks are added to ensure that they are not
    happening elsewhere.
    
    Signed-off-by: Scott Wood <swood@redhat.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8abf1b2d2e8f61dc527e290e3cb83b0cf9fafef0
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Mar 10 15:09:02 2021 +0100

    rcu: Delay RCU-selftests
    
    Delay RCU-selftests until ksoftirqd is up and running.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 0b9358cbbc0d2e9bd3e1a4788136ea5ddd7b05ee
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Mar 7 21:00:34 2012 +0100

    fs: namespace: Use cpu_chill() in trylock loops
    
    Retry loops on RT might loop forever when the modifying side was
    preempted. Use cpu_chill() instead of cpu_relax() to let the system
    make progress.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 3e649cdf24ddfabb01964ec59d1c2455f8273df1
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Mar 7 20:51:03 2012 +0100

    rt: Introduce cpu_chill()
    
    Retry loops on RT might loop forever when the modifying side was
    preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill()
    defaults to cpu_relax() for non RT. On RT it puts the looping task to
    sleep for a tick so the preempted task can make progress.
    
    Steven Rostedt changed it to use a hrtimer instead of msleep():
    |
    |Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken
    |up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is
    |called from softirq context, it may block the ksoftirqd() from running, in
    |which case, it may never wake up the msleep() causing the deadlock.
    
    + bigeasy later changed to schedule_hrtimeout()
    |If a task calls cpu_chill() and gets woken up by a regular or spurious
    |wakeup and has a signal pending, then it exits the sleep loop in
    |do_nanosleep() and sets up the restart block. If restart->nanosleep.type is
    |not TI_NONE then this results in accessing a stale user pointer from a
    |previously interrupted syscall and a copy to user based on the stale
    |pointer or a BUG() when 'type' is not supported in nanosleep_copyout().
    
    + bigeasy: add PF_NOFREEZE:
    | [....] Waiting for /dev to be fully populated...
    | =====================================
    | [ BUG: udevd/229 still has locks held! ]
    | 3.12.11-rt17 #23 Not tainted
    | -------------------------------------
    | 1 lock held by udevd/229:
    |  #0:  (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98
    |
    | stack backtrace:
    | CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23
    | (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14)
    | (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc)
    | (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160)
    | (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110)
    | (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38)
    | (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec)
    | (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c)
    | (dput+0x74/0x15c) from (lookup_real+0x4c/0x50)
    | (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44)
    | (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98)
    | (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc)
    | (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60)
    | (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c)
    | (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c)
    | (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94)
    | (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30)
    | (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48)
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 6577676e1025d244dd51be507fa461f6aa5964d5
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Oct 20 11:29:53 2017 +0200

    fs/dcache: disable preemption on i_dir_seq's write side
    
    i_dir_seq is an opencoded seqcounter. Based on the code it looks like we
    could have two writers in parallel despite the fact that the d_lock is
    held. The problem is that during the write process on RT the preemption
    is still enabled and if this process is interrupted by a reader with RT
    priority then we lock up.
    To avoid that lock up I am disabling the preemption during the update.
    The rename of i_dir_seq is here to ensure to catch new write sides in
    future.
    
    Cc: stable-rt@vger.kernel.org
    Reported-by: Oleg.Karfich@wago.com
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b6b25f302612ff245d0b5aaf450d7bd147f499ed
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Sep 14 14:35:49 2016 +0200

    fs/dcache: use swait_queue instead of waitqueue
    
    __d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock()
    which disables preemption. As a workaround convert it to swait.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit bbd667d9ef68d0705b8553d9cbbca113940ea602
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Aug 29 18:21:04 2013 +0200

    ptrace: fix ptrace vs tasklist_lock race
    
    As explained by Alexander Fyodorov <halcy@yandex.ru>:
    
    |read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel,
    |and it can remove __TASK_TRACED from task->state (by moving  it to
    |task->saved_state). If parent does wait() on child followed by a sys_ptrace
    |call, the following race can happen:
    |
    |- child sets __TASK_TRACED in ptrace_stop()
    |- parent does wait() which eventually calls wait_task_stopped() and returns
    |  child's pid
    |- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves
    |  __TASK_TRACED flag to saved_state
    |- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive()
    
    The patch is based on his initial patch where an additional check is
    added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is
    taken in case the caller is interrupted between looking into ->state and
    ->saved_state.
    
    [ Fix for ptrace_unfreeze_traced() by Oleg Nesterov ]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit c8d68449743241e79e6bf2dbfa08cedde307acd4
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 21 19:57:12 2011 +0200

    signal: Revert ptrace preempt magic
    
    Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more
    than a bandaid around the ptrace design trainwreck. It's not a
    correctness issue, it's merily a cosmetic bandaid.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit c6d6f9cc7f5da22dbcd755d426dfba8105f9b1c3
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 3 08:44:34 2009 -0500

    mm/scatterlist: Do not disable irqs on RT
    
    For -RT it is enough to keep pagefault disabled (which is currently handled by
    kmap_atomic()).
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d69a7f2680b5776e27bb2ef16c7070596deabd52
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 12 11:39:36 2011 +0200

    mm/vmalloc: Another preempt disable region which sucks
    
    Avoid the preempt disable version of get_cpu_var(). The inner-lock should
    provide enough serialisation.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 4e9fa61b3153bdf7bddba5e44ec0f243a0be37a9
Author: Mike Galbraith <umgwanakikbuti@gmail.com>
Date:   Tue Mar 22 11:16:09 2016 +0100

    mm/zsmalloc: copy with get_cpu_var() and locking
    
    get_cpu_var() disables preemption and triggers a might_sleep() splat later.
    This is replaced with get_locked_var().
    This bitspinlocks are replaced with a proper mutex which requires a slightly
    larger struct to allocate.
    
    Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then
    fixed the size magic]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 782bf4999ae66d23ce543f91399313d618908b0d
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Jan 28 17:14:16 2015 +0100

    mm/memcontrol: Replace local_irq_disable with local locks
    
    There are a few local_irq_disable() which then take sleeping locks. This
    patch converts them local locks.
    
    [bigeasy: Move unlock after memcg_check_events() in mem_cgroup_swapout(),
            pointed out by Matt Fleming <matt@codeblueprint.co.uk>]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 25de25aa3b89afacf840dfece4f7b423fae2b8c3
Author: Yang Shi <yang.shi@windriver.com>
Date:   Wed Oct 30 11:48:33 2013 -0700

    mm/memcontrol: Don't call schedule_work_on in preemption disabled context
    
    The following trace is triggered when running ltp oom test cases:
    
    BUG: sleeping function called from invalid context at kernel/rtmutex.c:659
    in_atomic(): 1, irqs_disabled(): 0, pid: 17188, name: oom03
    Preemption disabled at:[<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
    
    CPU: 2 PID: 17188 Comm: oom03 Not tainted 3.10.10-rt3 #2
    Hardware name: Intel Corporation Calpella platform/MATXM-CORE-411-B, BIOS 4.6.3 08/18/2010
    ffff88007684d730 ffff880070df9b58 ffffffff8169918d ffff880070df9b70
    ffffffff8106db31 ffff88007688b4a0 ffff880070df9b88 ffffffff8169d9c0
    ffff88007688b4a0 ffff880070df9bc8 ffffffff81059da1 0000000170df9bb0
    Call Trace:
    [<ffffffff8169918d>] dump_stack+0x19/0x1b
    [<ffffffff8106db31>] __might_sleep+0xf1/0x170
    [<ffffffff8169d9c0>] rt_spin_lock+0x20/0x50
    [<ffffffff81059da1>] queue_work_on+0x61/0x100
    [<ffffffff8112b361>] drain_all_stock+0xe1/0x1c0
    [<ffffffff8112ba70>] mem_cgroup_reclaim+0x90/0xe0
    [<ffffffff8112beda>] __mem_cgroup_try_charge+0x41a/0xc40
    [<ffffffff810f1c91>] ? release_pages+0x1b1/0x1f0
    [<ffffffff8106f200>] ? sched_exec+0x40/0xb0
    [<ffffffff8112cc87>] mem_cgroup_charge_common+0x37/0x70
    [<ffffffff8112e2c6>] mem_cgroup_newpage_charge+0x26/0x30
    [<ffffffff8110af68>] handle_pte_fault+0x618/0x840
    [<ffffffff8103ecf6>] ? unpin_current_cpu+0x16/0x70
    [<ffffffff81070f94>] ? migrate_enable+0xd4/0x200
    [<ffffffff8110cde5>] handle_mm_fault+0x145/0x1e0
    [<ffffffff810301e1>] __do_page_fault+0x1a1/0x4c0
    [<ffffffff8169c9eb>] ? preempt_schedule_irq+0x4b/0x70
    [<ffffffff8169e3b7>] ? retint_kernel+0x37/0x40
    [<ffffffff8103053e>] do_page_fault+0xe/0x10
    [<ffffffff8169e4c2>] page_fault+0x22/0x30
    
    So, to prevent schedule_work_on from being called in preempt disabled context,
    replace the pair of get/put_cpu() to get/put_cpu_light().
    
    
    Signed-off-by: Yang Shi <yang.shi@windriver.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 6e041ac24c6256fe97ddbc81e7cec054d6c366da
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu May 20 16:00:41 2021 +0200

    mm: memcontrol: Replace disable-IRQ locking with a local_lock
    
    Access to the per-CPU variable memcg_stock is synchronized by disabling
    interrupts.
    
    Convert it to a local_lock which allows RT kernels to substitute them with
    a real per CPU lock. On non RT kernels this maps to local_irq_save() as
    before, but provides also lockdep coverage of the critical region.
    No functional change.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 858043073a33d436f321bd46ad63898bb334cc3a
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu May 20 12:33:07 2021 +0200

    mm: memcontrol: Add an argument to refill_stock() to indicate locking
    
    The access to the per-CPU variable memcg_stock is protected by disabling
    interrupts. refill_stock() may change the ->caching member and updates
    the ->nr_pages member.
    refill_obj_stock() is also accecssing memcg_stock (modifies ->nr_pages)
    and disables interrupts as part for the locking. Since
    refill_obj_stock() may invoke refill_stock() (via drain_obj_stock() ->
    obj_cgroup_uncharge_pages()) the "disable interrupts"-lock is acquired
    recursively.
    
    Add an argument to refill_stock() to indicate if it is required to
    disable interrupts as part of the locking for exclusive memcg_stock
    access.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d0964c3449bab426321ffadc7cfd9cfa854a8fce
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Aug 17 12:28:10 2020 +0200

    u64_stats: Disable preemption on 32bit-UP/SMP with RT during updates
    
    On RT the seqcount_t is required even on UP because the softirq can be
    preempted. The IRQ handler is threaded so it is also preemptible.
    
    Disable preemption on 32bit-RT during value updates. There is no need to
    disable interrupts on RT because the handler is run threaded. Therefore
    disabling preemption is enough to guarantee that the update is not
    interruped.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 62888474f4c7def1b4d7eef2beb101fd3db1c406
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Wed Oct 28 18:15:32 2020 +0100

    mm/memcontrol: Disable preemption in __mod_memcg_lruvec_state()
    
    The callers expect disabled preemption/interrupts while invoking
    __mod_memcg_lruvec_state(). This works mainline because a lock of
    somekind is acquired.
    
    Use preempt_disable_rt() where per-CPU variables are accessed and a
    stable pointer is expected. This is also done in __mod_zone_page_state()
    for the same reason.
    
    Cc: stable-rt@vger.kernel.org
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit fd1745bb1e6a2c236589e8fafd4a460f5b49844a
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jul 3 08:30:13 2009 -0500

    mm/vmstat: Protect per cpu variables with preempt disable on RT
    
    Disable preemption on -RT for the vmstat code. On vanila the code runs in
    IRQ-off regions while on -RT it is not. "preempt_disable" ensures that the
    same ressources is not updated in parallel due to preemption.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 07be330e9aa8a8bfad04d6e34c85a0157d2caa5e
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Mar 2 18:58:04 2021 +0100

    mm: slub: Don't enable partial CPU caches on PREEMPT_RT by default
    
    SLUB's partial CPU caches lead to higher latencies in a hackbench
    benchmark.
    
    Don't enable partial CPU caches by default on PREEMPT_RT.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 1473115e3d4e4f673c726c6eb66d5bdff318d779
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Jul 2 14:27:23 2020 +0200

    mm: page_alloc: Use migrate_disable() in drain_local_pages_wq()
    
    drain_local_pages_wq() disables preemption to avoid CPU migration during
    CPU hotplug and can't use cpus_read_lock().
    
    Using migrate_disable() works here, too. The scheduler won't take the
    CPU offline until the task left the migrate-disable section.
    
    Use migrate_disable() in drain_local_pages_wq().
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 7b51cfe9736bbabeb5bd46f35a6cd1f95771ee10
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Jul 2 15:34:24 2021 +0200

    mm, slub: Duct tape lockdep_assert_held(local_lock_t) on RT
    
    The local_lock_t needs to be changed to make lockdep_assert_held()
    magically work.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit ef1919f3b424d7bca40012a9449534e960383663
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jun 23 15:32:51 2015 +0200

    irqwork: push most work into softirq context
    
    Initially we defered all irqwork into softirq because we didn't want the
    latency spikes if perf or another user was busy and delayed the RT task.
    The NOHZ trigger (nohz_full_kick_work) was the first user that did not work
    as expected if it did not run in the original irqwork context so we had to
    bring it back somehow for it. push_irq_work_func is the second one that
    requires this.
    
    This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs
    in raw-irq context. Everything else is defered into softirq context. Without
    -RT we have the orignal behavior.
    
    This patch incorporates tglx orignal work which revoked a little bringing back
    the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and
    Mike Galbraith,
    
    [bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a
              hard and soft variant]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 61f56f34c5313ab2d384daad1ed5a28e5ccdd228
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Jul 18 13:59:17 2011 +0200

    softirq: Disable softirq stacks for RT
    
    Disable extra stacks for softirqs. We want to preempt softirqs and
    having them on special IRQ-stack does not make this easier.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d98b56489a5c57e693bf38e2f40bd214c824edaf
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Nov 13 17:17:09 2011 +0100

    softirq: Check preemption after reenabling interrupts
    
    raise_softirq_irqoff() disables interrupts and wakes the softirq
    daemon, but after reenabling interrupts there is no preemption check,
    so the execution of the softirq thread might be delayed arbitrarily.
    
    In principle we could add that check to local_irq_enable/restore, but
    that's overkill as the rasie_softirq_irqoff() sections are the only
    ones which show this behaviour.
    
    Reported-by: Carsten Emde <cbe@osadl.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 6d59282185364970687e479dd7278fc0fea1294f
Author: Mike Galbraith <efault@gmx.de>
Date:   Sun Jan 8 09:32:25 2017 +0100

    cpuset: Convert callback_lock to raw_spinlock_t
    
    The two commits below add up to a cpuset might_sleep() splat for RT:
    
    8447a0fee974 cpuset: convert callback_mutex to a spinlock
    344736f29b35 cpuset: simplify cpuset_node_allowed API
    
    BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995
    in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset
    CPU: 135 PID: 11718 Comm: cset Tainted: G            E   4.10.0-rt1-rt #4
    Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014
    Call Trace:
     ? dump_stack+0x5c/0x81
     ? ___might_sleep+0xf4/0x170
     ? rt_spin_lock+0x1c/0x50
     ? __cpuset_node_allowed+0x66/0xc0
     ? ___slab_alloc+0x390/0x570 <disables IRQs>
     ? anon_vma_fork+0x8f/0x140
     ? copy_page_range+0x6cf/0xb00
     ? anon_vma_fork+0x8f/0x140
     ? __slab_alloc.isra.74+0x5a/0x81
     ? anon_vma_fork+0x8f/0x140
     ? kmem_cache_alloc+0x1b5/0x1f0
     ? anon_vma_fork+0x8f/0x140
     ? copy_process.part.35+0x1670/0x1ee0
     ? _do_fork+0xdd/0x3f0
     ? _do_fork+0xdd/0x3f0
     ? do_syscall_64+0x61/0x170
     ? entry_SYSCALL64_slow_path+0x25/0x25
    
    The later ensured that a NUMA box WILL take callback_lock in atomic
    context by removing the allocator and reclaim path __GFP_HARDWALL
    usage which prevented such contexts from taking callback_mutex.
    
    One option would be to reinstate __GFP_HARDWALL protections for
    RT, however, as the 8447a0fee974 changelog states:
    
    The callback_mutex is only used to synchronize reads/updates of cpusets'
    flags and cpu/node masks. These operations should always proceed fast so
    there's no reason why we can't use a spinlock instead of the mutex.
    
    Cc: stable-rt@vger.kernel.org
    Signed-off-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit bf37891999a380f988e1dd8058b3da1c2653e1f5
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Sep 13 16:42:35 2011 +0200

    sched: Disable TTWU_QUEUE on RT
    
    The queued remote wakeup mechanism can introduce rather large
    latencies if the number of migrated tasks is high. Disable it for RT.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e1ca6ff2eb5a110d79057b8c2321a7dfc5f27356
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jun 7 09:19:06 2011 +0200

    sched: Do not account rcu_preempt_depth on RT in might_sleep()
    
    RT changes the rcu_preempt_depth semantics, so we cannot check for it
    in might_sleep().
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 62c341d62dba37f2f0e2cc43f738582c871e84fa
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Nov 21 19:31:08 2016 +0100

    kernel/sched: move stack + kprobe clean up to __put_task_struct()
    
    There is no need to free the stack before the task struct (except for reasons
    mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if
    CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't
    free memory in preempt disabled region.
    vfree_atomic() delays the memory cleanup to a worker. Since we move everything
    to the RCU callback, we can also free it immediately.
    
    Cc: stable-rt@vger.kernel.org #for kprobe_flush_task()
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 7da87732639e954c37b4885e4dfdf6facf1f13c9
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Jun 6 12:20:33 2011 +0200

    sched: Move mmdrop to RCU on RT
    
    Takes sleeping locks and calls into the memory allocator, so nothing
    we want to do in task switch and oder atomic contexts.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 93fcb6de922b66624a3c79445f39274f781449cb
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Jun 6 12:12:51 2011 +0200

    sched: Limit the number of task migrations per batch
    
    Put an upper limit on the number of tasks which are migrated per batch
    to avoid large latencies.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit efad5fed9c9bdb626789a766faac249882d6cf52
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Sat May 27 19:02:06 2017 +0200

    kernel/sched: add {put|get}_cpu_light()
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 468c014d5e3733525eff26af855a6553736eaccb
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Fri Jul 24 12:38:56 2009 +0200

    preempt: Provide preempt_*_(no)rt variants
    
    RT needs a few preempt_disable/enable points which are not necessary
    otherwise. Implement variants to avoid #ifdeffery.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 32f3c134a2774d83da33407cba1ae3cef57533ee
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Oct 17 16:36:18 2017 +0200

    lockdep: disable self-test
    
    The self-test wasn't always 100% accurate for RT. We disabled a few
    tests which failed because they had a different semantic for RT. Some
    still reported false positives. Now the selftest locks up the system
    during boot and it needs to be investigated…
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 63bf6d3149bbb052d6c49ccc65419815f5e79596
Author: Josh Cartwright <josh.cartwright@ni.com>
Date:   Wed Jan 28 13:08:45 2015 -0600

    lockdep: selftest: fix warnings due to missing PREEMPT_RT conditionals
    
    "lockdep: Selftest: Only do hardirq context test for raw spinlock"
    disabled the execution of certain tests with PREEMPT_RT, but did
    not prevent the tests from still being defined.  This leads to warnings
    like:
    
      ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_12' defined but not used [-Wunused-function]
      ./linux/lib/locking-selftest.c:574:1: warning: 'irqsafe1_hard_rlock_21' defined but not used [-Wunused-function]
      ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_12' defined but not used [-Wunused-function]
      ./linux/lib/locking-selftest.c:577:1: warning: 'irqsafe1_hard_wlock_21' defined but not used [-Wunused-function]
      ./linux/lib/locking-selftest.c:580:1: warning: 'irqsafe1_soft_spin_12' defined but not used [-Wunused-function]
      ...
    
    Fixed by wrapping the test definitions in #ifndef CONFIG_PREEMPT_RT
    conditionals.
    
    
    Signed-off-by: Josh Cartwright <josh.cartwright@ni.com>
    Signed-off-by: Xander Huff <xander.huff@ni.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Gratian Crisan <gratian.crisan@ni.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d73448cc523ceaa187a8b8794ce511fdbf3a3836
Author: Yong Zhang <yong.zhang@windriver.com>
Date:   Mon Apr 16 15:01:56 2012 +0800

    lockdep: selftest: Only do hardirq context test for raw spinlock
    
    On -rt there is no softirq context any more and rwlock is sleepable,
    disable softirq context test and rwlock+irq test.
    
    Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: Yong Zhang <yong.zhang@windriver.com>
    Link: http://lkml.kernel.org/r/1334559716-18447-3-git-send-email-yong.zhang0@gmail.com
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a89aefc8729891931c25b9c47966254bc4c854f3
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Jul 17 18:51:23 2011 +0200

    lockdep: Make it RT aware
    
    teach lockdep that we don't really do softirqs on -RT.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 5419cc6547b3f7a1d22f354e7ce1d4b936b0f556
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Aug 4 17:40:42 2017 +0200

    locking: don't check for __LINUX_SPINLOCK_TYPES_H on -RT archs
    
    Upstream uses arch_spinlock_t within spinlock_t and requests that
    spinlock_types.h header file is included first.
    On -RT we have the rt_mutex with its raw_lock wait_lock which needs
    architectures' spinlock_types.h header file for its definition. However
    we need rt_mutex first because it is used to build the spinlock_t so
    that check does not work for us.
    Therefore I am dropping that check.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d1184a4a2389a330df33161a3a28f4c974f876e6
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu May 20 18:09:38 2021 +0200

    locking/RT: Add might sleeping annotation.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit df4f4e03b7ca6f74a0acb53024b317c5e86ac9ae
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Apr 13 23:34:56 2021 +0200

    locking/local_lock: Add RT support
    
    On PREEMPT_RT enabled kernels local_lock has a real spinlock
    inside. Provide the necessary macros to substitute the non-RT variants.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 0fdc3cb86fec46049c4f54beb85bc82034c7c285
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Apr 13 23:26:09 2021 +0200

    locking/local_lock: Prepare for RT support
    
    PREEMPT_RT enabled kernels will add a real lock to local_lock and have to
    replace the preemption/interrupt disable/enable pairs by
    migrate_disable/enable pairs.
    
    To avoid duplicating the inline helpers for RT provide defines
    which map the relevant invocations to the non-RT variants.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit ba5d7ea8cf4ce7d0e48760cf286d8ed7261783b3
Author: Steven Rostedt <rostedt@goodmis.org>
Date:   Tue Jul 6 16:36:57 2021 +0200

    locking/rtmutex: Add adaptive spinwait mechanism
    
    Going to sleep when a spinlock or rwlock is contended can be quite
    inefficient when the contention time is short and the lock owner is running
    on a different CPU. The MCS mechanism is not applicable to rtmutex based
    locks, so provide a simple adaptive spinwait mechanism for the RT specific
    spin/rwlock implementations.
    
    [ tglx: Provide a contemporary changelog ]
    
    Originally-by: Gregory Haskins <ghaskins@novell.com>
    Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8e39f8f9c367fd009ddd27e2376539c0712f1fca
Author: Gregory Haskins <ghaskins@novell.com>
Date:   Tue Jul 6 16:36:57 2021 +0200

    locking/rtmutex: Implement equal priority lock stealing
    
    The current logic only allows lock stealing to occur if the current task is
    of higher priority than the pending owner.
    
    Signficant throughput improvements can be gained by allowing the lock
    stealing to include tasks of equal priority when the contended lock is a
    spin_lock or a rw_lock and the tasks are not in a RT scheduling task.
    
    The assumption was that the system will make faster progress by allowing
    the task already on the CPU to take the lock rather than waiting for the
    system to wake up a different task.
    
    This does add a degree of unfairness, but in reality no negative side
    effects have been observed in the many years that this has been used in the
    RT kernel.
    
    [ tglx: Refactored and rewritten several times by Steve Rostedt, Sebastian
            Siewior and myself ]
    
    Signed-off-by: Gregory Haskins <ghaskins@novell.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit aa8c4cd5600d09576939f6093512260825adb046
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:57 2021 +0200

    preempt: Adjust PREEMPT_LOCK_OFFSET for RT
    
    On PREEMPT_RT regular spinlocks and rwlocks are substituted with rtmutex
    based constructs. spin/rwlock held regions are preemptible on PREEMPT_RT,
    so PREEMPT_LOCK_OFFSET has to be 0 to make the various cond_resched_*lock()
    functions work correctly.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit eaaa5e87721d080f69479f80657c2256fb6d82ee
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:57 2021 +0200

    rtmutex: Prevent lockdep false positive with PI futexes
    
    On PREEMPT_RT the futex hashbucket spinlock becomes 'sleeping' and rtmutex
    based. That causes a lockdep false positive because some of the futex
    functions invoke spin_unlock(&hb->lock) with the wait_lock of the rtmutex
    associated to the pi_futex held.  spin_unlock() in turn takes wait_lock of
    the rtmutex on which the spinlock is based which makes lockdep notice a
    lock recursion.
    
    Give the futex/rtmutex wait_lock a seperate key.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a796a11915bbe53f55c748a53f116f01424b73df
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:57 2021 +0200

    futex: Prevent requeue_pi() lock nesting issue on RT
    
    The requeue_pi() operation on RT kernels creates a problem versus the
    task::pi_blocked_on state when a waiter is woken early (signal, timeout)
    and that early wake up interleaves with the requeue_pi() operation.
    
    When the requeue manages to block the waiter on the rtmutex which is
    associated to the second futex, then a concurrent early wakeup of that
    waiter faces the problem that it has to acquire the hash bucket spinlock,
    which is not an issue on non-RT kernels, but on RT kernels spinlocks are
    substituted by 'sleeping' spinlocks based on rtmutex. If the hash bucket
    lock is contended then blocking on that spinlock would result in a
    impossible situation: blocking on two locks at the same time (the hash
    bucket lock and the rtmutex representing the PI futex).
    
    It was considered to make the hash bucket locks raw_spinlocks, but
    especially requeue operations with a large amount of waiters can introduce
    significant latencies, so that's not an option for RT.
    
    The RT tree carried a solution which (ab)used task::pi_blocked_on to store
    the information about an ongoing requeue and an early wakeup which worked,
    but required to add checks for these special states all over the place.
    
    The distangling of an early wakeup of a waiter for a requeue_pi() operation
    is already looking at quite some different states and the task::pi_blocked_on
    magic just expanded that to a hard to understand 'state machine'.
    
    This can be avoided by keeping track of the waiter/requeue state in the
    futex_q object itself.
    
    Add a requeue_state field to struct futex_q with the following possible
    states:
    
            Q_REQUEUE_PI_NONE
            Q_REQUEUE_PI_IGNORE
            Q_REQUEUE_PI_IN_PROGRESS
            Q_REQUEUE_PI_WAIT
            Q_REQUEUE_PI_DONE
            Q_REQUEUE_PI_LOCKED
    
    The waiter starts with state = NONE and the following state transitions are
    valid:
    
    On the waiter side:
      Q_REQUEUE_PI_NONE             -> Q_REQUEUE_PI_IGNORE
      Q_REQUEUE_PI_IN_PROGRESS      -> Q_REQUEUE_PI_WAIT
    
    On the requeue side:
      Q_REQUEUE_PI_NONE             -> Q_REQUEUE_PI_INPROGRESS
      Q_REQUEUE_PI_IN_PROGRESS      -> Q_REQUEUE_PI_DONE/LOCKED
      Q_REQUEUE_PI_IN_PROGRESS      -> Q_REQUEUE_PI_NONE (requeue failed)
      Q_REQUEUE_PI_WAIT             -> Q_REQUEUE_PI_DONE/LOCKED
      Q_REQUEUE_PI_WAIT             -> Q_REQUEUE_PI_IGNORE (requeue failed)
    
    The requeue side ignores a waiter with state Q_REQUEUE_PI_IGNORE as this
    signals that the waiter is already on the way out. It also means that
    the waiter is still on the 'wait' futex, i.e. uaddr1.
    
    The waiter side signals early wakeup to the requeue side either through
    setting state to Q_REQUEUE_PI_IGNORE or to Q_REQUEUE_PI_WAIT depending
    on the current state. In case of Q_REQUEUE_PI_IGNORE it can immediately
    proceed to take the hash bucket lock of uaddr1. If it set state to WAIT,
    which means the wakeup is interleaving with a requeue in progress it has
    to wait for the requeue side to change the state. Either to DONE/LOCKED
    or to IGNORE. DONE/LOCKED means the waiter q is now on the uaddr2 futex
    and either blocked (DONE) or has acquired it (LOCKED). IGNORE is set by
    the requeue side when the requeue attempt failed via deadlock detection
    and therefore the waiter's futex_q is still on the uaddr1 futex.
    
    While this is not strictly required on !RT making this unconditional has
    the benefit of common code and it also allows the waiter to avoid taking
    the hash bucket lock on the way out in certain cases, which reduces
    contention.
    
    Add the required helpers required for the state transitions, invoke them at
    the right places and restructure the futex_wait_requeue_pi() code to handle
    the return from wait (early or not) based on the state machine values.
    
    On !RT enabled kernels the waiter spin waits for the state going from
    Q_REQUEUE_PI_WAIT to some other state, on RT enabled kernels this is
    handled by rcuwait_wait_event() and the corresponding wake up on the
    requeue side.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit caf90d1280afb870b6e3d4798d828dabc90a4b29
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:56 2021 +0200

    futex: Clarify comment in futex_requeue()
    
    The comment about the restriction of the number of waiters to wake for the
    REQUEUE_PI case is confusing at best. Rewrite it.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e67ddc796fc167bc36b776c3c163ccf1bc37994f
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:56 2021 +0200

    futex: Restructure futex_requeue()
    
    No point in taking two more 'requeue_pi' conditionals just to get to the
    requeue. Same for the requeue_pi case just the other way round.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit f3ffb1c6a2e8b84dd36557f3a2a30d9a2c82bd15
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:55 2021 +0200

    futex: Correct the number of requeued waiters for PI
    
    The accounting is wrong when either the PI sanity check or the
    requeue PI operation fails. Adjust it in the failure path.
    
    Will be simplified in the next step.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d8411abc3a26fa8cd1fd0baac3447dc27c95eb81
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:54 2021 +0200

    futex: Cleanup stale comments
    
    The futex key reference mechanism is long gone. Cleanup the stale comments
    which still mention it.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 1e1c70cfd49d70fd1b69622c6a797c539fd451c9
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:54 2021 +0200

    futex: Validate waiter correctly in futex_proxy_trylock_atomic()
    
    The loop in futex_requeue() has a sanity check for the waiter which is
    missing in futex_proxy_trylock_atomic(). In theory the key2 check is
    sufficient, but futexes are cursed so add it for completness and paranoia
    sake.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit fc6e6a83c2ab412e7058f6afd9867f9fbd874081
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Jul 1 17:50:20 2021 +0200

    lib/test_lockup: Adapt to changed variables.
    
    The inner parts of certain locks (mutex, rwlocks) changed due to a rework for
    RT and non RT code. Most users remain unaffected, but those who fiddle around
    in the inner parts need to be updated.
    
    Match the struct names to the newer layout.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d64c4ab5d3e8d87647abc972f73b9f1dd7c9d495
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:52 2021 +0200

    locking/rtmutex: Add mutex variant for RT
    
    Add the necessary defines, helpers and API functions for replacing mutex on
    a PREEMPT_RT enabled kernel with a rtmutex based variant.
    
    If PREEMPT_RT is enabled then the regular 'struct mutex' is renamed to
    'struct __mutex', which is still typedeffed as '_mutex_t' to allow the
    standalone compilation and utilization of ww_mutex.
    
    No functional change when CONFIG_PREEMPT_RT=n
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8cbe9cb30f66789698965a0d27263fb7e76dee06
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:51 2021 +0200

    locking/mutex: Exclude non-ww_mutex API for RT
    
    In order to build ww_mutex standalone on RT and to replace mutex with a RT
    specific rtmutex based variant, guard the non-ww_mutex API so it is only
    built when CONFIG_PREEMPT_RT is disabled.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 9e1721dcf72bb6a1ad2de92e52216a3e280224a6
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:51 2021 +0200

    locking/mutex: Rearrange items in mutex.h
    
    Move the lockdep map initializer to a different place so it can be shared
    with the upcoming RT variant of struct mutex.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 61df59103205964c4d86134a153e51a4a931cfb9
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:51 2021 +0200

    locking/mutex: Replace struct mutex in core code
    
    PREEMPT_RT replaces 'struct mutex' with a rtmutex based variant so all
    mutex operations are included into the priority inheritance scheme, but
    wants to utilize the ww_mutex specific part of the regular mutex
    implementation as is.
    
    As the regular mutex and ww_mutex implementation are tightly coupled
    (ww_mutex has a 'struct mutex' inside) and share a lot of code (ww_mutex is
    mostly an extension) a simple replacement of 'struct mutex' does not work.
    
    'struct mutex' has a typedef '_mutex_t' associated. Replace all 'struct
    mutex' references in the mutex code code with '_mutex_t' which allows to
    have a RT specific 'struct mutex' in the final step.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 9e38af086bbd692fcc07bd24b66b27ef6d775f07
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:51 2021 +0200

    locking/ww_mutex: Switch to _mutex_t
    
    PREEMPT_RT replaces 'struct mutex' with a rtmutex based variant so all
    mutex operations are included into the priority inheritance scheme, but
    wants to utilize the ww_mutex specific part of the regular mutex
    implementation as is.
    
    As the regular mutex and ww_mutex implementation are tightly coupled
    (ww_mutex has a 'struct mutex' inside) and share a lot of code (ww_mutex is
    mostly an extension) a simple replacement of 'struct mutex' does not work.
    
    'struct mutex' has a typedef '_mutex_t' associated. Replace all 'struct
    mutex' references in ww_mutex with '_mutex_t' which allows to have a RT
    specific 'struct mutex' in the final step.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 0d00cb96f75047a2c69c59a49c65d86509122673
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:51 2021 +0200

    locking/mutex: Rename the ww_mutex relevant functions
    
    In order to build ww_mutex standalone for PREEMPT_RT and to allow replacing
    the regular mutex with an RT specific rtmutex based variant, rename a few
    ww_mutex relevant functions, so the final RT build does not have namespace
    collisions.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit c0652713c31434fb54868cac4788398791489025
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:50 2021 +0200

    locking/mutex: Introduce _mutex_t
    
    PREEMPT_RT replaces 'struct mutex' with a rtmutex based variant so all
    mutex operations are included into the priority inheritance scheme.
    
    But a complete replacement of the mutex implementation would require to
    reimplement ww_mutex on top of the rtmutex based variant. That has been
    tried, but the outcome is dubious if not outright wrong in some cases:
    
       1) ww_mutex by it's semantics can never provide any realtime properties
    
       2) The waiter ordering of ww_mutex depends on the associated context
          stamp, which is not possible with priority based ordering on a
          rtmutex based implementation
    
    So a rtmutex based ww_mutex would be semanticaly different and
    incomplete. Aside of that the ww_mutex specific helpers cannot be shared
    between the regular mutex and the RT variant, so they are likely to diverge
    further and grow different properties and bugs.
    
    The alternative solution is to make it possible to compile the ww_mutex
    specific part of the regular mutex implementation as is on RT and have a
    rtmutex based 'struct mutex' variant.
    
    As the regular mutex and ww_mutex implementation are tightly coupled
    (ww_mutex has a 'struct mutex' inside) and share a lot of code (ww_mutex is
    mostly an extension) a simple replacement of 'struct mutex' does not work.
    
    To solve this attach a typedef to 'struct mutex': _mutex_t
    
    This new type is then used to replace 'struct mutex' in 'struct ww_mutex',
    in a few helper functions and in the actual regular mutex code. None of the
    actual usage sites of mutexes are affected.
    
    That allows in the final step to have a RT specific 'struct mutex' and the
    regular _mutex_t type.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b0bb8d3b97f78809f0a1453105ad5494a39ffeff
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:50 2021 +0200

    locking/mutex: Make mutex::wait_lock raw
    
    PREEMPT_RT wants to utilize the existing ww_mutex implementation instead of
    trying to mangle ww_mutex functionality into the rtmutex based mutex
    implementation. The mutex internal wait_lock is a regular spinlock which
    would be converted to a sleeping spinlock on RT, but that's not really
    required because the wait_lock held times are short and limited.
    
    Convert it to a raw_spinlock like the wait_lock of rtmutex.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 52f869aee24db1e6415a4c6b8422f88b650008c1
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:50 2021 +0200

    locking/ww_mutex: Move ww_mutex declarations into ww_mutex.h
    
    Move the ww_mutex declarations in the ww_mutex specific header where they
    belong.
    
    Preperatory change to allow compiling ww_mutex standalone.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 5e5f46d3edac9840b14f906dbe6b097cc5d8ce90
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:50 2021 +0200

    locking/mutex: Move waiter to core header
    
    Move the mutex waiter declaration from the global to the core local
    header. There is no reason to expose it outside of the core code.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 72a31aa31eaeef8f1252d555856778269b9fc4e0
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:50 2021 +0200

    locking/mutex: Consolidate core headers
    
    Having two header files which contain just the non-debug and debug variants
    is mostly waste of disc space and has no real value. Stick the debug
    variants into the common mutex.h file as counterpart to the stubs for the
    non-debug case.
    
    That allows to add helpers and defines to the common header for the
    upcoming handling of mutexes and ww_mutexes on PREEMPT_RT.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 387b12793ce0faedbfc4563aaf557f664c95f1a9
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:49 2021 +0200

    locking/rwlock: Provide RT variant
    
    Similar to rw_semaphores on RT the rwlock substitution is not writer fair
    because it's not feasible to have a writer inherit it's priority to
    multiple readers. Readers blocked on a writer follow the normal rules of
    priority inheritance. Like RT spinlocks RT rwlocks are state preserving
    accross the slow lock operations (contended case).
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 3779e68a306bca13763583d465132b6fba9c56fe
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:49 2021 +0200

    locking/spinlock: Provide RT variant
    
    Provide the actual locking functions which make use of the general and
    spinlock specific rtmutex code.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 052c362a00b3b3f3187dcc93c221c2e023d92dde
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:49 2021 +0200

    locking/rtmutex: Provide the spin/rwlock core lock function
    
    A simplified version of the rtmutex slowlock function which neither handles
    signals nor timeouts and is careful about preserving the state of the
    blocked task accross the lock operation.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a78d15145be2367460595d306aeda6609f23777b
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:49 2021 +0200

    locking/spinlock: Provide RT variant header
    
    Provide the necessary wrappers around the actual rtmutex based spinlock
    implementation.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 1d0de8e9dd1017dd85087f05719cfc50f87e80ef
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:49 2021 +0200

    locking/spinlock: Provide RT specific spinlock type
    
    RT replaces spinlocks with a simple wrapper around a rtmutex which turns
    spinlocks on RT into 'sleeping' spinlocks. The actual implementation of the
    spinlock API differs from a regular rtmutex as it does neither handle
    timeouts nor signals and it is state preserving accross the lock operation.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 1eabaec31e25dd11f2b915841755733c010bf151
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jul 6 16:36:48 2021 +0200

    locking/rtmutex: Include only rbtree types
    
    rtmutex.h needs the definition of struct rb_root_cached. rbtree.h includes
    kernel.h which includes spinlock.h. That works nicely for non-RT enabled
    kernels, but on RT enabled kernels spinlocks are based on rtmutexes which
    creates another circular header dependency as spinlocks.h will require
    rtmutex.h.
    
    Include rbtree_types.h instead.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 3f1005975f9d1e02c35e5c720d91f284fc5a6b1f
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jul 6 16:36:48 2021 +0200

    rbtree: Split out the rbtree type definitions
    
    rtmutex.h needs the definition of struct rb_root_cached. rbtree.h includes
    kernel.h which includes spinlock.h. That works nicely for non-RT enabled
    kernels, but on RT enabled kernels spinlocks are based on rtmutexes which
    creates another circular header dependency as spinlocks.h will require
    rtmutex.h.
    
    Split out the type definitions and move them into their own header file so
    the rtmutex header can include just those.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 98c59a95a1469a0f66f4e4f2ded3f45b97705816
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jul 6 16:36:48 2021 +0200

    locking/lockdep: Reduce includes in debug_locks.h
    
    The inclusion of printk.h leads to a circular dependency if spinlock_t is
    based on rtmutexes on RT enabled kernels.
    
    Include only atomic.h (xchg()) and cache.h (__read_mostly) which is all
    what debug_locks.h requires.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e0594a0d6b0fa58257aaccc593e49f908105287a
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jul 6 16:36:48 2021 +0200

    locking/rtmutex: Prevent future include recursion hell
    
    rtmutex only needs raw_spinlock_t, but it includes spinlock_types.h which
    is not a problem on an non RT enabled kernel.
    
    RT kernels substitute regular spinlocks with 'sleeping' spinlocks which
    are based on rtmutexes and therefore must be able to include rtmutex.h.
    
    Include spinlock_types_raw.h instead.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d1739ad7ce7c4d5120e3b58f06834fa848d3caa5
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:48 2021 +0200

    locking/spinlock: Split the lock types header
    
    Move raw_spinlock into its own file. Prepare for RT 'sleeping spinlocks' to
    avoid header recursion as RT locks require rtmutex.h which in turn requires
    the raw spinlock types.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a891c54d9b25fbaafb3f20fae54ad20e3229c150
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:47 2021 +0200

    locking/rtmutex: Guard regular sleeping locks specific functions
    
    Guard the regular sleeping lock specific functionality which is used for
    rtmutex on non-RT enabled kernels and for mutex, rtmutex and semaphores on
    RT enabled kernels so the code can be reused for the RT specific
    implementation of spinlocks and rwlocks in a different compilation unit.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a7844d974babf05694b7e13eb6507f63cad74bb1
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:47 2021 +0200

    locking/rtmutex: Prepare RT rt_mutex_wake_q for RT locks
    
    Add a rtlock_task pointer to rt_mutex_wake_q which allows to handle the RT
    specific wakeup for spin/rwlock waiters. The pointer is just consuming 4/8
    bytes on stack so it is provided unconditionaly to avoid #ifdeffery all
    over the place.
    
    No functional change for non-RT enabled kernels.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 908d2944c4eb56c1c7cadb98e1492c6462567811
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:47 2021 +0200

    locking/rtmutex: Use rt_mutex_wake_q_head
    
    Prepare for the required state aware handling of waiter wakeups via wake_q
    and switch the rtmutex code over to the rtmutex specific wrapper.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a0477bfbc5bb986f348c4b9ac7bf3f0893da0596
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:47 2021 +0200

    locking/rtmutex: Provide rt_mutex_wake_q and helpers
    
    To handle the difference of wakeups for regular sleeping locks (mutex,
    rtmutex, rw_semaphore) and the wakeups for 'sleeping' spin/rwlocks on
    PREEMPT_RT enabled kernels correctly, it is required to provide a
    wake_q construct which allows to keep them seperate.
    
    Provide a wrapper around wake_q and the required helpers, which will be
    extended with the state handling later.
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 5328155b18de876d8749d7632bab41cca0919737
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:47 2021 +0200

    locking/rtmutex: Add wake_state to rt_mutex_waiter
    
    Regular sleeping locks like mutexes, rtmutexes and rw_semaphores are always
    entering and leaving a blocking section with task state == TASK_RUNNING.
    
    On a non-RT kernel spinlocks and rwlocks never affect the task state, but
    on RT kernels these locks are converted to rtmutex based 'sleeping' locks.
    
    So in case of contention the task goes to block which requires to carefully
    preserve the task state and restore it after acquiring the lock taking
    regular wakeups for the task into account which happened while the task was
    blocked. This state preserving is achieved by having a seperate task state
    for blocking on a RT spin/rwlock and a saved_state field in task_struct
    along with careful handling of these wakeup scenarios in try_to_wake_up().
    
    To avoid conditionals in the rtmutex code, store the wake state which has
    to be used for waking a lock waiter in rt_mutex_waiter which allows to
    handle the regular and RT spin/rwlocks by handing it to wake_up_state().
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 1a83a3b0bb6b4f584b3171ec9b6cf2747562132a
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:47 2021 +0200

    locking/rwsem: Add rtmutex based R/W semaphore implementation
    
    The RT specific R/W semaphore implementation used to restrict the number of
    readers to one because a writer cannot block on multiple readers and
    inherit its priority or budget.
    
    The single reader restricting was painful in various ways:
    
     - Performance bottleneck for multi-threaded applications in the page fault
       path (mmap sem)
    
     - Progress blocker for drivers which are carefully crafted to avoid the
       potential reader/writer deadlock in mainline.
    
    The analysis of the writer code paths shows, that properly written RT tasks
    should not take them. Syscalls like mmap(), file access which take mmap sem
    write locked have unbound latencies which are completely unrelated to mmap
    sem. Other R/W sem users like graphics drivers are not suitable for RT tasks
    either.
    
    So there is little risk to hurt RT tasks when the RT rwsem implementation is
    done in the following way:
    
     - Allow concurrent readers
    
     - Make writers block until the last reader left the critical section. This
       blocking is not subject to priority/budget inheritance.
    
     - Readers blocked on a writer inherit their priority/budget in the normal
       way.
    
    There is a drawback with this scheme. R/W semaphores become writer unfair
    though the applications which have triggered writer starvation (mostly on
    mmap_sem) in the past are not really the typical workloads running on a RT
    system. So while it's unlikely to hit writer starvation, it's possible. If
    there are unexpected workloads on RT systems triggering it, the problem
    has to be revisited.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 28b867762482d80cf05b7fd978dcb1b05836b197
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:46 2021 +0200

    locking: Add base code for RT rw_semaphore and rwlock
    
    On PREEMPT_RT rw_semaphores and rwlocks are substituted with a rtmutex and
    a reader count. The implementation is writer unfair as it is not feasible
    to do priority inheritance on multiple readers, but experience has shown
    that realtime workloads are not the typical workloads which are sensitive
    to writer starvation.
    
    The inner workings of rw_semaphores and rwlocks on RT are almost indentical
    except for the task state and signal handling. rw_semaphores are not state
    preserving over a contention, they are expect to enter and leave with state
    == TASK_RUNNING. rwlocks have a mechanism to preserve the state of the task
    at entry and restore it after unblocking taking potential non-lock related
    wakeups into account. rw_semaphores can also be subject to signal handling
    interrupting a blocked state, while rwlocks ignore signals.
    
    To avoid code duplication, provide a shared implementation which takes the
    small difference vs. state and signals into account. The code is included
    into the relevant rw_semaphore/rwlock base code and compiled for each use
    case seperately.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e03cbdcf154eddbc014705b6e2ebacd755d6ab39
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:46 2021 +0200

    locking/rtmutex: Provide lockdep less variants of rtmutex interfaces
    
    The existing rtmutex_() functions are used by code which uses rtmutex
    directly. These interfaces contain rtmutex specific lockdep operations.
    
    The inner code can be reused for lock implementations which build on top of
    rtmutexes, i.e. the lock substitutions for RT enabled kernels. But as these
    are different lock types they have their own lockdep operations. Calling
    the existing rtmutex interfaces for those would cause double lockdep checks
    and longer lock chains for no value.
    
    Provide rt_mutex_lock_state(), __rt_mutex_trylock() and __rt_mutex_unlock()
    which are not doing any lockdep operations on the rtmutex itself. The
    caller has to do them on the lock type which embeds the rtmutex.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit fb5c624fe3c514db85b9f1dcbddfb4509644b068
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:46 2021 +0200

    locking/rtmutex: Provide rt_mutex_slowlock_locked()
    
    Split the inner workings of rt_mutex_slowlock() out into a seperate
    function which can be reused by the upcoming RT lock substitutions,
    e.g. for rw_semaphores.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d6de1c1adf2da4f84dec14f1bc6e9fe1bb314d5b
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:46 2021 +0200

    rtmutex: Split API and implementation
    
    Prepare for reusing the inner functions of rtmutex for RT lock
    substitutions.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 9abf291893bdd655e4ad4d2802e03d91dbbaa182
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Apr 26 09:40:07 2021 +0200

    rtmutex: Convert macros to inlines
    
    Inlines are typesafe...
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 6d3f059f50e4bb1739feaf9cbc3d1adde0ce330e
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:45 2021 +0200

    sched/wake_q: Provide WAKE_Q_HEAD_INITIALIZER
    
    The RT specific spin/rwlock implementation requires special handling of the
    to be woken waiters. Provide a WAKE_Q_HEAD_INITIALIZER which can be used by
    the rtmutex code to implement a RT aware wake_q derivative.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e5cc3ca9494a713e99978ea40839d73b2882c314
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:45 2021 +0200

    sched: Provide schedule point for RT locks
    
    RT enabled kernels substitute spin/rwlocks with 'sleeping' variants based
    on rtmutex. Blocking on such a lock is similar to preemption versus:
    
     - I/O scheduling and worker handling because these functions might block
       on another substituted lock or come from a lock contention within these
       functions.
    
     - RCU considers this like a preemption because the task might be in a read
       side critical section.
    
    Add a seperate scheduling point for this and hand a new scheduling mode
    argument to __schedule() which allows along with seperate mode masks to
    handle this gracefully from within the scheduler without proliferating that
    to other subsystems like RCU.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit bf9603274469cf3c06e46646351791c0bf68fea4
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:45 2021 +0200

    sched: Rework the __schedule() preempt argument
    
    PREEMPT_RT needs to hand a special state into __schedule() when a task
    blocks on a 'sleeping' spin/rwlock. This is required to handle
    rcu_note_context_switch() correctly without having special casing in the
    RCU code. From an RCU point of view the blocking on the sleeping spinlock
    is equivalent to preemption because the task might be in a read side
    critical section.
    
    schedule_debug() also has a check which would trigger with the !preempt
    case, but that could be handled differently.
    
    To avoid adding another argument and extra checks which cannot be optimized
    out by the compiler the following solution has been chosen:
    
     - Replace the boolean 'preempt' argument with an unsigned integer
       'sched_mode' argument and define constants to hand in:
       (0 == No preemption, 1 = preemption).
    
     - Add two masks to apply on that mode one for the debug/rcu invocations
       and one for the actual scheduling decision.
    
       For a non RT kernel these masks are UINT_MAX, i.e. all bits are set
       which allows the compiler to optimze the AND operation out because it is
       not masking out anything. IOW, it's not different from the boolean.
    
       RT enabled kernels will define these masks seperately.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8b3163b2445598d211c2c035d43293f0ec3f3245
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:44 2021 +0200

    sched: Prepare for RT sleeping spin/rwlocks
    
    Waiting for spinlocks and rwlocks on non RT enabled kernels is task::state
    preserving. Any wakeup which matches the state is valid.
    
    RT enabled kernels substitutes them with 'sleeping' spinlocks. This creates
    an issue vs. task::state.
    
    In order to block on the lock the task has to overwrite task::state and a
    consecutive wakeup issued by the unlocker sets the state back to
    TASK_RUNNING. As a consequence the task loses the state which was set
    before the lock acquire and also any regular wakeup targeted at the task
    while it is blocked on the lock.
    
    To handle this gracefully add a 'saved_state' member to task_struct which
    is used in the following way:
    
     1) When a task blocks on a 'sleeping' spinlock, the current state is saved
        in task::saved_state before it is set to TASK_RTLOCK_WAIT.
    
     2) When the task unblocks and after acquiring the lock, it restores the saved
        state.
    
     3) When a regular wakeup happens for a task while it is blocked then the
        state change of that wakeup is redirected to operate on task::saved_state.
    
        This is also required when the task state is running because the task
        might have been woken up from the lock wait and has not yet restored
        the saved state.
    
    To make it complete provide the necessary helpers to save and restore the
    saved state along with the necessary documentation how the RT lock blocking
    is supposed to work.
    
    For non-RT kernels there is no functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e5376079ce19f17db369e9dda7a1c9c49dc85469
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:43 2021 +0200

    sched: Introduce TASK_RTLOCK_WAIT
    
    RT kernels have an extra quirk for try_to_wake_up() to handle task state
    preservation accross blocking on a 'sleeping' spin/rwlock.
    
    For this to function correctly and under all circumstances try_to_wake_up()
    must be able to identify whether the wakeup is lock related or not and
    whether the task is waiting for a lock or not.
    
    The original approach was to use a special wake_flag argument for
    try_to_wake_up() and just use TASK_UNINTERRUPTIBLE for the tasks wait state
    and the try_to_wake_up() state argument.
    
    This works in principle, but due to the fact that try_to_wake_up() cannot
    determine whether the task is waiting for a RT lock wakeup or for a regular
    wakeup it's suboptimal.
    
    RT kernels save the original task state when blocking on a RT lock and
    restore it when the lock has been acquired. Any non lock related wakeup is
    checked against the saved state and if it matches the saved state is set to
    running so that the wakeup is not lost when the state is restored.
    
    While the necessary logic for the wake_flag based solution is trivial the
    downside is that any regular wakeup with TASK_UNINTERRUPTIBLE in the state
    argument set will wake the task despite the fact that it is still blocked
    on the lock. That's not a fatal problem as the lock wait has do deal with
    spurious wakeups anyway, but it introduces unneccesary latencies.
    
    Introduce the TASK_RTLOCK_WAIT state bit which will be set when a task
    blocks on a RT lock.
    
    The lock wakeup will use wake_up_state(TASK_RTLOCK_WAIT) so both the
    waiting state and the wakeup state are distinguishable, which avoids
    spurious wakeups and allows better analysis.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 7b569a8fe497f0487b6fdc81f7ab8342c93faed9
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Jul 6 16:36:43 2021 +0200

    sched: Split out the wakeup state check
    
    RT kernels have a slightly more complicated handling of wakeups due to
    'sleeping' spin/rwlocks. If a task is blocked on such a lock then the
    original state of the task is preserved over the blocking and any regular
    (non lock related) wakeup has to be targeted at the saved state to ensure
    that these wakeups are not lost. Once the task acquired the lock it
    restores the task state from the saved state.
    
    To avoid cluttering try_to_wake_up() with that logic, split the wake up
    state check out into an inline helper and use it at both places where
    task::state is checked against the state argument of try_to_wake_up().
    
    No functional change.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit bee357cbbb92002a51e170be5fd4a38b78ad08cd
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Jul 17 21:41:35 2011 +0200

    debugobjects: Make RT aware
    
    Avoid filling the pool / allocating memory with irqs off().
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e1a2ed90bf8524c10521585661247dba44919c36
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Jul 17 21:56:42 2011 +0200

    trace: Add migrate-disabled counter to tracing output
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit f507f34db3d7dece4994287ee33d14fe4e0ea08d
Author: Grygorii Strashko <Grygorii.Strashko@linaro.org>
Date:   Tue Jul 21 19:43:56 2015 +0300

    pid.h: include atomic.h
    
    This patch fixes build error:
      CC      kernel/pid_namespace.o
    In file included from kernel/pid_namespace.c:11:0:
    include/linux/pid.h: In function 'get_pid':
    include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration]
       atomic_inc(&pid->count);
       ^
    which happens when
     CONFIG_PROVE_LOCKING=n
     CONFIG_DEBUG_SPINLOCK=n
     CONFIG_DEBUG_MUTEXES=n
     CONFIG_DEBUG_LOCK_ALLOC=n
     CONFIG_PID_NS=y
    
    Vanilla gets this via spinlock.h.
    
    Signed-off-by: Grygorii Strashko <Grygorii.Strashko@linaro.org>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 930fe8dd94675d20a7c6619bebf0f83ade04852d
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Oct 28 12:19:57 2013 +0100

    wait.h: include atomic.h
    
    |  CC      init/main.o
    |In file included from include/linux/mmzone.h:9:0,
    |                 from include/linux/gfp.h:4,
    |                 from include/linux/kmod.h:22,
    |                 from include/linux/module.h:13,
    |                 from init/main.c:15:
    |include/linux/wait.h: In function ‘wait_on_atomic_t’:
    |include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration]
    |  if (atomic_read(val) == 0)
    |  ^
    
    This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 276abf48d000079da14ea183b11d9db7be82d8ce
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Jul 26 15:06:10 2018 +0200

    efi: Allow efi=runtime
    
    In case the command line option "efi=noruntime" is default at built-time, the user
    could overwrite its state by `efi=runtime' and allow it again.
    
    Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit cdfc123edc5e774f29e2e1502eebe4b929b88be0
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Jul 26 15:03:16 2018 +0200

    efi: Disable runtime services on RT
    
    Based on meassurements the EFI functions get_variable /
    get_next_variable take up to 2us which looks okay.
    The functions get_time, set_time take around 10ms. Those 10ms are too
    much. Even one ms would be too much.
    Ard mentioned that SetVariable might even trigger larger latencies if
    the firware will erase flash blocks on NOR.
    
    The time-functions are used by efi-rtc and can be triggered during
    runtimed (either via explicit read/write or ntp sync).
    
    The variable write could be used by pstore.
    These functions can be disabled without much of a loss. The poweroff /
    reboot hooks may be provided by PSCI.
    
    Disable EFI's runtime wrappers.
    
    This was observed on "EFI v2.60 by SoftIron Overdrive 1000".
    
    Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 6107960d6569a17a622230eb01092f3b7e68eb2f
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Sat May 27 19:02:06 2017 +0200

    net/core: disable NET_RX_BUSY_POLL on RT
    
    napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire
    sleeping locks with disabled preemption so we would have to work around this
    and add explicit locking for synchronisation against ksoftirqd.
    Without explicit synchronisation a low priority process would "own" the NAPI
    state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no
    preempt_disable() and BH is preemptible on RT).
    In case a network packages arrives then the interrupt handler would set
    NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI
    would be scheduled in again.
    Should a task with RT priority busy poll then it would consume the CPU instead
    allowing tasks with lower priority to run.
    
    The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for
    poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong
    locking context on RT. Should this feature be considered useful on RT systems
    then it could be enabled again with proper locking and synchronisation.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit fdfbb25ebdd3422ec3ee5dcc80dee9431b80700e
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Jul 18 17:03:52 2011 +0200

    sched: Disable CONFIG_RT_GROUP_SCHED on RT
    
    Carsten reported problems when running:
    
      taskset 01 chrt -f 1 sleep 1
    
    from within rc.local on a F15 machine. The task stays running and
    never gets on the run queue because some of the run queues have
    rt_throttled=1 which does not go away. Works nice from a ssh login
    shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8ec5e355c7e2d39d1f915c2b5b690d95ca0ef471
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jul 3 08:44:03 2009 -0500

    mm: Allow only SLUB on RT
    
    Memory allocation disables interrupts as part of the allocation and freeing
    process. For -RT it is important that this section remain short and don't
    depend on the size of the request or an internal state of the memory allocator.
    At the beginning the SLAB memory allocator was adopted for RT's needs and it
    required substantial changes. Later, with the addition of the SLUB memory
    allocator we adopted this one as well and the changes were smaller. More
    important, due to the design of the SLUB allocator it performs better and its
    worst case latency was smaller. In the end only SLUB remained supported.
    
    Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs.
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 853484ed910413f59a6f4260593ba088ebf41a15
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Jul 24 12:11:43 2011 +0200

    kconfig: Disable config options which are not RT compatible
    
    Disable stuff which is known to have issues on RT
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b0873a0cd9c408b8a302010378ad8dce7e7f6679
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Jan 23 14:45:59 2014 +0100

    leds: trigger: disable CPU trigger on -RT
    
    as it triggers:
    |CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141
    |[<c0014aa4>] (unwind_backtrace+0x0/0xf8) from [<c0012788>] (show_stack+0x1c/0x20)
    |[<c0012788>] (show_stack+0x1c/0x20) from [<c043c8dc>] (dump_stack+0x20/0x2c)
    |[<c043c8dc>] (dump_stack+0x20/0x2c) from [<c004c5e8>] (__might_sleep+0x13c/0x170)
    |[<c004c5e8>] (__might_sleep+0x13c/0x170) from [<c043f270>] (__rt_spin_lock+0x28/0x38)
    |[<c043f270>] (__rt_spin_lock+0x28/0x38) from [<c043fa00>] (rt_read_lock+0x68/0x7c)
    |[<c043fa00>] (rt_read_lock+0x68/0x7c) from [<c036cf74>] (led_trigger_event+0x2c/0x5c)
    |[<c036cf74>] (led_trigger_event+0x2c/0x5c) from [<c036e0bc>] (ledtrig_cpu+0x54/0x5c)
    |[<c036e0bc>] (ledtrig_cpu+0x54/0x5c) from [<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c)
    |[<c000ffd8>] (arch_cpu_idle_exit+0x18/0x1c) from [<c00590b8>] (cpu_startup_entry+0xa8/0x234)
    |[<c00590b8>] (cpu_startup_entry+0xa8/0x234) from [<c043b2cc>] (rest_init+0xb8/0xe0)
    |[<c043b2cc>] (rest_init+0xb8/0xe0) from [<c061ebe0>] (start_kernel+0x2c4/0x380)
    
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 34809848f72d84c77c39ad68190ed1a11f573316
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Jul 8 17:14:48 2015 +0200

    jump-label: disable if stop_machine() is used
    
    Some architectures are using stop_machine() while switching the opcode which
    leads to latency spikes.
    The architectures which use stop_machine() atm:
    - ARM stop machine
    - s390 stop machine
    
    The architecures which use other sorcery:
    - MIPS
    - X86
    - powerpc
    - sparc
    - arm64
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [bigeasy: only ARM for now]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit f9bffbde69457558991cbfc1b9f795d948a0a658
Author: Ingo Molnar <mingo@elte.hu>
Date:   Fri Jul 3 08:29:57 2009 -0500

    genirq: Disable irqpoll on -rt
    
    Creates long latencies for no value
    
    Signed-off-by: Ingo Molnar <mingo@elte.hu>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit bae73e9b444c5380dcebdebf5723581d0dae5b84
Author: Josh Cartwright <joshc@ni.com>
Date:   Thu Feb 11 11:54:00 2016 -0600

    genirq: update irq_set_irqchip_state documentation
    
    On -rt kernels, the use of migrate_disable()/migrate_enable() is
    sufficient to guarantee a task isn't moved to another CPU.  Update the
    irq_set_irqchip_state() documentation to reflect this.
    
    Signed-off-by: Josh Cartwright <joshc@ni.com>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit abe17fca7cc1e2916140d51d42e1825d9a3cd053
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Feb 15 18:44:12 2021 +0100

    smp: Wake ksoftirqd on PREEMPT_RT instead do_softirq().
    
    The softirq implementation on PREEMPT_RT does not provide do_softirq().
    The other user of do_softirq() is replaced with a local_bh_disable()
    + enable() around the possible raise-softirq invocation. This can not be
    done here because migration_cpu_stop() is invoked with disabled
    preemption.
    
    Wake the softirq thread on PREEMPT_RT if there are any pending softirqs.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d7a13455f7f22dbe293e6d99eb8e8df4fb86a89f
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Jul 1 17:43:16 2021 +0200

    samples/kfifo: Rename read_lock/write_lock
    
    The variables names read_lock and write_lock can clash with functions used for
    read/writer locks.
    
    Rename read_lock to read_access and write_lock to write_access to avoid a name
    collision.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 554e55b4eee80c084864d7e799aa5ba505916e7f
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Oct 12 17:33:54 2020 +0200

    tcp: Remove superfluous BH-disable around listening_hash
    
    Commit
       9652dc2eb9e40 ("tcp: relax listening_hash operations")
    
    removed the need to disable bottom half while acquiring
    listening_hash.lock. There are still two callers left which disable
    bottom half before the lock is acquired.
    
    Drop local_bh_disable() around __inet_hash() which acquires
    listening_hash->lock, invoke inet_ehash_nolisten() with disabled BH.
    inet_unhash() conditionally acquires listening_hash->lock.
    
    Reported-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/linux-rt-users/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de/
    Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net

commit 2ba152ee3e8d7c312f73ad397b6bc134f62cc766
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Sep 8 07:32:20 2020 +0200

    net: Move lockdep where it belongs
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit ba75f58059e3c47602253bd9937ed39bfc35e5eb
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Aug 14 18:53:34 2020 +0200

    shmem: Use raw_spinlock_t for ->stat_lock
    
    Each CPU has SHMEM_INO_BATCH inodes available in `->ino_batch' which is
    per-CPU. Access here is serialized by disabling preemption. If the pool is
    empty, it gets reloaded from `->next_ino'. Access here is serialized by
    ->stat_lock which is a spinlock_t and can not be acquired with disabled
    preemption.
    One way around it would make per-CPU ino_batch struct containing the inode
    number a local_lock_t.
    Another sollution is to promote ->stat_lock to a raw_spinlock_t. The critical
    sections are short. The mpol_put() should be moved outside of the critical
    section to avoid invoking the destrutor with disabled preemption.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 109e285c64c2309882df143208263d195170d263
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Feb 11 10:40:46 2019 +0100

    mm: workingset: replace IRQ-off check with a lockdep assert.
    
    Commit
    
      68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes")
    
    introduced an IRQ-off check to ensure that a lock is held which also
    disabled interrupts. This does not work the same way on -RT because none
    of the locks, that are held, disable interrupts.
    Replace this check with a lockdep assert which ensures that the lock is
    held.
    
    Cc: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 64d8a21476dee47df55dd3aafc0fd0c0a5d2b98e
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Tue Jul 3 18:19:48 2018 +0200

    cgroup: use irqsave in cgroup_rstat_flush_locked()
    
    All callers of cgroup_rstat_flush_locked() acquire cgroup_rstat_lock
    either with spin_lock_irq() or spin_lock_irqsave().
    cgroup_rstat_flush_locked() itself acquires cgroup_rstat_cpu_lock which
    is a raw_spin_lock. This lock is also acquired in cgroup_rstat_updated()
    in IRQ context and therefore requires _irqsave() locking suffix in
    cgroup_rstat_flush_locked().
    Since there is no difference between spin_lock_t and raw_spin_lock_t
    on !RT lockdep does not complain here. On RT lockdep complains because
    the interrupts were not disabled here and a deadlock is possible.
    
    Acquire the raw_spin_lock_t with disabled interrupts.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8ba34ad86e2432ce7de361d73bb7a54f9c879558
Author: Valentin Schneider <valentin.schneider@arm.com>
Date:   Sun Nov 22 20:19:04 2020 +0000

    notifier: Make atomic_notifiers use raw_spinlock
    
    Booting a recent PREEMPT_RT kernel (v5.10-rc3-rt7-rebase) on my arm64 Juno
    leads to the idle task blocking on an RT sleeping spinlock down some
    notifier path:
    
      [    1.809101] BUG: scheduling while atomic: swapper/5/0/0x00000002
      [    1.809116] Modules linked in:
      [    1.809123] Preemption disabled at:
      [    1.809125] secondary_start_kernel (arch/arm64/kernel/smp.c:227)
      [    1.809146] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G        W         5.10.0-rc3-rt7 #168
      [    1.809153] Hardware name: ARM Juno development board (r0) (DT)
      [    1.809158] Call trace:
      [    1.809160] dump_backtrace (arch/arm64/kernel/stacktrace.c:100 (discriminator 1))
      [    1.809170] show_stack (arch/arm64/kernel/stacktrace.c:198)
      [    1.809178] dump_stack (lib/dump_stack.c:122)
      [    1.809188] __schedule_bug (kernel/sched/core.c:4886)
      [    1.809197] __schedule (./arch/arm64/include/asm/preempt.h:18 kernel/sched/core.c:4913 kernel/sched/core.c:5040)
      [    1.809204] preempt_schedule_lock (kernel/sched/core.c:5365 (discriminator 1))
      [    1.809210] rt_spin_lock_slowlock_locked (kernel/locking/rtmutex.c:1072)
      [    1.809217] rt_spin_lock_slowlock (kernel/locking/rtmutex.c:1110)
      [    1.809224] rt_spin_lock (./include/linux/rcupdate.h:647 kernel/locking/rtmutex.c:1139)
      [    1.809231] atomic_notifier_call_chain_robust (kernel/notifier.c:71 kernel/notifier.c:118 kernel/notifier.c:186)
      [    1.809240] cpu_pm_enter (kernel/cpu_pm.c:39 kernel/cpu_pm.c:93)
      [    1.809249] psci_enter_idle_state (drivers/cpuidle/cpuidle-psci.c:52 drivers/cpuidle/cpuidle-psci.c:129)
      [    1.809258] cpuidle_enter_state (drivers/cpuidle/cpuidle.c:238)
      [    1.809267] cpuidle_enter (drivers/cpuidle/cpuidle.c:353)
      [    1.809275] do_idle (kernel/sched/idle.c:132 kernel/sched/idle.c:213 kernel/sched/idle.c:273)
      [    1.809282] cpu_startup_entry (kernel/sched/idle.c:368 (discriminator 1))
      [    1.809288] secondary_start_kernel (arch/arm64/kernel/smp.c:273)
    
    Two points worth noting:
    
    1) That this is conceptually the same issue as pointed out in:
       313c8c16ee62 ("PM / CPU: replace raw_notifier with atomic_notifier")
    2) Only the _robust() variant of atomic_notifier callchains suffer from
       this
    
    AFAICT only the cpu_pm_notifier_chain really needs to be changed, but
    singling it out would mean introducing a new (truly) non-blocking API. At
    the same time, callers that are fine with any blocking within the call
    chain should use blocking notifiers, so patching up all atomic_notifier's
    doesn't seem *too* crazy to me.
    
    Fixes: 70d932985757 ("notifier: Fix broken error handling pattern")
    Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
    Link: https://lkml.kernel.org/r/20201122201904.30940-1-valentin.schneider@arm.com
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 10bc7873d8ce4ce530bea31f00160d6bcb2f7b04
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Nov 9 23:32:39 2020 +0100

    genirq: Move prio assignment into the newly created thread
    
    With enabled threaded interrupts the nouveau driver reported the
    following:
    | Chain exists of:
    |   &mm->mmap_lock#2 --> &device->mutex --> &cpuset_rwsem
    |
    |  Possible unsafe locking scenario:
    |
    |        CPU0                    CPU1
    |        ----                    ----
    |   lock(&cpuset_rwsem);
    |                                lock(&device->mutex);
    |                                lock(&cpuset_rwsem);
    |   lock(&mm->mmap_lock#2);
    
    The device->mutex is nvkm_device::mutex.
    
    Unblocking the lockchain at `cpuset_rwsem' is probably the easiest thing
    to do.
    Move the priority assignment to the start of the newly created thread.
    
    Fixes: 710da3c8ea7df ("sched/core: Prevent race condition between cpuset and __sched_setscheduler()")
    Reported-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    [bigeasy: Patch description]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lkml.kernel.org/r/a23a826af7c108ea5651e73b8fbae5e653f16e86.camel@gmx.de

commit 35811571bf3a761645f138e5459c2e69b97fc8b6
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Mon Nov 9 21:30:41 2020 +0100

    kthread: Move prio/affinite change into the newly created thread
    
    With enabled threaded interrupts the nouveau driver reported the
    following:
    | Chain exists of:
    |   &mm->mmap_lock#2 --> &device->mutex --> &cpuset_rwsem
    |
    |  Possible unsafe locking scenario:
    |
    |        CPU0                    CPU1
    |        ----                    ----
    |   lock(&cpuset_rwsem);
    |                                lock(&device->mutex);
    |                                lock(&cpuset_rwsem);
    |   lock(&mm->mmap_lock#2);
    
    The device->mutex is nvkm_device::mutex.
    
    Unblocking the lockchain at `cpuset_rwsem' is probably the easiest thing
    to do.
    Move the priority reset to the start of the newly created thread.
    
    Fixes: 710da3c8ea7df ("sched/core: Prevent race condition between cpuset and __sched_setscheduler()")
    Reported-by: Mike Galbraith <efault@gmx.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lkml.kernel.org/r/a23a826af7c108ea5651e73b8fbae5e653f16e86.camel@gmx.de

commit 0585cfb8fbe84f633c0395a1d108500c4618c501
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Jul 2 15:33:20 2021 +0200

    mm, slub: Correct ordering in slab_unlock()
    
    Fold into
        mm, slub: optionally save/restore irqs in slab_[un]lock()/
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 340e7c4136c3712d5c931a57e91100fb73305452
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Sat May 22 01:59:38 2021 +0200

    mm, slub: convert kmem_cpu_slab protection to local_lock
    
    Embed local_lock into struct kmem_cpu_slab and use the irq-safe versions of
    local_lock instead of plain local_irq_save/restore. On !PREEMPT_RT that's
    equivalent, with better lockdep visibility. On PREEMPT_RT that means better
    preemption.
    
    However, the cost on PREEMPT_RT is the loss of lockless fast paths which only
    work with cpu freelist. Those are designed to detect and recover from being
    preempted by other conflicting operations (both fast or slow path), but the
    slow path operations assume they cannot be preempted by a fast path operation,
    which is guaranteed naturally with disabled irqs. With local locks on
    PREEMPT_RT, the fast paths now also need to take the local lock to avoid races.
    
    In the allocation fastpath slab_alloc_node() we can just defer to the slowpath
    __slab_alloc() which also works with cpu freelist, but under the local lock.
    In the free fastpath do_slab_free() we have to add a new local lock protected
    version of freeing to the cpu freelist, as the existing slowpath only works
    with the page freelist.
    
    Also update the comment about locking scheme in SLUB to reflect changes done
    by this series.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 2180da7ea70a0fa7c6cc9fd5350805f87bd2d5a9
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri May 21 14:03:23 2021 +0200

    mm, slub: use migrate_disable() on PREEMPT_RT
    
    We currently use preempt_disable() (directly or via get_cpu_ptr()) to stabilize
    the pointer to kmem_cache_cpu. On PREEMPT_RT this would be incompatible with
    the list_lock spinlock. We can use migrate_disable() instead, but that
    increases overhead on !PREEMPT_RT as it's an unconditional function call even
    though it's ultimately a migrate_disable() there.
    
    In order to get the best available mechanism on both PREEMPT_RT and
    !PREEMPT_RT, introduce private slub_get_cpu_ptr() and slub_put_cpu_ptr()
    wrappers and use them.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 98ac7c83f7611f324cfff622371adb51e6b5ebbe
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri Jun 4 12:03:23 2021 +0200

    mm, slub: make slab_lock() disable irqs with PREEMPT_RT
    
    We need to disable irqs around slab_lock() (a bit spinlock) to make it
    irq-safe. The calls to slab_lock() are nested under spin_lock_irqsave() which
    doesn't disable irqs on PREEMPT_RT, so add explicit disabling with PREEMPT_RT.
    
    We also distinguish cmpxchg_double_slab() where we do the disabling explicitly
    and __cmpxchg_double_slab() for contexts with already disabled irqs.  However
    these context are also typically spin_lock_irqsave() thus insufficient on
    PREEMPT_RT. Thus, change __cmpxchg_double_slab() to be same as
    cmpxchg_double_slab() on PREEMPT_RT.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit dde8c73f2bd04af94cef72c96424d776537170af
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri Jun 4 12:55:55 2021 +0200

    mm, slub: optionally save/restore irqs in slab_[un]lock()/
    
    For PREEMPT_RT we will need to disable irqs for this bit spinlock. As a
    preparation, add a flags parameter, and an internal version that takes
    additional bool parameter to control irq saving/restoring (the flags
    parameter is compile-time unused if the bool is a constant false).
    
    Convert ___cmpxchg_double_slab(), which also comes with the same bool
    parameter, to use the internal version.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit de1f2497acfb0635558cb8cf610592adc7e8c83c
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Thu Jul 16 18:47:50 2020 +0200

    mm: slub: Make object_map_lock a raw_spinlock_t
    
    The variable object_map is protected by object_map_lock. The lock is always
    acquired in debug code and within already atomic context
    
    Make object_map_lock a raw_spinlock_t.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 12a3a78defce0f8ba270c624479d130eb571775b
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Feb 26 17:11:55 2021 +0100

    mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context
    
    flush_all() flushes a specific SLAB cache on each CPU (where the cache
    is present). The deactivate_slab()/__free_slab() invocation happens
    within IPI handler and is problematic for PREEMPT_RT.
    
    The flush operation is not a frequent operation or a hot path. The
    per-CPU flush operation can be moved to within a workqueue.
    
    [vbabka@suse.cz: adapt to new SLUB changes]
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 6e256a70bacc64981c554f3462f0f1fb22f7dd70
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu Jun 3 19:17:42 2021 +0200

    mm, slab: make flush_slab() possible to call with irqs enabled
    
    Currently flush_slab() is always called with disabled IRQs if it's needed, but
    the following patches will change that, so add a parameter to control IRQ
    disabling within the function, which only protects the kmem_cache_cpu
    manipulation and not the call to deactivate_slab() which doesn't need it.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit f66b34cdc3a0b4c01b59d8597bc0cbe36adb2196
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri May 21 01:48:56 2021 +0200

    mm, slub: don't disable irqs in slub_cpu_dead()
    
    slub_cpu_dead() cleans up for an offlined cpu from another cpu and calls only
    functions that are now irq safe, so we don't need to disable irqs anymore.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 15742266f9d30e85fc14035d9491ac1803427546
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri May 21 01:16:54 2021 +0200

    mm, slub: only disable irq with spin_lock in __unfreeze_partials()
    
    __unfreeze_partials() no longer needs to have irqs disabled, except for making
    the spin_lock operations irq-safe, so convert the spin_locks operations and
    remove the separate irq handling.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 02194b557292dbfd34cd029d93f926f2de7abb6c
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu May 20 16:39:51 2021 +0200

    mm, slub: detach percpu partial list in unfreeze_partials() using this_cpu_cmpxchg()
    
    Instead of relying on disabled irqs for atomicity when detaching the percpu
    partial list, we can use this_cpu_cmpxchg() and detach without irqs disabled.
    However, unfreeze_partials() can be also called from another cpu on behalf of
    a cpu that is being offlined, so we need to restructure the code accordingly:
    
    - __unfreeze_partials() is the bulk of unfreeze_partials() that processes the
      detached percpu partial list
    - unfreeze_partials() uses this_cpu_cmpxchg() to detach list from current cpu
    - unfreeze_partials_cpu() is to be called for the offlined cpu so it needs no
      protection, and is called from __flush_cpu_slab()
    - flush_cpu_slab() is for the local cpu thus it needs to call
      unfreeze_partials(). So it can't simply call
      __flush_cpu_slab(smp_processor_id()) anymore and we have to open-code it
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 62047a84e9eb8f87b95635a312f17cd1e3b86454
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu May 20 14:18:12 2021 +0200

    mm, slub: detach whole partial list at once in unfreeze_partials()
    
    Instead of iterating through the live percpu partial list, detach it from the
    kmem_cache_cpu at once. This is simpler and will allow further optimization.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 85fd98f06f02b01196a056d84209d475c10f3c7f
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu May 20 14:01:57 2021 +0200

    mm, slub: discard slabs in unfreeze_partials() without irqs disabled
    
    No need for disabled irqs when discarding slabs, so restore them before
    discarding.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit bfcb75f81573c966b314d7dc56f612313dbe22c6
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Thu May 20 14:00:03 2021 +0200

    mm, slub: move irq control into unfreeze_partials()
    
    unfreeze_partials() can be optimized so that it doesn't need irqs disabled for
    the whole time. As the first step, move irq control into the function and
    remove it from the put_cpu_partial() caller.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e6acdc5f5fc6ef0b23e83eb0f73064d854a7e164
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Wed May 12 14:04:43 2021 +0200

    mm, slub: call deactivate_slab() without disabling irqs
    
    The function is now safe to be called with irqs enabled, so move the calls
    outside of irq disabled sections.
    
    When called from ___slab_alloc() -> flush_slab() we have irqs disabled, so to
    reenable them before deactivate_slab() we need to open-code flush_slab() in
    ___slab_alloc() and reenable irqs after modifying the kmem_cache_cpu fields.
    But that means a IRQ handler meanwhile might have assigned a new page to
    kmem_cache_cpu.page so we have to retry the whole check.
    
    The remaining callers of flush_slab() are the IPI handler which has disabled
    irqs anyway, and slub_cpu_dead() which will be dealt with in the following
    patch.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit fc54ebb41d5d25040b3492426224b52aaa43194d
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Wed May 12 13:59:58 2021 +0200

    mm, slub: make locking in deactivate_slab() irq-safe
    
    dectivate_slab() now no longer touches the kmem_cache_cpu structure, so it will
    be possible to call it with irqs enabled. Just convert the spin_lock calls to
    their irq saving/restoring variants to make it irq-safe.
    
    Note we now have to use cmpxchg_double_slab() for irq-safe slab_lock(), because
    in some situations we don't take the list_lock, which would disable irqs.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 378a8597511dc865bdd387e2045673eecf850e32
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Wed May 12 13:53:34 2021 +0200

    mm, slub: move reset of c->page and freelist out of deactivate_slab()
    
    deactivate_slab() removes the cpu slab by merging the cpu freelist with slab's
    freelist and putting the slab on the proper node's list. It also sets the
    respective kmem_cache_cpu pointers to NULL.
    
    By extracting the kmem_cache_cpu operations from the function, we can make it
    not dependent on disabled irqs.
    
    Also if we return a single free pointer from ___slab_alloc, we no longer have
    to assign kmem_cache_cpu.page before deactivation or care if somebody preempted
    us and assigned a different page to our kmem_cache_cpu in the process.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit a1bedf14b4ef5bd9fb633ee89841cfb7b8f2c432
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue May 11 17:45:26 2021 +0200

    mm, slub: stop disabling irqs around get_partial()
    
    The function get_partial() does not need to have irqs disabled as a whole. It's
    sufficient to convert spin_lock operations to their irq saving/restoring
    versions.
    
    As a result, it's now possible to reach the page allocator from the slab
    allocator without disabling and re-enabling interrupts on the way.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit e7fa6bb0ab58c69fb6cf31846fdf0129caf7e7ca
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue May 11 16:56:09 2021 +0200

    mm, slub: check new pages with restored irqs
    
    Building on top of the previous patch, re-enable irqs before checking new
    pages. alloc_debug_processing() is now called with enabled irqs so we need to
    remove VM_BUG_ON(!irqs_disabled()); in check_slab() - there doesn't seem to be
    a need for it anyway.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 843f16905f6683346e6c962c4ed39f6fbdf313e8
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue May 11 16:37:51 2021 +0200

    mm, slub: validate slab from partial list or page allocator before making it cpu slab
    
    When we obtain a new slab page from node partial list or page allocator, we
    assign it to kmem_cache_cpu, perform some checks, and if they fail, we undo
    the assignment.
    
    In order to allow doing the checks without irq disabled, restructure the code
    so that the checks are done first, and kmem_cache_cpu.page assignment only
    after they pass.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 033708e4faa880c775131def8c6242815bd23e4d
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Mon May 10 16:30:01 2021 +0200

    mm, slub: restore irqs around calling new_slab()
    
    allocate_slab() currently re-enables irqs before calling to the page allocator.
    It depends on gfpflags_allow_blocking() to determine if it's safe to do so.
    Now we can instead simply restore irq before calling it through new_slab().
    The other caller early_kmem_cache_node_alloc() is unaffected by this.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit aa890ddaca1d27ceba0db33e31c829c006cff07e
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Mon May 10 13:56:17 2021 +0200

    mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc()
    
    Continue reducing the irq disabled scope. Check for per-cpu partial slabs with
    first with irqs enabled and then recheck with irqs disabled before grabbing
    the slab page. Mostly preparatory for the following patches.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8c1d368c71c0287d81afbe7a8b1baf28d8f72b1b
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Sat May 8 02:28:02 2021 +0200

    mm, slub: do initial checks in ___slab_alloc() with irqs enabled
    
    As another step of shortening irq disabled sections in ___slab_alloc(), delay
    disabling irqs until we pass the initial checks if there is a cached percpu
    slab and it's suitable for our allocation.
    
    Now we have to recheck c->page after actually disabling irqs as an allocation
    in irq handler might have replaced it.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>

commit 12c69bab1ece4076e397f2fc740078da0c3b1238
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri May 7 19:32:31 2021 +0200

    mm, slub: move disabling/enabling irqs to ___slab_alloc()
    
    Currently __slab_alloc() disables irqs around the whole ___slab_alloc().  This
    includes cases where this is not needed, such as when the allocation ends up in
    the page allocator and has to awkwardly enable irqs back based on gfp flags.
    Also the whole kmem_cache_alloc_bulk() is executed with irqs disabled even when
    it hits the __slab_alloc() slow path, and long periods with disabled interrupts
    are undesirable.
    
    As a first step towards reducing irq disabled periods, move irq handling into
    ___slab_alloc(). Callers will instead prevent the s->cpu_slab percpu pointer
    from becoming invalid via get_cpu_ptr(), thus preempt_disable(). This does not
    protect against modification by an irq handler, which is still done by disabled
    irq for most of ___slab_alloc(). As a small immediate benefit,
    slab_out_of_memory() from ___slab_alloc() is now called with irqs enabled.
    
    kmem_cache_alloc_bulk() disables irqs for its fastpath and then re-enables them
    before calling ___slab_alloc(), which then disables them at its discretion. The
    whole kmem_cache_alloc_bulk() operation also disables preemption.
    
    When  ___slab_alloc() calls new_slab() to allocate a new page, re-enable
    preemption, because new_slab() will re-enable interrupts in contexts that allow
    blocking (this will be improved by later patches).
    
    The patch itself will thus increase overhead a bit due to disabled preemption
    (on configs where it matters) and increased disabling/enabling irqs in
    kmem_cache_alloc_bulk(), but that will be gradually improved in the following
    patches.
    
    Note in __slab_alloc() we need to change the #ifdef CONFIG_PREEMPT guard to
    CONFIG_PREEMPT_COUNT to make sure preempt disable/enable is properly paired in
    all configurations. On configs without involuntary preemption and debugging
    the re-read of kmem_cache_cpu pointer is still compiled out as it was before.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 78ed20c1e4aab33e8ec0f7422442765d2804a278
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue May 18 02:01:39 2021 +0200

    mm, slub: simplify kmem_cache_cpu and tid setup
    
    In slab_alloc_node() and do_slab_free() fastpaths we need to guarantee that
    our kmem_cache_cpu pointer is from the same cpu as the tid value. Currently
    that's done by reading the tid first using this_cpu_read(), then the
    kmem_cache_cpu pointer and verifying we read the same tid using the pointer and
    plain READ_ONCE().
    
    This can be simplified to just fetching kmem_cache_cpu pointer and then reading
    tid using the pointer. That guarantees they are from the same cpu. We don't
    need to read the tid using this_cpu_read() because the value will be validated
    by this_cpu_cmpxchg_double(), making sure we are on the correct cpu and the
    freelist didn't change by anyone preempting us since reading the tid.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>

commit ae3b1f17e84ed59a61b35f12d0fb6d8650a629c1
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue May 11 18:25:09 2021 +0200

    mm, slub: restructure new page checks in ___slab_alloc()
    
    When we allocate slab object from a newly acquired page (from node's partial
    list or page allocator), we usually also retain the page as a new percpu slab.
    There are two exceptions - when pfmemalloc status of the page doesn't match our
    gfp flags, or when the cache has debugging enabled.
    
    The current code for these decisions is not easy to follow, so restructure it
    and add comments. The new structure will also help with the following changes.
    No functional change.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>

commit d071b8eed93b9a884b268668f1eb75c06804b6a8
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue May 11 14:05:22 2021 +0200

    mm, slub: return slab page from get_partial() and set c->page afterwards
    
    The function get_partial() finds a suitable page on a partial list, acquires
    and returns its freelist and assigns the page pointer to kmem_cache_cpu.
    In later patch we will need more control over the kmem_cache_cpu.page
    assignment, so instead of passing a kmem_cache_cpu pointer, pass a pointer to a
    pointer to a page that get_partial() can fill and the caller can assign the
    kmem_cache_cpu.page pointer. No functional change as all of this still happens
    with disabled IRQs.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 1b92ed69ba9349f97defd6f62788fa8949bcfa2f
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue May 11 13:01:34 2021 +0200

    mm, slub: dissolve new_slab_objects() into ___slab_alloc()
    
    The later patches will need more fine grained control over individual actions
    in ___slab_alloc(), the only caller of new_slab_objects(), so dissolve it
    there. This is a preparatory step with no functional change.
    
    The only minor change is moving WARN_ON_ONCE() for using a constructor together
    with __GFP_ZERO to new_slab(), which makes it somewhat less frequent, but still
    able to catch a development change introducing a systematic misuse.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Christoph Lameter <cl@linux.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>

commit 3c7b04f4ae2506ce43d7a6a0add926578616a583
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue May 11 12:45:48 2021 +0200

    mm, slub: extract get_partial() from new_slab_objects()
    
    The later patches will need more fine grained control over individual actions
    in ___slab_alloc(), the only caller of new_slab_objects(), so this is a first
    preparatory step with no functional change.
    
    This adds a goto label that appears unnecessary at this point, but will be
    useful for later changes.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Christoph Lameter <cl@linux.com>

commit 49dde93b06d1b735f768dbe9c8a1209adfed6964
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri Jun 4 12:16:14 2021 +0200

    mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab()
    
    These functions differ only in irq disabling in the slow path. We can create a
    common function with an extra bool parameter to control the irq disabling.
    As the functions are inline and the parameter compile-time constant, there
    will be no runtime overhead due to this change.
    
    Also change the DEBUG_VM based irqs disable assert to the more standard
    lockdep_assert based one.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 26d89006c6617445dfabec3ea1ae441f3a1e09d6
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Tue Jun 8 01:19:03 2021 +0200

    mm, slub: remove redundant unfreeze_partials() from put_cpu_partial()
    
    Commit d6e0b7fa1186 ("slub: make dead caches discard free slabs immediately")
    introduced cpu partial flushing for kmemcg caches, based on setting the target
    cpu_partial to 0 and adding a flushing check in put_cpu_partial().
    This code that sets cpu_partial to 0 was later moved by c9fc586403e7 ("slab:
    introduce __kmemcg_cache_deactivate()") and ultimately removed by 9855609bde03
    ("mm: memcg/slab: use a single set of kmem_caches for all accounted
    allocations"). However the check and flush in put_cpu_partial() was never
    removed, although it's effectively a dead code. So this patch removes it.
    
    Note that d6e0b7fa1186 also added preempt_disable()/enable() to
    unfreeze_partials() which could be thus also considered unnecessary. But
    further patches will rely on it, so keep it.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8f9b6e2f13011d74a46290b0ebc3326e72066441
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri May 21 01:25:06 2021 +0200

    mm, slub: don't disable irq for debug_check_no_locks_freed()
    
    In slab_free_hook() we disable irqs around the debug_check_no_locks_freed()
    call, which is unnecessary, as irqs are already being disabled inside the call.
    This seems to be leftover from the past where there were more calls inside the
    irq disabled sections. Remove the irq disable/enable operations.
    
    Mel noted:
    > Looks like it was needed for kmemcheck which went away back in 4.15
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>

commit bec329bbe49dbbc27fd4eccfdf837f5fe81a3ece
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Sun May 23 01:37:07 2021 +0200

    mm, slub: allocate private object map for validate_slab_cache()
    
    validate_slab_cache() is called either to handle a sysfs write, or from a
    self-test context. In both situations it's straightforward to preallocate a
    private object bitmap instead of grabbing the shared static one meant for
    critical sections, so let's do that.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Christoph Lameter <cl@linux.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>

commit cea6298794478c2574ffc948de060c5a835f6563
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Sun May 23 01:28:37 2021 +0200

    mm, slub: allocate private object map for sysfs listings
    
    Slub has a static spinlock protected bitmap for marking which objects are on
    freelist when it wants to list them, for situations where dynamically
    allocating such map can lead to recursion or locking issues, and on-stack
    bitmap would be too large.
    
    The handlers of sysfs files alloc_calls and free_calls also currently use this
    shared bitmap, but their syscall context makes it straightforward to allocate a
    private map before entering locked sections, so switch these processing paths
    to use a private bitmap.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Christoph Lameter <cl@linux.com>
    Acked-by: Mel Gorman <mgorman@techsingularity.net>

commit 0492d5407c2e9a3d6b2fc85e19b91f6089185b89
Author: Vlastimil Babka <vbabka@suse.cz>
Date:   Fri May 28 14:32:10 2021 +0200

    mm, slub: don't call flush_all() from list_locations()
    
    list_locations() can only be called on caches with SLAB_STORE_USER flag and as
    with all slub debugging flags, such caches avoid cpu or percpu partial slabs
    altogether, so there's nothing to flush.
    
    Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 72f1ab0179ac5cc017511f9e0322fc30df403d24
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Fri May 14 15:46:22 2021 +0100

    mm/page_alloc: Split per cpu page lists and zone stats -fix
    
    mm/ is not W=1 clean for allnoconfig but the patch "mm/page_alloc: Split
    per cpu page lists and zone stats" makes it worse with the following
    warning
    
      mm/vmstat.c: In function âzoneinfo_show_printâ:
      mm/vmstat.c:1698:28: warning: variable âpzstatsâ set but not used [-Wunused-but-set-variable]
         struct per_cpu_zonestat *pzstats;
                                  ^~~~~~~
    
    This is a fix to the mmotm patch
    mm-page_alloc-split-per-cpu-page-lists-and-zone-stats.patch.
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 7d4d69cd8788af7f2dfea018a3407c88c2416a82
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:58 2021 +0100

    mm/page_alloc: Update PGFREE outside the zone lock in __free_pages_ok
    
    VM events do not need explicit protection by disabling IRQs so
    update the counter with IRQs enabled in __free_pages_ok.
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8ec908f71c8d16d0bf9cde514c762c83e5ff3f41
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:57 2021 +0100

    mm/page_alloc: Avoid conflating IRQs disabled with zone->lock
    
    Historically when freeing pages, free_one_page() assumed that callers
    had IRQs disabled and the zone->lock could be acquired with spin_lock().
    This confuses the scope of what local_lock_irq is protecting and what
    zone->lock is protecting in free_unref_page_list in particular.
    
    This patch uses spin_lock_irqsave() for the zone->lock in
    free_one_page() instead of relying on callers to have disabled
    IRQs. free_unref_page_commit() is changed to only deal with PCP pages
    protected by the local lock. free_unref_page_list() then first frees
    isolated pages to the buddy lists with free_one_page() and frees the rest
    of the pages to the PCP via free_unref_page_commit(). The end result
    is that free_one_page() is no longer depending on side-effects of
    local_lock to be correct.
    
    Note that this may incur a performance penalty while memory hot-remove
    is running but that is not a common operation.
    
    [lkp@intel.com: Ensure CMA pages get addded to correct pcp list]
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b6ff3966dcbfa74c9633e1a8c470a244414986f2
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:56 2021 +0100

    mm/page_alloc: Explicitly acquire the zone lock in __free_pages_ok
    
    __free_pages_ok() disables IRQs before calling a common helper
    free_one_page() that acquires the zone lock. This is not safe according
    to Documentation/locking/locktypes.rst and in this context, IRQ disabling
    is not protecting a per_cpu_pages structure either or a local_lock would
    be used.
    
    This patch explicitly acquires the lock with spin_lock_irqsave instead of
    relying on a helper. This removes the last instance of local_irq_save()
    in page_alloc.c.
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 16e165bea08e613e5385ce3b5c84ce01c0c809ea
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:55 2021 +0100

    mm/page_alloc: Reduce duration that IRQs are disabled for VM counters
    
    IRQs are left disabled for the zone and node VM event counters. This is
    unnecessary as the affected counters are allowed to race for preemmption
    and IRQs.
    
    This patch reduces the scope of IRQs being disabled
    via local_[lock|unlock]_irq on !PREEMPT_RT kernels. One
    __mod_zone_freepage_state is still called with IRQs disabled. While this
    could be moved out, it's not free on all architectures as some require
    IRQs to be disabled for mod_zone_page_state on !PREEMPT_RT kernels.
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit c7285ff94096c60cf03f31f6057b4af38dda92cf
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:54 2021 +0100

    mm/page_alloc: Batch the accounting updates in the bulk allocator
    
    Now that the zone_statistics are simple counters that do not require
    special protection, the bulk allocator accounting updates can be batch
    updated without adding too much complexity with protected RMW updates or
    using xchg.
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 069f3cf439ee2eeddd18ec134201c111de95f31e
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:53 2021 +0100

    mm/vmstat: Inline NUMA event counter updates
    
    __count_numa_event is small enough to be treated similarly to
    __count_vm_event so inline it.
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 39642efb7daa3a5f640bbb852bb981a709d66917
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:52 2021 +0100

    mm/vmstat: Convert NUMA statistics to basic NUMA counters
    
    NUMA statistics are maintained on the zone level for hits, misses, foreign
    etc but nothing relies on them being perfectly accurate for functional
    correctness. The counters are used by userspace to get a general overview
    of a workloads NUMA behaviour but the page allocator incurs a high cost to
    maintain perfect accuracy similar to what is required for a vmstat like
    NR_FREE_PAGES. There even is a sysctl vm.numa_stat to allow userspace to
    turn off the collection of NUMA statistics like NUMA_HIT.
    
    This patch converts NUMA_HIT and friends to be NUMA events with similar
    accuracy to VM events. There is a possibility that slight errors will be
    introduced but the overall trend as seen by userspace will be similar.
    The counters are no longer updated from vmstat_refresh context as it is
    unnecessary overhead for counters that may never be read by userspace.
    Note that counters could be maintained at the node level to save space
    but it would have a user-visible impact due to /proc/zoneinfo.
    
    [lkp@intel.com: Fix misplaced closing brace for !CONFIG_NUMA]
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 7e057409dc6848aa4ca5be8d922fba948742ad7b
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:51 2021 +0100

    mm/page_alloc: Convert per-cpu list protection to local_lock
    
    There is a lack of clarity of what exactly local_irq_save/local_irq_restore
    protects in page_alloc.c . It conflates the protection of per-cpu page
    allocation structures with per-cpu vmstat deltas.
    
    This patch protects the PCP structure using local_lock which for most
    configurations is identical to IRQ enabling/disabling. The scope of the
    lock is still wider than it should be but this is decreased later.
    
    It is possible for the local_lock to be embedded safely within struct
    per_cpu_pages but it adds complexity to free_unref_page_list.
    
    [lkp@intel.com: Make pagesets static]
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b40b27ff9be924922be1a2ae6ed885e9412d22d7
Author: Mel Gorman <mgorman@techsingularity.net>
Date:   Wed May 12 10:54:50 2021 +0100

    mm/page_alloc: Split per cpu page lists and zone stats
    
    The per-cpu page allocator lists and the per-cpu vmstat deltas are stored
    in the same struct per_cpu_pages even though vmstats have no direct impact
    on the per-cpu page lists. This is inconsistent because the vmstats for a
    node are stored on a dedicated structure. The bigger issue is that the
    per_cpu_pages structure is not cache-aligned and stat updates either
    cache conflict with adjacent per-cpu lists incurring a runtime cost or
    padding is required incurring a memory cost.
    
    This patch splits the per-cpu pagelists and the vmstat deltas into separate
    structures. It's mostly a mechanical conversion but some variable renaming
    is done to clearly distinguish the per-cpu pages structure (pcp) from
    the vmstats (pzstats).
    
    Superficially, this appears to increase the size of the per_cpu_pages
    structure but the movement of expire fills a structure hole so there is
    no impact overall.
    
    [lkp@intel.com: Check struct per_cpu_zonestat has a non-zero size]
    [vbabka@suse.cz: Init zone->per_cpu_zonestats properly]
    
    Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit d36c3ebdd16546d7467e82f780ae02b0e605af84
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Dec 6 22:40:07 2020 +0100

    timers: Move clearing of base::timer_running under base::lock
    
    syzbot reported KCSAN data races vs. timer_base::timer_running being set to
    NULL without holding base::lock in expire_timers().
    
    This looks innocent and most reads are clearly not problematic but for a
    non-RT kernel it's completely irrelevant whether the store happens before
    or after taking the lock. For an RT kernel moving the store under the lock
    requires an extra unlock/lock pair in the case that there is a waiter for
    the timer. But that's not the end of the world and definitely not worth the
    trouble of adding boatloads of comments and annotations to the code. Famous
    last words...
    
    Reported-by: syzbot+aa7c2385d46c5eba0b89@syzkaller.appspotmail.com
    Reported-by: syzbot+abea4558531bae1ba9fe@syzkaller.appspotmail.com
    Link: https://lkml.kernel.org/r/87lfea7gw8.fsf@nanos.tec.linutronix.de
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Cc: stable-rt@vger.kernel.org

commit 1d1164a4e1927868029b10ce5e854ca133d9766a
Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date:   Fri Oct 30 13:59:06 2020 +0100

    highmem: Don't disable preemption on RT in kmap_atomic()
    
    Disabling preemption makes it impossible to acquire sleeping locks within
    kmap_atomic() section.
    For PREEMPT_RT it is sufficient to disable migration.
    
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 63cf1e4b564a46ec7bee5571cff518c70355dbdf
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:10 2020 +0106

    printk: add pr_flush()
    
    Provide a function to allow waiting for console printers to catch
    up to the latest logged message.
    
    Use pr_flush() to give console printers a chance to finish in
    critical situations if no atomic console is available. For now
    pr_flush() is only used in the most common error paths:
    panic(), print_oops_end_marker(), report_bug(), kmsg_dump().
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b41f91f573cd9c671efe8698efcaac2b99af0ea0
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:09 2020 +0106

    printk: add console handover
    
    If earlyprintk is used, a boot console will print directly to the
    console immediately. The boot console will unregister itself as soon
    as a non-boot console registers. However, the non-boot console does
    not begin printing until its kthread has started. Since this happens
    much later, there is a long pause in the console output. If the
    ringbuffer is small, messages could even be dropped during the
    pause.
    
    Add a new CON_HANDOVER console flag to be used internally by printk
    in order to track which non-boot console took over from a boot
    console. If handover consoles have implemented write_atomic(), they
    are allowed to print directly to the console until their kthread can
    take over.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 4a181ae05c92b849ee813966829073ea189f8749
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:08 2020 +0106

    printk: remove deferred printing
    
    Since printing occurs either atomically or from the printing
    kthread, there is no need for any deferring or tracking possible
    recursion paths. Remove all printk context tracking.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit c4049cfc8a0327c22d56855882d8e2cffd6d20fa
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:07 2020 +0106

    printk: move console printing to kthreads
    
    Create a kthread for each console to perform console printing. Now
    all console printing is fully asynchronous except for the boot
    console and when the kernel enters sync mode (and there are atomic
    consoles available).
    
    The console_lock() and console_unlock() functions now only do what
    their name says... locking and unlocking of the console.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 4b788a578cc2be37f1761f07407a9d920ecb0671
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:06 2020 +0106

    printk: introduce kernel sync mode
    
    When the kernel performs an OOPS, enter into "sync mode":
    
    - only atomic consoles (write_atomic() callback) will print
    - printing occurs within vprintk_store() instead of console_unlock()
    
    CONSOLE_LOG_MAX is moved to printk.h to support the per-console
    buffer used in sync mode.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 7995ace9ab04969b9d5577e5fd74d77765c7d917
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:05 2020 +0106

    printk: use seqcount_latch for console_seq
    
    In preparation for atomic printing, change @console_seq to use
    seqcount_latch so that it can be read without requiring @console_sem.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 19aa624068300cc91cb3c4e342def5e0ab0e40d4
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:04 2020 +0106

    printk: combine boot_delay_msec() into printk_delay()
    
    boot_delay_msec() is always called immediately before printk_delay()
    so just combine the two.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 109255d49bc349bf222a231045ca6464e9dfe248
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:03 2020 +0106

    printk: relocate printk_delay() and vprintk_default()
    
    Move printk_delay() and vprintk_default() "as is" further up so that
    they can be used by new functions in an upcoming commit.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit b94b12794bb4dc84d23705ea20d1435ce72db5da
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:02 2020 +0106

    serial: 8250: implement write_atomic
    
    Implement a non-sleeping NMI-safe write_atomic() console function in
    order to support emergency console printing.
    
    Since interrupts need to be disabled during transmit, all usage of
    the IER register is wrapped with access functions that use the
    console_atomic_lock() function to synchronize register access while
    tracking the state of the interrupts. This is necessary because
    write_atomic() can be called from an NMI context that has preempted
    write_atomic().
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8d4fe695dbb0dfdf8b0699a8ef7c1c3402fdd823
Author: John Ogness <john.ogness@linutronix.de>
Date:   Fri Mar 19 14:57:31 2021 +0100

    kdb: only use atomic consoles for output mirroring
    
    Currently kdb uses the @oops_in_progress hack to mirror kdb output
    to all active consoles from NMI context. Ignoring locks is unsafe.
    Now that an NMI-safe atomic interfaces is available for consoles,
    use that interface to mirror kdb output.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 735eda8e5ceb2b6716285c005e876bff2df979cc
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:01 2020 +0106

    console: add write_atomic interface
    
    Add a write_atomic() callback to the console. This is an optional
    function for console drivers. The function must be atomic (including
    NMI safe) for writing to the console.
    
    Console drivers must still implement the write() callback. The
    write_atomic() callback will only be used in special situations,
    such as when the kernel panics.
    
    Creating an NMI safe write_atomic() that must synchronize with
    write() requires a careful implementation of the console driver. To
    aid with the implementation, a set of console_atomic_*() functions
    are provided:
    
        void console_atomic_lock(unsigned int *flags);
        void console_atomic_unlock(unsigned int flags);
    
    These functions synchronize using a processor-reentrant spinlock
    (called a cpulock).
    
    kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active).
    This will conflict with the printk cpulock. Therefore, a CPU must
    ensure that it is not holding the printk cpulock when calling
    kgdb_cpu_enter(). If it is, it must allow its printk context to
    complete first.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 8c1c98157afc93269bd28db78663e40fdf5ed138
Author: John Ogness <john.ogness@linutronix.de>
Date:   Thu Feb 18 17:37:41 2021 +0100

    printk: convert @syslog_lock to spin_lock
    
    @syslog_log was a raw_spin_lock to simplify the transition of
    removing @logbuf_lock and the safe buffers. With that transition
    complete, @syslog_log can become a spin_lock.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 114233f8744cb4911d536f9f118f66433f97b2fe
Author: John Ogness <john.ogness@linutronix.de>
Date:   Mon Nov 30 01:42:00 2020 +0106

    printk: remove safe buffers
    
    With @logbuf_lock removed, the high level printk functions for
    storing messages are lockless. Messages can be stored from any
    context, so there is no need for the NMI and safe buffers anymore.
    Remove the NMI and safe buffers.
    
    Although the safe buffers are removed, the NMI and safe context
    tracking is still in place. In these contexts, store the message
    immediately but still use irq_work to defer the console printing.
    
    Since printk recursion tracking is in place, safe context tracking
    for most of printk is not needed. Remove it. Only safe context
    tracking relating to the console lock is left in place. This is
    because the console lock is needed for the actual printing.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

commit 2baa48303c4513ec77b8429035a4b0d5c7701408
Author: John Ogness <john.ogness@linutronix.de>
Date:   Fri Dec 11 00:55:25 2020 +0106

    printk: track/limit recursion
    
    Track printk() recursion and limit it to 3 levels per-CPU and per-context.
    
    Signed-off-by: John Ogness <john.ogness@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>