commit 27ae60f80e50b495fb1b5d292c2f09bd74fd65c7 Author: Alexandre Frade Date: Wed Dec 15 17:08:52 2021 +0000 Linux 5.15.8-rt23-xanmod1 Signed-off-by: Alexandre Frade commit 544480a40f4e2547cc83197292bf3f309b3cb5fe Author: Alexandre Frade Date: Wed Dec 15 14:22:01 2021 +0000 PREEMPT_RT: char/lrng: Rebase to kernel v5.15-rt Signed-off-by: Alexandre Frade commit 62ecc3f53b7affc1630a0133bdcb4a5b7efb5b3f Author: Yu Zhao Date: Wed Nov 10 21:15:01 2021 -0700 PREEMPT_RT: mm: x86, arm64: add arch_has_hw_pte_young() Some architectures automatically set the accessed bit in PTEs, e.g., x86 and arm64 v8.2. On architectures that do not have this capability, clearing the accessed bit in a PTE triggers a page fault following the TLB miss of this PTE. Being aware of this capability can help make better decisions, i.e., whether to limit the size of each batch of PTEs and the burst of batches when clearing the accessed bit. Signed-off-by: Yu Zhao Tested-by: Konstantin Kharlamov Rebased-by: Alexandre Frade Signed-off-by: Alexandre Frade commit 4f4b15b34f5f706056c16c8ed465dafd7d14d1a5 Merge: bcde0eb91225 098a6d8590a9 Author: Alexandre Frade Date: Wed Dec 15 14:09:05 2021 +0000 Merge tag 'v5.15.7-rt23' into 5.15-rt Linux 5.15.7-rt23 commit bcde0eb9122589050f594c8f31f98b30b5872ff2 Author: Alexandre Frade Date: Wed Dec 15 14:08:23 2021 +0000 Revert "mm: x86, arm64: add arch_has_hw_pte_young()" This reverts commit b480620429adefe10ea12cca0455cd78201f132a. commit d704d8831b8fe89c333c14569fed732a41e02ba2 Author: Alexandre Frade Date: Tue Dec 14 17:30:14 2021 +0000 Linux 5.15.8-xanmod1 Signed-off-by: Alexandre Frade commit 098a6d8590a95d4fc44c3a06e3e40023e2cdd4ca Author: Clark Williams Date: Fri Dec 10 15:41:09 2021 -0600 Linux 5.15.7-rt23 Signed-off-by: Clark Williams commit 18358ca1962dcb97ef41c49bda7325fd583991cd Merge: d7ca44fecd09 4e8c680af6d5 Author: Clark Williams Date: Fri Dec 10 14:59:57 2021 -0600 Merge tag 'v5.15.7' into v5.15-rt This is the 5.15.7 stable release commit d7ca44fecd09d3dab13b7d67657aab62e0c5fd52 Merge: 5d8b36c7ec2b a2547651bc89 Author: Clark Williams Date: Fri Dec 10 14:58:52 2021 -0600 Merge tag 'v5.15.6' into v5.15-rt This is the 5.15.6 stable release commit 5d8b36c7ec2b348b4977a7287f58ae461f57c392 Author: Sebastian Andrzej Siewior Date: Mon Nov 29 10:34:07 2021 +0100 v5.15.5-rt22 Signed-off-by: Sebastian Andrzej Siewior commit 836bce175cd19ef79b236b4eb19fdf9ef06ffe97 Merge: 2534ec53a47a f00712e27083 Author: Sebastian Andrzej Siewior Date: Mon Nov 29 10:33:54 2021 +0100 Merge tag 'v5.15.5' into linux-5.15.y-rt This is the 5.15.5 stable release commit 2534ec53a47a4b1ee5985534c24652b5c62abe62 Author: Sebastian Andrzej Siewior Date: Fri Nov 19 09:05:28 2021 +0100 v5.15.3-rt21 Signed-off-by: Sebastian Andrzej Siewior commit 1f718bd22b90ab6baa22028386bc32bea2e4e034 Merge: 0a3c9296d2f8 3b17187f5ca1 Author: Sebastian Andrzej Siewior Date: Fri Nov 19 09:05:16 2021 +0100 Merge tag 'v5.15.3' into linux-5.15.y-rt This is the 5.15.3 stable release commit 0a3c9296d2f8d1c2511ccd3d020297f3b4c9f618 Author: Sebastian Andrzej Siewior Date: Thu Nov 18 16:00:54 2021 +0100 v5.15.2-rt20 Signed-off-by: Sebastian Andrzej Siewior commit 24b9dabed35df008f243f7c0dbdcc8b85c922528 Author: Arnd Bergmann Date: Tue Oct 26 12:07:11 2021 +0200 net: sched: gred: dynamically allocate tc_gred_qopt_offload The tc_gred_qopt_offload structure has grown too big to be on the stack for 32-bit architectures after recent changes. net/sched/sch_gred.c:903:13: error: stack frame size (1180) exceeds limit (1024) in 'gred_destroy' [-Werror,-Wframe-larger-than] net/sched/sch_gred.c:310:13: error: stack frame size (1212) exceeds limit (1024) in 'gred_offload' [-Werror,-Wframe-larger-than] Use dynamic allocation per qdisc to avoid this. Fixes: 50dc9a8572aa ("net: sched: Merge Qdisc::bstats and Qdisc::cpu_bstats data types") Fixes: 67c9e6270f30 ("net: sched: Protect Qdisc::bstats with u64_stats") Suggested-by: Jakub Kicinski Signed-off-by: Arnd Bergmann Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20211026100711.nalhttf6mbe6sudx@linutronix.de Signed-off-by: Jakub Kicinski commit 6cf712a67c999fb266381813508153770533785f Author: Sebastian Andrzej Siewior Date: Fri Nov 12 18:53:03 2021 +0100 v5.15.2-rt19 Signed-off-by: Sebastian Andrzej Siewior commit 891add4322531a3f44f328682ce7c7ee65071feb Author: Sebastian Andrzej Siewior Date: Fri Nov 12 18:52:11 2021 +0100 mm/vmalloc: Remove unused `cpu' variable. The `cpu' variable is assigned but not used. Remove it. Reported-by: kernel test robot Signed-off-by: Sebastian Andrzej Siewior commit a8e529408ac3fe6a792aec0e4480503c2f96b409 Author: Sebastian Andrzej Siewior Date: Fri Nov 12 18:51:48 2021 +0100 v5.15.2-rt18 Signed-off-by: Sebastian Andrzej Siewior commit db6e5ffb3142598a2cb6e51818c01c45a4ccd078 Merge: 446caa996eaa 7cc36c3e14ae Author: Sebastian Andrzej Siewior Date: Fri Nov 12 18:51:33 2021 +0100 Merge tag 'v5.15.2' into linux-5.15.y-rt This is the 5.15.2 stable release Signed-off-by: Sebastian Andrzej Siewior commit 446caa996eaa06721ba441645f5c789ee480bca7 Author: Sebastian Andrzej Siewior Date: Tue Nov 2 11:55:36 2021 +0100 v5.15-rt17 Signed-off-by: Sebastian Andrzej Siewior commit 3aa5519557c307d7ea5dd0b2f962909101373069 Author: Sebastian Andrzej Siewior Date: Tue Nov 2 11:54:34 2021 +0100 preempt: Remove preempt_disable_rt(). Remove preempt_disable_rt() and its counterpart because there are no more users of it. Signed-off-by: Sebastian Andrzej Siewior commit 18e08ba517e91246157c587e04ffc36b01ff84c2 Author: Sebastian Andrzej Siewior Date: Tue Nov 2 11:53:28 2021 +0100 fs/dcache: disable preemption on i_dir_seq's write side. This is an update of the original patch. It reverts the __i_dir_seq rename and inlines preempt_enable_rt(). Signed-off-by: Sebastian Andrzej Siewior commit 514342eb43a760575d6d9a366506a41ab7ec4888 Author: Sebastian Andrzej Siewior Date: Tue Nov 2 11:52:05 2021 +0100 fscache: Use only one fscache_object_cong_wait. This is an update of the original patch, removing put_cpu_var() which was overseen in the initial patch. Signed-off-by: Sebastian Andrzej Siewior commit 895496e11e7d37af1e200546e2ce826ac56b363e Author: Sebastian Andrzej Siewior Date: Tue Nov 2 09:38:54 2021 +0100 v5.15-rt16 Signed-off-by: Sebastian Andrzej Siewior commit 7811d3873b046e0e73798ae52360751b147fbb55 Merge: b83089a173c2 8bb7eca972ad Author: Sebastian Andrzej Siewior Date: Tue Nov 2 09:38:32 2021 +0100 Merge tag 'v5.15' into linux-5.15.y-rt Linux 5.15 Signed-off-by: Sebastian Andrzej Siewior commit b83089a173c2f6415f9e3a93d341ecce30b515df Author: Sebastian Andrzej Siewior Date: Fri Oct 29 10:13:43 2021 +0200 v5.15-rc7-rt15 Signed-off-by: Sebastian Andrzej Siewior commit 24e584a65417e5cae4900e5ffa6710a8006730bf Author: Sebastian Andrzej Siewior Date: Fri Oct 29 10:12:58 2021 +0200 drm/i915: Update the i915 patches This is all-in-one update of the i915 patches to the latest version posted upstream. Signed-off-by: Sebastian Andrzej Siewior commit aae93144898af113331668f53f80cb83f5a07360 Author: Sebastian Andrzej Siewior Date: Fri Oct 29 10:07:11 2021 +0200 mm: Disable NUMA_BALANCING_DEFAULT_ENABLED and TRANSPARENT_HUGEPAGE on PREEMPT_RT TRANSPARENT_HUGEPAGE: There are potential non-deterministic delays to an RT thread if a critical memory region is not THP-aligned and a non-RT buffer is located in the same hugepage-aligned region. It's also possible for an unrelated thread to migrate pages belonging to an RT task incurring unexpected page faults due to memory defragmentation even if khugepaged is disabled. Regular HUGEPAGEs are not affected by this can be used. NUMA_BALANCING: There is a non-deterministic delay to mark PTEs PROT_NONE to gather NUMA fault samples, increased page faults of regions even if mlocked and non-deterministic delays when migrating pages. [Mel Gorman worded 99% of the commit description]. Link: https://lore.kernel.org/all/20200304091159.GN3818@techsingularity.net/ Link: https://lore.kernel.org/all/20211026165100.ahz5bkx44lrrw5pt@linutronix.de/ Cc: stable-rt@vger.kernel.org Cc: Mel Gorman Signed-off-by: Sebastian Andrzej Siewior Acked-by: Mel Gorman Link: https://lore.kernel.org/r/20211028143327.hfbxjze7palrpfgp@linutronix.de commit b056c482486a9c8f50914640d3a1a5a5c6521ace Author: Sebastian Andrzej Siewior Date: Fri Oct 29 10:03:56 2021 +0200 fs/namespace: Boost the mount_lock.lock owner instead of spinning on PREEMPT_RT. The MNT_WRITE_HOLD flag is used to hold back any new writers while the mount point is about to be made read-only. __mnt_want_write() then loops with disabled preemption until this flag disappears. Callers of mnt_hold_writers() (which sets the flag) hold the spinlock_t of mount_lock (seqlock_t) which disables preemption on !PREEMPT_RT and ensures the task is not scheduled away so that the spinning side spins for a long time. On PREEMPT_RT the spinlock_t does not disable preemption and so it is possible that the task setting MNT_WRITE_HOLD is preempted by task with higher priority which then spins infinitely waiting for MNT_WRITE_HOLD to get removed. Acquire mount_lock::lock which is held by setter of MNT_WRITE_HOLD. This will PI-boost the owner and wait until the lock is dropped and which means that MNT_WRITE_HOLD is cleared again. Remove unused cpu_chill(). Link: https://lkml.kernel.org/r/20211025152218.opvcqfku2lhqvp4o@linutronix.de Signed-off-by: Sebastian Andrzej Siewior commit 74920695ab51a6d180dcd6554193cc8427758360 Author: Sebastian Andrzej Siewior Date: Thu Oct 28 17:30:50 2021 +0200 fscache: Use only one fscache_object_cong_wait. In the commit mentioned below, fscache was converted from slow-work to workqueue. slow_work_enqueue() and slow_work_sleep_till_thread_needed() did not use a per-CPU workqueue. They choose from two global waitqueues depending on the SLOW_WORK_VERY_SLOW bit which was not set so it always one waitqueue. I can't find out how it is ensured that a waiter on certain CPU is woken up be the other side. My guess is that the timeout in schedule_timeout() ensures that it does not wait forever (or a random wake up). fscache_object_sleep_till_congested() must be invoked from preemptible context in order for schedule() to work. In this case this_cpu_ptr() should complain with CONFIG_DEBUG_PREEMPT enabled except the thread is bound to one CPU. wake_up() wakes only one waiter and I'm not sure if it is guaranteed that only one waiter exists. Replace the per-CPU waitqueue with one global waitqueue. Fixes: 8b8edefa2fffb ("fscache: convert object to use workqueue instead of slow-work") Reported-by: Gregor Beck Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior commit aeb874c78142d0b639eeaf974531df45716814a6 Author: Sebastian Andrzej Siewior Date: Tue Oct 26 08:29:41 2021 +0200 v5.15-rc7-rt14 Signed-off-by: Sebastian Andrzej Siewior commit 27a9c5671125406100a5acf27216b5cdf2bd7546 Merge: aff35ce24d1a 3906fe9bb7f1 Author: Sebastian Andrzej Siewior Date: Tue Oct 26 08:29:16 2021 +0200 Merge tag 'v5.15-rc7' into linux-5.15.y-rt Linux 5.15-rc7 Signed-off-by: Sebastian Andrzej Siewior commit aff35ce24d1a095c612c55c406c0904fe21b687d Author: Sebastian Andrzej Siewior Date: Thu Oct 21 15:17:38 2021 +0200 v5.15-rc6-rt13 Signed-off-by: Sebastian Andrzej Siewior commit 3b86aee1ce637b47df92beb2ff0c1b1c5f987ab4 Author: Sebastian Andrzej Siewior Date: Thu Oct 21 15:14:40 2021 +0200 net: Update the seqcount_t removal from Qdisc. This is an all-in-one patch updating the series based on what has been merged upstream plus the individual fixes: net/sched: sch_ets: properly init all active DRR list handles net: sched: Allow statistics reads from softirq. net: sched: fix logic error in qdisc_run_begin() net: sched: remove one pair of atomic operations net: stats: Read the statistics in ___gnet_stats_copy_basic() instead of adding. Signed-off-by: Sebastian Andrzej Siewior commit d3ab918c240db09455948ed2b3c2212ae2890e07 Author: Sebastian Andrzej Siewior Date: Mon Oct 18 10:27:48 2021 +0200 v5.15-rc6-rt12 Signed-off-by: Sebastian Andrzej Siewior commit 3ed7763956cc7267741d9f118e9615e4b3bf83db Author: Sebastian Andrzej Siewior Date: Mon Oct 18 10:16:02 2021 +0200 net: Update the seqcount_t removal from Qdisc. This is an all-in-one patch udpating the seqcount_t removal from Qdisc to the version recently posted on the list. Try to simplify the gnet_stats and remove qdisc->running sequence counter. https://lore.kernel.org/all/20211016084910.4029084-1-bigeasy@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior commit 3cd6a94b497228200613259fb0b917b80509190d Author: Sebastian Andrzej Siewior Date: Mon Oct 18 10:09:19 2021 +0200 workqueue: Remove printk_deferred_*(). In the recent -rc update the printk_deferred_*() functions were added which are not needed due to the printk rework. Remove printk_deferred_*(). Signed-off-by: Sebastian Andrzej Siewior commit 0ce951980c4573f75b4a723a804b8675784a8bd2 Author: Sebastian Andrzej Siewior Date: Mon Oct 18 08:24:55 2021 +0200 v5.15-rc6-rt11 Signed-off-by: Sebastian Andrzej Siewior commit 82e273e814ad244311e6bab407b1bb8a29596024 Merge: cbc8a1e30105 519d81956ee2 Author: Sebastian Andrzej Siewior Date: Mon Oct 18 08:24:12 2021 +0200 Merge tag 'v5.15-rc6' into linux-5.15.y-rt Linux 5.15-rc6 commit cbc8a1e30105464afc65c43efd59a838496c7abe Author: Sebastian Andrzej Siewior Date: Fri Oct 15 19:15:23 2021 +0200 v5.15-rc5-rt10 Signed-off-by: Sebastian Andrzej Siewior commit 0c34700de5e7624bfb2b800b91f2ee597a5cf389 Author: He Zhe Date: Tue Oct 12 16:44:21 2021 +0800 arm64: signal: Use ARCH_RT_DELAYS_SIGNAL_SEND. The software breakpoint is handled via do_debug_exception() which disables preemption. On PREEMPT_RT spinlock_t become sleeping locks and must not be acquired with disabled preemption. Use ARCH_RT_DELAYS_SIGNAL_SEND so the signal (from send_user_sigtrap()) is sent delayed in return to userland. Cc: stable-rt@vger.kernel.org Signed-off-by: He Zhe Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20211012084421.35136-1-zhe.he@windriver.com commit 48d6f712f1bc0cc8bbf08c84434a9e3f50da73f1 Author: Sebastian Andrzej Siewior Date: Mon Oct 11 08:35:28 2021 +0200 v5.15-rc5-rt9 Signed-off-by: Sebastian Andrzej Siewior commit 03597e5074d008e9d12f7ac4ade4988f0a7a434c Merge: 340cc70444e7 64570fbc14f8 Author: Sebastian Andrzej Siewior Date: Mon Oct 11 08:34:43 2021 +0200 Merge tag 'v5.15-rc5' into linux-5.15.y-rt Linux 5.15-rc5 Signed-off-by: Sebastian Andrzej Siewior commit 340cc70444e7ffe89e8ea1eb0a32fb6de44c13df Author: Sebastian Andrzej Siewior Date: Fri Oct 8 23:16:09 2021 +0200 v5.15-rc4-rt8 Signed-off-by: Sebastian Andrzej Siewior commit 90b389a757742f20fd982f2196ae605605a2075f Author: Sebastian Andrzej Siewior Date: Fri Oct 8 23:13:18 2021 +0200 net: Update the Qdisc-seqcount series. This is an all-in-update of the qdisc series which decoples the e seqcount_t "try lock" usage from statistics update and qdisc's running state. The series is still Work-In-Progress. Signed-off-by: Sebastian Andrzej Siewior commit 563ed9f51ed55801e4c0d5618d1a5a9c1fde7d35 Author: Sebastian Andrzej Siewior Date: Fri Oct 8 23:06:32 2021 +0200 drm/i915: Update the i915 patches. This is an all-in-one update of i915 patches. The series has been posted at https://lore.kernel.org/all/20211005150046.1000285-1-bigeasy@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior commit f5a50fee22c2fef5f21b0ae8ee72f4e95300a95d Author: Sebastian Andrzej Siewior Date: Fri Oct 8 21:58:15 2021 +0200 irq_work: Update to the latest version. This is an all-in-one update of the irq_work series with the version that has been updatream at https://lore.kernel.org/all/20211006111852.1514359-1-bigeasy@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior commit 56ce0f6137adadbf4a2ced2d23eefb1a28812ecf Author: Sebastian Andrzej Siewior Date: Mon Oct 4 09:47:21 2021 +0200 v5.15-rc4-rt7 Signed-off-by: Sebastian Andrzej Siewior commit 2485e6837d73dcb21706565e2899023dfed6fdd3 Merge: f06442978b05 9e1ff307c779 Author: Sebastian Andrzej Siewior Date: Mon Oct 4 09:46:46 2021 +0200 Merge tag 'v5.15-rc4' into linux-5.15.y-rt Linux 5.15-rc4 commit f06442978b0554d8e5734a5aa9b087544014ad3e Author: Sebastian Andrzej Siewior Date: Thu Sep 30 15:03:55 2021 +0200 v5.15-rc3-rt6 Signed-off-by: Sebastian Andrzej Siewior commit 34e1320d9ada25a4c60b38c6f1259176170d2688 Author: Sebastian Andrzej Siewior Date: Thu Sep 30 15:02:45 2021 +0200 sched: Sync the sched patches Sync the scheduler related patches with the series posted https://lkml.kernel.org/r/20210928122339.502270600@linutronix.de Signed-off-by: Sebastian Andrzej Siewior commit 905d204298c73e84633d7fc84c2e726fcd5d5b1e Author: Sebastian Andrzej Siewior Date: Thu Sep 30 14:58:41 2021 +0200 irq_work: Sync the patches Sync the irq_work patches with what has been posted in https://lore.kernel.org/all/20210927211919.310855-1-bigeasy@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior commit 7bbddda7e1f27b5c324f7b5e6a6344375f05d077 Author: Sebastian Andrzej Siewior Date: Thu Sep 30 14:56:49 2021 +0200 smp: Wake ksoftirqd on PREEMPT_RT instead do_softirq(). Sync with what has been posted in https://lore.kernel.org/all/20210927073814.x5h6osr4dgiu44sc@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior commit a4dbd1f80141e0f6e515b684a364945964ad2470 Author: Sebastian Andrzej Siewior Date: Thu Sep 30 14:56:23 2021 +0200 mm/scatterlist: Sync with the proposed patch. Signed-off-by: Sebastian Andrzej Siewior commit de323151f2ec7c75992c7ab2e34ad6248b42ae22 Author: Sebastian Andrzej Siewior Date: Thu Sep 30 14:51:41 2021 +0200 zsmalloc. Sync patch Sync with the post in https://lore.kernel.org/all/20210928084419.mkfu62barwrsvflq@linutronix.de/ Signed-off-by: Sebastian Andrzej Siewior commit 6faa4db6334865e82695bd5183746e56fd36d60b Author: Sebastian Andrzej Siewior Date: Thu Apr 2 21:16:30 2020 +0200 irq_poll: Use raise_softirq_irqoff() in cpu_dead notifier __raise_softirq_irqoff() adds a bit to the pending sofirq mask and this is it. The softirq won't be handled in a deterministic way but randomly when an interrupt fires and handles softirq in its irq_exit() routine or if something randomly checks and handles pending softirqs in the call chain before the CPU goes idle. Add a local_bh_disable/enable() around the IRQ-off section which will handle pending softirqs. Signed-off-by: Sebastian Andrzej Siewior Link: https://lkml.kernel.org/r/20210930103754.2128949-1-bigeasy@linutronix.de commit ab2154ce404eefbbeb41889ffa906cb6e9f6b664 Author: Sebastian Andrzej Siewior Date: Thu Sep 30 14:44:25 2021 +0200 Revert "softirq: Check preemption after reenabling interrupts" for irq_poll irq_poll_cpu_dead() defers its softirq processing until a random point. This is true for RT & !RT. The checks in irq_poll_softirq() are not needed because it is invoked from within the softirq handler. The check in irq_poll_complete() is not needed because there is no task wake here. The check in irq_poll_sched() should not be needed. Should as in all handler should invoke it from their interrupt service routing. I'm not sure about lpfc and mpt3sas due to long call chain but if it is invoked from a preemptible context then it is delayed to random point in time with and without PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior commit ce55c8d03f930dde4f521c4b1ab9312f037f3f7b Author: Sebastian Andrzej Siewior Date: Mon Sep 27 11:59:17 2021 +0200 irq: Export force_irqthreads_key Temporary add the EXPORT_SYMBOL_GPL for force_irqthreads_key until it is settled if it is needed or not. Signed-off-by: Sebastian Andrzej Siewior commit a23f617fdef44a5b87601a70c5c5297e36ea7d20 Author: Thomas Gleixner Date: Sun Sep 26 17:10:45 2021 +0200 net: bridge: mcast: Associate the seqcount with its protecting lock. The sequence count bridge_mcast_querier::seq is protected by net_bridge::multicast_lock but seqcount_init() does not associate the seqcount with the lock. This leads to a warning on PREEMPT_RT because preemption is still enabled. Let seqcount_init() associate the seqcount with lock that protects the write section. Remove lockdep_assert_held_once() because lockdep already checks whether the associated lock is held. Fixes: 67b746f94ff39 ("net: bridge: mcast: make sure querier port/address updates are consistent") Reported-by: Mike Galbraith Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Tested-by: Mike Galbraith https://lkml.kernel.org/r/20210928141049.593833-1-bigeasy@linutronix.de commit 095936956b7722d66006ee5bb8085637f8799d08 Author: Sebastian Andrzej Siewior Date: Thu Sep 30 14:37:10 2021 +0200 v5.15-rc3-rt5 Signed-off-by: Sebastian Andrzej Siewior commit 02f236ada0c74b474c57d50f637a33d3840b04f9 Merge: 903767b28c70 5816b3e6577e Author: Sebastian Andrzej Siewior Date: Thu Sep 30 14:36:27 2021 +0200 Merge tag 'v5.15-rc3' into linux-5.15.y-rt Linux 5.15-rc3 commit 903767b28c703abb1a614edd8d5c6d0558a0e0f8 Author: Sebastian Andrzej Siewior Date: Fri Sep 24 19:04:21 2021 +0200 v5.15-rc2-rt4 Signed-off-by: Sebastian Andrzej Siewior commit 48ae0faa44fb27c486257b5679f8634b7223f1a7 Author: Sebastian Andrzej Siewior Date: Fri Sep 24 19:04:03 2021 +0200 sched: Redo delayed mm_struct & task struct deallocation. This is an all-on-commit containing the rewrite of the delayed mm_struct and stack deallocation. Rework by Thomask Gleixner. Signed-off-by: Sebastian Andrzej Siewior commit 797da75cd31a15e2c823022613ca0a09713ad4b7 Author: Sebastian Andrzej Siewior Date: Fri Sep 24 19:02:56 2021 +0200 sched: Additional might_sleep() improvements. This is an all-on-one Patch including the following series: [patch 0/8] sched: Clean up might_sleep() and make it RT aware [patch 1/8] sched: Clean up the might_sleep() underscore zoo [patch 2/8] sched: Make cond_resched_*lock() variants consistent vs. might_sleep() [patch 3/8] sched: Remove preempt_offset argument from __might_sleep() [patch 4/8] sched: Cleanup might_sleep() printks [patch 5/8] sched: Make might_sleep() output less confusing [patch 6/8] sched: Make RCU nest depth distinct in __might_resched() [patch 7/8] sched: Make cond_resched_lock() variants RT aware [patch 8/8] locking/rt: Take RCU nesting into account for __might_resched() as posted by Thomas Gleixner. Signed-off-by: Sebastian Andrzej Siewior commit 0870473527f803ce0c4794eddd7c9b1e7638ac4b Author: Sebastian Andrzej Siewior Date: Fri Sep 24 18:58:21 2021 +0200 Remove a few atomic.h includes. It appears that ARM compiles without these two additional includes. Remove the atomic.h includes which were added earlier. Signed-off-by: Sebastian Andrzej Siewior commit da9f1e9ee4aa0795984fbad3997d2ef8dd8bf06a Author: Sebastian Andrzej Siewior Date: Fri Sep 24 18:55:06 2021 +0200 smp: Wake ksoftirqd from idle when it is not running. ksoftirqd should be woken if it is not idle, not the other way around. Signed-off-by: Sebastian Andrzej Siewior commit 4dcd0e163c8a1fb20655b0878d0cd3e0ffe8794e Author: Sebastian Andrzej Siewior Date: Fri Sep 24 18:49:54 2021 +0200 smack: Correct intendention level. The smk_ipv6_port_check() invocation here has one tabular too much. Signed-off-by: Sebastian Andrzej Siewior commit ccc6d0d428e7dc05002f67c1a3f74a94dc673628 Author: Sebastian Andrzej Siewior Date: Wed Sep 22 20:07:39 2021 +0200 v5.15-rc2-rt3 Signed-off-by: Sebastian Andrzej Siewior commit 1a45b3551ef852193c3d338888132c4925d0690d Author: Sebastian Andrzej Siewior Date: Wed Sep 22 19:34:40 2021 +0200 preempt: Move preempt_enable_no_resched() to the RT block preempt_enable_no_resched() should point to preempt_enable() on PREEMPT_RT so nobody is playing any preempt tricks and enables preemption without checking for the need-resched flag. This was misplaced in v3.14.0-rt1 und remained unnoticed until now. Point preempt_enable_no_resched() and preempt_enable() on RT. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior commit 1b91b8af0858e3c74b475ef00aa758dd18c1e326 Author: Sebastian Andrzej Siewior Date: Wed Sep 22 19:30:53 2021 +0200 Revert preempt: Provide preempt_*_nort variants Remove the preempt_*_nort() functions, there are no users anymore. Signed-off-by: Sebastian Andrzej Siewior commit 9d05eb85ce4c100031b6055dd2011b716a0d076d Author: Thomas Gleixner Date: Wed Sep 22 12:28:19 2021 +0200 locking/rt: Take RCU nesting into account for might_sleep() The RT patches contained a cheap hack to ignore the RCU nesting depth in might_sleep() checks, which was a pragmatic but incorrect workaround. The general rule that rcu_read_lock() held sections cannot voluntary sleep does apply even on RT kernels. Though the substitution of spin/rw locks on RT enabled kernels has to be exempt from that rule. On !RT a spin_lock() can obviously nest inside a rcu read side critical section as the lock acquisition is not going to block, but on RT this is not longer the case due to the 'sleeping' spin lock substitution. Instead of generally ignoring the RCU nesting depth in might_sleep() checks, pass the rcu_preempt_depth() as offset argument to might_sleep() from spin/read/write_lock() which makes the check work correctly even in RCU read side critical sections. The actual blocking on such a substituted lock within a RCU read side critical section is already handled correctly in __schedule() by treating it as a "preemption" of the RCU read side critical section. Signed-off-by: Thomas Gleixner commit c56c6f2bd4f0991b41b3ad2ebe81b661f6801913 Author: Thomas Gleixner Date: Wed Sep 22 19:25:35 2021 +0200 sched: Make cond_resched_lock() RT aware [ This is an all-in-one commit reverting the commit sched: Do not account rcu_preempt_depth on RT in might_sleep() and introducing this commit. ] The might_sleep() checks in the cond_resched_lock() variants use PREEMPT_LOCK_OFFSET for preempt count offset checking. On PREEMPT_RT enabled kernels spin/rw_lock held sections stay preemptible which means PREEMPT_LOCK_OFFSET is 0, but that still triggers the might_sleep() check because that takes RCU read side nesting into account. On RT enabled kernels spin/read/write_lock() issue rcu_read_lock() to resemble the !RT semantics, which means in cond_resched_lock() the might sleep check will see preempt_count() == 0 and rcu_preempt_depth() == 1. Introduce PREEMPT_LOCK_SCHED_OFFSET for those might sleep checks and map them to PREEMPT_LOCK_OFFSET on !RT and to 1 (accounting for rcu_preempt_depth()) on RT enabled kernels. Signed-off-by: Thomas Gleixner commit 279b703274a5af8d7b374972bcc143e9ac713dcc Author: Thomas Gleixner Date: Wed Sep 22 19:20:55 2021 +0200 rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT [ This is an all-in-one commit reverting the commit Revert "rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT" and introducing this commit. ] Valentin reported warnings about suspicious RCU usage on RT kernels. Those happen when offloading of RCU callbacks is enabled: WARNING: suspicious RCU usage 5.13.0-rt1 #20 Not tainted ----------------------------- kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state! rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58) rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777) rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876) The reason is that rcu_rdp_is_offloaded() is invoked without one of the required protections on RT enabled kernels because local_bh_disable() does not disable preemption on RT. Valentin proposed to add a local lock to the code in question, but that's suboptimal in several aspects: 1) local locks add extra code to !RT kernels for no value. 2) All possible callsites have to audited and amended when affected possible at an outer function level due to lock nesting issues. 3) As the local lock has to be taken at the outer functions it's required to release and reacquire them in the inner code sections which might voluntary schedule, e.g. rcu_do_batch(). Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke rcu_rdp_is_offloaded() in the variable declaration section right at the top of the functions. But the actual usage of the result is either within a section which provides the required protections or after such a section. So the obvious solution is to move the invocation into the code sections which provide the proper protections, which solves the problem for RT and does not have any impact on !RT kernels. Reported-by: Valentin Schneider Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior commit 6b61ded94d0ae0618a44e6829ec69b5498e5d26d Author: Ingo Molnar Date: Fri Jul 3 08:29:57 2009 -0500 genirq: Disable irqfixup/poll on PREEMPT_RT. The support for misrouted IRQs is used on old / legacy systems and is not feasible on PREEMPT_RT. Polling for interrupts reduces the overall system performance. Additionally the interrupt latency depends on the polling frequency and delays are not desired for real time workloads. Disable IRQ polling on PREEMPT_RT and let the user know that it is not enabled. The compiler will optimize the real fixup/poll code out. [ bigeasy: Update changelog and switch to IS_ENABLED() ] Signed-off-by: Ingo Molnar Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20210917223841.c6j6jcaffojrnot3@linutronix.de Signed-off-by: Sebastian Andrzej Siewior commit 7cc45601d980aa70eefb6b68ab73e349026e9674 Author: Sebastian Andrzej Siewior Date: Wed Sep 22 18:28:04 2021 +0200 Revert "sched: Disable CONFIG_RT_GROUP_SCHED on RT" The original issue does not reproduce anymore. Allow RT_GROUP_SCHED to be selected. Signed-off-by: Sebastian Andrzej Siewior commit 6e3039ef8fdfdd397c5c7bebfb52f1b99b00fa04 Author: Sebastian Andrzej Siewior Date: Wed Sep 22 18:21:08 2021 +0200 Revert "cpuset: Convert callback_lock to raw_spinlock_t" Since the mm/slub rework the code is not invoked with disabled interrupts so there is no need to convert them raw_spinlock_t. Revert back to spinlock_t. Signed-off-by: Sebastian Andrzej Siewior commit dbef0d64af1ebd3a6d6bf020784a0ca7e9fe6296 Author: Sebastian Andrzej Siewior Date: Wed Sep 22 18:18:03 2021 +0200 Revert "crypto: limit more FPU-enabled sections" There are no more users of kernel_fpu_resched(), remove it. Signed-off-by: Sebastian Andrzej Siewior commit 639e3e17f5189eef325512a179c1a40ce92dc8eb Author: Sebastian Andrzej Siewior Date: Wed Sep 22 16:27:51 2021 +0200 v5.15-rc2-rt2 Signed-off-by: Sebastian Andrzej Siewior commit 311be12ad0a58a4ab0f507075deabc5cc7c8a7c1 Merge: 05f564af7ff4 e4e737bb5c17 Author: Sebastian Andrzej Siewior Date: Wed Sep 22 16:26:56 2021 +0200 Merge tag 'v5.15-rc2' into linux-5.15.y-rt Linux 5.15-rc2 Signed-off-by: Sebastian Andrzej Siewior commit 05f564af7ff4cd9cc21645df2f0be6b3e6681ee6 Author: Thomas Gleixner Date: Fri Jul 8 20:25:16 2011 +0200 Add localversion for -RT release Signed-off-by: Thomas Gleixner commit 527f5d6c2befba2028a3a5006451644f4a600d3b Author: Sebastian Andrzej Siewior Date: Fri Oct 11 13:14:41 2019 +0200 POWERPC: Allow to enable RT Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 95aacd7877ed768fc87f2d461256e6d8f699837d Author: Sebastian Andrzej Siewior Date: Tue Mar 26 18:31:29 2019 +0100 powerpc/stackprotector: work around stack-guard init from atomic This is invoked from the secondary CPU in atomic context. On x86 we use tsc instead. On Power we XOR it against mftb() so lets use stack address as the initial value. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit aed2b869f16ca87cd0935a1571a951f7c1465eb4 Author: Bogdan Purcareata Date: Fri Apr 24 15:53:13 2015 +0000 powerpc/kvm: Disable in-kernel MPIC emulation for PREEMPT_RT While converting the openpic emulation code to use a raw_spinlock_t enables guests to run on RT, there's still a performance issue. For interrupts sent in directed delivery mode with a multiple CPU mask, the emulated openpic will loop through all of the VCPUs, and for each VCPUs, it call IRQ_check, which will loop through all the pending interrupts for that VCPU. This is done while holding the raw_lock, meaning that in all this time the interrupts and preemption are disabled on the host Linux. A malicious user app can max both these number and cause a DoS. This temporary fix is sent for two reasons. First is so that users who want to use the in-kernel MPIC emulation are aware of the potential latencies, thus making sure that the hardware MPIC and their usage scenario does not involve interrupts sent in directed delivery mode, and the number of possible pending interrupts is kept small. Secondly, this should incentivize the development of a proper openpic emulation that would be better suited for RT. Acked-by: Scott Wood Signed-off-by: Bogdan Purcareata Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 46cc89ccaa4c2335e365888fe1f5a3f1e068f4f8 Author: Sebastian Andrzej Siewior Date: Tue Mar 26 18:31:54 2019 +0100 powerpc/pseries/iommu: Use a locallock instead local_irq_save() The locallock protects the per-CPU variable tce_page. The function attempts to allocate memory while tce_page is protected (by disabling interrupts). Use local_irq_save() instead of local_irq_disable(). Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit c1da300b3ef9f85f7db2a24aca5f85776398c986 Author: Sebastian Andrzej Siewior Date: Fri Jul 26 11:30:49 2019 +0200 powerpc: traps: Use PREEMPT_RT Add PREEMPT_RT to the backtrace if enabled. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 0ac10d77ff6a6009a5eea8c703a95a9d82b450a8 Author: Sebastian Andrzej Siewior Date: Fri Oct 11 13:14:35 2019 +0200 ARM64: Allow to enable RT Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 96191b3ae82dcbb7c02f77d59961b049542110b3 Author: Sebastian Andrzej Siewior Date: Fri Oct 11 13:14:29 2019 +0200 ARM: Allow to enable RT Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit dd2c511036ea50df6e9a565bca70e0472c7b3738 Author: Sebastian Andrzej Siewior Date: Thu Jul 29 10:36:30 2021 +0200 arm64/sve: Make kernel FPU protection RT friendly Non RT kernels need to protect FPU against preemption and bottom half processing. This is achieved by disabling bottom halves via local_bh_disable() which implictly disables preemption. On RT kernels this protection mechanism is not sufficient because local_bh_disable() does not disable preemption. It serializes bottom half related processing via a CPU local lock. As bottom halves are running always in thread context on RT kernels disabling preemption is the proper choice as it implicitly prevents bottom half processing. Signed-off-by: Sebastian Andrzej Siewior commit 13541cb292e61d4917e581a30bd4b24a233d9882 Author: Sebastian Andrzej Siewior Date: Thu Jul 29 12:52:14 2021 +0200 arm64/sve: Delay freeing memory in fpsimd_flush_thread() fpsimd_flush_thread() invokes kfree() via sve_free() within a preempt disabled section which is not working on -RT. Delay freeing of memory until preemption is enabled again. Signed-off-by: Sebastian Andrzej Siewior commit de8fd76a85e81c7ebd0c0edd77b885fc1d68e9f9 Author: Josh Cartwright Date: Thu Feb 11 11:54:01 2016 -0600 KVM: arm/arm64: downgrade preempt_disable()d region to migrate_disable() kvm_arch_vcpu_ioctl_run() disables the use of preemption when updating the vgic and timer states to prevent the calling task from migrating to another CPU. It does so to prevent the task from writing to the incorrect per-CPU GIC distributor registers. On -rt kernels, it's possible to maintain the same guarantee with the use of migrate_{disable,enable}(), with the added benefit that the migrate-disabled region is preemptible. Update kvm_arch_vcpu_ioctl_run() to do so. Cc: Christoffer Dall Reported-by: Manish Jaggi Signed-off-by: Josh Cartwright Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 08cbd3d856cbea53a564f5a1d5ee54a852a9b54d Author: Yadi.hu Date: Wed Dec 10 10:32:09 2014 +0800 ARM: enable irq in translation/section permission fault handlers Probably happens on all ARM, with CONFIG_PREEMPT_RT CONFIG_DEBUG_ATOMIC_SLEEP This simple program.... int main() { *((char*)0xc0001000) = 0; }; [ 512.742724] BUG: sleeping function called from invalid context at kernel/rtmutex.c:658 [ 512.743000] in_atomic(): 0, irqs_disabled(): 128, pid: 994, name: a [ 512.743217] INFO: lockdep is turned off. [ 512.743360] irq event stamp: 0 [ 512.743482] hardirqs last enabled at (0): [< (null)>] (null) [ 512.743714] hardirqs last disabled at (0): [] copy_process+0x3b0/0x11c0 [ 512.744013] softirqs last enabled at (0): [] copy_process+0x3b0/0x11c0 [ 512.744303] softirqs last disabled at (0): [< (null)>] (null) [ 512.744631] [] (unwind_backtrace+0x0/0x104) [ 512.745001] [] (dump_stack+0x20/0x24) [ 512.745355] [] (__might_sleep+0x1dc/0x1e0) [ 512.745717] [] (rt_spin_lock+0x34/0x6c) [ 512.746073] [] (do_force_sig_info+0x34/0xf0) [ 512.746457] [] (force_sig_info+0x18/0x1c) [ 512.746829] [] (__do_user_fault+0x9c/0xd8) [ 512.747185] [] (do_bad_area+0x7c/0x94) [ 512.747536] [] (do_sect_fault+0x40/0x48) [ 512.747898] [] (do_DataAbort+0x40/0xa0) [ 512.748181] Exception stack(0xecaa1fb0 to 0xecaa1ff8) Oxc0000000 belongs to kernel address space, user task can not be allowed to access it. For above condition, correct result is that test case should receive a “segment fault” and exits but not stacks. the root cause is commit 02fe2845d6a8 ("avoid enabling interrupts in prefetch/data abort handlers"),it deletes irq enable block in Data abort assemble code and move them into page/breakpiont/alignment fault handlers instead. But author does not enable irq in translation/section permission fault handlers. ARM disables irq when it enters exception/ interrupt mode, if kernel doesn't enable irq, it would be still disabled during translation/section permission fault. We see the above splat because do_force_sig_info is still called with IRQs off, and that code eventually does a: spin_lock_irqsave(&t->sighand->siglock, flags); As this is architecture independent code, and we've not seen any other need for other arch to have the siglock converted to raw lock, we can conclude that we should enable irq for ARM translation/section permission exception. Signed-off-by: Yadi.hu Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 1faa2fb44ea1d62c55d95e05061ac5c6d667da68 Author: Anders Roxell Date: Thu May 14 17:52:17 2015 +0200 arch/arm64: Add lazy preempt support arm64 is missing support for PREEMPT_RT. The main feature which is lacking is support for lazy preemption. The arch-specific entry code, thread information structure definitions, and associated data tables have to be extended to provide this support. Then the Kconfig file has to be extended to indicate the support is available, and also to indicate that support for full RT preemption is now available. Signed-off-by: Anders Roxell Signed-off-by: Thomas Gleixner commit 1bc48220642d9741a495769ea09685769823c029 Author: Thomas Gleixner Date: Thu Nov 1 10:14:11 2012 +0100 powerpc: Add support for lazy preemption Implement the powerpc pieces for lazy preempt. Signed-off-by: Thomas Gleixner commit 5f15849f54218c63ea6602a2a0c0906050ff565d Author: Thomas Gleixner Date: Wed Oct 31 12:04:11 2012 +0100 arm: Add support for lazy preemption Implement the arm pieces for lazy preempt. Signed-off-by: Thomas Gleixner commit f5374c0220e129fcad87961b20c799d0d7ad36a5 Author: Thomas Gleixner Date: Tue Jul 13 07:52:52 2021 +0200 entry: Fix the preempt lazy fallout Common code needs common defines.... Fixes: f2f9e496208c ("x86: Support for lazy preemption") Reported-by: kernel test robot Signed-off-by: Thomas Gleixner commit 6b6c42d25074558693da6badfd4a133b690cd408 Author: Thomas Gleixner Date: Thu Nov 1 11:03:47 2012 +0100 x86: Support for lazy preemption Implement the x86 pieces for lazy preempt. Signed-off-by: Thomas Gleixner commit e2cb7670f0da3a6680fe6f499c7a93cbdca07a37 Author: Sebastian Andrzej Siewior Date: Tue Jun 30 11:45:14 2020 +0200 x86/entry: Use should_resched() in idtentry_exit_cond_resched() The TIF_NEED_RESCHED bit is inlined on x86 into the preemption counter. By using should_resched(0) instead of need_resched() the same check can be performed which uses the same variable as 'preempt_count()` which was issued before. Use should_resched(0) instead need_resched(). Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 240c44c173f943d1102a957c03a59621e39dd394 Author: Thomas Gleixner Date: Fri Oct 26 18:50:54 2012 +0100 sched: Add support for lazy preemption It has become an obsession to mitigate the determinism vs. throughput loss of RT. Looking at the mainline semantics of preemption points gives a hint why RT sucks throughput wise for ordinary SCHED_OTHER tasks. One major issue is the wakeup of tasks which are right away preempting the waking task while the waking task holds a lock on which the woken task will block right after having preempted the wakee. In mainline this is prevented due to the implicit preemption disable of spin/rw_lock held regions. On RT this is not possible due to the fully preemptible nature of sleeping spinlocks. Though for a SCHED_OTHER task preempting another SCHED_OTHER task this is really not a correctness issue. RT folks are concerned about SCHED_FIFO/RR tasks preemption and not about the purely fairness driven SCHED_OTHER preemption latencies. So I introduced a lazy preemption mechanism which only applies to SCHED_OTHER tasks preempting another SCHED_OTHER task. Aside of the existing preempt_count each tasks sports now a preempt_lazy_count which is manipulated on lock acquiry and release. This is slightly incorrect as for lazyness reasons I coupled this on migrate_disable/enable so some other mechanisms get the same treatment (e.g. get_cpu_light). Now on the scheduler side instead of setting NEED_RESCHED this sets NEED_RESCHED_LAZY in case of a SCHED_OTHER/SCHED_OTHER preemption and therefor allows to exit the waking task the lock held region before the woken task preempts. That also works better for cross CPU wakeups as the other side can stay in the adaptive spinning loop. For RT class preemption there is no change. This simply sets NEED_RESCHED and forgoes the lazy preemption counter. Initial test do not expose any observable latency increasement, but history shows that I've been proven wrong before :) The lazy preemption mode is per default on, but with CONFIG_SCHED_DEBUG enabled it can be disabled via: # echo NO_PREEMPT_LAZY >/sys/kernel/debug/sched_features and reenabled via # echo PREEMPT_LAZY >/sys/kernel/debug/sched_features The test results so far are very machine and workload dependent, but there is a clear trend that it enhances the non RT workload performance. Signed-off-by: Thomas Gleixner commit 06dcd8d096d0da9f94276092351dbd8f577729c6 Author: Sebastian Andrzej Siewior Date: Thu Nov 7 17:49:20 2019 +0100 x86: Enable RT also on 32bit Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit e957c868f3d824647448acfc330c2eb2593e1b91 Author: Sebastian Andrzej Siewior Date: Wed Aug 7 18:15:38 2019 +0200 x86: Allow to enable RT Allow to select RT. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 9b2b75754bebace772f56ccecae6744d306dcf68 Author: Thomas Gleixner Date: Sun Nov 6 12:26:18 2011 +0100 x86: kvm Require const tsc for RT Non constant TSC is a nightmare on bare metal already, but with virtualization it becomes a complete disaster because the workarounds are horrible latency wise. That's also a preliminary for running RT in a guest on top of a RT host. Signed-off-by: Thomas Gleixner commit 2dbbc3a07009b2ab5613a7702a71dfc97e14d091 Author: Oleg Nesterov Date: Tue Jul 14 14:26:34 2015 +0200 signal/x86: Delay calling signals in atomic On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT is enabled, spin_locks become mutexes, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spin lock that has been converted to a mutex and has the possibility to sleep. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86_64, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] Signed-off-by: Oleg Nesterov Signed-off-by: Steven Rostedt Signed-off-by: Thomas Gleixner [bigeasy: also needed on 32bit as per Yang Shi ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 9157fbad198d6034fe2772401bb21312a7e914ac Author: Clark Williams Date: Sat Jul 30 21:55:53 2011 -0500 sysfs: Add /sys/kernel/realtime entry Add a /sys/kernel entry to indicate that the kernel is a realtime kernel. Clark says that he needs this for udev rules, udev needs to evaluate if its a PREEMPT_RT kernel a few thousand times and parsing uname output is too slow or so. Are there better solutions? Should it exist and return 0 on !-rt? Signed-off-by: Clark Williams Signed-off-by: Peter Zijlstra Signed-off-by: Thomas Gleixner commit 4496c886dce7d36a706aa65f95e315206774c480 Author: Haris Okanovic Date: Tue Aug 15 15:13:08 2017 -0500 tpm_tis: fix stall after iowrite*()s ioread8() operations to TPM MMIO addresses can stall the cpu when immediately following a sequence of iowrite*()'s to the same region. For example, cyclitest measures ~400us latency spikes when a non-RT usermode application communicates with an SPI-based TPM chip (Intel Atom E3940 system, PREEMPT_RT kernel). The spikes are caused by a stalling ioread8() operation following a sequence of 30+ iowrite8()s to the same address. I believe this happens because the write sequence is buffered (in cpu or somewhere along the bus), and gets flushed on the first LOAD instruction (ioread*()) that follows. The enclosed change appears to fix this issue: read the TPM chip's access register (status code) after every iowrite*() operation to amortize the cost of flushing data to chip across multiple instructions. Signed-off-by: Haris Okanovic Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit b0e29f3fb4c8ea7d6feff8bb577b9d70605cbab8 Author: Thomas Gleixner Date: Tue Jan 8 21:36:51 2013 +0100 tty/serial/pl011: Make the locking work on RT The lock is a sleeping lock and local_irq_save() is not the optimsation we are looking for. Redo it to make it work on -RT and non-RT. Signed-off-by: Thomas Gleixner commit d51d0ac45d55cf65f4cb4b5ac394e60aa14bb586 Author: Thomas Gleixner Date: Thu Jul 28 13:32:57 2011 +0200 tty/serial/omap: Make the locking RT aware The lock is a sleeping lock and local_irq_save() is not the optimsation we are looking for. Redo it to make it work on -RT and non-RT. Signed-off-by: Thomas Gleixner commit 51f37225610c50f9ba3ec61b86cb628ec4f43f9c Author: Sebastian Andrzej Siewior Date: Wed Sep 8 19:03:41 2021 +0200 drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock() execlists_dequeue() is invoked from a function which uses local_irq_disable() to disable interrupts so the spin_lock() behaves like spin_lock_irq(). This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not the same as spin_lock_irq(). execlists_dequeue_irq() and execlists_dequeue() has each one caller only. If intel_engine_cs::active::lock is acquired and released with the _irq suffix then it behaves almost as if execlists_dequeue() would be invoked with disabled interrupts. The difference is the last part of the function which is then invoked with enabled interrupts. I can't tell if this makes a difference. From looking at it, it might work to move the last unlock at the end of the function as I didn't find anything that would acquire the lock again. Reported-by: Clark Williams Signed-off-by: Sebastian Andrzej Siewior commit e0faa56772fa1fd3a3e646934523605836a76415 Author: Sebastian Andrzej Siewior Date: Wed Sep 8 17:18:00 2021 +0200 drm/i915/gt: Queue and wait for the irq_work item. Disabling interrupts and invoking the irq_work function directly breaks on PREEMPT_RT. PREEMPT_RT does not invoke all irq_work from hardirq context because some of the user have spinlock_t locking in the callback function. These locks are then turned into a sleeping locks which can not be acquired with disabled interrupts. Using irq_work_queue() has the benefit that the irqwork will be invoked in the regular context. In general there is "no" delay between enqueuing the callback and its invocation because the interrupt is raised right away on architectures which support it (which includes x86). Use irq_work_queue() + irq_work_sync() instead invoking the callback directly. Reported-by: Clark Williams Signed-off-by: Sebastian Andrzej Siewior commit e1628c2aabb7f0a48dcbe3ec8ac97ba26b7d5fa2 Author: Sebastian Andrzej Siewior Date: Tue Jul 7 12:25:11 2020 +0200 drm/i915/gt: Only disable interrupts for the timeline lock on !force-threaded According to commit d67739268cf0e ("drm/i915/gt: Mark up the nested engine-pm timeline lock as irqsafe") the intrrupts are disabled the code may be called from an interrupt handler and from preemptible context. With `force_irqthreads' set the timeline mutex is never observed in IRQ context so it is not neede to disable interrupts. Disable only interrupts if not in `force_irqthreads' mode. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 530de1ea2899690febd168f8420be8974d282534 Author: Sebastian Andrzej Siewior Date: Wed Dec 19 10:47:02 2018 +0100 drm/i915: skip DRM_I915_LOW_LEVEL_TRACEPOINTS with NOTRACE The order of the header files is important. If this header file is included after tracepoint.h was included then the NOTRACE here becomes a nop. Currently this happens for two .c files which use the tracepoitns behind DRM_I915_LOW_LEVEL_TRACEPOINTS. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 3bf3cccf84148527e14c2ba60faa8b194e790012 Author: Sebastian Andrzej Siewior Date: Thu Dec 6 09:52:20 2018 +0100 drm/i915: disable tracing on -RT Luca Abeni reported this: | BUG: scheduling while atomic: kworker/u8:2/15203/0x00000003 | CPU: 1 PID: 15203 Comm: kworker/u8:2 Not tainted 4.19.1-rt3 #10 | Call Trace: | rt_spin_lock+0x3f/0x50 | gen6_read32+0x45/0x1d0 [i915] | g4x_get_vblank_counter+0x36/0x40 [i915] | trace_event_raw_event_i915_pipe_update_start+0x7d/0xf0 [i915] The tracing events use trace_i915_pipe_update_start() among other events use functions acquire spin locks. A few trace points use intel_get_crtc_scanline(), others use ->get_vblank_counter() wich also might acquire a sleeping lock. Based on this I don't see any other way than disable trace points on RT. Cc: stable-rt@vger.kernel.org Reported-by: Luca Abeni Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit ab87c83725670657ad06fab962370145c62be088 Author: Mike Galbraith Date: Sat Feb 27 09:01:42 2016 +0100 drm/i915: Don't disable interrupts on PREEMPT_RT during atomic updates Commit 8d7849db3eab7 ("drm/i915: Make sprite updates atomic") started disabling interrupts across atomic updates. This breaks on PREEMPT_RT because within this section the code attempt to acquire spinlock_t locks which are sleeping locks on PREEMPT_RT. According to the comment the interrupts are disabled to avoid random delays and not required for protection or synchronisation. Don't disable interrupts on PREEMPT_RT during atomic updates. [bigeasy: drop local locks, commit message] Signed-off-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 0c53e5bdd843774639f1f3b0f01a8ee642e07e6f Author: Mike Galbraith Date: Sat Feb 27 08:09:11 2016 +0100 drm,radeon,i915: Use preempt_disable/enable_rt() where recommended DRM folks identified the spots, so use them. Signed-off-by: Mike Galbraith Signed-off-by: Thomas Gleixner Cc: Sebastian Andrzej Siewior Cc: linux-rt-users Signed-off-by: Thomas Gleixner commit 53590458d7b7ffe367b9a9e82a4ab16c48848be4 Author: Thomas Gleixner Date: Tue Aug 21 20:38:50 2012 +0200 random: Make it work on rt Delegate the random insertion to the forced threaded interrupt handler. Store the return IP of the hard interrupt handler in the irq descriptor and feed it into the random generator as a source of entropy. Signed-off-by: Thomas Gleixner commit 6a57de4ea155af5d4c28a8f7d319961b57a75e03 Author: Thomas Gleixner Date: Thu Dec 16 14:25:18 2010 +0100 x86: stackprotector: Avoid random pool on rt CPU bringup calls into the random pool to initialize the stack canary. During boot that works nicely even on RT as the might sleep checks are disabled. During CPU hotplug the might sleep checks trigger. Making the locks in random raw is a major PITA, so avoid the call on RT is the only sensible solution. This is basically the same randomness which we get during boot where the random pool has no entropy and we rely on the TSC randomnness. Reported-by: Carsten Emde Signed-off-by: Thomas Gleixner commit c3e10bfedc46b861ef683f4ab522def517f98103 Author: Thomas Gleixner Date: Tue Jul 14 14:26:34 2015 +0200 panic: skip get_random_bytes for RT_FULL in init_oops_id Disable on -RT. If this is invoked from irq-context we will have problems to acquire the sleeping lock. Signed-off-by: Thomas Gleixner commit 97c64a85703a661ee35149fc66f28f96dd5a8b3d Author: Sebastian Andrzej Siewior Date: Thu Jul 29 10:38:03 2021 +0200 crypto: testmgr - Only disable migration in crypto_disable_simd_for_test() crypto_disable_simd_for_test() disables preemption in order to receive a stable per-CPU variable which it needs to modify in order to alter crypto_simd_usable() results. This can also be achived by migrate_disable() which forbidds CPU migrations but allows the task to be preempted. The latter is important for PREEMPT_RT since operation like skcipher_walk_first() may allocate memory which must not happen with disabled preemption on PREEMPT_RT. Use migrate_disable() in crypto_disable_simd_for_test() to achieve a stable per-CPU pointer. Signed-off-by: Sebastian Andrzej Siewior commit 7b4d5ab9df85783196c3ab39cd661b05b82369c4 Author: Sebastian Andrzej Siewior Date: Thu Jul 26 18:52:00 2018 +0200 crypto: cryptd - add a lock instead preempt_disable/local_bh_disable cryptd has a per-CPU lock which protected with local_bh_disable() and preempt_disable(). Add an explicit spin_lock to make the locking context more obvious and visible to lockdep. Since it is a per-CPU lock, there should be no lock contention on the actual spinlock. There is a small race-window where we could be migrated to another CPU after the cpu_queue has been obtain. This is not a problem because the actual ressource is protected by the spinlock. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 15d0b47d1914678ce339d13f5e86a5b0616f493c Author: Sebastian Andrzej Siewior Date: Thu Nov 30 13:40:10 2017 +0100 crypto: limit more FPU-enabled sections Those crypto drivers use SSE/AVX/… for their crypto work and in order to do so in kernel they need to enable the "FPU" in kernel mode which disables preemption. There are two problems with the way they are used: - the while loop which processes X bytes may create latency spikes and should be avoided or limited. - the cipher-walk-next part may allocate/free memory and may use kmap_atomic(). The whole kernel_fpu_begin()/end() processing isn't probably that cheap. It most likely makes sense to process as much of those as possible in one go. The new *_fpu_sched_rt() schedules only if a RT task is pending. Probably we should measure the performance those ciphers in pure SW mode and with this optimisations to see if it makes sense to keep them for RT. This kernel_fpu_resched() makes the code more preemptible which might hurt performance. Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit c3c0b6dee70390cbf98d5bb46642457c28002d1a Author: Thomas Gleixner Date: Sat Nov 12 14:00:48 2011 +0100 scsi/fcoe: Make RT aware. Do not disable preemption while taking sleeping locks. All user look safe for migrate_diable() only. Signed-off-by: Thomas Gleixner commit 760c68f816c8c058388aa8ab53cd3942118d37e9 Author: Thomas Gleixner Date: Tue Apr 6 16:51:31 2010 +0200 md: raid5: Make raid5_percpu handling RT aware __raid_run_ops() disables preemption with get_cpu() around the access to the raid5_percpu variables. That causes scheduling while atomic spews on RT. Serialize the access to the percpu data with a lock and keep the code preemptible. Reported-by: Udo van den Heuvel Signed-off-by: Thomas Gleixner Tested-by: Udo van den Heuvel commit c761c4c8f66ed39550f35ee93e86ae79f8e19d7b Author: Mike Galbraith Date: Thu Mar 31 04:08:28 2016 +0200 drivers/block/zram: Replace bit spinlocks with rtmutex for -rt They're nondeterministic, and lead to ___might_sleep() splats in -rt. OTOH, they're a lot less wasteful than an rtmutex per page. Signed-off-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 22524fa5a8da13289482d6fc51580c3bc48f152c Author: Sebastian Andrzej Siewior Date: Tue Jul 14 14:26:34 2015 +0200 block/mq: do not invoke preempt_disable() preempt_disable() and get_cpu() don't play well together with the sleeping locks it tries to allocate later. It seems to be enough to replace it with get_cpu_light() and migrate_disable(). Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit fe7f6c7c796378539f0632923e7a398c9d1ac9cc Author: Priyanka Jain Date: Thu May 17 09:35:11 2012 +0530 net: Remove preemption disabling in netif_rx() 1)enqueue_to_backlog() (called from netif_rx) should be bind to a particluar CPU. This can be achieved by disabling migration. No need to disable preemption 2)Fixes crash "BUG: scheduling while atomic: ksoftirqd" in case of RT. If preemption is disabled, enqueue_to_backog() is called in atomic context. And if backlog exceeds its count, kfree_skb() is called. But in RT, kfree_skb() might gets scheduled out, so it expects non atomic context. -Replace preempt_enable(), preempt_disable() with migrate_enable(), migrate_disable() respectively -Replace get_cpu(), put_cpu() with get_cpu_light(), put_cpu_light() respectively Signed-off-by: Priyanka Jain Signed-off-by: Thomas Gleixner Acked-by: Rajan Srivastava Cc: Link: http://lkml.kernel.org/r/1337227511-2271-1-git-send-email-Priyanka.Jain@freescale.com Signed-off-by: Thomas Gleixner [bigeasy: Remove assumption about migrate_disable() from the description.] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 4df9e6944ccf7ab982b0164c21924992050b25aa Author: Sebastian Andrzej Siewior Date: Wed Mar 30 13:36:29 2016 +0200 net: dev: always take qdisc's busylock in __dev_xmit_skb() The root-lock is dropped before dev_hard_start_xmit() is invoked and after setting the __QDISC___STATE_RUNNING bit. If this task is now pushed away by a task with a higher priority then the task with the higher priority won't be able to submit packets to the NIC directly instead they will be enqueued into the Qdisc. The NIC will remain idle until the task(s) with higher priority leave the CPU and the task with lower priority gets back and finishes the job. If we take always the busylock we ensure that the RT task can boost the low-prio task and submit the packet. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit d1b8c09e832c89b21f3923d0d7aa95cbbea0f357 Author: Sebastian Andrzej Siewior Date: Wed Sep 16 16:15:39 2020 +0200 net: Dequeue in dev_cpu_dead() without the lock Upstream uses skb_dequeue() to acquire lock of `input_pkt_queue'. The reason is to synchronize against a remote CPU which still thinks that the CPU is online enqueues packets to this CPU. There are no guarantees that the packet is enqueued before the callback is run, it just hope. RT however complains about an not initialized lock because it uses another lock for `input_pkt_queue' due to the IRQ-off nature of the context. Use the unlocked dequeue version for `input_pkt_queue'. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 526612cab54c1bb73c64a5fce78b38fd6f5e9b15 Author: Thomas Gleixner Date: Tue Jul 12 15:38:34 2011 +0200 net: Use skbufhead with raw lock Use the rps lock as rawlock so we can keep irq-off regions. It looks low latency. However we can't kfree() from this context therefore we defer this to the softirq and use the tofree_queue list for it (similar to process_queue). Signed-off-by: Thomas Gleixner commit b04eaab51744b038926d322c0a346d03db8954d9 Author: Mike Galbraith Date: Wed Feb 18 16:05:28 2015 +0100 sunrpc: Make svc_xprt_do_enqueue() use get_cpu_light() |BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:915 |in_atomic(): 1, irqs_disabled(): 0, pid: 3194, name: rpc.nfsd |Preemption disabled at:[] svc_xprt_received+0x4b/0xc0 [sunrpc] |CPU: 6 PID: 3194 Comm: rpc.nfsd Not tainted 3.18.7-rt1 #9 |Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.404 11/06/2014 | ffff880409630000 ffff8800d9a33c78 ffffffff815bdeb5 0000000000000002 | 0000000000000000 ffff8800d9a33c98 ffffffff81073c86 ffff880408dd6008 | ffff880408dd6000 ffff8800d9a33cb8 ffffffff815c3d84 ffff88040b3ac000 |Call Trace: | [] dump_stack+0x4f/0x9e | [] __might_sleep+0xe6/0x150 | [] rt_spin_lock+0x24/0x50 | [] svc_xprt_do_enqueue+0x80/0x230 [sunrpc] | [] svc_xprt_received+0x4b/0xc0 [sunrpc] | [] svc_add_new_perm_xprt+0x6d/0x80 [sunrpc] | [] svc_addsock+0x143/0x200 [sunrpc] | [] write_ports+0x28c/0x340 [nfsd] | [] nfsctl_transaction_write+0x4c/0x80 [nfsd] | [] vfs_write+0xb3/0x1d0 | [] SyS_write+0x49/0xb0 | [] system_call_fastpath+0x16/0x1b Signed-off-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 9c7e3496af769312f8ccc18dd5ef26ce4bf470b7 Author: Sebastian Andrzej Siewior Date: Fri Jun 16 19:03:16 2017 +0200 net/core: use local_bh_disable() in netif_rx_ni() In 2004 netif_rx_ni() gained a preempt_disable() section around netif_rx() and its do_softirq() + testing for it. The do_softirq() part is required because netif_rx() raises the softirq but does not invoke it. The preempt_disable() is required to remain on the same CPU which added the skb to the per-CPU list. All this can be avoided be putting this into a local_bh_disable()ed section. The local_bh_enable() part will invoke do_softirq() if required. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit d9560a72d2cfdc29fa0e27d4d7d0f053651d4003 Author: Sebastian Andrzej Siewior Date: Tue Sep 8 16:57:11 2020 +0200 net: Properly annotate the try-lock for the seqlock In patch ("net/Qdisc: use a seqlock instead seqcount") the seqcount has been replaced with a seqlock to allow to reader to boost the preempted writer. The try_write_seqlock() acquired the lock with a try-lock but the seqcount annotation was "lock". Opencode write_seqcount_t_begin() and use the try-lock annotation for lockdep. Reported-by: Mike Galbraith Cc: stable-rt@vger.kernel.org Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 6336bff5d2417cdde27923d0221cd6b696fb4e6a Author: Sebastian Andrzej Siewior Date: Wed Sep 14 17:36:35 2016 +0200 net/Qdisc: use a seqlock instead seqcount The seqcount disables preemption on -RT while it is held which can't remove. Also we don't want the reader to spin for ages if the writer is scheduled out. The seqlock on the other hand will serialize / sleep on the lock while writer is active. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit dfeaa86ff99fad4a9e364f9a63803ff1cc213421 Author: From: Scott Wood Date: Thu Aug 19 21:24:24 2021 +0200 rcutorture: Avoid problematic critical section nesting on PREEMPT_RT rcutorture is generating some nesting scenarios that are not compatible on PREEMPT_RT. For example: preempt_disable(); rcu_read_lock_bh(); preempt_enable(); rcu_read_unlock_bh(); The problem here is that on PREEMPT_RT the bottom halves have to be disabled and enabled in preemptible context. Reorder locking: start with BH locking and continue with then with disabling preemption or interrupts. In the unlocking do it reverse by first enabling interrupts and preemption and BH at the very end. Ensure that on PREEMPT_RT BH locking remains unchanged if in non-preemptible context. Link: https://lkml.kernel.org/r/20190911165729.11178-6-swood@redhat.com Link: https://lkml.kernel.org/r/20210819182035.GF4126399@paulmck-ThinkPad-P17-Gen-1 Signed-off-by: Scott Wood [bigeasy: Drop ATOM_BH, make it only about changing BH in atomic context. Allow enabling RCU in IRQ-off section. Reword commit message.] Signed-off-by: Sebastian Andrzej Siewior Link: https://lkml.kernel.org/r/20210820074236.2zli4nje7bof62rh@linutronix.de commit ce5b77820ef3c19f6eed7136f7ff03bd9d5bc5d0 Author: Sebastian Andrzej Siewior Date: Wed Mar 10 15:09:02 2021 +0100 rcu: Delay RCU-selftests Delay RCU-selftests until ksoftirqd is up and running. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 519707c04fd8c9e6c184f276b6719ad311bf3cc9 Author: Thomas Gleixner Date: Wed Mar 7 21:00:34 2012 +0100 fs: namespace: Use cpu_chill() in trylock loops Retry loops on RT might loop forever when the modifying side was preempted. Use cpu_chill() instead of cpu_relax() to let the system make progress. Signed-off-by: Thomas Gleixner commit 14e2cece3cd72e4796768efb67312cdc812e718c Author: Thomas Gleixner Date: Wed Mar 7 20:51:03 2012 +0100 rt: Introduce cpu_chill() Retry loops on RT might loop forever when the modifying side was preempted. Add cpu_chill() to replace cpu_relax(). cpu_chill() defaults to cpu_relax() for non RT. On RT it puts the looping task to sleep for a tick so the preempted task can make progress. Steven Rostedt changed it to use a hrtimer instead of msleep(): | |Ulrich Obergfell pointed out that cpu_chill() calls msleep() which is woken |up by the ksoftirqd running the TIMER softirq. But as the cpu_chill() is |called from softirq context, it may block the ksoftirqd() from running, in |which case, it may never wake up the msleep() causing the deadlock. + bigeasy later changed to schedule_hrtimeout() |If a task calls cpu_chill() and gets woken up by a regular or spurious |wakeup and has a signal pending, then it exits the sleep loop in |do_nanosleep() and sets up the restart block. If restart->nanosleep.type is |not TI_NONE then this results in accessing a stale user pointer from a |previously interrupted syscall and a copy to user based on the stale |pointer or a BUG() when 'type' is not supported in nanosleep_copyout(). + bigeasy: add PF_NOFREEZE: | [....] Waiting for /dev to be fully populated... | ===================================== | [ BUG: udevd/229 still has locks held! ] | 3.12.11-rt17 #23 Not tainted | ------------------------------------- | 1 lock held by udevd/229: | #0: (&type->i_mutex_dir_key#2){+.+.+.}, at: lookup_slow+0x28/0x98 | | stack backtrace: | CPU: 0 PID: 229 Comm: udevd Not tainted 3.12.11-rt17 #23 | (unwind_backtrace+0x0/0xf8) from (show_stack+0x10/0x14) | (show_stack+0x10/0x14) from (dump_stack+0x74/0xbc) | (dump_stack+0x74/0xbc) from (do_nanosleep+0x120/0x160) | (do_nanosleep+0x120/0x160) from (hrtimer_nanosleep+0x90/0x110) | (hrtimer_nanosleep+0x90/0x110) from (cpu_chill+0x30/0x38) | (cpu_chill+0x30/0x38) from (dentry_kill+0x158/0x1ec) | (dentry_kill+0x158/0x1ec) from (dput+0x74/0x15c) | (dput+0x74/0x15c) from (lookup_real+0x4c/0x50) | (lookup_real+0x4c/0x50) from (__lookup_hash+0x34/0x44) | (__lookup_hash+0x34/0x44) from (lookup_slow+0x38/0x98) | (lookup_slow+0x38/0x98) from (path_lookupat+0x208/0x7fc) | (path_lookupat+0x208/0x7fc) from (filename_lookup+0x20/0x60) | (filename_lookup+0x20/0x60) from (user_path_at_empty+0x50/0x7c) | (user_path_at_empty+0x50/0x7c) from (user_path_at+0x14/0x1c) | (user_path_at+0x14/0x1c) from (vfs_fstatat+0x48/0x94) | (vfs_fstatat+0x48/0x94) from (SyS_stat64+0x14/0x30) | (SyS_stat64+0x14/0x30) from (ret_fast_syscall+0x0/0x48) Signed-off-by: Thomas Gleixner Signed-off-by: Steven Rostedt Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 05bc8016ce483ab5acd64262b246d9760634da47 Author: Sebastian Andrzej Siewior Date: Fri Oct 20 11:29:53 2017 +0200 fs/dcache: disable preemption on i_dir_seq's write side i_dir_seq is an opencoded seqcounter. Based on the code it looks like we could have two writers in parallel despite the fact that the d_lock is held. The problem is that during the write process on RT the preemption is still enabled and if this process is interrupted by a reader with RT priority then we lock up. To avoid that lock up I am disabling the preemption during the update. The rename of i_dir_seq is here to ensure to catch new write sides in future. Cc: stable-rt@vger.kernel.org Reported-by: Oleg.Karfich@wago.com Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit f1ca438e3d0295d59ec8215bda86b8385ded0fb6 Author: Sebastian Andrzej Siewior Date: Wed Sep 14 14:35:49 2016 +0200 fs/dcache: use swait_queue instead of waitqueue __d_lookup_done() invokes wake_up_all() while holding a hlist_bl_lock() which disables preemption. As a workaround convert it to swait. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 489b92d5b72ab0c3725e1600ff4c5e103779d958 Author: Sebastian Andrzej Siewior Date: Thu Aug 29 18:21:04 2013 +0200 ptrace: fix ptrace vs tasklist_lock race As explained by Alexander Fyodorov : |read_lock(&tasklist_lock) in ptrace_stop() is converted to mutex on RT kernel, |and it can remove __TASK_TRACED from task->state (by moving it to |task->saved_state). If parent does wait() on child followed by a sys_ptrace |call, the following race can happen: | |- child sets __TASK_TRACED in ptrace_stop() |- parent does wait() which eventually calls wait_task_stopped() and returns | child's pid |- child blocks on read_lock(&tasklist_lock) in ptrace_stop() and moves | __TASK_TRACED flag to saved_state |- parent calls sys_ptrace, which calls ptrace_check_attach() and wait_task_inactive() The patch is based on his initial patch where an additional check is added in case the __TASK_TRACED moved to ->saved_state. The pi_lock is taken in case the caller is interrupted between looking into ->state and ->saved_state. [ Fix for ptrace_unfreeze_traced() by Oleg Nesterov ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 6a6f78bb3b7f79b260b9182ea2575d53be29e890 Author: Thomas Gleixner Date: Wed Sep 21 19:57:12 2011 +0200 signal: Revert ptrace preempt magic Upstream commit '53da1d9456fe7f8 fix ptrace slowness' is nothing more than a bandaid around the ptrace design trainwreck. It's not a correctness issue, it's merily a cosmetic bandaid. Signed-off-by: Thomas Gleixner commit 2fbdd537cfbfbb105091042d02af2b20e2ba1d26 Author: Thomas Gleixner Date: Sun Jul 25 21:35:46 2021 +0200 mm/memcontrol: Disable on PREEMPT_RT 559271146efc ("mm/memcg: optimize user context object stock access") is a classic example of optimizing for the cpu local BKL serialization without a clear protection scope. Disable MEMCG on RT for now. Signed-off-by: Thomas Gleixner commit d246b2247f236c7057a9b9a238fcfca1e8422562 Author: Thomas Gleixner Date: Fri Jul 3 08:44:34 2009 -0500 mm/scatterlist: Do not disable irqs on RT For -RT it is enough to keep pagefault disabled (which is currently handled by kmap_atomic()). Signed-off-by: Thomas Gleixner commit e9cde32ac89ef784055cab8a9bdba7f3531ee753 Author: Thomas Gleixner Date: Tue Jul 12 11:39:36 2011 +0200 mm/vmalloc: Another preempt disable region which sucks Avoid the preempt disable version of get_cpu_var(). The inner-lock should provide enough serialisation. Signed-off-by: Thomas Gleixner commit e2f0c792a3194efd1f072e3f906ab42993f1ce15 Author: Mike Galbraith Date: Tue Mar 22 11:16:09 2016 +0100 mm/zsmalloc: copy with get_cpu_var() and locking get_cpu_var() disables preemption and triggers a might_sleep() splat later. This is replaced with get_locked_var(). This bitspinlocks are replaced with a proper mutex which requires a slightly larger struct to allocate. Signed-off-by: Mike Galbraith Signed-off-by: Thomas Gleixner [bigeasy: replace the bitspin_lock() with a mutex, get_locked_var(). Mike then fixed the size magic, Mike made handle lock spinlock_t] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit abf25965d386b982366c80934989304d2347e161 Author: Sebastian Andrzej Siewior Date: Mon Aug 17 12:28:10 2020 +0200 u64_stats: Disable preemption on 32bit-UP/SMP with RT during updates On RT the seqcount_t is required even on UP because the softirq can be preempted. The IRQ handler is threaded so it is also preemptible. Disable preemption on 32bit-RT during value updates. There is no need to disable interrupts on RT because the handler is run threaded. Therefore disabling preemption is enough to guarantee that the update is not interruped. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 7b7f3888e286e61e98b3d5d8feacbccfb7c33268 Author: Sebastian Andrzej Siewior Date: Thu Jul 2 14:27:23 2020 +0200 mm: page_alloc: Use migrate_disable() in drain_local_pages_wq() drain_local_pages_wq() disables preemption to avoid CPU migration during CPU hotplug and can't use cpus_read_lock(). Using migrate_disable() works here, too. The scheduler won't take the CPU offline until the task left the migrate-disable section. Use migrate_disable() in drain_local_pages_wq(). Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 36960b5c04e2dc3e15517cb1c8907866b0c0699e Author: Sebastian Andrzej Siewior Date: Wed Sep 8 13:26:36 2021 +0200 irq_work: Also rcuwait for !IRQ_WORK_HARD_IRQ on PREEMPT_RT On PREEMPT_RT most items are processed as LAZY via softirq context. Avoid to spin-wait for them because irq_work_sync() could have higher priority and not allow the irq-work to be completed. Wait additionally for !IRQ_WORK_HARD_IRQ irq_work items on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior commit 2b8a43ec6fb6bf025015fb198a2f9d10a10e929f Author: Sebastian Andrzej Siewior Date: Wed Sep 8 13:23:20 2021 +0200 irq_work: Allow irq_work_sync() to sleep if irq_work() no IRQ support. irq_work() triggers instantly an interrupt if supported by the architecture. Otherwise the work will be processed on the next timer tick. In worst case irq_work_sync() could spin up to a jiffy. irq_work_sync() is usually used in tear down context which is fully preemptible. Based on review irq_work_sync() is invoked from preemptible context and there is one waiter at a time. This qualifies it to use rcuwait for synchronisation. Let irq_work_sync() synchornize with rcuwait if the architecture processes irqwork via the timer tick. Signed-off-by: Sebastian Andrzej Siewior commit 6339e8bc2f88eb24ac345ff3bfb79cec4a089dfa Author: Sebastian Andrzej Siewior Date: Tue Jun 23 15:32:51 2015 +0200 irqwork: push most work into softirq context Initially we defered all irqwork into softirq because we didn't want the latency spikes if perf or another user was busy and delayed the RT task. The NOHZ trigger (nohz_full_kick_work) was the first user that did not work as expected if it did not run in the original irqwork context so we had to bring it back somehow for it. push_irq_work_func is the second one that requires this. This patch adds the IRQ_WORK_HARD_IRQ which makes sure the callback runs in raw-irq context. Everything else is defered into softirq context. Without -RT we have the orignal behavior. This patch incorporates tglx orignal work which revoked a little bringing back the arch_irq_work_raise() if possible and a few fixes from Steven Rostedt and Mike Galbraith, [bigeasy: melt tglx's irq_work_tick_soft() which splits irq_work_tick() into a hard and soft variant] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 1e04bde0ef7d38d0f5983d27e86f064076c44943 Author: Thomas Gleixner Date: Mon Jul 18 13:59:17 2011 +0200 softirq: Disable softirq stacks for RT Disable extra stacks for softirqs. We want to preempt softirqs and having them on special IRQ-stack does not make this easier. Signed-off-by: Thomas Gleixner commit b9369c0865aadf7c6a13d1bb5afd2fbde94b6c93 Author: Thomas Gleixner Date: Sun Nov 13 17:17:09 2011 +0100 softirq: Check preemption after reenabling interrupts raise_softirq_irqoff() disables interrupts and wakes the softirq daemon, but after reenabling interrupts there is no preemption check, so the execution of the softirq thread might be delayed arbitrarily. In principle we could add that check to local_irq_enable/restore, but that's overkill as the rasie_softirq_irqoff() sections are the only ones which show this behaviour. Reported-by: Carsten Emde Signed-off-by: Thomas Gleixner commit 5d15c5e6f8d62e2442fdf263299b1acc80564db8 Author: Mike Galbraith Date: Sun Jan 8 09:32:25 2017 +0100 cpuset: Convert callback_lock to raw_spinlock_t The two commits below add up to a cpuset might_sleep() splat for RT: 8447a0fee974 cpuset: convert callback_mutex to a spinlock 344736f29b35 cpuset: simplify cpuset_node_allowed API BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:995 in_atomic(): 0, irqs_disabled(): 1, pid: 11718, name: cset CPU: 135 PID: 11718 Comm: cset Tainted: G E 4.10.0-rt1-rt #4 Hardware name: Intel Corporation BRICKLAND/BRICKLAND, BIOS BRHSXSD1.86B.0056.R01.1409242327 09/24/2014 Call Trace: ? dump_stack+0x5c/0x81 ? ___might_sleep+0xf4/0x170 ? rt_spin_lock+0x1c/0x50 ? __cpuset_node_allowed+0x66/0xc0 ? ___slab_alloc+0x390/0x570 ? anon_vma_fork+0x8f/0x140 ? copy_page_range+0x6cf/0xb00 ? anon_vma_fork+0x8f/0x140 ? __slab_alloc.isra.74+0x5a/0x81 ? anon_vma_fork+0x8f/0x140 ? kmem_cache_alloc+0x1b5/0x1f0 ? anon_vma_fork+0x8f/0x140 ? copy_process.part.35+0x1670/0x1ee0 ? _do_fork+0xdd/0x3f0 ? _do_fork+0xdd/0x3f0 ? do_syscall_64+0x61/0x170 ? entry_SYSCALL64_slow_path+0x25/0x25 The later ensured that a NUMA box WILL take callback_lock in atomic context by removing the allocator and reclaim path __GFP_HARDWALL usage which prevented such contexts from taking callback_mutex. One option would be to reinstate __GFP_HARDWALL protections for RT, however, as the 8447a0fee974 changelog states: The callback_mutex is only used to synchronize reads/updates of cpusets' flags and cpu/node masks. These operations should always proceed fast so there's no reason why we can't use a spinlock instead of the mutex. Cc: stable-rt@vger.kernel.org Signed-off-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 9e93864c2cfb63d0525c020ffc70fcbbb6b55fe9 Author: Thomas Gleixner Date: Tue Sep 13 16:42:35 2011 +0200 sched: Disable TTWU_QUEUE on RT The queued remote wakeup mechanism can introduce rather large latencies if the number of migrated tasks is high. Disable it for RT. Signed-off-by: Thomas Gleixner commit 16efe185af4bf785d743b2eb440e86e9289270ca Author: Thomas Gleixner Date: Tue Jun 7 09:19:06 2011 +0200 sched: Do not account rcu_preempt_depth on RT in might_sleep() RT changes the rcu_preempt_depth semantics, so we cannot check for it in might_sleep(). Signed-off-by: Thomas Gleixner commit 6fff45c6418d643fe84b3d5f7cec9143ad88bc3b Author: Sebastian Andrzej Siewior Date: Mon Nov 21 19:31:08 2016 +0100 kernel/sched: move stack + kprobe clean up to __put_task_struct() There is no need to free the stack before the task struct (except for reasons mentioned in commit 68f24b08ee89 ("sched/core: Free the stack early if CONFIG_THREAD_INFO_IN_TASK")). This also comes handy on -RT because we can't free memory in preempt disabled region. vfree_atomic() delays the memory cleanup to a worker. Since we move everything to the RCU callback, we can also free it immediately. Cc: stable-rt@vger.kernel.org #for kprobe_flush_task() Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 87abb33bb6e30967dd4a17ded49cc97403257ea4 Author: Thomas Gleixner Date: Mon Jun 6 12:20:33 2011 +0200 sched: Move mmdrop to RCU on RT Takes sleeping locks and calls into the memory allocator, so nothing we want to do in task switch and oder atomic contexts. Signed-off-by: Thomas Gleixner commit ccbf9a21311a5797ce34e250066f7601466d6093 Author: Thomas Gleixner Date: Mon Jun 6 12:12:51 2011 +0200 sched: Limit the number of task migrations per batch Put an upper limit on the number of tasks which are migrated per batch to avoid large latencies. Signed-off-by: Thomas Gleixner commit 56e8a08a143c655c7f9bf7099fa4a391cd599a64 Author: Sebastian Andrzej Siewior Date: Sat May 27 19:02:06 2017 +0200 kernel/sched: add {put|get}_cpu_light() Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 3bc2ff03f8cfd70fa440d02b9072b70c222fb265 Author: Thomas Gleixner Date: Fri Jul 24 12:38:56 2009 +0200 preempt: Provide preempt_*_(no)rt variants RT needs a few preempt_disable/enable points which are not necessary otherwise. Implement variants to avoid #ifdeffery. Signed-off-by: Thomas Gleixner commit 7e17e4f96819c33a5906cc7f19bab5b5c1fa8111 Author: Sebastian Andrzej Siewior Date: Tue Aug 17 09:48:31 2021 +0200 locking: Allow to include asm/spinlock_types.h from linux/spinlock_types_raw.h The printk header file includes ratelimit_types.h for its __ratelimit() based usage. It requires it for the static initializer used in printk_ratelimited(). It uses a raw_spinlock_t and includes the spinlock_types.h. It makes no difference on non PREEMPT-RT builds but PREEMPT-RT replaces the inner part of some locks and therefore includes rtmutex.h and atomic.h which leads to recursive includes where defines are missing. By including only the raw_spinlock_t defines it avoids the atomic.h related includes at this stage. An example on powerpc: | CALL scripts/atomic/check-atomics.sh |In file included from include/linux/bug.h:5, | from include/linux/page-flags.h:10, | from kernel/bounds.c:10: |arch/powerpc/include/asm/page_32.h: In function ‘clear_page’: |arch/powerpc/include/asm/bug.h:87:4: error: implicit declaration of function ‘__WARN’ [-Werror=implicit-function-declaration] | 87 | __WARN(); \ | | ^~~~~~ |arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ON’ | 48 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | | ^~~~~~~ |arch/powerpc/include/asm/bug.h:58:17: error: invalid application of ‘sizeof’ to incomplete type ‘struct bug_entry’ | 58 | "i" (sizeof(struct bug_entry)), \ | | ^~~~~~ |arch/powerpc/include/asm/bug.h:89:3: note: in expansion of macro ‘BUG_ENTRY’ | 89 | BUG_ENTRY(PPC_TLNEI " %4, 0", \ | | ^~~~~~~~~ |arch/powerpc/include/asm/page_32.h:48:2: note: in expansion of macro ‘WARN_ON’ | 48 | WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1)); | | ^~~~~~~ |In file included from arch/powerpc/include/asm/ptrace.h:298, | from arch/powerpc/include/asm/hw_irq.h:12, | from arch/powerpc/include/asm/irqflags.h:12, | from include/linux/irqflags.h:16, | from include/asm-generic/cmpxchg-local.h:6, | from arch/powerpc/include/asm/cmpxchg.h:526, | from arch/powerpc/include/asm/atomic.h:11, | from include/linux/atomic.h:7, | from include/linux/rwbase_rt.h:6, | from include/linux/rwlock_types.h:55, | from include/linux/spinlock_types.h:74, | from include/linux/ratelimit_types.h:7, | from include/linux/printk.h:10, | from include/asm-generic/bug.h:22, | from arch/powerpc/include/asm/bug.h:109, | from include/linux/bug.h:5, | from include/linux/page-flags.h:10, | from kernel/bounds.c:10: |include/linux/thread_info.h: In function ‘copy_overflow’: |include/linux/thread_info.h:210:2: error: implicit declaration of function ‘WARN’ [-Werror=implicit-function-declaration] | 210 | WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count); | | ^~~~ The WARN / BUG include pulls in printk.h and then ptrace.h expects WARN (from bug.h) which is not yet complete. Even hw_irq.h has WARN_ON() statements. On POWERPC64 there are missing atomic64 defines while building 32bit VDSO: | VDSO32C arch/powerpc/kernel/vdso32/vgettimeofday.o |In file included from include/linux/atomic.h:80, | from include/linux/rwbase_rt.h:6, | from include/linux/rwlock_types.h:55, | from include/linux/spinlock_types.h:74, | from include/linux/ratelimit_types.h:7, | from include/linux/printk.h:10, | from include/linux/kernel.h:19, | from arch/powerpc/include/asm/page.h:11, | from arch/powerpc/include/asm/vdso/gettimeofday.h:5, | from include/vdso/datapage.h:137, | from lib/vdso/gettimeofday.c:5, | from : |include/linux/atomic-arch-fallback.h: In function ‘arch_atomic64_inc’: |include/linux/atomic-arch-fallback.h:1447:2: error: implicit declaration of function ‘arch_atomic64_add’; did you mean ‘arch_atomic_add’? [-Werror=impl |icit-function-declaration] | 1447 | arch_atomic64_add(1, v); | | ^~~~~~~~~~~~~~~~~ | | arch_atomic_add The generic fallback is not included, atomics itself are not used. If kernel.h does not include printk.h then it comes later from the bug.h include. Signed-off-by: Sebastian Andrzej Siewior commit 27313e8ae0cbd715b8e61b3cbbfc6599774bb751 Author: Sebastian Andrzej Siewior Date: Thu Aug 12 18:13:39 2021 +0200 lockdep/selftests: Adapt ww-tests for PREEMPT_RT The ww-mutex selftest operates directly on ww_mutex::base and assumes its type is struct mutex. This isn't true on PREEMPT_RT which turns the mutex into a rtmutex. Add a ww_mutex_base_ abstraction which maps to the relevant mutex_ or rt_mutex_ function. Change the CONFIG_DEBUG_MUTEXES ifdef to DEBUG_WW_MUTEXES. The latter is true for the MUTEX and RTMUTEX implementation of WW-MUTEX. The assignment is required in order to pass the tests. Signed-off-by: Sebastian Andrzej Siewior commit f8fd88690baec51139f6034f3bd9dfa4c4783c39 Author: Sebastian Andrzej Siewior Date: Thu Aug 12 16:02:29 2021 +0200 lockdep/selftests: Skip the softirq related tests on PREEMPT_RT The softirq context on PREEMPT_RT is different compared to !PREEMPT_RT. As such lockdep_softirq_enter() is a nop and the all the "softirq safe" tests fail on PREEMPT_RT because there is no difference. Skip the softirq context tests on PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior commit e306efe3115cf3293dbc68953af82e8d2da9ad61 Author: Sebastian Andrzej Siewior Date: Thu Aug 12 14:25:38 2021 +0200 lockdep/selftests: Unbalanced migrate_disable() & rcu_read_lock() The tests with unbalanced lock() + unlock() operation leave a modified preemption counter behind which is then reset to its original value after the test. The spin_lock() function on PREEMPT_RT does not include a preempt_disable() statement but migrate_disable() and read_rcu_lock(). As a consequence both counter never get back to their original value and system explodes later after the selftest. In the double-unlock case on PREEMPT_RT, the migrate_disable() and RCU code will trigger which should be avoided. These counter should not be decremented below their initial value. Save both counters and bring them back to their original value after the test. In the double-unlock case, increment both counter in advance to they become balanced after the double unlock. Signed-off-by: Sebastian Andrzej Siewior commit 28ad1345ff6843e9a13e43c871b206db643dadea Author: Sebastian Andrzej Siewior Date: Thu Aug 12 16:16:54 2021 +0200 lockdep/selftests: Add rtmutex to the last column The last column contains the results for the rtmutex tests. Add it. Signed-off-by: Sebastian Andrzej Siewior commit 842a333640b899e66f52efe4ba32824fa079e1df Author: Thomas Gleixner Date: Sun Jul 17 18:51:23 2011 +0200 lockdep: Make it RT aware There is not really a softirq context on PREEMPT_RT. Softirqs on PREEMPT_RT are always invoked within the context of a threaded interrupt handler or within ksoftirqd. The "in-softirq" context is preemptible and is protected by a per-CPU lock to ensure mutual exclusion. There is no difference on PREEMPT_RT between spin_lock_irq() and spin_lock() because the former does not disable interrupts. Therefore if lock is used in_softirq() and locked once with spin_lock_irq() then lockdep will report this with "inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage". Teach lockdep that we don't really do softirqs on -RT. Signed-off-by: Thomas Gleixner commit cdcac5fbfbb4ffd83b455870016166752ea8179b Author: Sebastian Andrzej Siewior Date: Fri Aug 13 13:49:49 2021 +0200 rtmutex: Add rt_mutex_lock_nest_lock() and rt_mutex_lock_killable(). The locking selftest for ww-mutex expects to operate directly on the base-mutex which becomes a rtmutex on PREEMPT_RT. Add rt_mutex_lock_nest_lock(), follows mutex_lock_nest_lock() for rtmutex. Add rt_mutex_lock_killable(), follows mutex_lock_killable() for rtmutex. Signed-off-by: Sebastian Andrzej Siewior commit 41fb1afa1f525e58f59333b5d574fe5881af00c2 Author: Sebastian Andrzej Siewior Date: Fri Aug 13 12:40:49 2021 +0200 rtmutex: Add a special case for ww-mutex handling. The lockdep selftest for ww-mutex assumes in a few cases the ww_ctx->contending_lock assignment via __ww_mutex_check_kill() which does not happen if the rtmutex detects the deadlock early. The testcase passes if the deadlock handling here is removed. This means that it will work if multiple threads/tasks are involved and not just a single one. Signed-off-by: Sebastian Andrzej Siewior commit 291578e56607f429c06248f71ff16cd0b85f7c85 Author: Sebastian Andrzej Siewior Date: Thu Aug 12 14:40:05 2021 +0200 sched: Trigger warning if ->migration_disabled counter underflows. If migrate_enable() is used more often than its counter part then it remains undetected and rq::nr_pinned will underflow, too. Add a warning if migrate_enable() is attempted if without a matching a migrate_disable(). Signed-off-by: Sebastian Andrzej Siewior commit 8e6a53cbc5f92e758e133c829ae95a17b08e9f2a Author: Sebastian Andrzej Siewior Date: Fri Aug 13 18:26:10 2021 +0200 lockdep/selftests: Avoid using local_lock_{acquire|release}(). The functions local_lock related functions local_lock_acquire() local_lock_release() are part of the internal implementation and should be avoided. Define the lock as DEFINE_PER_CPU so the normal local_lock() function can be used. Signed-off-by: Sebastian Andrzej Siewior commit b3b8baa4871b56411d817785e8145d4462ac8c5c Author: Sebastian Andrzej Siewior Date: Tue Sep 7 12:11:47 2021 +0200 locking: Remove rt_rwlock_is_contended() rt_rwlock_is_contended() has not users. It makes no sense to use it as rwlock_is_contended() because it is a sleeping lock on RT and preemption is possible. It reports always != 0 if used by a writer and even if there is a waiter then the lock might not be handed over if the current owner has the highest priority. Remove rt_rwlock_is_contended(). Reported-by: kernel test robot Signed-off-by: Sebastian Andrzej Siewior commit 61d127d5540ba884277c8699a19a7ca086c14bfb Author: Grygorii Strashko Date: Tue Jul 21 19:43:56 2015 +0300 pid.h: include atomic.h This patch fixes build error: CC kernel/pid_namespace.o In file included from kernel/pid_namespace.c:11:0: include/linux/pid.h: In function 'get_pid': include/linux/pid.h:78:3: error: implicit declaration of function 'atomic_inc' [-Werror=implicit-function-declaration] atomic_inc(&pid->count); ^ which happens when CONFIG_PROVE_LOCKING=n CONFIG_DEBUG_SPINLOCK=n CONFIG_DEBUG_MUTEXES=n CONFIG_DEBUG_LOCK_ALLOC=n CONFIG_PID_NS=y Vanilla gets this via spinlock.h. Signed-off-by: Grygorii Strashko Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit bf79739373ee9bb1dcbe74e85572f21a6d4817cc Author: Sebastian Andrzej Siewior Date: Mon Oct 28 12:19:57 2013 +0100 wait.h: include atomic.h | CC init/main.o |In file included from include/linux/mmzone.h:9:0, | from include/linux/gfp.h:4, | from include/linux/kmod.h:22, | from include/linux/module.h:13, | from init/main.c:15: |include/linux/wait.h: In function ‘wait_on_atomic_t’: |include/linux/wait.h:982:2: error: implicit declaration of function ‘atomic_read’ [-Werror=implicit-function-declaration] | if (atomic_read(val) == 0) | ^ This pops up on ARM. Non-RT gets its atomic.h include from spinlock.h Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit e5431e4a12e896b008f45f7ceffcd36a38deb150 Author: Sebastian Andrzej Siewior Date: Thu Jul 26 15:06:10 2018 +0200 efi: Allow efi=runtime In case the command line option "efi=noruntime" is default at built-time, the user could overwrite its state by `efi=runtime' and allow it again. Acked-by: Ard Biesheuvel Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit af552fc375dbb4d093bf720a781bc23ffd8d3b2b Author: Sebastian Andrzej Siewior Date: Thu Jul 26 15:03:16 2018 +0200 efi: Disable runtime services on RT Based on meassurements the EFI functions get_variable / get_next_variable take up to 2us which looks okay. The functions get_time, set_time take around 10ms. Those 10ms are too much. Even one ms would be too much. Ard mentioned that SetVariable might even trigger larger latencies if the firware will erase flash blocks on NOR. The time-functions are used by efi-rtc and can be triggered during runtimed (either via explicit read/write or ntp sync). The variable write could be used by pstore. These functions can be disabled without much of a loss. The poweroff / reboot hooks may be provided by PSCI. Disable EFI's runtime wrappers. This was observed on "EFI v2.60 by SoftIron Overdrive 1000". Acked-by: Ard Biesheuvel Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit cfc26283567ce2997d30a72e1f999974e5d16c5e Author: Sebastian Andrzej Siewior Date: Sat May 27 19:02:06 2017 +0200 net/core: disable NET_RX_BUSY_POLL on RT napi_busy_loop() disables preemption and performs a NAPI poll. We can't acquire sleeping locks with disabled preemption so we would have to work around this and add explicit locking for synchronisation against ksoftirqd. Without explicit synchronisation a low priority process would "own" the NAPI state (by setting NAPIF_STATE_SCHED) and could be scheduled out (no preempt_disable() and BH is preemptible on RT). In case a network packages arrives then the interrupt handler would set NAPIF_STATE_MISSED and the system would wait until the task owning the NAPI would be scheduled in again. Should a task with RT priority busy poll then it would consume the CPU instead allowing tasks with lower priority to run. The NET_RX_BUSY_POLL is disabled by default (the system wide sysctls for poll/read are set to zero) so disable NET_RX_BUSY_POLL on RT to avoid wrong locking context on RT. Should this feature be considered useful on RT systems then it could be enabled again with proper locking and synchronisation. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit a07f998bb4664b0803048ec8b7144b62771a35a0 Author: Thomas Gleixner Date: Mon Jul 18 17:03:52 2011 +0200 sched: Disable CONFIG_RT_GROUP_SCHED on RT Carsten reported problems when running: taskset 01 chrt -f 1 sleep 1 from within rc.local on a F15 machine. The task stays running and never gets on the run queue because some of the run queues have rt_throttled=1 which does not go away. Works nice from a ssh login shell. Disabling CONFIG_RT_GROUP_SCHED solves that as well. Signed-off-by: Thomas Gleixner commit b8068ff63743fde48c9071bbe3f0c1739ac9a78d Author: Ingo Molnar Date: Fri Jul 3 08:44:03 2009 -0500 mm: Allow only SLUB on RT Memory allocation disables interrupts as part of the allocation and freeing process. For -RT it is important that this section remain short and don't depend on the size of the request or an internal state of the memory allocator. At the beginning the SLAB memory allocator was adopted for RT's needs and it required substantial changes. Later, with the addition of the SLUB memory allocator we adopted this one as well and the changes were smaller. More important, due to the design of the SLUB allocator it performs better and its worst case latency was smaller. In the end only SLUB remained supported. Disable SLAB and SLOB on -RT. Only SLUB is adopted to -RT needs. Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 3ffdb11accf70ab891ba3c6626c4a46730e777bc Author: Thomas Gleixner Date: Sun Jul 24 12:11:43 2011 +0200 kconfig: Disable config options which are not RT compatible Disable stuff which is known to have issues on RT Signed-off-by: Thomas Gleixner commit ff390ab72e450716d1a157f3aa0d432e39a93679 Author: Sebastian Andrzej Siewior Date: Thu Jan 23 14:45:59 2014 +0100 leds: trigger: disable CPU trigger on -RT as it triggers: |CPU: 0 PID: 0 Comm: swapper Not tainted 3.12.8-rt10 #141 |[] (unwind_backtrace+0x0/0xf8) from [] (show_stack+0x1c/0x20) |[] (show_stack+0x1c/0x20) from [] (dump_stack+0x20/0x2c) |[] (dump_stack+0x20/0x2c) from [] (__might_sleep+0x13c/0x170) |[] (__might_sleep+0x13c/0x170) from [] (__rt_spin_lock+0x28/0x38) |[] (__rt_spin_lock+0x28/0x38) from [] (rt_read_lock+0x68/0x7c) |[] (rt_read_lock+0x68/0x7c) from [] (led_trigger_event+0x2c/0x5c) |[] (led_trigger_event+0x2c/0x5c) from [] (ledtrig_cpu+0x54/0x5c) |[] (ledtrig_cpu+0x54/0x5c) from [] (arch_cpu_idle_exit+0x18/0x1c) |[] (arch_cpu_idle_exit+0x18/0x1c) from [] (cpu_startup_entry+0xa8/0x234) |[] (cpu_startup_entry+0xa8/0x234) from [] (rest_init+0xb8/0xe0) |[] (rest_init+0xb8/0xe0) from [] (start_kernel+0x2c4/0x380) Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 895df7962fa1c2f186eb05df3908a46162746624 Author: Thomas Gleixner Date: Wed Jul 8 17:14:48 2015 +0200 jump-label: disable if stop_machine() is used Some architectures are using stop_machine() while switching the opcode which leads to latency spikes. The architectures which use stop_machine() atm: - ARM stop machine - s390 stop machine The architecures which use other sorcery: - MIPS - X86 - powerpc - sparc - arm64 Signed-off-by: Thomas Gleixner [bigeasy: only ARM for now] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit c68dc552afc5a11da24451440c62258610d89b58 Author: Ingo Molnar Date: Fri Jul 3 08:29:57 2009 -0500 genirq: Disable irqpoll on -rt Creates long latencies for no value Signed-off-by: Ingo Molnar Signed-off-by: Thomas Gleixner commit 58ba3ac7ad12a7c848b12b0a2e165afb05a58cb7 Author: Sebastian Andrzej Siewior Date: Tue Aug 31 20:48:02 2021 +0200 mm: Fully initialize invalidate_lock, amend lock class later The function __init_rwsem() is not part of the official API, it just a helper function used by init_rwsem(). Changing the lock's class and name should be done by using lockdep_set_class_and_name() after the has been fully initialized. The overhead of the additional class struct and setting it twice is negligible and it works across all locks. Fully initialize the lock with init_rwsem() and then set the custom class and name for the lock. Fixes: 730633f0b7f95 ("mm: Protect operations adding pages to page cache with invalidate_lock") Link: https://lkml.kernel.org/r/20210901084403.g4fezi23cixemlhh@linutronix.de Signed-off-by: Sebastian Andrzej Siewior commit ffcc385a6484b502ab09cdcbe4900d5b40778edc Author: Josh Cartwright Date: Thu Feb 11 11:54:00 2016 -0600 genirq: update irq_set_irqchip_state documentation On -rt kernels, the use of migrate_disable()/migrate_enable() is sufficient to guarantee a task isn't moved to another CPU. Update the irq_set_irqchip_state() documentation to reflect this. Signed-off-by: Josh Cartwright Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 57216a621b7f6e3155e701cc79ad96566c451fa6 Author: Sebastian Andrzej Siewior Date: Mon Feb 15 18:44:12 2021 +0100 smp: Wake ksoftirqd on PREEMPT_RT instead do_softirq(). The softirq implementation on PREEMPT_RT does not provide do_softirq(). The other user of do_softirq() is replaced with a local_bh_disable() + enable() around the possible raise-softirq invocation. This can not be done here because migration_cpu_stop() is invoked with disabled preemption. Wake the softirq thread on PREEMPT_RT if there are any pending softirqs. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 7f6426d5f8ccd6bf29af15c0440d12ea12ebfb92 Author: Sebastian Andrzej Siewior Date: Thu Jul 1 17:43:16 2021 +0200 samples/kfifo: Rename read_lock/write_lock The variables names read_lock and write_lock can clash with functions used for read/writer locks. Rename read_lock to read_access and write_lock to write_access to avoid a name collision. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20210806152551.qio7c3ho6pexezup@linutronix.de commit b65f4165fa0e5f1400c59e2e04986a96e452b5df Author: Sebastian Andrzej Siewior Date: Mon Oct 12 17:33:54 2020 +0200 tcp: Remove superfluous BH-disable around listening_hash Commit 9652dc2eb9e40 ("tcp: relax listening_hash operations") removed the need to disable bottom half while acquiring listening_hash.lock. There are still two callers left which disable bottom half before the lock is acquired. Drop local_bh_disable() around __inet_hash() which acquires listening_hash->lock, invoke inet_ehash_nolisten() with disabled BH. inet_unhash() conditionally acquires listening_hash->lock. Reported-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/linux-rt-users/12d6f9879a97cd56c09fb53dee343cbb14f7f1f7.camel@gmx.de/ Link: https://lkml.kernel.org/r/X9CheYjuXWc75Spa@hirez.programming.kicks-ass.net commit 9358758d2ae53ef7dd43f39eaa30ec23e7d0e880 Author: Thomas Gleixner Date: Tue Sep 8 07:32:20 2020 +0200 net: Move lockdep where it belongs Signed-off-by: Thomas Gleixner commit 9317f6cdd3cf6ab56e034f704a95352382277933 Author: Sebastian Andrzej Siewior Date: Mon Feb 11 10:40:46 2019 +0100 mm: workingset: replace IRQ-off check with a lockdep assert. Commit 68d48e6a2df57 ("mm: workingset: add vmstat counter for shadow nodes") introduced an IRQ-off check to ensure that a lock is held which also disabled interrupts. This does not work the same way on -RT because none of the locks, that are held, disable interrupts. Replace this check with a lockdep assert which ensures that the lock is held. Cc: Peter Zijlstra Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://https://lkml.kernel.org/r/.kernel.org/linux-mm/20190211113829.sqf6bdi4c4cdd3rp@linutronix.de/ commit fcdcdbff92b2eaba9bda3100823b15d0933e0d96 Author: Sebastian Andrzej Siewior Date: Tue Jul 3 18:19:48 2018 +0200 cgroup: use irqsave in cgroup_rstat_flush_locked() All callers of cgroup_rstat_flush_locked() acquire cgroup_rstat_lock either with spin_lock_irq() or spin_lock_irqsave(). cgroup_rstat_flush_locked() itself acquires cgroup_rstat_cpu_lock which is a raw_spin_lock. This lock is also acquired in cgroup_rstat_updated() in IRQ context and therefore requires _irqsave() locking suffix in cgroup_rstat_flush_locked(). Since there is no difference between spin_lock_t and raw_spin_lock_t on !RT lockdep does not complain here. On RT lockdep complains because the interrupts were not disabled here and a deadlock is possible. Acquire the raw_spin_lock_t with disabled interrupts. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://www.spinics.net/lists/cgroups/msg23051.html commit 54647177c25b739b319951577669fa361440d93a Author: Thomas Gleixner Date: Mon Nov 9 23:32:39 2020 +0100 genirq: Move prio assignment into the newly created thread With enabled threaded interrupts the nouveau driver reported the following: | Chain exists of: | &mm->mmap_lock#2 --> &device->mutex --> &cpuset_rwsem | | Possible unsafe locking scenario: | | CPU0 CPU1 | ---- ---- | lock(&cpuset_rwsem); | lock(&device->mutex); | lock(&cpuset_rwsem); | lock(&mm->mmap_lock#2); The device->mutex is nvkm_device::mutex. Unblocking the lockchain at `cpuset_rwsem' is probably the easiest thing to do. Move the priority assignment to the start of the newly created thread. Fixes: 710da3c8ea7df ("sched/core: Prevent race condition between cpuset and __sched_setscheduler()") Reported-by: Mike Galbraith Signed-off-by: Thomas Gleixner [bigeasy: Patch description] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/a23a826af7c108ea5651e73b8fbae5e653f16e86.camel@gmx.de commit fb2a621e1eab9d8bd52d34a930b05b3e8e532cf1 Author: Sebastian Andrzej Siewior Date: Mon Nov 9 21:30:41 2020 +0100 kthread: Move prio/affinite change into the newly created thread With enabled threaded interrupts the nouveau driver reported the following: | Chain exists of: | &mm->mmap_lock#2 --> &device->mutex --> &cpuset_rwsem | | Possible unsafe locking scenario: | | CPU0 CPU1 | ---- ---- | lock(&cpuset_rwsem); | lock(&device->mutex); | lock(&cpuset_rwsem); | lock(&mm->mmap_lock#2); The device->mutex is nvkm_device::mutex. Unblocking the lockchain at `cpuset_rwsem' is probably the easiest thing to do. Move the priority reset to the start of the newly created thread. Fixes: 710da3c8ea7df ("sched/core: Prevent race condition between cpuset and __sched_setscheduler()") Reported-by: Mike Galbraith Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/a23a826af7c108ea5651e73b8fbae5e653f16e86.camel@gmx.de commit dfda52cac905d4df6c7d013a738c52ec2d81131e Author: Sebastian Andrzej Siewior Date: Mon Aug 30 19:26:27 2021 +0200 kcov: Replace local_irq_save() with a local_lock_t. The kcov code mixes local_irq_save() and spin_lock() in kcov_remote_{start|end}(). This creates a warning on PREEMPT_RT because local_irq_save() disables interrupts and spin_lock_t is turned into a sleeping lock which can not be acquired in a section with disabled interrupts. The kcov_remote_lock is used to synchronize the access to the hash-list kcov_remote_map. The local_irq_save() block protects access to the per-CPU data kcov_percpu_data. There no compelling reason to change the lock type to raw_spin_lock_t to make it work with local_irq_save(). Changing it would require to move memory allocation (in kcov_remote_add()) and deallocation outside of the locked section. Adding an unlimited amount of entries to the hashlist will increase the IRQ-off time during lookup. It could be argued that this is debug code and the latency does not matter. There is however no need to do so and it would allow to use this facility in an RT enabled build. Using a local_lock_t instead of local_irq_save() has the befit of adding a protection scope within the source which makes it obvious what is protected. On a !PREEMPT_RT && !LOCKDEP build the local_lock_irqsave() maps directly to local_irq_save() so there is overhead at runtime. Replace the local_irq_save() section with a local_lock_t. Reported-by: Clark Williams Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210830172627.267989-6-bigeasy@linutronix.de commit 1a1a80568c47b763f81ed6a296bd6b8ecf021a6b Author: Sebastian Andrzej Siewior Date: Mon Aug 30 19:26:26 2021 +0200 kcov: Avoid enable+disable interrupts if !in_task(). kcov_remote_start() may need to allocate memory in the in_task() case (otherwise per-CPU memory has been pre-allocated) and therefore requires enabled interrupts. The interrupts are enabled before checking if the allocation is required so if no allocation is required then the interrupts are needlessly enabled and disabled again. Enable interrupts only if memory allocation is performed. Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210830172627.267989-5-bigeasy@linutronix.de commit f69f4f1a89bf1e383b8f94f2debd90ede72e0b8e Author: Sebastian Andrzej Siewior Date: Mon Aug 30 19:26:25 2021 +0200 kcov: Allocate per-CPU memory on the relevant node. During boot kcov allocates per-CPU memory which is used later if remote/ softirq processing is enabled. Allocate the per-CPU memory on the CPU local node to avoid cross node memory access. Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210830172627.267989-4-bigeasy@linutronix.de commit 8c97d6d03c9d6aca1bbca0a38415adf07560aeb1 Author: Sebastian Andrzej Siewior Date: Mon Aug 30 19:26:24 2021 +0200 Documentation/kcov: Define `ip' in the example. The example code uses the variable `ip' but never declares it. Declare `ip' as a 64bit variable which is the same type as the array from which it loads its value. Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210830172627.267989-3-bigeasy@linutronix.de commit 34b7258d2e9ee9690cb5b8b77236dc468eb72daa Author: Sebastian Andrzej Siewior Date: Mon Aug 30 19:26:23 2021 +0200 Documentation/kcov: Include types.h in the example. The first example code has includes at the top, the following two example share that part. The last example (remote coverage collection) requires the linux/types.h header file due its __aligned_u64 usage. Add the linux/types.h to the top most example and a comment that the header files from above are required as it is done in the second example. Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210830172627.267989-2-bigeasy@linutronix.de commit beee436049f6b403c8450962bea8d12065903974 Author: Sebastian Andrzej Siewior Date: Thu Sep 9 10:15:30 2021 +0200 virt: acrn: Remove unsued acrn_irqfds_mutex. acrn_irqfds_mutex is not used, never was. Remove acrn_irqfds_mutex. Fixes: aa3b483ff1d71 ("virt: acrn: Introduce irqfd") Cc: Fei Li Signed-off-by: Sebastian Andrzej Siewior commit 8ec63772330a54ff8c891738b0509a043c7cdd3c Author: Sebastian Andrzej Siewior Date: Thu Sep 9 12:18:29 2021 +0200 smack: Guard smack_ipv6_lock definition within a SMACK_IPV6_PORT_LABELING block The mutex smack_ipv6_lock is only used with the SMACK_IPV6_PORT_LABELING block but its definition is outside of the block. This leads to a defined-but-not-used warning on PREEMPT_RT. Moving smack_ipv6_lock down to the block where it is used where it used raises the question why is smk_ipv6_port_list read if nothing is added to it. Turns out, only smk_ipv6_port_check() is using it outside of an ifdef SMACK_IPV6_PORT_LABELING block. However two of three caller invoke smk_ipv6_port_check() from a ifdef block and only one is using __is_defined() macro which requires the function and smk_ipv6_port_list to be around. Put the lock and list inside an ifdef SMACK_IPV6_PORT_LABELING block to avoid the warning regarding unused mutex. Extend the ifdef-block to also cover smk_ipv6_port_check(). Make smack_socket_connect() use ifdef instead of __is_defined() to avoid complains about missing function. Cc: Casey Schaufler Cc: James Morris Cc: "Serge E. Hallyn" Signed-off-by: Sebastian Andrzej Siewior commit a802f9a2cbc77b4e25f86a748aea752345981caf Author: Sebastian Andrzej Siewior Date: Thu Sep 9 10:15:30 2021 +0200 ASoC: mediatek: mt8195: Remove unsued irqs_lock. irqs_lock is not used, never was. Remove irqs_lock. Fixes: 283b612429a27 ("ASoC: mediatek: implement mediatek common structure") Cc: Liam Girdwood Cc: Mark Brown Cc: Jaroslav Kysela Cc: Takashi Iwai Cc: Matthias Brugger Signed-off-by: Sebastian Andrzej Siewior commit b0123d5e345e0aaeabdc446174b81cc2f6d279fd Author: Sebastian Andrzej Siewior Date: Fri Sep 3 10:40:01 2021 +0200 lockdep: Let lock_is_held_type() detect recursive read as read lock_is_held_type(, 1) detects acquired read locks. It only recognized locks acquired with lock_acquire_shared(). Read locks acquired with lock_acquire_shared_recursive() are not recognized because a `2' is stored as the read value. Rework the check to additionally recognise lock's read value one and two as a read held lock. Fixes: e918188611f07 ("locking: More accurate annotations for read_lock()") Signed-off-by: Sebastian Andrzej Siewior Acked-by: Waiman Long Acked-by: Boqun Feng Link: https://lkml.kernel.org/r/20210910135312.4axzdxt74rgct2ur@linutronix.de commit 39609ed79d420e0b966e16a1d695733c2d3b9a7f Author: Sebastian Andrzej Siewior Date: Tue Aug 24 22:47:37 2021 +0200 sched: Switch wait_task_inactive to HRTIMER_MODE_REL_HARD With PREEMPT_RT enabled all hrtimers callbacks will be invoked in softirq mode unless they are explicitly marked as HRTIMER_MODE_HARD. During boot kthread_bind() is used for the creation of per-CPU threads and then hangs in wait_task_inactive() if the ksoftirqd is not yet up and running. The hang disappeared since commit 26c7295be0c5e ("kthread: Do not preempt current task if it is going to call schedule()") but enabling function trace on boot reliably leads to the freeze on boot behaviour again. The timer in wait_task_inactive() can not be directly used by an user interface to abuse it and create a mass wake of several tasks at the same time which would to long sections with disabled interrupts. Therefore it is safe to make the timer HRTIMER_MODE_REL_HARD. Switch the timer to HRTIMER_MODE_REL_HARD. Cc: stable-rt@vger.kernel.org Link: https://lkml.kernel.org/r/20210826170408.vm7rlj7odslshwch@linutronix.de Signed-off-by: Sebastian Andrzej Siewior commit cfe937f97bcac13003358c1d598a92e9161ab61f Author: Chao Qin Date: Mon Jul 19 10:26:50 2021 +0800 printk: Enhance the condition check of msleep in pr_flush() There is msleep in pr_flush(). If call WARN() in the early boot stage such as in early_initcall, pr_flush() will run into msleep when process scheduler is not ready yet. And then the system will sleep forever. Before the system_state is SYSTEM_RUNNING, make sure DO NOT sleep in pr_flush(). Fixes: c0b395bd0fe3("printk: add pr_flush()") Signed-off-by: Chao Qin Signed-off-by: Lili Li Signed-off-by: Thomas Gleixner Reviewed-by: John Ogness Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/lkml/20210719022649.3444072-1-chao.qin@intel.com commit bfb1d59baaa122bc08ffb76274e0748665f0e821 Author: John Ogness Date: Mon Nov 30 01:42:10 2020 +0106 printk: add pr_flush() Provide a function to allow waiting for console printers to catch up to the latest logged message. Use pr_flush() to give console printers a chance to finish in critical situations if no atomic console is available. For now pr_flush() is only used in the most common error paths: panic(), print_oops_end_marker(), report_bug(), kmsg_dump(). Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner commit cf0148cba4a562c881e9146408aee66b93a4db67 Author: John Ogness Date: Mon Nov 30 01:42:09 2020 +0106 printk: add console handover If earlyprintk is used, a boot console will print directly to the console immediately. The boot console will unregister itself as soon as a non-boot console registers. However, the non-boot console does not begin printing until its kthread has started. Since this happens much later, there is a long pause in the console output. If the ringbuffer is small, messages could even be dropped during the pause. Add a new CON_HANDOVER console flag to be used internally by printk in order to track which non-boot console took over from a boot console. If handover consoles have implemented write_atomic(), they are allowed to print directly to the console until their kthread can take over. Signed-off-by: John Ogness Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 217819769325ddcc25dc4596c5a784833cd8e4e6 Author: John Ogness Date: Mon Nov 30 01:42:08 2020 +0106 printk: remove deferred printing Since printing occurs either atomically or from the printing kthread, there is no need for any deferring or tracking possible recursion paths. Remove all printk defer functions and context tracking. Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner commit 5944a1794481063003aa51e57915de119006a122 Author: John Ogness Date: Mon Nov 30 01:42:07 2020 +0106 printk: move console printing to kthreads Create a kthread for each console to perform console printing. Now all console printing is fully asynchronous except for the boot console and when the kernel enters sync mode (and there are atomic consoles available). The console_lock() and console_unlock() functions now only do what their name says... locking and unlocking of the console. Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner commit 4eacfd5498ab77c1520dd97a828b4459d45c5fa4 Author: John Ogness Date: Mon Nov 30 01:42:06 2020 +0106 printk: introduce kernel sync mode When the kernel performs an OOPS, enter into "sync mode": - only atomic consoles (write_atomic() callback) will print - printing occurs within vprintk_store() instead of console_unlock() CONSOLE_LOG_MAX is moved to printk.h to support the per-console buffer used in sync mode. Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner commit 6d904cc4885baddb8b5c07957c27ca162d32e092 Author: John Ogness Date: Mon Nov 30 01:42:05 2020 +0106 printk: use seqcount_latch for console_seq In preparation for atomic printing, change @console_seq to use seqcount_latch so that it can be read without requiring @console_sem. Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner commit 9f407831efa41ed98685eb3188912c531d3b7216 Author: John Ogness Date: Mon Nov 30 01:42:04 2020 +0106 printk: call boot_delay_msec() in printk_delay() boot_delay_msec() is always called immediately before printk_delay() so just call it from within printk_delay(). Signed-off-by: John Ogness Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 5317b1a96502d3c18ef9b4af1cd912bff8baff26 Author: John Ogness Date: Mon Nov 30 01:42:03 2020 +0106 printk: relocate printk_delay() Move printk_delay() "as is" further up so that they can be used by new functions in an upcoming commit. Signed-off-by: John Ogness Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit edee00174787bed8c9c33fc5960525985cefa144 Author: John Ogness Date: Mon Nov 30 01:42:02 2020 +0106 serial: 8250: implement write_atomic Implement a non-sleeping NMI-safe write_atomic() console function in order to support emergency console printing. Since interrupts need to be disabled during transmit, all usage of the IER register is wrapped with access functions that use the console_atomic_lock() function to synchronize register access while tracking the state of the interrupts. This is necessary because write_atomic() can be called from an NMI context that has preempted write_atomic(). Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner commit 2b7d3a83c629f936bfc2ccc917f6937ad1df680e Author: John Ogness Date: Fri Mar 19 14:57:31 2021 +0100 kdb: only use atomic consoles for output mirroring Currently kdb uses the @oops_in_progress hack to mirror kdb output to all active consoles from NMI context. Ignoring locks is unsafe. Now that an NMI-safe atomic interfaces is available for consoles, use that interface to mirror kdb output. Signed-off-by: John Ogness Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner commit 27293f7ad5b78d1e96c5abb5f4e72108e6259b59 Author: John Ogness Date: Mon Nov 30 01:42:01 2020 +0106 console: add write_atomic interface Add a write_atomic() callback to the console. This is an optional function for console drivers. The function must be atomic (including NMI safe) for writing to the console. Console drivers must still implement the write() callback. The write_atomic() callback will only be used in special situations, such as when the kernel panics. Creating an NMI safe write_atomic() that must synchronize with write() requires a careful implementation of the console driver. To aid with the implementation, a set of console_atomic_*() functions are provided: void console_atomic_lock(unsigned long flags); void console_atomic_unlock(unsigned long flags); These functions synchronize using the printk cpulock and disable hardware interrupts. kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active) during cpu roundup. This will conflict with the printk cpulock. Therefore, a CPU must ensure that it is not holding the printk cpulock when calling kgdb_cpu_enter(). If it is, it must allow its printk context to complete first. A new helper function kgdb_roundup_delay() is introduced for kgdb to determine if it is holding the printk cpulock. If so, a flag is set so that when the printk cpulock is released, kgdb will be re-triggered for that CPU. Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner commit 8ea42ce2224a0947339a9a18c69b6e38773e5940 Author: John Ogness Date: Thu Jul 15 09:34:45 2021 +0206 printk: rename printk cpulock API and always disable interrupts The printk cpulock functions use local_irq_disable(). This means that hardware interrupts are also disabled on PREEMPT_RT. To make this clear, rename the functions to use the raw_ prefix: raw_printk_cpu_lock_irqsave(flags); raw_printk_cpu_unlock_irqrestore(flags); Also, these functions were a NOP for !CONFIG_SMP. But for !CONFIG_SMP they still need to disable hardware interrupts. So modify them appropriately for this. Signed-off-by: John Ogness Signed-off-by: Thomas Gleixner commit 0aca773c3400a58b32cae611dbb289142c81cb9d Author: Valentin Schneider Date: Wed Aug 11 21:13:54 2021 +0100 arm64: mm: Make arch_faults_on_old_pte() check for migratability arch_faults_on_old_pte() relies on the calling context being non-preemptible. CONFIG_PREEMPT_RT turns the PTE lock into a sleepable spinlock, which doesn't disable preemption once acquired, triggering the warning in arch_faults_on_old_pte(). It does however disable migration, ensuring the task remains on the same CPU during the entirety of the critical section, making the read of cpu_has_hw_af() safe and stable. Make arch_faults_on_old_pte() check migratable() instead of preemptible(). Signed-off-by: Valentin Schneider Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210811201354.1976839-5-valentin.schneider@arm.com commit e29353f5f0d089048dfcb2c7bce66e41bc8d2dc2 Author: Valentin Schneider Date: Wed Aug 11 21:13:53 2021 +0100 rcu/nocb: Protect NOCB state via local_lock() under PREEMPT_RT Warning ======= Running v5.13-rt1 on my arm64 Juno board triggers: [ 0.156302] ============================= [ 0.160416] WARNING: suspicious RCU usage [ 0.164529] 5.13.0-rt1 #20 Not tainted [ 0.168300] ----------------------------- [ 0.172409] kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state! [ 0.179920] [ 0.179920] other info that might help us debug this: [ 0.179920] [ 0.188037] [ 0.188037] rcu_scheduler_active = 1, debug_locks = 1 [ 0.194677] 3 locks held by rcuc/0/11: [ 0.198448] #0: ffff00097ef10cf8 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip (./include/linux/rcupdate.h:662 kernel/softirq.c:171) [ 0.208709] #1: ffff80001205e5f0 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock (kernel/locking/spinlock_rt.c:43 (discriminator 4)) [ 0.217134] #2: ffff80001205e5f0 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip (kernel/softirq.c:169) [ 0.226428] [ 0.226428] stack backtrace: [ 0.230889] CPU: 0 PID: 11 Comm: rcuc/0 Not tainted 5.13.0-rt1 #20 [ 0.237100] Hardware name: ARM Juno development board (r0) (DT) [ 0.243041] Call trace: [ 0.245497] dump_backtrace (arch/arm64/kernel/stacktrace.c:163) [ 0.249185] show_stack (arch/arm64/kernel/stacktrace.c:219) [ 0.252522] dump_stack (lib/dump_stack.c:122) [ 0.255947] lockdep_rcu_suspicious (kernel/locking/lockdep.c:6439) [ 0.260328] rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58) [ 0.264537] rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777) [ 0.267786] rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876) [ 0.271644] smpboot_thread_fn (kernel/smpboot.c:165 (discriminator 3)) [ 0.275767] kthread (kernel/kthread.c:321) [ 0.279013] ret_from_fork (arch/arm64/kernel/entry.S:1005) In this case, this is the RCU core kthread accessing the local CPU's rdp. Before that, rcu_cpu_kthread() invokes local_bh_disable(). Under !CONFIG_PREEMPT_RT (and rcutree.use_softirq=0), this ends up incrementing the preempt_count, which satisfies the "local non-preemptible read" of rcu_rdp_is_offloaded(). Under CONFIG_PREEMPT_RT however, this becomes local_lock(&softirq_ctrl.lock) which, under the same config, is migrate_disable() + rt_spin_lock(). As pointed out by Frederic, this is not sufficient to safely access an rdp's offload state, as the RCU core kthread can be preempted by a kworker executing rcu_nocb_rdp_offload() [1]. Introduce a local_lock to serialize an rdp's offload state while the rdp's associated core kthread is executing rcu_core(). rcu_core() preemptability considerations ======================================== As pointed out by Paul [2], keeping rcu_check_quiescent_state() preemptible (which is the case under CONFIG_PREEMPT_RT) requires some consideration. note_gp_changes() itself runs with irqs off, and enters __note_gp_changes() with rnp->lock held (raw_spinlock), thus is safe vs preemption. rdp->core_needs_qs *could* change after being read by the RCU core kthread if it then gets preempted. Consider, with CONFIG_RCU_STRICT_GRACE_PERIOD: rcuc/x task_foo rcu_check_quiescent_state() `\ rdp->core_needs_qs == true rcu_read_unlock() `\ rcu_preempt_deferred_qs_irqrestore() `\ rcu_report_qs_rdp() `\ rdp->core_needs_qs := false; This would let rcuc/x's rcu_check_quiescent_state() proceed further down to rcu_report_qs_rdp(), but if task_foo's earlier rcu_report_qs_rdp() invocation would have cleared the rdp grpmask from the rnp mask, so rcuc/x's invocation would simply bail. Since rcu_report_qs_rdp() can be safely invoked, even if rdp->core_needs_qs changed, it appears safe to keep rcu_check_quiescent_state() preemptible. [1]: http://lore.kernel.org/r/20210727230814.GC283787@lothringen [2]: http://lore.kernel.org/r/20210729010445.GO4397@paulmck-ThinkPad-P17-Gen-1 Signed-off-by: Valentin Schneider Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210811201354.1976839-4-valentin.schneider@arm.com commit 078166fd8b2f6e4657e95dc5012cd6ce4dbe99c0 Author: Valentin Schneider Date: Wed Aug 11 21:13:52 2021 +0100 sched: Introduce migratable() Some areas use preempt_disable() + preempt_enable() to safely access per-CPU data. The PREEMPT_RT folks have shown this can also be done by keeping preemption enabled and instead disabling migration (and acquiring a sleepable lock, if relevant). Introduce a helper which checks whether the current task can be migrated elsewhere, IOW if it is pinned to its local CPU in the current context. This can help determining if per-CPU properties can be safely accessed. Note that CPU affinity is not checked here, as a preemptible task can have its affinity changed at any given time (including if it has PF_NO_SETAFFINITY, when hotplug gets involved). Signed-off-by: Valentin Schneider [bigeasy: Return false on UP, call it is_migratable().] Signed-off-by: Sebastian Andrzej Siewior Link: https://lore.kernel.org/r/20210811201354.1976839-3-valentin.schneider@arm.com