commit 893b3e82b9a1618d7dd0f8e12c67e54f0759ed6e Author: Alexandre Frade Date: Fri Jul 16 15:50:58 2021 +0000 Linux 5.10.47-rt46-xanmod1 Signed-off-by: Alexandre Frade commit e947a6c0dfb563cb425eb198fc15ca240a293598 Merge: 753f4447e2cd dc5610d093cf Author: Alexandre Frade Date: Fri Jul 16 15:49:49 2021 +0000 Merge tag 'v5.10.47-rt46' into 5.10-rt Linux 5.10.47-rt46 commit dc5610d093cf6aa8eb3f79a2964c9b1064693f8d Author: Steven Rostedt (VMware) Date: Fri Jul 16 09:31:55 2021 -0400 Linux 5.10.47-rt46 commit c8b328dc735e7123a09ea9317dac6df631c4dc47 Author: Valentin Schneider Date: Tue Jun 8 00:37:36 2021 -0400 sched: Don't defer CPU pick to migration_cpu_stop() commit 475ea6c60279e9f2ddf7e4cf2648cd8ae0608361 upstream. Will reported that the 'XXX __migrate_task() can fail' in migration_cpu_stop() can happen, and it *is* sort of a big deal. Looking at it some more, one will note there is a glaring hole in the deferred CPU selection: (w/ CONFIG_CPUSET=n, so that the affinity mask passed via taskset doesn't get AND'd with cpu_online_mask) $ taskset -pc 0-2 $PID # offline CPUs 3-4 $ taskset -pc 3-5 $PID `\ $PID may stay on 0-2 due to the cpumask_any_distribute() picking an offline CPU and __migrate_task() refusing to do anything due to cpu_is_allowed(). set_cpus_allowed_ptr() goes to some length to pick a dest_cpu that matches the right constraints vs affinity and the online/active state of the CPUs. Reuse that instead of discarding it in the affine_move_task() case. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Reported-by: Will Deacon Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20210526205751.842360-2-valentin.schneider@arm.com Signed-off-by: Paul Gortmaker Signed-off-by: Steven Rostedt (VMware) commit 8a7730b7a190d406b45b0d0633f915624b877e34 Author: Peter Zijlstra Date: Tue Jun 8 00:37:35 2021 -0400 sched: Simplify set_affinity_pending refcounts commit 50caf9c14b1498c90cf808dbba2ca29bd32ccba4 upstream. Now that we have set_affinity_pending::stop_pending to indicate if a stopper is in progress, and we have the guarantee that if that stopper exists, it will (eventually) complete our @pending we can simplify the refcount scheme by no longer counting the stopper thread. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Valentin Schneider Link: https://lkml.kernel.org/r/20210224131355.724130207@infradead.org Signed-off-by: Paul Gortmaker Signed-off-by: Steven Rostedt (VMware) commit 88a170e7f023e655749d86ad71e4dc629ad797a0 Author: Peter Zijlstra Date: Tue Jun 8 00:37:34 2021 -0400 sched: Fix affine_move_task() self-concurrency commit 9e81889c7648d48dd5fe13f41cbc99f3c362484a upstream. Consider: sched_setaffinity(p, X); sched_setaffinity(p, Y); Then the first will install p->migration_pending = &my_pending; and issue stop_one_cpu_nowait(pending); and the second one will read p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending), the _SAME_ @pending. This causes stopper list corruption. Add set_affinity_pending::stop_pending, to indicate if a stopper is in progress. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Valentin Schneider Link: https://lkml.kernel.org/r/20210224131355.649146419@infradead.org Signed-off-by: Paul Gortmaker Signed-off-by: Steven Rostedt (VMware) commit 4444820fe19c5fbc92fd4b337d8e3dac83c0acc4 Author: Peter Zijlstra Date: Tue Jun 8 00:37:33 2021 -0400 sched: Optimize migration_cpu_stop() commit 3f1bc119cd7fc987c8ed25ffb717f99403bb308c upstream. When the purpose of migration_cpu_stop() is to migrate the task to 'any' valid CPU, don't migrate the task when it's already running on a valid CPU. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Valentin Schneider Link: https://lkml.kernel.org/r/20210224131355.569238629@infradead.org Signed-off-by: Paul Gortmaker Signed-off-by: Steven Rostedt (VMware) commit c1ae80bfd2b384f9b0cfb9d643692f048ba5d4d1 Author: Peter Zijlstra Date: Tue Jun 8 00:37:32 2021 -0400 sched: Collate affine_move_task() stoppers commit 58b1a45086b5f80f2b2842aa7ed0da51a64a302b upstream. The SCA_MIGRATE_ENABLE and task_running() cases are almost identical, collapse them to avoid further duplication. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Valentin Schneider Link: https://lkml.kernel.org/r/20210224131355.500108964@infradead.org Signed-off-by: Paul Gortmaker Signed-off-by: Steven Rostedt (VMware) commit 3f6cdb240508875511af7878766c0a7d26fb868f Author: Peter Zijlstra Date: Tue Jun 8 00:37:31 2021 -0400 sched: Simplify migration_cpu_stop() commit c20cf065d4a619d394d23290093b1002e27dff86 upstream. When affine_move_task() issues a migration_cpu_stop(), the purpose of that function is to complete that @pending, not any random other p->migration_pending that might have gotten installed since. This realization much simplifies migration_cpu_stop() and allows further necessary steps to fix all this as it provides the guarantee that @pending's stopper will complete @pending (and not some random other @pending). Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Valentin Schneider Link: https://lkml.kernel.org/r/20210224131355.430014682@infradead.org Signed-off-by: Paul Gortmaker Signed-off-by: Steven Rostedt (VMware) commit 6ecced9a3aba3f6575589a9553c86b8e62d6f71d Author: Peter Zijlstra Date: Tue Jun 8 00:37:30 2021 -0400 sched: Fix migration_cpu_stop() requeueing commit 8a6edb5257e2a84720fe78cb179eca58ba76126f upstream. When affine_move_task(p) is called on a running task @p, which is not otherwise already changing affinity, we'll first set p->migration_pending and then do: stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg); This then gets us to migration_cpu_stop() running on the CPU that was previously running our victim task @p. If we find that our task is no longer on that runqueue (this can happen because of a concurrent migration due to load-balance etc.), then we'll end up at the: } else if (dest_cpu < 1 || pending) { branch. Which we'll take because we set pending earlier. Here we first check if the task @p has already satisfied the affinity constraints, if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop() onto the CPU that is now hosting our task @p: stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop, &pending->arg, &pending->stop_work); Except, we've never initialized pending->arg, which will be all 0s. This then results in running migration_cpu_stop() on the next CPU with arg->p == NULL, which gives the by now obvious result of fireworks. The cure is to change affine_move_task() to always use pending->arg, furthermore we can use the exact same pattern as the SCA_MIGRATE_ENABLE case, since we'll block on the pending->done completion anyway, no point in adding yet another completion in stop_one_cpu(). This then gives a clear distinction between the two migration_cpu_stop() use cases: - sched_exec() / migrate_task_to() : arg->pending == NULL - affine_move_task() : arg->pending != NULL; And we can have it ignore p->migration_pending when !arg->pending. Any stop work from sched_exec() / migrate_task_to() is in addition to stop works from affine_move_task(), which will be sufficient to issue the completion. Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Valentin Schneider Link: https://lkml.kernel.org/r/20210224131355.357743989@infradead.org Signed-off-by: Paul Gortmaker Signed-off-by: Steven Rostedt (VMware)