commit 16459626dc1e749a531f42d66fbb075df0b0f145 Author: Alexandre Frade Date: Sun May 2 15:51:30 2021 +0000 Linux 5.12.1-xanmod1 Signed-off-by: Alexandre Frade commit 905201b9790d6bd3267f9ebb1e0357df1a228a5c Author: Yu Zhao Date: Tue Apr 13 00:56:33 2021 -0600 mm: multigenerational lru: documentation Add Documentation/vm/multigen_lru.rst. Signed-off-by: Yu Zhao commit 768059dd288e5cc63f76f20a0592416f166f37b3 Author: Yu Zhao Date: Tue Apr 13 00:56:32 2021 -0600 mm: multigenerational lru: Kconfig Add configuration options for the multigenerational lru. Signed-off-by: Yu Zhao commit 1edcb632136203a464cd09c9cbf26235995d4688 Author: Yu Zhao Date: Tue Apr 13 00:56:31 2021 -0600 mm: multigenerational lru: user interface Add a sysfs file /sys/kernel/mm/lru_gen/enabled so users can enable and disable the multigenerational lru at runtime. Add a sysfs file /sys/kernel/mm/lru_gen/spread so users can spread pages out across multiple generations. More generations make the background aging more aggressive. Add a debugfs file /sys/kernel/debug/lru_gen so users can monitor the multigenerational lru and trigger the aging and the eviction. This file has the following output: memcg memcg_id memcg_path node node_id min_gen birth_time anon_size file_size ... max_gen birth_time anon_size file_size Given a memcg and a node, "min_gen" is the oldest generation (number) and "max_gen" is the youngest. Birth time is in milliseconds. The sizes of anon and file types are in pages. This file takes the following input: + memcg_id node_id gen [swappiness] - memcg_id node_id gen [swappiness] [nr_to_reclaim] The first command line accounts referenced pages to generation "max_gen" and creates the next generation "max_gen"+1. In this case, "gen" should be equal to "max_gen". A swap file and a non-zero "swappiness" are required to scan anon type. If swapping is not desired, set vm.swappiness to 0. The second command line evicts generations less than or equal to "gen". In this case, "gen" should be less than "max_gen"-1 as "max_gen" and "max_gen"-1 are active generations and therefore protected from the eviction. Use "nr_to_reclaim" to limit the number of pages to be evicted. Multiple command lines are supported, so does concatenation with delimiters "," and ";". Signed-off-by: Yu Zhao commit 83eca56f63529b88453f20abd752e2a6e6b7c9f3 Author: Yu Zhao Date: Tue Apr 13 00:56:30 2021 -0600 mm: multigenerational lru: page reclaim With the aging and the eviction in place, we can build the page reclaim in a straightforward manner: 1) In order to reduce the latency, direct reclaim only invokes the aging when both min_seq[2] reaches max_seq-1; otherwise it invokes the eviction. 2) In order to avoid the aging in the direct reclaim path, kswapd does the background aging more proactively. It invokes the aging when either of min_seq[2] reaches max_seq-1; otherwise it invokes the eviction. And we add another optimization: pages mapped around a referenced PTE may also have been referenced due to the spatial locality. In the reclaim path, if the rmap finds the PTE mapping a page under reclaim referenced, it calls a new function lru_gen_scan_around() to scan the vicinity of the PTE. And if this new function finds others referenced PTEs, it updates the generation number of the pages mapped by those PTEs. Signed-off-by: Yu Zhao commit feb6a642c0de3c5354dfbe50003aa58ea9c31bc9 Author: Yu Zhao Date: Tue Apr 13 00:56:29 2021 -0600 mm: multigenerational lru: eviction The eviction consumes old generations. Given an lruvec, the eviction scans the pages on the per-zone lists indexed by either of min_seq[2]. It first tries to select a type based on the values of min_seq[2]. When anon and file types are both available from the same generation, it selects the one that has a lower refault rate. During a scan, the eviction sorts pages according to their generation numbers, if the aging has found them referenced. It also moves pages from the tiers that have higher refault rates than tier 0 to the next generation. When it finds all the per-zone lists of a selected type are empty, the eviction increments min_seq[2] indexed by this selected type. Signed-off-by: Yu Zhao commit 4988f007fb2bc2c515a796ab6ba4132754945a07 Author: Yu Zhao Date: Tue Apr 13 00:56:28 2021 -0600 mm: multigenerational lru: aging The aging produces young generations. Given an lruvec, the aging walks the mm_struct list associated with this lruvec to scan page tables for referenced pages. Upon finding one, the aging updates the generation number of this page to max_seq. After each round of scan, the aging increments max_seq. The aging is due when both of min_seq[2] reaches max_seq-1, assuming both anon and file types are reclaimable. The aging uses the following optimizations when scanning page tables: 1) It will not scan page tables from processes that have been sleeping since the last scan. 2) It will not scan PTE tables under non-leaf PMD entries that do not have the accessed bit set, when CONFIG_HAVE_ARCH_PARENT_PMD_YOUNG=y. 3) It will not zigzag between the PGD table and the same PMD or PTE table spanning multiple VMAs. In other words, it finishes all the VMAs with the range of the same PMD or PTE table before it returns to the PGD table. This optimizes workloads that have large numbers of tiny VMAs, especially when CONFIG_PGTABLE_LEVELS=5. Signed-off-by: Yu Zhao commit d3cbfd2912628f6128206719685abb59b92b9d14 Author: Yu Zhao Date: Tue Apr 13 00:56:27 2021 -0600 mm: multigenerational lru: mm_struct list In order to scan page tables, we add an infrastructure to maintain either a system-wide mm_struct list or per-memcg mm_struct lists. Multiple threads can concurrently work on the same mm_struct list, and each of them will be given a different mm_struct. This infrastructure also tracks whether an mm_struct is being used on any CPUs or has been used since the last time a worker looked at it. In other words, workers will not be given an mm_struct that belongs to a process that has been sleeping. Signed-off-by: Yu Zhao commit 76eacaafedfa718476d9f10f30d07878b8f7dfd7 Author: Yu Zhao Date: Tue Apr 13 00:56:26 2021 -0600 mm: multigenerational lru: activation For pages accessed multiple times via file descriptors, instead of activating them upon the second accesses, we activate them based on the refault rates of their tiers. Pages accessed N times via file descriptors belong to tier order_base_2(N). Pages from tier 0, i.e., those read ahead, accessed once via file descriptors and accessed only via page tables, are evicted regardless of the refault rate. Pages from other tiers will be moved to the next generation, i.e., activated, if the refault rates of their tiers are higher than that of tier 0. Each generation contains at most MAX_NR_TIERS tiers, and they require additional MAX_NR_TIERS-2 bits in page->flags. This feedback model has a few advantages over the current feedforward model: 1) It has a negligible overhead in the access path because activations are done in the reclaim path. 2) It takes mapped pages into account and avoids overprotecting pages accessed multiple times via file descriptors. 3) More tiers offer better protection to pages accessed more than twice when buffered-I/O-intensive workloads are under memory pressure. For pages mapped upon page faults, the accessed bit is set and they must be properly aged. We add them to the per-zone lists index by max_seq, i.e., the youngest generation. For pages not in page cache or swap cache, this can be done easily in the page fault path: we rename lru_cache_add_inactive_or_unevictable() to lru_cache_add_page_vma() and add a new parameter, which is set to true for pages mapped upon page faults. For pages in page cache or swap cache, we cannot differentiate the page fault path from the read ahead path at the time we call lru_cache_add() in add_to_page_cache_lru() and __read_swap_cache_async(). So we add a new function lru_gen_activation(), which is essentially activate_page(), to move pages to the per-zone lists indexed by max_seq at a later time. Hopefully we would find those pages in lru_pvecs.lru_add and simply set PageActive() on them without having to actually move them. Finally, we need to be compatible with the existing notion of active and inactive. We cannot use PageActive() because it is not set on active pages unless they are isolated, in order to spare the aging the trouble of clearing it when an active generation becomes inactive. A new function page_is_active() compares the generation number of a page with max_seq and max_seq-1 (modulo MAX_NR_GENS), which are considered active and protected from the eviction. Other generations, which may or may not exist, are considered inactive. Signed-off-by: Yu Zhao commit c647191127e433655ca12ab3ee17fe403fb7fa49 Author: Yu Zhao Date: Tue Apr 13 00:56:25 2021 -0600 mm: multigenerational lru: groundwork For each lruvec, evictable pages are divided into multiple generations. The youngest generation number is stored in max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in min_seq[2] separately for anon and file types as clean file pages can be evicted regardless of may_swap or may_writepage. Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into page->flags. The sliding window technique is used to prevent truncated generation numbers from overlapping. Each truncated generation number is an index to lruvec->evictable.lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]. Evictable pages are added to the per-zone lists indexed by max_seq or min_seq[2] (modulo MAX_NR_GENS), depending on whether they are being faulted in. The workflow comprises two conceptually independent functions: the aging and the eviction. The aging produces young generations. Given an lruvec, the aging scans page tables for referenced pages of this lruvec. Upon finding one, the aging updates its generation number to max_seq. After each round of scan, the aging increments max_seq. The aging is due when both of min_seq[2] reaches max_seq-1, assuming both anon and file types are reclaimable. The eviction consumes old generations. Given an lruvec, the eviction scans the pages on the per-zone lists indexed by either of min_seq[2]. It tries to select a type based on the values of min_seq[2] and swappiness. During a scan, the eviction sorts pages according to their generation numbers, if the aging has found them referenced. When it finds all the per-zone lists of a selected type are empty, the eviction increments min_seq[2] indexed by this selected type. Signed-off-by: Yu Zhao commit 4b0b4be60a4e66e35eb422aae2c2f7f823977f32 Author: Yu Zhao Date: Tue Apr 13 00:56:24 2021 -0600 mm/vmscan.c: refactor shrink_node() Heuristics that determine scan balance between anon and file LRUs are rather independent. Move them into a separate function to improve readability. Signed-off-by: Yu Zhao commit e6439b405d3add6424dfb5851771b067855a30ae Author: Yu Zhao Date: Tue Apr 13 00:56:23 2021 -0600 mm, x86: support the access bit on non-leaf PMD entries Some architectures support the accessed bit on non-leaf PMD entries (parents) in addition to leaf PTE entries (children) where pages are mapped, e.g., x86_64 sets the accessed bit on a parent when using it as part of linear-address translation [1]. Page table walkers who are interested in the accessed bit on children can take advantage of this: they do not need to search the children when the accessed bit is not set on a parent, given that they have previously cleared the accessed bit on this parent. [1]: Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3 (October 2019), section 4.8 Signed-off-by: Yu Zhao commit 4b2807724fb39814618f2da8d97707d7cec47dd3 Author: Yu Zhao Date: Tue Apr 13 00:56:22 2021 -0600 mm/swap.c: export activate_page() activate_page() is needed to activate pages that are already on lru or queued in lru_pvecs.lru_add. The exported function is a merger between the existing activate_page() and __lru_cache_activate_page(). Signed-off-by: Yu Zhao commit da3fe2aff99561187b9b9c0bdc3d1e570226eab1 Author: Yu Zhao Date: Tue Apr 13 00:56:21 2021 -0600 include/linux/cgroup.h: export cgroup_mutex cgroup_mutex is needed to synchronize with memcg creations. Signed-off-by: Yu Zhao commit 63ddb341956a2453886a038132665ea8227ec8b2 Author: Yu Zhao Date: Tue Apr 13 00:56:20 2021 -0600 include/linux/huge_mm.h: define is_huge_zero_pmd() if !CONFIG_TRANSPARENT_HUGEPAGE Currently is_huge_zero_pmd() only exists when CONFIG_TRANSPARENT_HUGEPAGE=y. This patch adds the function for !CONFIG_TRANSPARENT_HUGEPAGE. Signed-off-by: Yu Zhao commit a2c442f33ac26022abc442d5e66561d0da613aca Author: Yu Zhao Date: Tue Apr 13 00:56:19 2021 -0600 include/linux/nodemask.h: define next_memory_node() if !CONFIG_NUMA Currently next_memory_node only exists when CONFIG_NUMA=y. This patch adds the macro for !CONFIG_NUMA. Signed-off-by: Yu Zhao commit a63c4ff94e8cf05213565091e2ef57beab262bbf Author: Yu Zhao Date: Tue Apr 13 00:56:18 2021 -0600 include/linux/memcontrol.h: do not warn in page_memcg_rcu() if !CONFIG_MEMCG page_memcg_rcu() warns on !rcu_read_lock_held() regardless of CONFIG_MEMCG. The following code is legit, but it triggers the warning when !CONFIG_MEMCG, since lock_page_memcg() and unlock_page_memcg() are empty for this config. memcg = lock_page_memcg(page1) (rcu_read_lock() if CONFIG_MEMCG=y) do something to page1 if (page_memcg_rcu(page2) == memcg) do something to page2 too as it cannot be migrated away from the memcg either. unlock_page_memcg(page1) (rcu_read_unlock() if CONFIG_MEMCG=y) Locking/unlocking rcu consistently for both configs is rigorous but it also forces unnecessary locking upon users who have no interest in CONFIG_MEMCG. This patch removes the assertion for !CONFIG_MEMCG, because page_memcg_rcu() has a few callers and there are no concerns regarding their correctness at the moment. Signed-off-by: Yu Zhao commit 325bb637fc5665539e715de6a7dcc39f5ecce918 Author: Nick Terrell Date: Mon Apr 26 16:34:03 2021 -0700 MAINTAINERS: Add maintainer entry for zstd Adds a maintainer entry for zstd listing myself as the maintainer for all zstd code, pointing to the upstream issues tracker for bugs, and listing my linux repo as the tree. Signed-off-by: Nick Terrell commit 8a041ad7a493f91c564f4f448d92b7fd39acee72 Merge: 2e46a548aa56 9f29b08688ca Author: Alexandre Frade Date: Sun May 2 14:30:44 2021 +0000 Merge tag 'v5.12.1' into 5.12 This is the 5.12.1 stable release commit 9f29b08688ca35efcffe01b80f55fd2a4edf5796 Author: Greg Kroah-Hartman Date: Sun May 2 11:10:27 2021 +0200 Linux 5.12.1 Tested-by: Fox Chen Tested-by: Linux Kernel Functional Testing Tested-by: Jon Hunter Tested-by: Guenter Roeck Link: https://lore.kernel.org/r/20210430141910.899518186@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman commit 8413faeeb2bf489b3436ff3ea96da8a3f74c289a Author: Tomas Winkler Date: Wed Apr 14 07:52:00 2021 +0300 mei: me: add Alder Lake P device id. commit 0df74278faedf20f9696bf2755cf0ce34afa4c3a upstream. Add Alder Lake P device ID. Cc: Signed-off-by: Tomas Winkler Link: https://lore.kernel.org/r/20210414045200.3498241-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman commit 2e4f97122f3a9df870dfe9671994136448890768 Author: Johannes Berg Date: Tue Apr 27 11:49:52 2021 +0200 cfg80211: fix locking in netlink owner interface destruction commit ea6b2098dd02789f68770fd3d5a373732207be2f upstream. Harald Arnesen reported [1] a deadlock at reboot time, and after he captured a stack trace a picture developed of what's going on: The distribution he's using is using iwd (not wpa_supplicant) to manage wireless. iwd will usually use the "socket owner" option when it creates new interfaces, so that they're automatically destroyed when it quits (unexpectedly or otherwise). This is also done by wpa_supplicant, but it doesn't do it for the normal one, only for additional ones, which is different with iwd. Anyway, during shutdown, iwd quits while the netdev is still UP, i.e. IFF_UP is set. This causes the stack trace that Linus so nicely transcribed from the pictures: cfg80211_destroy_iface_wk() takes wiphy_lock -> cfg80211_destroy_ifaces() ->ieee80211_del_iface ->ieeee80211_if_remove ->cfg80211_unregister_wdev ->unregister_netdevice_queue ->dev_close_many ->__dev_close_many ->raw_notifier_call_chain ->cfg80211_netdev_notifier_call and that last call tries to take wiphy_lock again. In commit a05829a7222e ("cfg80211: avoid holding the RTNL when calling the driver") I had taken into account the possibility of recursing from cfg80211 into cfg80211_netdev_notifier_call() via the network stack, but only for NETDEV_UNREGISTER, not for what happens here, NETDEV_GOING_DOWN and NETDEV_DOWN notifications. Additionally, while this worked still back in commit 78f22b6a3a92 ("cfg80211: allow userspace to take ownership of interfaces"), it missed another corner case: unregistering a netdev will cause dev_close() to be called, and thus stop wireless operations (e.g. disconnecting), but there are some types of virtual interfaces in wifi that don't have a netdev - for that we need an additional call to cfg80211_leave(). So, to fix this mess, change cfg80211_destroy_ifaces() to not require the wiphy_lock(), but instead make it acquire it, but only after it has actually closed all the netdevs on the list, and then call cfg80211_leave() as well before removing them from the driver, to fix the second issue. The locking change in this requires modifying the nl80211 call to not get the wiphy lock passed in, but acquire it by itself after flushing any potentially pending destruction requests. [1] https://lore.kernel.org/r/09464e67-f3de-ac09-28a3-e27b7914ee7d@skogtun.org Cc: stable@vger.kernel.org # 5.12 Reported-by: Harald Arnesen Fixes: 776a39b8196d ("cfg80211: call cfg80211_destroy_ifaces() with wiphy lock held") Fixes: 78f22b6a3a92 ("cfg80211: allow userspace to take ownership of interfaces") Signed-off-by: Johannes Berg Tested-by: Harald Arnesen Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit f935c64a0c87d86730efd6e1e168555460234d04 Author: Jiri Kosina Date: Sat Apr 17 11:13:39 2021 +0200 iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_gen2_enqueue_hcmd() commit e7020bb068d8be50a92f48e36b236a1a1ef9282e upstream. Analogically to what we did in 2800aadc18a6 ("iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_enqueue_hcmd()"), we must apply the same fix to iwl_pcie_gen2_enqueue_hcmd(), as it's being called from exactly the same contexts. Reported-by: Heiner Kallweit Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2104171112390.18270@cbobk.fhfr.pm Signed-off-by: Greg Kroah-Hartman commit ac4ebcbd87195c8781778bcd0f5e1eccce5f4936 Author: Oliver Neukum Date: Wed Apr 21 09:45:13 2021 +0200 USB: CDC-ACM: fix poison/unpoison imbalance commit a8b3b519618f30a87a304c4e120267ce6f8dc68a upstream. suspend() does its poisoning conditionally, resume() does it unconditionally. On a device with combined interfaces this will balance, on a device with two interfaces the counter will go negative and resubmission will fail. Both actions need to be done conditionally. Fixes: 6069e3e927c8f ("USB: cdc-acm: untangle a circular dependency between callback and softint") Signed-off-by: Oliver Neukum Cc: stable Link: https://lore.kernel.org/r/20210421074513.4327-1-oneukum@suse.com Signed-off-by: Greg Kroah-Hartman Signed-off-by: Greg Kroah-Hartman commit 41c44e1f3112d7265dae522c026399b2a42d19ef Author: Johan Hovold Date: Mon Apr 26 10:11:49 2021 +0200 net: hso: fix NULL-deref on disconnect regression commit 2ad5692db72874f02b9ad551d26345437ea4f7f3 upstream. Commit 8a12f8836145 ("net: hso: fix null-ptr-deref during tty device unregistration") fixed the racy minor allocation reported by syzbot, but introduced an unconditional NULL-pointer dereference on every disconnect instead. Specifically, the serial device table must no longer be accessed after the minor has been released by hso_serial_tty_unregister(). Fixes: 8a12f8836145 ("net: hso: fix null-ptr-deref during tty device unregistration") Cc: stable@vger.kernel.org Cc: Anirudh Rayabharam Reported-by: Leonardo Antoniazzi Signed-off-by: Johan Hovold Reviewed-by: Anirudh Rayabharam Signed-off-by: David S. Miller Signed-off-by: Greg Kroah-Hartman