commit 86e774370d4dc5f3fc7b1040228b47316e6c1a3c Author: Alexandre Frade Date: Sun Feb 26 22:19:40 2023 +0000 Linux 6.2.1-xanmod1 Signed-off-by: Alexandre Frade commit e4597e620040da202444ad1e66ac2c52a7ad5076 Author: Marcel Holtmann Date: Fri Dec 16 21:12:47 2022 +0100 bluetooth: Fix issue with Actions Semi ATS2851 based devices Their devices claim to support the erroneous data reporting, but don't actually support the required commands. So blacklist them and add a quirk. < HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0 > HCI Event: Command Status (0x0f) plen 4 Read Default Erroneous Data Reporting (0x03|0x005a) ncmd 1 Status: Unknown HCI Command (0x01) T: Bus=02 Lev=02 Prnt=08 Port=02 Cnt=01 Dev#= 10 Spd=12 MxCh= 0 D: Ver= 2.00 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=10d7 ProdID=b012 Rev=88.91 S: Manufacturer=Actions S: Product=general adapter S: SerialNumber=ACTIONS1234 C:* #Ifs= 2 Cfg#= 1 Atr=c0 MxPwr=100mA I:* If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=1ms E: Ad=02(O) Atr=02(Bulk) MxPS= 64 Ivl=0ms E: Ad=82(I) Atr=02(Bulk) MxPS= 64 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=01(O) Atr=01(Isoc) MxPS= 0 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 0 Ivl=1ms I: If#= 1 Alt= 1 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=01(O) Atr=01(Isoc) MxPS= 9 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 9 Ivl=1ms I: If#= 1 Alt= 2 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=01(O) Atr=01(Isoc) MxPS= 17 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 17 Ivl=1ms I: If#= 1 Alt= 3 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=01(O) Atr=01(Isoc) MxPS= 25 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 25 Ivl=1ms I: If#= 1 Alt= 4 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=01(O) Atr=01(Isoc) MxPS= 33 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 33 Ivl=1ms I: If#= 1 Alt= 5 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb E: Ad=01(O) Atr=01(Isoc) MxPS= 49 Ivl=1ms E: Ad=83(I) Atr=01(Isoc) MxPS= 49 Ivl=1ms Signed-off-by: Marcel Holtmann Signed-off-by: Alexandre Frade commit c83cba9ac4cfed25a5d9c281b4238de4f43f8e3e Author: Vishal Rao Date: Mon Dec 19 15:34:03 2022 +0530 ALSA: hda: cs35l41: Fix sound on ASUS Zenbook UM5302TA Signed-off-by: Alexandre Frade commit fa9bdc9473452ddbdc2f3ec459df7924b77b6d59 Author: Alexandre Frade Date: Mon Mar 21 21:37:19 2022 +0000 i2c: busses: Add SMBus capability to work with OpenRGB driver control Signed-off-by: Alexandre Frade commit 1ee75e8d33a016be43ba9b1917ea499bc66697c4 Author: Mark Weiman Date: Sun Aug 12 11:36:21 2018 -0400 pci: Enable overrides for missing ACS capabilities This an updated version of Alex Williamson's patch from: https://lkml.org/lkml/2013/5/30/513 Original commit message follows: PCIe ACS (Access Control Services) is the PCIe 2.0+ feature that allows us to control whether transactions are allowed to be redirected in various subnodes of a PCIe topology. For instance, if two endpoints are below a root port or downsteam switch port, the downstream port may optionally redirect transactions between the devices, bypassing upstream devices. The same can happen internally on multifunction devices. The transaction may never be visible to the upstream devices. One upstream device that we particularly care about is the IOMMU. If a redirection occurs in the topology below the IOMMU, then the IOMMU cannot provide isolation between devices. This is why the PCIe spec encourages topologies to include ACS support. Without it, we have to assume peer-to-peer DMA within a hierarchy can bypass IOMMU isolation. Unfortunately, far too many topologies do not support ACS to make this a steadfast requirement. Even the latest chipsets from Intel are only sporadically supporting ACS. We have trouble getting interconnect vendors to include the PCIe spec required PCIe capability, let alone suggested features. Therefore, we need to add some flexibility. The pcie_acs_override= boot option lets users opt-in specific devices or sets of devices to assume ACS support. The "downstream" option assumes full ACS support on root ports and downstream switch ports. The "multifunction" option assumes the subset of ACS features available on multifunction endpoints and upstream switch ports are supported. The "id:nnnn:nnnn" option enables ACS support on devices matching the provided vendor and device IDs, allowing more strategic ACS overrides. These options may be combined in any order. A maximum of 16 id specific overrides are available. It's suggested to use the most limited set of options necessary to avoid completely disabling ACS across the topology. Note to hardware vendors, we have facilities to permanently quirk specific devices which enforce isolation but not provide an ACS capability. Please contact me to have your devices added and save your customers the hassle of this boot option. Rebased-by: Alexandre Frade Signed-off-by: Mark Weiman Signed-off-by: Alexandre Frade commit efad1f319e244637ed7e0e34408f95a5044cb51d Author: Serge Hallyn Date: Fri May 31 19:12:12 2013 +0100 sysctl: add sysctl to disallow unprivileged CLONE_NEWUSER by default add sysctl to disallow unprivileged CLONE_NEWUSER by default This is a short-term patch. Unprivileged use of CLONE_NEWUSER is certainly an intended feature of user namespaces. However for at least saucy we want to make sure that, if any security issues are found, we have a fail-safe. Signed-off-by: Serge Hallyn [bwh: Remove unneeded binary sysctl bits] [bwh: Keep this sysctl, but change the default to enabled] Signed-off-by: Alexandre Frade commit 7bf420a7460e91f76ea7d0a07afe43ee7dea6a9e Author: Zebediah Figura Date: Wed Sep 28 03:48:52 2022 +0000 winesync: Introduce the winesync driver and character device patchset Signed-off-by: Alexandre Frade commit a52e447efeec44f42ee4f4ca14e590d39c237f65 Author: Christian Brauner Date: Wed Jan 23 21:54:23 2019 +0100 SAUCE: binder: give binder_alloc its own debug mask file Currently both binder.c and binder_alloc.c both register the /sys/module/binder_linux/paramters/debug_mask file which leads to conflicts in sysfs. This commit gives binder_alloc.c its own /sys/module/binder_linux/paramters/alloc_debug_mask file. Signed-off-by: Christian Brauner Signed-off-by: Seth Forshee Signed-off-by: Alexandre Frade commit 21c4f4ace0906770fee48631865bfcbc884ed938 Author: Christian Brauner Date: Wed Jan 16 23:13:25 2019 +0100 SAUCE: binder: turn into module The Android binder driver needs to become a module for the sake of shipping Anbox. To do this we need to export the following functions since binder is currently still using them: - security_binder_set_context_mgr() - security_binder_transaction() - security_binder_transfer_binder() - security_binder_transfer_file() - can_nice() - __wake_up_pollfree() - __close_fd_get_file() - task_work_add() - map_kernel_range_noflush() - get_vm_area() - zap_page_range() - put_ipc_ns() - get_ipc_ns_exported() - show_init_ipc_ns() Rebased-by: Alexandre Frade Signed-off-by: Christian Brauner [ saf: fix additional reference to init_ipc_ns from 5.0-rc6 ] Signed-off-by: Seth Forshee Signed-off-by: Alexandre Frade commit 6b2b337e9841353ad8b990f2309dc8d0673d58e4 Author: Arjan van de Ven Date: Wed May 17 01:52:11 2017 +0000 init: wait for partition and retry scan As Clear Linux boots fast the device is not ready when the mounting code is reached, so a retry device scan will be performed every 0.5 sec for at least 40 sec and synchronize the async task. Signed-off-by: Miguel Bernal Marin Signed-off-by: Alexandre Frade commit bedd1447e580482f756be3850a562e3f80ec4be0 Author: Arjan van de Ven Date: Thu Jun 2 23:36:32 2016 -0500 drivers: initialize ata before graphics ATA init is the long pole in the boot process, and its asynchronous. move the graphics init after it so that ata and graphics initialize in parallel Signed-off-by: Alexandre Frade commit 8cafbad4542f6b2abf047c3de855bdf503af4b67 Author: Arjan van de Ven Date: Sun Feb 18 23:35:41 2018 +0000 locking: rwsem: spin faster tweak rwsem owner spinning a bit Signed-off-by: Alexandre Frade commit 12f23c63afb5876f2a93e01058fda64778077b52 Author: William Douglas Date: Wed Jun 20 17:23:21 2018 +0000 firmware: Enable stateless firmware loading Prefer the order of specific version before generic and /etc before /lib to enable the user to give specific overrides for generic firmware and distribution firmware. Signed-off-by: Alexandre Frade commit 7edc36fd5d673b4cc026ab676bcd3da829feb12a Author: Arjan van de Ven Date: Sun Sep 22 11:12:35 2019 -0300 intel_rapl: Silence rapl trace debug Signed-off-by: Alexandre Frade commit 0d425537ff25c5dbc6b6cdfe76760b6b128d425f Author: Ricardo Neri Date: Thu Aug 25 15:55:28 2022 -0700 sched/fair: Let lower-priority CPUs do active balancing When more than one SMT siblings of a physical core are busy, an idle CPU of lower priority can help. Indicate that the low priority CPU can do active balancing from the high- priority CPU only if they belong to separate cores. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Len Brown Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Len Brown Signed-off-by: Ricardo Neri Signed-off-by: Alexandre Frade commit 3ea0f17de78c5b250668ba2e037da66ff31c449b Author: Ricardo Neri Date: Thu Aug 25 15:55:26 2022 -0700 sched/fair: Simplify asym_packing logic for SMT sched groups When the destination CPU is an SMT sibling and idle, it can only help the busiest group if all of its other SMT siblings are also idle. Otherwise, there is not increase in throughput. It does not matter whether the busiest group has SMT siblings. Simply check if there are any tasks running on the local group before proceeding. Cc: Ben Segall Cc: Daniel Bristot de Oliveira Cc: Dietmar Eggemann Cc: Len Brown Cc: Mel Gorman Cc: Rafael J. Wysocki Cc: Srinivas Pandruvada Cc: Steven Rostedt Cc: Tim C. Chen Cc: Valentin Schneider Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Len Brown Signed-off-by: Ricardo Neri Signed-off-by: Alexandre Frade commit 056aaa60dff6130d128ee68d67a03b5bacd3df65 Author: Arjan van de Ven Date: Thu Dec 13 01:00:49 2018 +0000 sched/wait: Do accept() in LIFO order for cache efficiency Signed-off-by: Alexandre Frade commit 736cf857b8d1609d861fceb12b067d4f7e824116 Author: Arjan van de Ven Date: Sat Dec 8 18:21:32 2018 +0000 x86/vdso: Use lfence instead of rep and nop Signed-off-by: Alexandre Frade commit 9f948bfa2473f51ea1652ffa2bc18acc43f3e00d Author: Felix Fietkau Date: Sat Dec 5 15:07:03 2015 +0100 mac80211: ignore AP power level when tx power type is "fixed" In some cases a user might want to connect to a far away access point, which announces a low tx power limit. Using the AP's power limit can make the connection significantly more unstable or even impossible, and mac80211 currently provides no way to disable this behavior. To fix this, use the currently unused distinction between limited and fixed tx power to decide whether a remote AP's power limit should be accepted. Signed-off-by: Felix Fietkau Signed-off-by: Alexandre Frade commit 7b0fc0919f94bff98a304da465b560958a39cef7 Author: mfreemon@cloudflare.com Date: Tue Mar 1 17:06:02 2022 -0600 tcp: Add a sysctl to skip tcp collapse processing when the receive buffer is full For context and additional information about this patch, see the blog post at https://blog.cloudflare.com/optimizing-tcp-for-high-throughput-and-low-latency/ sysctl: net.ipv4.tcp_collapse_max_bytes If tcp_collapse_max_bytes is non-zero, attempt to collapse the queue to free up memory if the current amount of memory allocated is less than tcp_collapse_max_bytes. Otherwise, the packet is dropped without attempting to collapse the queue. If tcp_collapse_max_bytes is zero, this feature is disabled and the default Linux behavior is used. The default Linux behavior is to always perform the attempt to collapse the queue to free up memory. When the receive queue is small, we want to collapse the queue. There are two reasons for this: (a) the latency of performing the collapse will be small on a small queue, and (b) we want to avoid sending a congestion signal (via a packet drop) to the sender when the receive queue is small. The result is that we avoid latency spikes caused by the time it takes to perform the collapse logic when the receive queue is large and full, while preserving existing behavior and performance for all other cases. Signed-off-by: Alexandre Frade commit 12cb15ca7c2f3dca2ec0ed4495284d0059065044 Author: Alexandre Frade Date: Mon Feb 27 01:38:18 2023 +0000 netfilter: Add netfilter nf_tables fullcone support Signed-off-by: Syrone Wong Signed-off-by: Alexandre Frade commit 21b29cc27da29f7f8221531982e6b3eb3e7c88c6 Author: Konstantin Demin Date: Tue May 17 10:10:40 2022 +0300 net-tcp_bbr: v2: Use correct 64-bit division Signed-off-by: Konstantin Demin Signed-off-by: Alexandre Frade commit 8c3cf4346625a12c18c4bb477b6159bee9b8b351 Author: Adithya Abraham Philip Date: Fri Jun 11 21:56:10 2021 +0000 net-tcp_bbr: v2: Fix missing ECT markings on retransmits for BBRv2 Adds a new flag TCP_ECN_ECT_PERMANENT that is used by CCAs to indicate that retransmitted packets and pure ACKs must have the ECT bit set. This is a necessary fix for BBRv2, which when using ECN expects ECT to be set even on retransmitted packets and ACKs. Currently CCAs like BBRv2 which can use ECN but don't "need" it do not have a way to indicate that ECT should be set on retransmissions/ACKs. Signed-off-by: Adithya Abraham Philip Signed-off-by: Neal Cardwell Signed-off-by: Alexandre Frade commit a5311508d2a0553515a22920895e594476e05d77 Author: Neal Cardwell Date: Mon Dec 28 19:23:09 2020 -0500 net-tcp_bbr: v2: don't assume prior_cwnd was set entering CA_Loss Fix WARN_ON_ONCE() warnings that were firing and pointing to a bbr->prior_cwnd of 0 when exiting CA_Loss and transitioning to CA_Open. The issue was that tcp_simple_retransmit() calls: tcp_set_ca_state(sk, TCP_CA_Loss); without first calling icsk_ca_ops->ssthresh(sk) (because tcp_simple_retransmit() is dealing with losses due to MTU issues and not congestion). The lack of this callback means that BBR did not get a chance to set bbr->prior_cwnd, and thus upon exiting CA_Loss in such cases the WARN_ON_ONCE() would fire due to a zero bbr->prior_cwnd. This commit removes that warning, since a bbr->prior_cwnd of 0 is a valid situation in this state transition. For setting inflight_lo upon entering CA_Loss, to avoid setting an inflight_lo of 0 in this case, this commit switches to taking the max of cwnd and prior_cwnd. We plan to remove that line of code when we switch to cautious (PRR-style) recovery, so that awkwardness will go away. Change-Id: I575dce871c2f20e91e3e9449e1706f42a07b8118 Signed-off-by: Alexandre Frade commit c6138c24469d0182f4b2e9297d1884afcbcbfd4d Author: Neal Cardwell Date: Mon Aug 17 19:10:21 2020 -0400 net-tcp_bbr: v2: remove cycle_rand parameter that is unused in BBRv2 Change-Id: Iee1df7e41e42de199068d7c89131ed3d228327c0 Signed-off-by: Alexandre Frade commit 7596214c3bfde4a8a97f87be957d33c7c184ebb2 Author: Neal Cardwell Date: Mon Aug 17 19:08:41 2020 -0400 net-tcp_bbr: v2: remove field bw_rtts that is unused in BBRv2 Change-Id: I58e3346c707748a6f316f3ed060d2da84c32a79b Signed-off-by: Alexandre Frade commit 1d6ea05952ab5d866e3d431a78a1c97ac309882d Author: Neal Cardwell Date: Thu Nov 21 15:28:01 2019 -0500 net-tcp_bbr: v2: remove unnecessary rs.delivered_ce logic upon loss There is no reason to compute rs.delivered_ce upon loss. In fact, we specifically do not want to compute rs.delivered_ce upon loss. Two issues: (1) This would be the wrong thing to do, in behavior terms. With RACK's dynamic reordering window, losses can be marked long after the sequence hole appears in the ACK/SACK stream. We want to to catch the ECN mark rate rising too high as quickly as possible, which means we want to check for high ECN mark rates at ACK time (as BBRv2 currently does) and not loss marking time. (2) This is dead code. The ECN mark rate cannot be detected as too high because the check needs rs->delivered to be > 0 as well: if (rs->delivered_ce > 0 && rs->delivered > 0 && Since we are not setting rs->delivered upon loss, this check cannot succeed, so setting delivered_ce is pointless. This dead and wrong line was discovered by Randall Stewart at Netflix as he was reading the BBRv2 code. Change-Id: I37f83f418a259ec31d8f82de986db071b364b76a Signed-off-by: Alexandre Frade commit 83a1f03b4caf89531341bd491d460733631d9fff Author: Neal Cardwell Date: Tue Jun 11 12:54:22 2019 -0400 net-tcp_bbr: v2: BBRv2 ("bbr2") congestion control for Linux TCP BBR v2 is an enhacement to the BBR v1 algorithm. It's designed to aim for lower queues, lower loss, and better Reno/CUBIC coexistence than BBR v1. BBR v2 maintains the core of BBR v1: an explicit model of the network path that is two-dimensional, adapting to estimate the (a) maximum available bandwidth and (b) maximum safe volume of data a flow can keep in-flight in the network. It maintains the estimated BDP as a core guide for estimating an appropriate level of in-flight data. BBR v2 makes several key enhancements: o Its bandwidth-probing time scale is adapted, within bounds, to allow improved coexistence with Reno and CUBIC. The bandwidth-probing time scale is (a) extended dynamically based on estimated BDP to improve coexistence with Reno/CUBIC; (b) bounded by an interactive wall-clock time-scale to be more scalable and responsive than Reno and CUBIC. o Rather than being largely agnostic to loss and ECN marks, it explicitly uses loss and (DCTCP-style) ECN signals to maintain its model. o It aims for lower losses than v1 by adjusting its model to attempt to stay within loss rate and ECN mark rate bounds (loss_thresh and ecn_thresh, respectively). o It adapts to loss/ECN signals even when the application is running out of data ("application-limited"), in case the "application-limited" flow is also "network-limited" (the bw and/or inflight available to this flow is lower than previously estimated when the flow ran out of data). o It has a three-part model: the model explicit three tracks operating points, where an operating point is a tuple: (bandwidth, inflight). The three operating points are: o latest: the latest measurement from the current round trip o upper bound: robust, optimistic, long-term upper bound o lower bound: robust, conservative, short-term lower bound These are stored in the following state variables: o latest: bw_latest, inflight_latest o lo: bw_lo, inflight_lo o hi: bw_hi[2], inflight_hi To gain intuition about the meaning of the three operating points, it may help to consider the analogs in CUBIC, which has a somewhat analogous three-part model used by its probing state machine: BBR param CUBIC param ----------- ------------- latest ~ cwnd lo ~ ssthresh hi ~ last_max_cwnd The analogy is only a loose one, though, since the BBR operating points are calculated differently, and are 2-dimensional (bw,inflight) rather than CUBIC's one-dimensional notion of operating point (inflight). o It uses the three-part model to adapt the magnitude of its bandwidth to match the estimated space available in the buffer, rather than (as in BBR v1) assuming that it was always acceptable to place 0.25*BDP in the bottleneck buffer when probing (commodity datacenter switches commonly do not have that much buffer for WAN flows). When BBR v2 estimates it hit a buffer limit during probing, its bandwidth probing then starts gently in case little space is still available in the buffer, and the accelerates, slowly at first and then rapidly if it can grow inflight without seeing congestion signals. In such cases, probing is bounded by inflight_hi + inflight_probe, where inflight_probe grows as: [0, 1, 2, 4, 8, 16,...]. This allows BBR to keep losses low and bounded if a bottleneck remains congested, while rapidly/scalably utilizing free bandwidth when it becomes available. o It has a slightly revised state machine, to achieve the goals above. BBR_BW_PROBE_UP: pushes up inflight to probe for bw/vol BBR_BW_PROBE_DOWN: drain excess inflight from the queue BBR_BW_PROBE_CRUISE: use pipe, w/ headroom in queue/pipe BBR_BW_PROBE_REFILL: try refill the pipe again to 100%, leaving queue empty o The estimated BDP: BBR v2 continues to maintain an estimate of the path's two-way propagation delay, by tracking a windowed min_rtt, and coordinating (on an as-ndeeded basis) to try to expose the two-way propagation delay by draining the bottleneck queue. BBR v2 continues to use its min_rtt and (currently-applicable) bandwidth estimate to estimate the current bandwidth-delay product. The estimated BDP still provides one important guideline for bounding inflight data. However, because any min-filtered RTT and max-filtered bw inherently tend to both overestimate, the estimated BDP is often too high; in this case loss or ECN marks can ensue, in which case BBR v2 adjusts inflight_hi and inflight_lo to adapt its sending rate and inflight down to match the available capacity of the path. o Space: Note that ICSK_CA_PRIV_SIZE increased. This is because BBR v2 requires more space. Note that much of the space is due to support for per-socket parameterization and debugging in this release for research and debugging. With that state removed, the full "struct bbr" is 140 bytes, or 144 with padding. This is an increase of 40 bytes over the existing ca_priv space. o Code: BBR v2 reuses many pieces from BBR v1. But it omits the following significant pieces: o "packet conservation" (bbr_set_cwnd_to_recover_or_restore(), bbr_can_grow_inflight()) o long-term bandwidth estimator ("policer mode") The code layout tries to keep BBR v2 code near the bottom of the file, so that v1-applicable code in the top does not accidentally refer to v2 code. o Docs: See the following docs for more details and diagrams decsribing the BBR v2 algorithm: https://datatracker.ietf.org/meeting/104/materials/slides-104-iccrg-an-update-on-bbr-00 https://datatracker.ietf.org/meeting/102/materials/slides-102-iccrg-an-update-on-bbr-work-at-google-00 o Internal notes: For this upstream rebase, Neal started from: git show fed518041ac6:net/ipv4/tcp_bbr.c > net/ipv4/tcp_bbr.c then removed dev instrumentation (dynamic get/set for parameters) and code that was only used by BBRv1 Effort: net-tcp_bbr Origin-9xx-SHA1: 2c84098e60bed6d67dde23cd7538c51dee273102 Change-Id: I125cf26ba2a7a686f2fa5e87f4c2afceb65f7a05 Signed-off-by: Alexandre Frade commit 9bfe63adb167a423e925c05778ecfa0a80f81e2e Author: Neal Cardwell Date: Sat Nov 16 13:16:25 2019 -0500 net-tcp: add fast_ack_mode=1: skip rwin check in tcp_fast_ack_mode__tcp_ack_snd_check() Add logic for an experimental TCP connection behavior, enabled with tp->fast_ack_mode = 1, which disables checking the receive window before sending an ack in __tcp_ack_snd_check(). If this behavior is enabled, the data receiver sends an ACK if the amount of data is > RCV.MSS. Change-Id: Iaa0a0fd7108221f883137a79d5bfa724f1b096d4 Signed-off-by: Alexandre Frade commit 9cfa6a9a9de7da9eafcea0dfdfa346b40dca8b75 Author: Neal Cardwell Date: Fri Sep 27 17:10:26 2019 -0400 net-tcp: re-generalize TSO sizing in TCP CC module API Reorganize the API for CC modules so that the CC module once again gets complete control of the TSO sizing decision. This is how the API was set up around 2016 and the initial BBRv1 upstreaming. Later Eric Dumazet simplified it. But with wider testing it now seems that to avoid CPU regressions BBR needs to have a different TSO sizing function. This is necessary to handle cases where there are many flows bottlenecked on the sender host's NIC, in which case BBR's pacing rate is much lower than CUBIC/Reno/DCTCP's. Why does this happen? Because BBR's pacing rate adapts to the low bandwidth share each flow sees. By contrast, CUBIC/Reno/DCTCP see no loss or ECN, so they grow a very large cwnd, and thus large pacing rate and large TSO burst size. Change-Id: Ic8ccfdbe4010ee8d4bf6a6334c48a2fceb2171ea Signed-off-by: Alexandre Frade commit b0ed7e48f63072df6a2760a81e01d14eb23d24ae Author: Yousuk Seung Date: Wed May 23 17:55:54 2018 -0700 net-tcp: add new ca opts flag TCP_CONG_WANTS_CE_EVENTS Add a a new ca opts flag TCP_CONG_WANTS_CE_EVENTS that allows a congestion control module to receive CE events. Currently congestion control modules have to set the TCP_CONG_NEEDS_ECN bit in opts flag to receive CE events but this may incur changes in ECN behavior elsewhere. This patch adds a new bit TCP_CONG_WANTS_CE_EVENTS that allows congestion control modules to receive CE events independently of TCP_CONG_NEEDS_ECN. Effort: net-tcp Origin-9xx-SHA1: 9f7e14716cde760bc6c67ef8ef7e1ee48501d95b Change-Id: I2255506985242f376d910c6fd37daabaf4744f24 Signed-off-by: Alexandre Frade commit 0142591f08729f1d3ad9aba7410c4bb6761c8844 Author: Neal Cardwell Date: Tue May 7 22:37:19 2019 -0400 net-tcp_bbr: v2: set tx.in_flight for skbs in repair write queue Syzkaller was able to use TCP_REPAIR to reproduce the new warning added in tcp_fragment(): WARNING: CPU: 0 PID: 118174 at net/ipv4/tcp_output.c:1487 tcp_fragment+0xdcc/0x10a0 net/ipv4/tcp_output.c:1487() inconsistent: tx.in_flight: 0 old_factor: 53 The warning happens because skbs inserted into the tcp_rtx_queue during the repair process go through a sort of "fake send" process, and that process was seting pcount but not tx.in_flight, and thus the warnings (where old_factor is the old pcount). The fix of setting tx.in_flight in the TCP_REPAIR code path seems simple enough, and indeed makes the repro code from syzkaller stop producing warnings. Running through kokonut tests, and will send out for review when all tests pass. Effort: net-tcp_bbr Origin-9xx-SHA1: 330f825a08a6fe92cef74d799cc468864c479f63 Change-Id: I0bc4a790f040fd4239620e1eedd5dc64666c6f05 Signed-off-by: Alexandre Frade commit 5d178d368a172227a5baf1fe7af321640466d8e6 Author: Neal Cardwell Date: Wed May 1 20:16:25 2019 -0400 net-tcp_bbr: v2: adjust skb tx.in_flight upon split in tcp_fragment() When we fragment an skb that has already been sent, we need to update the tx.in_flight for the first skb in the resulting pair ("buff"). Because we were not updating the tx.in_flight, the tx.in_flight value was inconsistent with the pcount of the "buff" skb (tx.in_flight would be too high). That meant that if the "buff" skb was lost, then bbr2_inflight_hi_from_lost_skb() would calculate an inflight_hi value that is too high. This could result in longer queues and higher packet loss. Packetdrill testing verified that without this commit, when the second half of an skb is SACKed and then later the first half of that skb is marked lost, the calculated inflight_hi was incorrect. Effort: net-tcp_bbr Origin-9xx-SHA1: 385f1ddc610798fab2837f9f372857438b25f874 Change-Id: I617f8cab4e9be7a0b8e8d30b047bf8645393354d Signed-off-by: Alexandre Frade commit 416f40a6897827b50807842a6c6405849fc5a71d Author: Neal Cardwell Date: Wed May 1 20:16:33 2019 -0400 net-tcp_bbr: v2: adjust skb tx.in_flight upon merge in tcp_shifted_skb() When tcp_shifted_skb() updates state as adjacent SACKed skbs are coalesced, previously the tx.in_flight was not adjusted, so we could get contradictory state where the skb's recorded pcount was bigger than the tx.in_flight (the number of segments that were in_flight after sending the skb). Normally have a SACKed skb with contradictory pcount/tx.in_flight would not matter. However, with SACK reneging, the SACKed bit is removed, and an skb once again becomes eligible for retransmitting, fragmenting, SACKing, etc. Packetdrill testing verified the following sequence is possible in a kernel that does not have this commit: - skb N is SACKed - skb N+1 is SACKed and combined with skb N using tcp_shifted_skb() - tcp_shifted_skb() will increase the pcount of prev, but leave tx.in_flight as-is - so prev skb can have pcount > tx.in_flight - RTO, tcp_timeout_mark_lost(), detect reneg, remove "SACKed" bit, mark skb N as lost - find pcount of skb N is greater than its tx.in_flight I suspect this issue iw what caused the bbr2_inflight_hi_from_lost_skb(): WARN_ON_ONCE(inflight_prev < 0) to fire in production machines using bbr2. Tested: See last commit in series for sponge link. Effort: net-tcp_bbr Origin-9xx-SHA1: 1a3e997e613d2dcf32b947992882854ebe873715 Change-Id: I1b0b75c27519953430c7db51c6f358f104c7af55 Signed-off-by: Alexandre Frade commit adadd11d247f9811db196c32d73eee2b25f3bcaa Author: Neal Cardwell Date: Tue May 7 22:36:36 2019 -0400 net-tcp_bbr: v2: factor out tx.in_flight setting into tcp_set_tx_in_flight() Factor out the code to set an skb's tx.in_flight field into its own function, so that this code can be used for the TCP_REPAIR "fake send" code path that inserts skbs into the rtx queue without sending them. This is in preparation for the following patch, which fixes an issue with TCP_REPAIR and tx.in_flight. Tested: See last patch in series for sponge link. Effort: net-tcp_bbr Origin-9xx-SHA1: e880fc907d06ea7354333f60f712748ebce9497b Change-Id: I4fbd4a6e18a51ab06d50ab1c9ad820ce5bea89af Signed-off-by: Alexandre Frade commit f7a3b26daf5cdf09f8d656e54e6526f65190d412 Author: Neal Cardwell Date: Tue Aug 7 21:52:06 2018 -0400 net-tcp_bbr: v2: introduce ca_ops->skb_marked_lost() CC module callback API For connections experiencing reordering, RACK can mark packets lost long after we receive the SACKs/ACKs hinting that the packets were actually lost. This means that CC modules cannot easily learn the volume of inflight data at which packet loss happens by looking at the current inflight or even the packets in flight when the most recently SACKed packet was sent. To learn this, CC modules need to know how many packets were in flight at the time lost packets were sent. This new callback, combined with TCP_SKB_CB(skb)->tx.in_flight, allows them to learn this. This also provides a consistent callback that is invoked whether packets are marked lost upon ACK processing, using the RACK reordering timer, or at RTO time. Effort: net-tcp_bbr Origin-9xx-SHA1: afcbebe3374e4632ac6714d39e4dc8a8455956f4 Change-Id: I54826ab53df636be537e5d3c618a46145d12d51a Signed-off-by: Alexandre Frade commit def2c1054a8995b7ed4fb964403efc2f2418fdaf Author: Neal Cardwell Date: Mon Nov 19 13:48:36 2018 -0500 net-tcp_bbr: v2: export FLAG_ECE in rate_sample.is_ece For understanding the relationship between inflight and ECN signals, to try to find the highest inflight value that has acceptable levels ECN marking. Effort: net-tcp_bbr Origin-9xx-SHA1: 3eba998f2898541406c2666781182200934965a8 Change-Id: I3a964e04cee83e11649a54507043d2dfe769a3b3 Signed-off-by: Alexandre Frade commit 873c221e187cb52fe5d9aa277c2d424851c430cc Author: Neal Cardwell Date: Thu Oct 12 23:44:27 2017 -0400 net-tcp_bbr: v2: count packets lost over TCP rate sampling interval For understanding the relationship between inflight and packet loss signals, to try to find the highest inflight value that has acceptable levels of packet losses. Effort: net-tcp_bbr Origin-9xx-SHA1: 4527e26b2bd7756a88b5b9ef1ada3da33dd609ab Change-Id: I594c2500868d9c530770e7ddd68ffc87c57f4fd5 Signed-off-by: Alexandre Frade commit 548f2b1bc04dc84f0dcd5489da184d0e3f3dbcc9 Author: Neal Cardwell Date: Sat Aug 5 11:49:50 2017 -0400 net-tcp_bbr: v2: snapshot packets in flight at transmit time and pass in rate_sample For understanding the relationship between inflight and losses or ECN signals, to try to find the highest inflight value that has acceptable levels of loss/ECN marking. Effort: net-tcp_bbr Origin-9xx-SHA1: b3eb4f2d20efab4ca001f32c9294739036c493ea Change-Id: I7314047d0ff14dd261a04b1969a46dc658c8836a Signed-off-by: Alexandre Frade commit 56af2a6458570f608dd6260dd898bdfea94d1bb9 Author: Neal Cardwell Date: Sun Jun 24 21:55:59 2018 -0400 net-tcp_bbr: v2: shrink delivered_mstamp, first_tx_mstamp to u32 to free up 8 bytes Free up some space for tracking inflight and losses for each bw sample, in upcoming commits. These timestamps are in microseconds, and are now stored in 32 bits. So they can only hold time intervals up to roughly 2^12 = 4096 seconds. But Linux TCP RTT and RTO tracking has the same 32-bit microsecond implementation approach and resulting deployment limitations. So this is not introducing a new limit. And these should not be a limitation for the foreseeable future. Effort: net-tcp_bbr Origin-9xx-SHA1: 238a7e6b5d51625fef1ce7769826a7b21b02ae55 Change-Id: I3b779603797263b52a61ad57c565eb91fe42680c Signed-off-by: Alexandre Frade commit 1362a985bec89292eb8acc615b001ef34b947daa Author: Neal Cardwell Date: Tue Jun 11 12:26:55 2019 -0400 net-tcp_bbr: broaden app-limited rate sample detection This commit is a bug fix for the Linux TCP app-limited (application-limited) logic that is used for collecting rate (bandwidth) samples. Previously the app-limited logic only looked for "bubbles" of silence in between application writes, by checking at the start of each sendmsg. But "bubbles" of silence can also happen before retransmits: e.g. bubbles can happen between an application write and a retransmit, or between two retransmits. Retransmits are triggered by ACKs or timers. So this commit checks for bubbles of app-limited silence upon ACKs or timers. Why does this commit check for app-limited state at the start of ACKs and timer handling? Because at that point we know whether inflight was fully using the cwnd. During processing the ACK or timer event we often change the cwnd; after changing the cwnd we can't know whether inflight was fully using the old cwnd. Origin-9xx-SHA1: 3fe9b53291e018407780fb8c356adb5666722cbc Change-Id: I37221506f5166877c2b110753d39bb0757985e68 Signed-off-by: Alexandre Frade commit 90d14c41d822e9d1c50191cc32e00966fbf1e654 Author: André Almeida Date: Mon Oct 25 09:49:42 2021 -0300 futex: Add entry point for FUTEX_WAIT_MULTIPLE (opcode 31) Add an option to wait on multiple futexes using the old interface, that uses opcode 31 through futex() syscall. Do that by just translation the old interface to use the new code. This allows old and stable versions of Proton to still use fsync in new kernel releases. Signed-off-by: André Almeida Signed-off-by: Alexandre Frade commit 735d4e3e966512c2af43a283473096c104c80c85 Author: Alexandre Frade Date: Sat Feb 25 19:58:52 2023 +0000 XANMOD: Makefile: Move ARM and x86 instruction set selection to kernel-wide build Signed-off-by: Alexandre Frade commit f1676869cc999d93055ecd743280ba7b74bb447f Author: graysky Date: Thu Jan 5 14:29:37 2023 -0500 x86/kconfig: more uarches for kernel 5.17+ FEATURES This patch adds additional CPU options to the Linux kernel accessible under: Processor type and features ---> Processor family ---> With the release of gcc 11.1 and clang 12.0, several generic 64-bit levels are offered which are good for supported Intel or AMD CPUs: • x86-64-v2 • x86-64-v3 • x86-64-v4 Users of glibc 2.33 and above can see which level is supported by current hardware by running: /lib/ld-linux-x86-64.so.2 --help | grep supported Alternatively, compare the flags from /proc/cpuinfo to this list.[1] CPU-specific microarchitectures include: • AMD Improved K8-family • AMD K10-family • AMD Family 10h (Barcelona) • AMD Family 14h (Bobcat) • AMD Family 16h (Jaguar) • AMD Family 15h (Bulldozer) • AMD Family 15h (Piledriver) • AMD Family 15h (Steamroller) • AMD Family 15h (Excavator) • AMD Family 17h (Zen) • AMD Family 17h (Zen 2) • AMD Family 19h (Zen 3)† • AMD Family 19h (Zen 4)§ • Intel Silvermont low-power processors • Intel Goldmont low-power processors (Apollo Lake and Denverton) • Intel Goldmont Plus low-power processors (Gemini Lake) • Intel 1st Gen Core i3/i5/i7 (Nehalem) • Intel 1.5 Gen Core i3/i5/i7 (Westmere) • Intel 2nd Gen Core i3/i5/i7 (Sandybridge) • Intel 3rd Gen Core i3/i5/i7 (Ivybridge) • Intel 4th Gen Core i3/i5/i7 (Haswell) • Intel 5th Gen Core i3/i5/i7 (Broadwell) • Intel 6th Gen Core i3/i5/i7 (Skylake) • Intel 6th Gen Core i7/i9 (Skylake X) • Intel 8th Gen Core i3/i5/i7 (Cannon Lake) • Intel 10th Gen Core i7/i9 (Ice Lake) • Intel Xeon (Cascade Lake) • Intel Xeon (Cooper Lake)* • Intel 3rd Gen 10nm++ i3/i5/i7/i9-family (Tiger Lake)* • Intel 4th Gen 10nm++ Xeon (Sapphire Rapids)‡ • Intel 11th Gen i3/i5/i7/i9-family (Rocket Lake)‡ • Intel 12th Gen i3/i5/i7/i9-family (Alder Lake)‡ • Intel 13th Gen i3/i5/i7/i9-family (Raptor Lake)§ • Intel 14th Gen i3/i5/i7/i9-family (Meteor Lake)§ • Intel 5th Gen 10nm++ Xeon (Emerald Rapids)§ Notes: If not otherwise noted, gcc >=9.1 is required for support. *Requires gcc >=10.1 or clang >=10.0 †Required gcc >=10.3 or clang >=12.0 ‡Required gcc >=11.1 or clang >=12.0 §Required gcc >=13.0 or clang >=15.0.5 It also offers to compile passing the 'native' option which, "selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine. Using -march=native enables all instruction subsets supported by the local machine and will produce code optimized for the local machine under the constraints of the selected instruction set."[2] Users of Intel CPUs should select the 'Intel-Native' option and users of AMD CPUs should select the 'AMD-Native' option. MINOR NOTES RELATING TO INTEL ATOM PROCESSORS This patch also changes -march=atom to -march=bonnell in accordance with the gcc v4.9 changes. Upstream is using the deprecated -match=atom flags when I believe it should use the newer -march=bonnell flag for atom processors.[3] It is not recommended to compile on Atom-CPUs with the 'native' option.[4] The recommendation is to use the 'atom' option instead. BENEFITS Small but real speed increases are measurable using a make endpoint comparing a generic kernel to one built with one of the respective microarchs. See the following experimental evidence supporting this statement: https://github.com/graysky2/kernel_gcc_patch REQUIREMENTS linux version 5.17+ gcc version >=9.0 or clang version >=9.0 ACKNOWLEDGMENTS This patch builds on the seminal work by Jeroen.[5] REFERENCES 1. https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9 2. https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-x86-Options 3. https://bugzilla.kernel.org/show_bug.cgi?id=77461 4. https://github.com/graysky2/kernel_gcc_patch/issues/15 5. http://www.linuxforge.net/docs/linux/linux-gcc.php Signed-off-by: Alexandre Frade commit fdc97418f2bf4d03fb661670d2db8ddebe442aed Author: Alexandre Frade Date: Mon Oct 3 21:09:27 2022 +0000 XANMOD: scripts/setlocalversion: Move localversion* files to the end Signed-off-by: Alexandre Frade commit 9c2f0c8694fa13c1ad47a14ef6de5ab4b9a89a39 Author: Alexandre Frade Date: Sun May 29 00:57:40 2022 +0000 XANMOD: scripts/setlocalversion: remove "+" tag for git repo short version Signed-off-by: Alexandre Frade commit bc2ed0915d0073411c7bd59aeca86bcf260788d1 Author: Alexandre Frade Date: Mon Aug 29 16:47:26 2022 +0000 XANMOD: Makefile: Disable GCC vectorization on trees Signed-off-by: Alexandre Frade commit b9ce9d9f8aa8f5ae23b009c17b119f1fdf67ed64 Author: Alexandre Frade Date: Thu Jun 25 16:40:43 2020 -0300 XANMOD: lib/kconfig.debug: disable default CONFIG_SYMBOLIC_ERRNAME and CONFIG_DEBUG_BUGVERBOSE Signed-off-by: Alexandre Frade Signed-off-by: Alexandre Frade commit 3595067f889533d31d28712577edc599d5eea85d Author: Alexandre Frade Date: Tue Mar 31 13:32:08 2020 -0300 XANMOD: cpufreq: tunes ondemand and conservative governor for performance Signed-off-by: Alexandre Frade Signed-off-by: Alexandre Frade commit cf3d258b7a05cd8edd46a749273fa68d58ca4e80 Author: Alexandre Frade Date: Wed Jun 15 17:07:29 2022 +0000 XANMOD: sched/autogroup: Add kernel parameter and config option to enable/disable autogroup feature by default Signed-off-by: Alexandre Frade commit 28e0fce796e5d250befc4ee4a8add1d579d457c4 Author: Alexandre Frade Date: Mon Jan 29 17:31:25 2018 +0000 XANMOD: mm/vmscan: vm_swappiness = 30 decreases the amount of swapping Signed-off-by: Alexandre Frade Signed-off-by: Alexandre Frade commit 7d5305ff1b5a759465d1e459610216742b4725d1 Author: Alexandre Frade Date: Mon Jan 29 16:59:22 2018 +0000 XANMOD: dcache: cache_pressure = 50 decreases the rate at which VFS caches are reclaimed Signed-off-by: Alexandre Frade Signed-off-by: Alexandre Frade commit f04d9d31e4b8fe3d0a695ff0eaeae12f39bd0607 Author: Alexandre Frade Date: Mon Jan 29 17:26:15 2018 +0000 XANMOD: kconfig: add 500Hz timer interrupt kernel config option Signed-off-by: Alexandre Frade Signed-off-by: Alexandre Frade commit 28bc875829de08ff95168cb1f4db0db9ff0838ed Author: Alexandre Frade Date: Mon Dec 14 16:24:26 2020 +0000 XANMOD: block: set rq_affinity to force full multithreading I/O requests Signed-off-by: Alexandre Frade commit 3e05b956ec4b037a84c7586dc1a9d12a1545072f Author: Alexandre Frade Date: Thu Jan 6 16:59:01 2022 +0000 XANMOD: block/mq-deadline: Disable front_merges by default Signed-off-by: Alexandre Frade commit b3eb15eae514ed4fa7d7338e7b8cba6dcc069163 Author: Alexandre Frade Date: Wed May 11 18:56:51 2022 +0000 XANMOD: block/mq-deadline: Increase write priority to improve responsiveness Signed-off-by: Alexandre Frade commit 2b64a0dcf6440d7d04a3bee233dc0e1f05a4d99c Author: Alexandre Frade Date: Wed Oct 5 03:46:34 2022 +0000 XANMOD: rcu: Change sched_setscheduler_nocheck() calls to SCHED_RR policy Signed-off-by: Alexandre Frade commit cf154da9f458c1598822c87fa4aec70a410292b3 Author: Alexandre Frade Date: Thu Sep 15 02:45:40 2022 +0000 XANMOD: sched/core: Add yield_type sysctl to reduce or disable sched_yield [1] https://github.com/hamadmarri/cacule-cpu-scheduler/issues/35 Signed-off-by: Alexandre Frade commit 9f2412981dd4936298b0de95d8dfb2f28d99430b Author: Alexandre Frade Date: Wed Feb 1 10:17:47 2023 +0000 XANMOD: fair: Remove all energy efficiency functions Signed-off-by: Alexandre Frade commit d00c6d10c5985aad5474e54c50199e1b1869d458 Author: Alexandre Frade Date: Mon Aug 29 17:02:28 2022 +0000 XANMOD: x86/build: Add more x86_64 optimizations Signed-off-by: Alexandre Frade commit 8c20eb7e6a27b2c493b0bbb435e75cae7135634f Author: Greg Kroah-Hartman Date: Sat Feb 25 11:13:30 2023 +0100 Linux 6.2.1 Link: https://lore.kernel.org/r/20230223130426.170746546@linuxfoundation.org Link: https://lore.kernel.org/r/20230223141539.893173089@linuxfoundation.org Tested-by: Luna Jernberg Tested-by: Justin M. Forbes Tested-by: Conor Dooley Tested-by: Florian Fainelli Tested-by: Shuah Khan Tested-by: Bagas Sanjaya Tested-by: Guenter Roeck Tested-by: Linux Kernel Functional Testing Tested-by: Ron Economos Tested-by: Slade Watkins Tested-by: Allen Pais Signed-off-by: Greg Kroah-Hartman commit 6113b6c51af0a54e5d06ca7d4e824a8936dc56d2 Author: Linus Torvalds Date: Wed Feb 22 09:52:32 2023 -0800 bpf: add missing header file include commit f3dd0c53370e70c0f9b7e931bbec12916f3bb8cc upstream. Commit 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") built fine on x86-64 and arm64, and that's the extent of my local build testing. It turns out those got the include incidentally through other header files ( in particular), but that was not true of other architectures, resulting in build errors kernel/bpf/core.c: In function ‘___bpf_prog_run’: kernel/bpf/core.c:1913:3: error: implicit declaration of function ‘barrier_nospec’ so just make sure to explicitly include the proper header file to make everybody see it. Fixes: 74e19ef0ff80 ("uaccess: Add speculation barrier to copy_from_user()") Reported-by: kernel test robot Reported-by: Viresh Kumar Reported-by: Huacai Chen Tested-by: Geert Uytterhoeven Tested-by: Dave Hansen Acked-by: Alexei Starovoitov Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman commit 4752330adea4fac414a18f2e8af1a13376e8d9e5 Author: Eric Biggers Date: Tue Feb 7 22:51:33 2023 -0800 randstruct: disable Clang 15 support commit 78f7a3fd6dc66cb788c21d7705977ed13c879351 upstream. The randstruct support released in Clang 15 is unsafe to use due to a bug that can cause miscompilations: "-frandomize-layout-seed inconsistently randomizes all-function-pointers structs" (https://github.com/llvm/llvm-project/issues/60349). It has been fixed on the Clang 16 release branch, so add a Clang version check. Fixes: 035f7f87b729 ("randstruct: Enable Clang support") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers Acked-by: Nick Desaulniers Reviewed-by: Nathan Chancellor Reviewed-by: Bill Wendling Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20230208065133.220589-1-ebiggers@kernel.org Signed-off-by: Greg Kroah-Hartman commit 94d8de83286fb1827340eba35b61c308f6b46ead Author: Kees Cook Date: Wed Jan 4 13:09:12 2023 -0800 ext4: Fix function prototype mismatch for ext4_feat_ktype commit 118901ad1f25d2334255b3d50512fa20591531cd upstream. With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed. ext4_feat_ktype was setting the "release" handler to "kfree", which doesn't have a matching function prototype. Add a simple wrapper with the correct prototype. This was found as a result of Clang's new -Wcast-function-type-strict flag, which is more sensitive than the simpler -Wcast-function-type, which only checks for type width mismatches. Note that this code is only reached when ext4 is a loadable module and it is being unloaded: CFI failure at kobject_put+0xbb/0x1b0 (target: kfree+0x0/0x180; expected type: 0x7c4aa698) ... RIP: 0010:kobject_put+0xbb/0x1b0 ... Call Trace: ext4_exit_sysfs+0x14/0x60 [ext4] cleanup_module+0x67/0xedb [ext4] Fixes: b99fee58a20a ("ext4: create ext4_feat kobject dynamically") Cc: Theodore Ts'o Cc: Eric Biggers Cc: stable@vger.kernel.org Build-tested-by: Gustavo A. R. Silva Reviewed-by: Gustavo A. R. Silva Reviewed-by: Nathan Chancellor Link: https://lore.kernel.org/r/20230103234616.never.915-kees@kernel.org Signed-off-by: Kees Cook Reviewed-by: Eric Biggers Link: https://lore.kernel.org/r/20230104210908.gonna.388-kees@kernel.org Signed-off-by: Greg Kroah-Hartman commit c75a91216f2ef326d631622e3c93d0bf394da123 Author: Hans de Goede Date: Fri Feb 17 15:42:08 2023 +0100 platform/x86: nvidia-wmi-ec-backlight: Add force module parameter commit 0d9bdd8a550170306c2021b8d6766c5343b870c2 upstream. On some Lenovo Legion models, the backlight might be driven by either one of nvidia_wmi_ec_backlight or amdgpu_bl0 at different times. When the Nvidia WMI EC backlight interface reports the backlight is controlled by the EC, the current backlight handling only registers nvidia_wmi_ec_backlight (and registers no other backlight interfaces). This hides (never registers) the amdgpu_bl0 interface, where as prior to 6.1.4 users would have both nvidia_wmi_ec_backlight and amdgpu_bl0 and could work around things in userspace. Add a force module parameter which can be used with acpi_backlight=native to restore the old behavior as a workound (for now) by passing: "acpi_backlight=native nvidia-wmi-ec-backlight.force=1" Fixes: 8d0ca287fd8c ("platform/x86: nvidia-wmi-ec-backlight: Use acpi_video_get_backlight_type()") Link: https://bugzilla.kernel.org/show_bug.cgi?id=217026 Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede Reviewed-by: Daniel Dadap Link: https://lore.kernel.org/r/20230217144208.5721-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman commit bae1b851d0a187069f7ea6014b2fe6b3ee2ded50 Author: Shyam Sundar S K Date: Mon Feb 13 17:44:57 2023 +0530 platform/x86/amd/pmf: Add depends on CONFIG_POWER_SUPPLY commit 3004e8d2a0a98bbf4223ae146464fadbff68bf78 upstream. It is reported that amd_pmf driver is missing "depends on" for CONFIG_POWER_SUPPLY causing the following build error. ld: drivers/platform/x86/amd/pmf/core.o: in function `amd_pmf_remove': core.c:(.text+0x10): undefined reference to `power_supply_unreg_notifier' ld: drivers/platform/x86/amd/pmf/core.o: in function `amd_pmf_probe': core.c:(.text+0x38f): undefined reference to `power_supply_reg_notifier' make[1]: *** [scripts/Makefile.vmlinux:34: vmlinux] Error 1 make: *** [Makefile:1248: vmlinux] Error 2 Add this to the Kconfig file. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217028 Fixes: c5258d39fc4c ("platform/x86/amd/pmf: Add helper routine to update SPS thermals") Signed-off-by: Shyam Sundar S K Link: https://lore.kernel.org/r/20230213121457.1764463-1-Shyam-sundar.S-k@amd.com Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede Signed-off-by: Greg Kroah-Hartman commit d1437233fc8a43af3c2f9de2f6e949ba64abd8b3 Author: Paul Moore Date: Tue Feb 7 10:21:47 2023 -0500 audit: update the mailing list in MAINTAINERS commit 6c6cd913accd77008f74a1a9d57b816db3651daa upstream. We've moved the upstream Linux Kernel audit subsystem discussions to a new mailing list, this patch updates the MAINTAINERS info with the new list address. Marking this for stable inclusion to help speed uptake of the new list across all of the supported kernel releases. This is a doc only patch so the risk should be close to nil. Cc: stable@vger.kernel.org Signed-off-by: Paul Moore Signed-off-by: Greg Kroah-Hartman commit 6a4bd33f9fecf743014b80a00b29bf095b525eb7 Author: Lukas Wunner Date: Fri Jan 27 15:01:00 2023 +0100 wifi: mwifiex: Add missing compatible string for SD8787 commit 36dd7a4c6226133b0b7aa92b8e604e688d958d0c upstream. Commit e3fffc1f0b47 ("devicetree: document new marvell-8xxx and pwrseq-sd8787 options") documented a compatible string for SD8787 in the devicetree bindings, but neglected to add it to the mwifiex driver. Fixes: e3fffc1f0b47 ("devicetree: document new marvell-8xxx and pwrseq-sd8787 options") Signed-off-by: Lukas Wunner Cc: stable@vger.kernel.org # v4.11+ Cc: Matt Ranostay Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/320de5005ff3b8fd76be2d2b859fd021689c3681.1674827105.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman commit 5dc297652dbc557eba7ca7d6a4c5f1940dffffb1 Author: Benjamin Tissoires Date: Thu Feb 16 11:22:58 2023 +0100 HID: mcp-2221: prevent UAF in delayed work commit 47e91fdfa511139f2549687edb0d8649b123227b upstream. If the device is plugged/unplugged without giving time for mcp_init_work() to complete, we might kick in the devm free code path and thus have unavailable struct mcp_2221 while in delayed work. Canceling the delayed_work item is enough to solve the issue, because cancel_delayed_work_sync will prevent the work item to requeue itself. Fixes: 960f9df7c620 ("HID: mcp2221: add ADC/DAC support via iio subsystem") CC: stable@vger.kernel.org Acked-by: Jiri Kosina Link: https://lore.kernel.org/r/20230215-wip-mcp2221-v2-1-109f71fd036e@redhat.com Signed-off-by: Benjamin Tissoires Signed-off-by: Greg Kroah-Hartman commit 7a7857b644e1cbe08e0b1160b1f3761b3bbd09e6 Author: Peter Zijlstra Date: Thu Jan 26 16:34:27 2023 +0100 x86/static_call: Add support for Jcc tail-calls commit 923510c88d2b7d947c4217835fd9ca6bd65cc56c upstream. Clang likes to create conditional tail calls like: 0000000000000350 : 350: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 351: R_X86_64_NONE __fentry__-0x4 355: 48 83 bf 20 01 00 00 00 cmpq $0x0,0x120(%rdi) 35d: 0f 85 00 00 00 00 jne 363 35f: R_X86_64_PLT32 __SCT__amd_pmu_branch_add-0x4 363: e9 00 00 00 00 jmp 368 364: R_X86_64_PLT32 __x86_return_thunk-0x4 Where 0x35d is a static call site that's turned into a conditional tail-call using the Jcc class of instructions. Teach the in-line static call text patching about this. Notably, since there is no conditional-ret, in that case patch the Jcc to point at an empty stub function that does the ret -- or the return thunk when needed. Reported-by: "Erhard F." Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Masami Hiramatsu (Google) Link: https://lore.kernel.org/r/Y9Kdg9QjHkr9G5b5@hirez.programming.kicks-ass.net Cc: Nathan Chancellor Signed-off-by: Greg Kroah-Hartman commit cb018fee513c42d4f449f31c1b6703d9c9386ae9 Author: Peter Zijlstra Date: Mon Jan 23 21:59:17 2023 +0100 x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions commit ac0ee0a9560c97fa5fe1409e450c2425d4ebd17a upstream. In order to re-write Jcc.d32 instructions text_poke_bp() needs to be taught about them. The biggest hurdle is that the whole machinery is currently made for 5 byte instructions and extending this would grow struct text_poke_loc which is currently a nice 16 bytes and used in an array. However, since text_poke_loc contains a full copy of the (s32) displacement, it is possible to map the Jcc.d32 2 byte opcodes to Jcc.d8 1 byte opcode for the int3 emulation. This then leaves the replacement bytes; fudge that by only storing the last 5 bytes and adding the rule that 'length == 6' instruction will be prefixed with a 0x0f byte. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Masami Hiramatsu (Google) Link: https://lore.kernel.org/r/20230123210607.115718513@infradead.org Cc: Nathan Chancellor Signed-off-by: Greg Kroah-Hartman commit a3ab0272c91bba96457eff0f8682d6b013cc110e Author: Peter Zijlstra Date: Mon Jan 23 21:59:16 2023 +0100 x86/alternatives: Introduce int3_emulate_jcc() commit db7adcfd1cec4e95155e37bc066fddab302c6340 upstream. Move the kprobe Jcc emulation into int3_emulate_jcc() so it can be used by more code -- specifically static_call() will need this. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Ingo Molnar Reviewed-by: Masami Hiramatsu (Google) Link: https://lore.kernel.org/r/20230123210607.057678245@infradead.org Cc: Nathan Chancellor Signed-off-by: Greg Kroah-Hartman commit 2c8ee21d78942cf48bc836612ad365fd6f06cfbb Author: Dave Hansen Date: Tue Feb 21 12:30:15 2023 -0800 uaccess: Add speculation barrier to copy_from_user() commit 74e19ef0ff8061ef55957c3abd71614ef0f42f47 upstream. The results of "access_ok()" can be mis-speculated. The result is that you can end speculatively: if (access_ok(from, size)) // Right here even for bad from/size combinations. On first glance, it would be ideal to just add a speculation barrier to "access_ok()" so that its results can never be mis-speculated. But there are lots of system calls just doing access_ok() via "copy_to_user()" and friends (example: fstat() and friends). Those are generally not problematic because they do not _consume_ data from userspace other than the pointer. They are also very quick and common system calls that should not be needlessly slowed down. "copy_from_user()" on the other hand uses a user-controller pointer and is frequently followed up with code that might affect caches. Take something like this: if (!copy_from_user(&kernelvar, uptr, size)) do_something_with(kernelvar); If userspace passes in an evil 'uptr' that *actually* points to a kernel addresses, and then do_something_with() has cache (or other) side-effects, it could allow userspace to infer kernel data values. Add a barrier to the common copy_from_user() code to prevent mis-speculated values which happen after the copy. Also add a stub for architectures that do not define barrier_nospec(). This makes the macro usable in generic code. Since the barrier is now usable in generic code, the x86 #ifdef in the BPF code can also go away. Reported-by: Jordy Zomer Suggested-by: Linus Torvalds Signed-off-by: Dave Hansen Reviewed-by: Thomas Gleixner Acked-by: Daniel Borkmann # BPF bits Signed-off-by: Linus Torvalds Signed-off-by: Greg Kroah-Hartman