commit 2e46a548aa5680282206b1b5882c4e187e94d12a
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Tue Apr 27 01:05:07 2021 +0000

    Linux 5.12.0-xanmod1
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit fb622f472fdfb05d19f85a9d0adab7d8d43b687d
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Tue Apr 27 01:04:32 2021 +0000

    XANMOD: config: Initial Linux 5.12-xanmod kernel config
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 4fc3da071d6107fbedf7fcc281f7145a4999d26d
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Mon Apr 26 21:36:08 2021 +0000

    XANMOD: Revert "iio: adc: adi-axi-adc: add proper Kconfig dependencies"
    
    This reverts commit be24c65e9fa2486bb8ec98d9f592bdcf04bedd88.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit f050643b90fe9e2993f877d5d7ae5cb63d054e28
Author: Ben Hutchings <ben@decadent.org.uk>
Date:   Mon Sep 7 02:51:53 2020 +0100

    android: Export symbols needed by Android drivers
    
    We want to enable use of the Android ashmem and binder drivers to
    support Anbox, but they should not be built-in as that would waste
    resources and increase security attack surface on systems that don't
    need them.
    
    Export the currently un-exported symbols they depend on.

commit 88722c892f48b434c4dc2318838984d962035147
Author: Ben Hutchings <ben@decadent.org.uk>
Date:   Fri Jun 22 17:27:00 2018 +0100

    android: Enable building ashmem and binder as modules
    
    We want to enable use of the Android ashmem and binder drivers to
    support Anbox, but they should not be built-in as that would waste
    resources and increase security attack surface on systems that don't
    need them.
    
    - Add a MODULE_LICENSE declaration to ashmem
    - Change the Makefiles to build each driver as an object with the
      "_linux" suffix (which is what Anbox expects)
    - Change config symbol types to tristate

commit 075b83ceec12663987a54522ae4c4bf1faa33aab
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:47 2021 +0300

    fs/ntfs3: Add MAINTAINERS
    
    This adds MAINTAINERS
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit bd3747be0983e5e9e461f4ddc094e29cd07e33db
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:46 2021 +0300

    fs/ntfs3: Add NTFS3 in fs/Kconfig and fs/Makefile
    
    This adds NTFS3 in fs/Kconfig and fs/Makefile
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 763d84a53813c19f90837247bdc9032281450053
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:45 2021 +0300

    fs/ntfs3: Add Kconfig, Makefile and doc
    
    This adds Kconfig, Makefile and doc
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit ef5be092fa563d75c4a4b512a40a3517eb035bde
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:44 2021 +0300

    fs/ntfs3: Add NTFS journal
    
    This adds NTFS journal
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit ee34f1be2f43cab86d32b6664cece488a730367e
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:43 2021 +0300

    fs/ntfs3: Add compression
    
    This patch adds different types of NTFS-applicable compressions:
    - lznt
    - lzx
    - xpress
    Latter two (lzx, xpress) implement Windows Compact OS feature and
    were taken from ntfs-3g system comression plugin authored by Eric Biggers
    (https://github.com/ebiggers/ntfs-3g-system-compression)
    which were ported to ntfs3 and adapted to Linux Kernel environment.
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 0690a75ab0ba619690c8cf406c9d2c91361e7680
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:42 2021 +0300

    fs/ntfs3: Add attrib operations
    
    This adds attrib operations
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 5ebf9f04aeab2919731ff27a89f76aaa82dc191f
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:41 2021 +0300

    fs/ntfs3: Add file operations and implementation
    
    This adds file operations and implementation
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit cc974e59a0b419bf649a4a0c62800674cd507b33
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:40 2021 +0300

    fs/ntfs3: Add bitmap
    
    This adds bitmap
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 291d1a301e607167785ae3775e6dbbdf771176d3
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:39 2021 +0300

    fs/ntfs3: Add initialization of super block
    
    This adds initialization of super block
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit fdf0f8c5eb38a7a059ccd35d4b9453e9f63de8bf
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:38 2021 +0300

    fs/ntfs3: Add headers and misc files
    
    This adds headers and misc files
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 36575510290abad0b1fddd8d07a33cdb65203ed8
Author: Arjan van de Ven <arjan@linux.intel.com>
Date:   Sun Feb 18 23:35:41 2018 +0000

    locking: rwsem: spin faster
    
    tweak rwsem owner spinning a bit
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 62f93532e0b7a05c790d78756cd56db8749e7a60
Author: William Douglas <william.douglas@intel.com>
Date:   Wed Jun 20 17:23:21 2018 +0000

    firmware: Enable stateless firmware loading
    
    Prefer the order of specific version before generic and /etc before
    /lib to enable the user to give specific overrides for generic
    firmware and distribution firmware.

commit 662bdc36c32fc1aecfafdf8aa58f915655108c24
Author: Arjan van de Ven <arjan@linux.intel.com>
Date:   Sun Sep 22 11:12:35 2019 -0300

    intel_rapl: Silence rapl trace debug

commit d65dc49cc3f4d7b1394b4b9d17069a4343d807f8
Author: graysky <graysky@archlinux.us>
Date:   Mon Apr 12 07:09:27 2021 -0400

    x86/kconfig: more uarches for kernel 5.8+
    
    WARNING
    This patch works with all gcc versions 9.0+ and with kernel version 5.8+ and should
    NOT be applied when compiling on older versions of gcc due to key name changes
    of the march flags introduced with the version 4.9 release of gcc.[1]
    
    FEATURES
    This patch adds additional CPU options to the Linux kernel accessible under:
     Processor type and features  --->
      Processor family --->
    
    With the release of gcc 11.0, several generic 64-bit levels are offered which
    are good for supported Intel or AMD CPUs:
    • x86-64-v2
    • x86-64-v3
    • x86-64-v4
    
    Users of glibc 2.33 and above can see which level is supported by current
    hardware by running:
      /lib/ld-linux-x86-64.so.2 --help | grep supported
    
    Alternatively, compare the flags from /proc/cpuinfo to this list.[2]
    
    CPU-specific microarchitectures include:
    • AMD Improved K8-family
    • AMD K10-family
    • AMD Family 10h (Barcelona)
    • AMD Family 14h (Bobcat)
    • AMD Family 16h (Jaguar)
    • AMD Family 15h (Bulldozer)
    • AMD Family 15h (Piledriver)
    • AMD Family 15h (Steamroller)
    • AMD Family 15h (Excavator)
    • AMD Family 17h (Zen)
    • AMD Family 17h (Zen 2)
    • AMD Family 19h (Zen 3)†
    • Intel Silvermont low-power processors
    • Intel Goldmont low-power processors (Apollo Lake and Denverton)
    • Intel Goldmont Plus low-power processors (Gemini Lake)
    • Intel 1st Gen Core i3/i5/i7 (Nehalem)
    • Intel 1.5 Gen Core i3/i5/i7 (Westmere)
    • Intel 2nd Gen Core i3/i5/i7 (Sandybridge)
    • Intel 3rd Gen Core i3/i5/i7 (Ivybridge)
    • Intel 4th Gen Core i3/i5/i7 (Haswell)
    • Intel 5th Gen Core i3/i5/i7 (Broadwell)
    • Intel 6th Gen Core i3/i5/i7 (Skylake)
    • Intel 6th Gen Core i7/i9 (Skylake X)
    • Intel 8th Gen Core i3/i5/i7 (Cannon Lake)
    • Intel 10th Gen Core i7/i9 (Ice Lake)
    • Intel Xeon (Cascade Lake)
    • Intel Xeon (Cooper Lake)*
    • Intel 3rd Gen 10nm++ i3/i5/i7/i9-family (Tiger Lake)*
    • Intel 3rd Gen 10nm++ Xeon (Sapphire Rapids)‡
    • Intel 11th Gen i3/i5/i7/i9-family (Rocket Lake)‡
    • Intel 12th Gen i3/i5/i7/i9-family (Alder Lake)‡
    
    Notes: If not otherwise noted, gcc >=9.1 is required for support.
           *Requires gcc >=10.1  †Required gcc >=10.3  ‡Required gcc >=11.0
    
    It also offers to compile passing the 'native' option which, "selects the CPU
    to generate code for at compilation time by determining the processor type of
    the compiling machine. Using -march=native enables all instruction subsets
    supported by the local machine and will produce code optimized for the local
    machine under the constraints of the selected instruction set."[3]
    
    Users of Intel CPUs should select the 'Intel-Native' option and users of AMD
    CPUs should select the 'AMD-Native' option.
    
    MINOR NOTES RELATING TO INTEL ATOM PROCESSORS
    This patch also changes -march=atom to -march=bonnell in accordance with the
    gcc v4.9 changes. Upstream is using the deprecated -match=atom flags when I
    believe it should use the newer -march=bonnell flag for atom processors.[4]
    
    It is not recommended to compile on Atom-CPUs with the 'native' option.[5] The
    recommendation is to use the 'atom' option instead.
    
    BENEFITS
    Small but real speed increases are measurable using a make endpoint comparing
    a generic kernel to one built with one of the respective microarchs.
    
    See the following experimental evidence supporting this statement:
    https://github.com/graysky2/kernel_gcc_patch
    
    REQUIREMENTS
    linux version >=5.8
    gcc version >=9.0
    
    ACKNOWLEDGMENTS
    This patch builds on the seminal work by Jeroen.[6]
    
    REFERENCES
    1.  https://gcc.gnu.org/gcc-4.9/changes.html
    2.  https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9
    3.  https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-x86-Options
    4.  https://bugzilla.kernel.org/show_bug.cgi?id=77461
    5.  https://github.com/graysky2/kernel_gcc_patch/issues/15
    6.  http://www.linuxforge.net/docs/linux/linux-gcc.php
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 4a585b507df9ddd5b4faaf9827ef2a45f04f2c36
Author: Mark Weiman <mark.weiman@markzz.com>
Date:   Sun Aug 12 11:36:21 2018 -0400

    pci: Enable overrides for missing ACS capabilities
    
    This an updated version of Alex Williamson's patch from:
    https://lkml.org/lkml/2013/5/30/513
    
    Original commit message follows:
    
    PCIe ACS (Access Control Services) is the PCIe 2.0+ feature that
    allows us to control whether transactions are allowed to be redirected
    in various subnodes of a PCIe topology.  For instance, if two
    endpoints are below a root port or downsteam switch port, the
    downstream port may optionally redirect transactions between the
    devices, bypassing upstream devices.  The same can happen internally
    on multifunction devices.  The transaction may never be visible to the
    upstream devices.
    
    One upstream device that we particularly care about is the IOMMU.  If
    a redirection occurs in the topology below the IOMMU, then the IOMMU
    cannot provide isolation between devices.  This is why the PCIe spec
    encourages topologies to include ACS support.  Without it, we have to
    assume peer-to-peer DMA within a hierarchy can bypass IOMMU isolation.
    
    Unfortunately, far too many topologies do not support ACS to make this
    a steadfast requirement.  Even the latest chipsets from Intel are only
    sporadically supporting ACS.  We have trouble getting interconnect
    vendors to include the PCIe spec required PCIe capability, let alone
    suggested features.
    
    Therefore, we need to add some flexibility.  The pcie_acs_override=
    boot option lets users opt-in specific devices or sets of devices to
    assume ACS support.  The "downstream" option assumes full ACS support
    on root ports and downstream switch ports.  The "multifunction"
    option assumes the subset of ACS features available on multifunction
    endpoints and upstream switch ports are supported.  The "id:nnnn:nnnn"
    option enables ACS support on devices matching the provided vendor
    and device IDs, allowing more strategic ACS overrides.  These options
    may be combined in any order.  A maximum of 16 id specific overrides
    are available.  It's suggested to use the most limited set of options
    necessary to avoid completely disabling ACS across the topology.
    Note to hardware vendors, we have facilities to permanently quirk
    specific devices which enforce isolation but not provide an ACS
    capability.  Please contact me to have your devices added and save
    your customers the hassle of this boot option.
    
    Signed-off-by: Mark Weiman <mark.weiman@markzz.com>

commit 3deea6cb9b6fd57abcd2015e7afc23052175915a
Author: Nick Terrell <terrelln@fb.com>
Date:   Fri Sep 11 16:37:08 2020 -0700

    lib: zstd: Upgrade to latest upstream zstd version 1.4.10
    
    Upgrade to the latest upstream zstd version 1.4.10.
    
    This patch is 100% generated from upstream zstd commit 4432dac93bea [0].
    
    This patch is very large because it is transitioning from the custom
    kernel zstd to using upstream directly. The new zstd follows upstreams
    file structure which is different. Future update patches will be much
    smaller because they will only contain the changes from one upstream
    zstd release.
    
    As an aid for review I've created a commit [1] that shows the diff
    between upstream zstd as-is (which doesn't compile), and the zstd
    code imported in this patch. The verion of zstd in this patch is
    generated from upstream with changes applied by automation to replace
    upstreams libc dependencies, remove unnecessary portability macros,
    and use the kernel's xxhash instead of bundling it.
    
    The benefits of this patch are as follows:
    1. Using upstream directly with automated script to generate kernel
       code. This allows us to update the kernel every upstream release, so
       the kernel gets the latest bug fixes and performance improvements,
       and doesn't get 3 years out of date again. The automation and the
       translated code are tested every upstream commit to ensure it
       continues to work.
    2. Upgrades from a custom zstd based on 1.3.1 to 1.4.10, getting 3 years
       of performance improvements and bug fixes. On x86_64 I've measured
       15% faster BtrFS and SquashFS decompression+read speeds, 35% faster
       kernel decompression, and 30% faster ZRAM decompression+read speeds.
    3. Zstd-1.4.10 supports negative compression levels, which allow zstd to
       match or subsume lzo's performance.
    4. Maintains the same kernel-specific wrapper API, so no callers have to
       be modified with zstd version updates.
    
    One concern that was brought up was stack usage. Upstream zstd had
    already removed most of its heavy stack usage functions, but I just
    removed the last functions that allocate arrays on the stack. I've
    measured the high water mark for both compression and decompression
    before and after this patch. Decompression is approximately neutral,
    using about 1.2KB of stack space. Compression levels up to 3 regressed
    from 1.4KB -> 1.6KB, and higher compression levels regressed from 1.5KB
    -> 2KB. We've added unit tests upstream to prevent further regression.
    I believe that this is a reasonable increase, and if it does end up
    causing problems, this commit can be cleanly reverted, because it only
    touches zstd.
    
    I chose the bulk update instead of replaying upstream commits because
    there have been ~3500 upstream commits since the 1.3.1 release, zstd
    wasn't ready to be used in the kernel as-is before a month ago, and not
    all upstream zstd commits build. The bulk update preserves bisectablity
    because bugs can be bisected to the zstd version update. At that point
    the update can be reverted, and we can work with upstream to find and
    fix the bug.
    
    Note that upstream zstd release 1.4.10 doesn't exist yet. I have cut a
    staging branch at 4432dac93bea [0] and will apply any changes requested
    to the staging branch. Once we're ready to merge this update I will cut
    a zstd release at the commit we merge, so we have a known zstd release
    in the kernel.
    
    The implementation of the kernel API is contained in
    zstd_compress_module.c and zstd_decompress_module.c.
    
    [0] https://github.com/facebook/zstd/commit/4432dac93bea0ae7cb48c7f010caee7a103382d3
    [1] https://github.com/terrelln/linux/commit/292e2183aeab24b016c8f66ec6ded6006e4298f1
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>

commit 4ab13eb9a3d8aabab6e8d68e2a986c492ae05bab
Author: Nick Terrell <terrelln@fb.com>
Date:   Mon Sep 14 12:54:12 2020 -0700

    lib: zstd: Add decompress_sources.h for decompress_unzstd
    
    Adds decompress_sources.h which includes every .c file necessary for
    zstd decompression. This is used in decompress_unzstd.c so the internal
    structure of the library isn't exposed.
    
    This allows us to upgrade the zstd library version without modifying any
    callers. Instead we just need to update decompress_sources.h.
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>

commit fd3aebbbb0fd79303b61fbae967bcc32025163f1
Author: Nick Terrell <terrelln@fb.com>
Date:   Fri Sep 11 16:49:00 2020 -0700

    lib: zstd: Add kernel-specific API
    
    This patch:
    - Moves `include/linux/zstd.h` -> `include/linux/zstd_lib.h`
    - Updates modified zstd headers to yearless copyright
    - Adds a new API in `include/linux/zstd.h` that is functionally
      equivalent to the in-use subset of the current API. Functions are
      renamed to avoid symbol collisions with zstd, to make it clear it is
      not the upstream zstd API, and to follow the kernel style guide.
    - Updates all callers to use the new API.
    
    There are no functional changes in this patch. Since there are no
    functional change, I felt it was okay to update all the callers in a
    single patch. Once the API is approved, the callers are mechanically
    changed.
    
    This patch is preparing for the 3rd patch in this series, which updates
    zstd to version 1.4.10. Since the upstream zstd API is no longer exposed
    to callers, the update can happen transparently.
    
    Signed-off-by: Nick Terrell <terrelln@fb.com>

commit cdbe3afbb13102efb8d682a77ab18b58fd66a11f
Author: Piotr Gorski <lucjan.lucjanov@gmail.com>
Date:   Fri Mar 19 19:04:05 2021 -0800

    init: add support for zstd compressed modules
    
    kmod 28 supports modules compressed in zstd format so let's add this possibility to kernel.
    
    [ pf: remove explicit compression level ]
    
    Signed-off-by: Piotr Gorski <lucjan.lucjanov@gmail.com>
    Signed-off-by: Oleksandr Natalenko <oleksandr@natalenko.name>

commit b1c0accbd858d7f927e54d8005a7bd5a0dca6578
Author: Neal Cardwell <ncardwell@google.com>
Date:   Mon Dec 28 19:23:09 2020 -0500

    net-tcp_bbr: v2: don't assume prior_cwnd was set entering CA_Loss
    
    Fix WARN_ON_ONCE() warnings that were firing and pointing to a
    bbr->prior_cwnd of 0 when exiting CA_Loss and transitioning to
    CA_Open.
    
    The issue was that tcp_simple_retransmit() calls:
    
      tcp_set_ca_state(sk, TCP_CA_Loss);
    
    without first calling icsk_ca_ops->ssthresh(sk) (because
    tcp_simple_retransmit() is dealing with losses due to MTU issues and
    not congestion). The lack of this callback means that BBR did not get
    a chance to set bbr->prior_cwnd, and thus upon exiting CA_Loss in such
    cases the WARN_ON_ONCE() would fire due to a zero bbr->prior_cwnd.
    
    This commit removes that warning, since a bbr->prior_cwnd of 0 is a
    valid situation in this state transition.
    
    For setting inflight_lo upon entering CA_Loss, to avoid setting an
    inflight_lo of 0 in this case, this commit switches to taking the max
    of cwnd and prior_cwnd. We plan to remove that line of code when we
    switch to cautious (PRR-style) recovery, so that awkwardness will go
    away.
    
    Change-Id: I575dce871c2f20e91e3e9449e1706f42a07b8118

commit dbe394ffbe3c1f542ffd590945bcb5940bca8db4
Author: Neal Cardwell <ncardwell@google.com>
Date:   Mon Aug 17 19:10:21 2020 -0400

    net-tcp_bbr: v2: remove cycle_rand parameter that is unused in BBRv2
    
    Change-Id: Iee1df7e41e42de199068d7c89131ed3d228327c0

commit 02314d3cc50d93777b66c2f5842b17d3558293c7
Author: Neal Cardwell <ncardwell@google.com>
Date:   Mon Aug 17 19:08:41 2020 -0400

    net-tcp_bbr: v2: remove field bw_rtts that is unused in BBRv2
    
    Change-Id: I58e3346c707748a6f316f3ed060d2da84c32a79b

commit 5cc63356c1a5f5312f62dfd1d9cd1e63b18a19a5
Author: Neal Cardwell <ncardwell@google.com>
Date:   Thu Nov 21 15:28:01 2019 -0500

    net-tcp_bbr: v2: remove unnecessary rs.delivered_ce logic upon loss
    
    There is no reason to compute rs.delivered_ce upon loss.
    
    In fact, we specifically do not want to compute rs.delivered_ce upon loss.
    
    Two issues:
    
    (1) This would be the wrong thing to do, in behavior terms.  With
        RACK's dynamic reordering window, losses can be marked long after
        the sequence hole appears in the ACK/SACK stream. We want to to
        catch the ECN mark rate rising too high as quickly as possible,
        which means we want to check for high ECN mark rates at ACK time
        (as BBRv2 currently does) and not loss marking time.
    
    (2) This is dead code. The ECN mark rate cannot be detected as too
        high because the check needs rs->delivered to be > 0 as well:
    
           if (rs->delivered_ce > 0 && rs->delivered > 0 &&
    
        Since we are not setting rs->delivered upon loss, this check
        cannot succeed, so setting delivered_ce is pointless.
    
    This dead and wrong line was discovered by Randall Stewart at Netflix
    as he was reading the BBRv2 code.
    
    Change-Id: I37f83f418a259ec31d8f82de986db071b364b76a

commit 14abdf52be5e5dc39542017ba6c5eb5a79b1cfd6
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue Jun 11 12:54:22 2019 -0400

    net-tcp_bbr: v2: BBRv2 ("bbr2") congestion control for Linux TCP
    
    BBR v2 is an enhacement to the BBR v1 algorithm. It's designed to aim for lower
    queues, lower loss, and better Reno/CUBIC coexistence than BBR v1.
    
    BBR v2 maintains the core of BBR v1: an explicit model of the network
    path that is two-dimensional, adapting to estimate the (a) maximum
    available bandwidth and (b) maximum safe volume of data a flow can
    keep in-flight in the network. It maintains the estimated BDP as a
    core guide for estimating an appropriate level of in-flight data.
    
    BBR v2 makes several key enhancements:
    
    o Its bandwidth-probing time scale is adapted, within bounds, to allow improved
    coexistence with Reno and CUBIC. The bandwidth-probing time scale is (a)
    extended dynamically based on estimated BDP to improve coexistence with
    Reno/CUBIC; (b) bounded by an interactive wall-clock time-scale to be more
    scalable and responsive than Reno and CUBIC.
    
    o Rather than being largely agnostic to loss and ECN marks, it explicitly uses
    loss and (DCTCP-style) ECN signals to maintain its model.
    
    o It aims for lower losses than v1 by adjusting its model to attempt to stay
    within loss rate and ECN mark rate bounds (loss_thresh and ecn_thresh,
    respectively).
    
    o It adapts to loss/ECN signals even when the application is running out of
    data ("application-limited"), in case the "application-limited" flow is also
    "network-limited" (the bw and/or inflight available to this flow is lower than
    previously estimated when the flow ran out of data).
    
    o It has a three-part model: the model explicit three tracks operating points,
    where an operating point is a tuple: (bandwidth, inflight). The three operating
    points are:
    
      o latest:        the latest measurement from the current round trip
      o upper bound:   robust, optimistic, long-term upper bound
      o lower bound:   robust, conservative, short-term lower bound
    
    These are stored in the following state variables:
    
      o latest:  bw_latest, inflight_latest
      o lo:      bw_lo,     inflight_lo
      o hi:      bw_hi[2],  inflight_hi
    
    To gain intuition about the meaning of the three operating points, it
    may help to consider the analogs in CUBIC, which has a somewhat
    analogous three-part model used by its probing state machine:
    
      BBR param     CUBIC param
      -----------   -------------
      latest     ~  cwnd
      lo         ~  ssthresh
      hi         ~  last_max_cwnd
    
    The analogy is only a loose one, though, since the BBR operating
    points are calculated differently, and are 2-dimensional (bw,inflight)
    rather than CUBIC's one-dimensional notion of operating point
    (inflight).
    
    o It uses the three-part model to adapt the magnitude of its bandwidth
    to match the estimated space available in the buffer, rather than (as
    in BBR v1) assuming that it was always acceptable to place 0.25*BDP in
    the bottleneck buffer when probing (commodity datacenter switches
    commonly do not have that much buffer for WAN flows). When BBR v2
    estimates it hit a buffer limit during probing, its bandwidth probing
    then starts gently in case little space is still available in the
    buffer, and the accelerates, slowly at first and then rapidly if it
    can grow inflight without seeing congestion signals. In such cases,
    probing is bounded by inflight_hi + inflight_probe, where
    inflight_probe grows as: [0, 1, 2, 4, 8, 16,...]. This allows BBR to
    keep losses low and bounded if a bottleneck remains congested, while
    rapidly/scalably utilizing free bandwidth when it becomes available.
    
    o It has a slightly revised state machine, to achieve the goals above.
        BBR_BW_PROBE_UP:    pushes up inflight to probe for bw/vol
        BBR_BW_PROBE_DOWN:  drain excess inflight from the queue
        BBR_BW_PROBE_CRUISE: use pipe, w/ headroom in queue/pipe
        BBR_BW_PROBE_REFILL: try refill the pipe again to 100%, leaving queue empty
    
    o The estimated BDP: BBR v2 continues to maintain an estimate of the
    path's two-way propagation delay, by tracking a windowed min_rtt, and
    coordinating (on an as-ndeeded basis) to try to expose the two-way
    propagation delay by draining the bottleneck queue.
    
    BBR v2 continues to use its min_rtt and (currently-applicable) bandwidth
    estimate to estimate the current bandwidth-delay product. The estimated BDP
    still provides one important guideline for bounding inflight data. However,
    because any min-filtered RTT and max-filtered bw inherently tend to both
    overestimate, the estimated BDP is often too high; in this case loss or ECN
    marks can ensue, in which case BBR v2 adjusts inflight_hi and inflight_lo to
    adapt its sending rate and inflight down to match the available capacity of the
    path.
    
    o Space: Note that ICSK_CA_PRIV_SIZE increased. This is because BBR v2
    requires more space. Note that much of the space is due to support for
    per-socket parameterization and debugging in this release for research
    and debugging. With that state removed, the full "struct bbr" is 140
    bytes, or 144 with padding. This is an increase of 40 bytes over the
    existing ca_priv space.
    
    o Code: BBR v2 reuses many pieces from BBR v1. But it omits the following
      significant pieces:
    
      o "packet conservation" (bbr_set_cwnd_to_recover_or_restore(),
        bbr_can_grow_inflight())
      o long-term bandwidth estimator ("policer mode")
    
      The code layout tries to keep BBR v2 code near the bottom of the
      file, so that v1-applicable code in the top does not accidentally
      refer to v2 code.
    
    o Docs:
      See the following docs for more details and diagrams decsribing the BBR v2
      algorithm:
        https://datatracker.ietf.org/meeting/104/materials/slides-104-iccrg-an-update-on-bbr-00
        https://datatracker.ietf.org/meeting/102/materials/slides-102-iccrg-an-update-on-bbr-work-at-google-00
    
    o Internal notes:
      For this upstream rebase, Neal started from:
        git show fed518041ac6:net/ipv4/tcp_bbr.c > net/ipv4/tcp_bbr.c
      then removed dev instrumentation (dynamic get/set for parameters)
      and code that was only used by BBRv1
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 2c84098e60bed6d67dde23cd7538c51dee273102
    Change-Id: I125cf26ba2a7a686f2fa5e87f4c2afceb65f7a05
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit b3bda0af32b582149f6f14b01fbdeb25c3122cf1
Author: Neal Cardwell <ncardwell@google.com>
Date:   Sat Nov 16 13:16:25 2019 -0500

    net-tcp: add fast_ack_mode=1: skip rwin check in tcp_fast_ack_mode__tcp_ack_snd_check()
    
    Add logic for an experimental TCP connection behavior, enabled with
    tp->fast_ack_mode = 1, which disables checking the receive window
    before sending an ack in __tcp_ack_snd_check(). If this behavior is
    enabled, the data receiver sends an ACK if the amount of data is >
    RCV.MSS.
    
    Change-Id: Iaa0a0fd7108221f883137a79d5bfa724f1b096d4

commit 2578ee8f03fef91b3e161df1d3fcf8615e1bd523
Author: Neal Cardwell <ncardwell@google.com>
Date:   Fri Sep 27 17:10:26 2019 -0400

    net-tcp: re-generalize TSO sizing in TCP CC module API
    
    Reorganize the API for CC modules so that the CC module once again
    gets complete control of the TSO sizing decision. This is how the API
    was set up around 2016 and the initial BBRv1 upstreaming. Later Eric
    Dumazet simplified it. But with wider testing it now seems that to
    avoid CPU regressions BBR needs to have a different TSO sizing
    function.
    
    This is necessary to handle cases where there are many flows
    bottlenecked on the sender host's NIC, in which case BBR's pacing rate
    is much lower than CUBIC/Reno/DCTCP's. Why does this happen? Because
    BBR's pacing rate adapts to the low bandwidth share each flow sees. By
    contrast, CUBIC/Reno/DCTCP see no loss or ECN, so they grow a very
    large cwnd, and thus large pacing rate and large TSO burst size.
    
    Change-Id: Ic8ccfdbe4010ee8d4bf6a6334c48a2fceb2171ea

commit 528113eac603edbab243a51e279f82239973bc03
Author: Yousuk Seung <ysseung@google.com>
Date:   Wed May 23 17:55:54 2018 -0700

    net-tcp: add new ca opts flag TCP_CONG_WANTS_CE_EVENTS
    
    Add a a new ca opts flag TCP_CONG_WANTS_CE_EVENTS that allows a
    congestion control module to receive CE events.
    
    Currently congestion control modules have to set the TCP_CONG_NEEDS_ECN
    bit in opts flag to receive CE events but this may incur changes in ECN
    behavior elsewhere. This patch adds a new bit TCP_CONG_WANTS_CE_EVENTS
    that allows congestion control modules to receive CE events
    independently of TCP_CONG_NEEDS_ECN.
    
    Effort: net-tcp
    Origin-9xx-SHA1: 9f7e14716cde760bc6c67ef8ef7e1ee48501d95b
    Change-Id: I2255506985242f376d910c6fd37daabaf4744f24

commit 5ba707d2a09a68a1f3f43128c8a6905cdd81fafc
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue May 7 22:37:19 2019 -0400

    net-tcp_bbr: v2: set tx.in_flight for skbs in repair write queue
    
    Syzkaller was able to use TCP_REPAIR to reproduce the new warning
    added in tcp_fragment():
    
      WARNING: CPU: 0 PID: 118174 at net/ipv4/tcp_output.c:1487
        tcp_fragment+0xdcc/0x10a0 net/ipv4/tcp_output.c:1487()
      inconsistent: tx.in_flight: 0 old_factor: 53
    
    The warning happens because skbs inserted into the tcp_rtx_queue
    during the repair process go through a sort of "fake send" process,
    and that process was seting pcount but not tx.in_flight, and thus the
    warnings (where old_factor is the old pcount).
    
    The fix of setting tx.in_flight in the TCP_REPAIR code path seems
    simple enough, and indeed makes the repro code from syzkaller stop
    producing warnings. Running through kokonut tests, and will send out
    for review when all tests pass.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 330f825a08a6fe92cef74d799cc468864c479f63
    Change-Id: I0bc4a790f040fd4239620e1eedd5dc64666c6f05

commit 04cab5c7a03a1fde7f623e843eedfa9807a9aea5
Author: Neal Cardwell <ncardwell@google.com>
Date:   Wed May 1 20:16:25 2019 -0400

    net-tcp_bbr: v2: adjust skb tx.in_flight upon split in tcp_fragment()
    
    When we fragment an skb that has already been sent, we need to update
    the tx.in_flight for the first skb in the resulting pair ("buff").
    
    Because we were not updating the tx.in_flight, the tx.in_flight value
    was inconsistent with the pcount of the "buff" skb (tx.in_flight would
    be too high). That meant that if the "buff" skb was lost, then
    bbr2_inflight_hi_from_lost_skb() would calculate an inflight_hi value
    that is too high. This could result in longer queues and higher packet
    loss.
    
    Packetdrill testing verified that without this commit, when the second
    half of an skb is SACKed and then later the first half of that skb is
    marked lost, the calculated inflight_hi was incorrect.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 385f1ddc610798fab2837f9f372857438b25f874
    Change-Id: I617f8cab4e9be7a0b8e8d30b047bf8645393354d

commit 30c1b6616f813ce98ef191be8757af15588f93a4
Author: Neal Cardwell <ncardwell@google.com>
Date:   Wed May 1 20:16:33 2019 -0400

    net-tcp_bbr: v2: adjust skb tx.in_flight upon merge in tcp_shifted_skb()
    
    When tcp_shifted_skb() updates state as adjacent SACKed skbs are
    coalesced, previously the tx.in_flight was not adjusted, so we could
    get contradictory state where the skb's recorded pcount was bigger
    than the tx.in_flight (the number of segments that were in_flight
    after sending the skb).
    
    Normally have a SACKed skb with contradictory pcount/tx.in_flight
    would not matter. However, with SACK reneging, the SACKed bit is
    removed, and an skb once again becomes eligible for retransmitting,
    fragmenting, SACKing, etc. Packetdrill testing verified the following
    sequence is possible in a kernel that does not have this commit:
    
     - skb N is SACKed
     - skb N+1 is SACKed and combined with skb N using tcp_shifted_skb()
       - tcp_shifted_skb() will increase the pcount of prev,
         but leave tx.in_flight as-is
       - so prev skb can have pcount > tx.in_flight
     - RTO, tcp_timeout_mark_lost(), detect reneg,
       remove "SACKed" bit, mark skb N as lost
       - find pcount of skb N is greater than its tx.in_flight
    
    I suspect this issue iw what caused the bbr2_inflight_hi_from_lost_skb():
      WARN_ON_ONCE(inflight_prev < 0)
    to fire in production machines using bbr2.
    
    Tested: See last commit in series for sponge link.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 1a3e997e613d2dcf32b947992882854ebe873715
    Change-Id: I1b0b75c27519953430c7db51c6f358f104c7af55

commit fddae3c57df6b34d004f9e1de939c35e500888bb
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue May 7 22:36:36 2019 -0400

    net-tcp_bbr: v2: factor out tx.in_flight setting into tcp_set_tx_in_flight()
    
    Factor out the code to set an skb's tx.in_flight field into its own
    function, so that this code can be used for the TCP_REPAIR "fake send"
    code path that inserts skbs into the rtx queue without sending
    them. This is in preparation for the following patch, which fixes an
    issue with TCP_REPAIR and tx.in_flight.
    
    Tested: See last patch in series for sponge link.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: e880fc907d06ea7354333f60f712748ebce9497b
    Change-Id: I4fbd4a6e18a51ab06d50ab1c9ad820ce5bea89af

commit 017bb349213c8f2dbf7986ee424aab6eb5b92d59
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue Aug 7 21:52:06 2018 -0400

    net-tcp_bbr: v2: introduce ca_ops->skb_marked_lost() CC module callback API
    
    For connections experiencing reordering, RACK can mark packets lost
    long after we receive the SACKs/ACKs hinting that the packets were
    actually lost.
    
    This means that CC modules cannot easily learn the volume of inflight
    data at which packet loss happens by looking at the current inflight
    or even the packets in flight when the most recently SACKed packet was
    sent. To learn this, CC modules need to know how many packets were in
    flight at the time lost packets were sent. This new callback, combined
    with TCP_SKB_CB(skb)->tx.in_flight, allows them to learn this.
    
    This also provides a consistent callback that is invoked whether
    packets are marked lost upon ACK processing, using the RACK reordering
    timer, or at RTO time.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: afcbebe3374e4632ac6714d39e4dc8a8455956f4
    Change-Id: I54826ab53df636be537e5d3c618a46145d12d51a

commit ac91660898eb56eefec8b1d1978355db2e077e4e
Author: Neal Cardwell <ncardwell@google.com>
Date:   Mon Nov 19 13:48:36 2018 -0500

    net-tcp_bbr: v2: export FLAG_ECE in rate_sample.is_ece
    
    For understanding the relationship between inflight and ECN signals,
    to try to find the highest inflight value that has acceptable levels
    ECN marking.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 3eba998f2898541406c2666781182200934965a8
    Change-Id: I3a964e04cee83e11649a54507043d2dfe769a3b3

commit e18459d2eac0dd76376ebd912bb6a69ad2276711
Author: Neal Cardwell <ncardwell@google.com>
Date:   Thu Oct 12 23:44:27 2017 -0400

    net-tcp_bbr: v2: count packets lost over TCP rate sampling interval
    
    For understanding the relationship between inflight and packet loss
    signals, to try to find the highest inflight value that has acceptable
    levels of packet losses.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 4527e26b2bd7756a88b5b9ef1ada3da33dd609ab
    Change-Id: I594c2500868d9c530770e7ddd68ffc87c57f4fd5

commit 71527a4da4853cd8b0092786152c789cde1e4195
Author: Neal Cardwell <ncardwell@google.com>
Date:   Sat Aug 5 11:49:50 2017 -0400

    net-tcp_bbr: v2: snapshot packets in flight at transmit time and pass in rate_sample
    
    For understanding the relationship between inflight and losses or ECN
    signals, to try to find the highest inflight value that has acceptable
    levels of loss/ECN marking.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: b3eb4f2d20efab4ca001f32c9294739036c493ea
    Change-Id: I7314047d0ff14dd261a04b1969a46dc658c8836a

commit ef335de2e69e23d8af56a7b54ce6a2dbe057718a
Author: Neal Cardwell <ncardwell@google.com>
Date:   Sun Jun 24 21:55:59 2018 -0400

    net-tcp_bbr: v2: shrink delivered_mstamp, first_tx_mstamp to u32 to free up 8 bytes
    
    Free up some space for tracking inflight and losses for each
    bw sample, in upcoming commits.
    
    These timestamps are in microseconds, and are now stored in 32
    bits. So they can only hold time intervals up to roughly 2^12 = 4096
    seconds.  But Linux TCP RTT and RTO tracking has the same 32-bit
    microsecond implementation approach and resulting deployment
    limitations. So this is not introducing a new limit. And these should
    not be a limitation for the foreseeable future.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 238a7e6b5d51625fef1ce7769826a7b21b02ae55
    Change-Id: I3b779603797263b52a61ad57c565eb91fe42680c

commit ef83bb5ad3ba177b117970ede5684d84ab45552a
Author: Yuchung Cheng <ycheng@google.com>
Date:   Tue Mar 27 18:01:46 2018 -0700

    net-tcp_rate: account for CE marks in rate sample
    
    This patch counts number of packets delivered have CE mark in the
    rate sample, using similar approach of delivery accounting.
    
    Effort: net-tcp_rate
    Origin-9xx-SHA1: 710644db434c3da335a7c8b72207a671ccbb5cf8
    Change-Id: I0968fb33fe19b5c774e8c3afd2685558a6ec8710

commit bf480a14b514649e6f5f31a7d20c118a4d54d269
Author: Yuchung Cheng <ycheng@google.com>
Date:   Tue Mar 27 18:33:29 2018 -0700

    net-tcp_rate: consolidate inflight tracking approaches in TCP
    
    In order to track CE marks per rate sample (one round trip), we'll
    need to snap the starting tcp delivered_ce acount in the packet
    meta header (tcp_skb_cb). But there's not enough space.
    
    Good news is that the "last_in_flight" in the header, used by
    NV congestion control, is almost equivalent as "delivered". In
    fact "delivered" is better by accounting out-of-order packets
    additionally.  Therefore we can remove it to make room for the
    CE tracking.
    
    This would make delayed ACK detection slightly less accurate but the
    impact is negligible since it's not used for any critical control.
    
    Effort: net-tcp_rate
    Origin-9xx-SHA1: ddcd46ec85d5f1c4454258af0c54b3254c0d64a7
    Change-Id: I1a184aad6d101c981ac7f2f275aa9417ff856910

commit b7f13dd1e1360a80a0857ff970d80163b9849a2e
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue Jun 11 12:26:55 2019 -0400

    net-tcp_bbr: broaden app-limited rate sample detection
    
    This commit is a bug fix for the Linux TCP app-limited
    (application-limited) logic that is used for collecting rate
    (bandwidth) samples.
    
    Previously the app-limited logic only looked for "bubbles" of
    silence in between application writes, by checking at the start
    of each sendmsg. But "bubbles" of silence can also happen before
    retransmits: e.g. bubbles can happen between an application write
    and a retransmit, or between two retransmits.
    
    Retransmits are triggered by ACKs or timers. So this commit checks
    for bubbles of app-limited silence upon ACKs or timers.
    
    Why does this commit check for app-limited state at the start of
    ACKs and timer handling? Because at that point we know whether
    inflight was fully using the cwnd.  During processing the ACK or
    timer event we often change the cwnd; after changing the cwnd we
    can't know whether inflight was fully using the old cwnd.
    
    Origin-9xx-SHA1: 3fe9b53291e018407780fb8c356adb5666722cbc
    Change-Id: I37221506f5166877c2b110753d39bb0757985e68

commit fb9b1dba130defde38b089c1c8e44c0fb2ccce48
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    kernel: Enable waitpid() for futex2
    
    To make pthreads works as expected if they are using futex2, wake
    clear_child_tid with futex2 as well. This is make applications that uses
    waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the
    child to terminate. Given that apps should not mix futex() and futex2(),
    any correct app will trigger a harmless noop wakeup on the interface
    that it isn't using.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 063f09291162b4036f343116e79d58d735c2b664
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    perf bench: Add futex2 benchmark tests
    
    Add support at the existing futex benchmarking code base to enable
    futex2 calls. `perf bench` tests can be used not only as a way to
    measure the performance of implementation, but also as stress testing
    for the kernel infrastructure.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 3728fd9da4fa785f95230b079801c15612f29b98
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    selftests: futex2: Add requeue test
    
    Add testing for futex_requeue(). The first test just requeue from one
    waiter to another one, and wake it. The second performs both wake and
    requeue, and we check return values to see if the operation
    woke/requeued the expected number of waiters.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 364c13d345a992940785debc76a3c294c76ff32c
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    selftests: futex2: Add waitv test
    
    Create a new file to test the waitv mechanism. Test both private and
    shared futexes. Wake the last futex in the array, and check if the
    return value from futex_waitv() is the right index.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit ac6aedcc75843b6a1336390bf075fec515d4ff59
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    selftests: futex2: Add wouldblock test
    
    Adapt existing futex wait wouldblock file to test the same mechanism for
    futex2.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 5210402efbe49acb5bbdc5235400debf9cc54ef6
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    selftests: futex2: Add timeout test
    
    Adapt existing futex wait timeout file to test the same mechanism for
    futex2. futex2 accepts only absolute 64bit timers, but supports both
    monotonic and realtime clocks.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 17273f73fc1d972be4405ac20829d6557424f06e
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    selftests: futex2: Add wake/wait test
    
    Add a simple file to test wake/wait mechanism using futex2 interface.
    Test three scenarios: using a common local int variable as private
    futex, a shm futex as shared futex and a file-backed shared memory as a
    shared futex. This should test all branches of futex_get_key().
    
    Create helper files so more tests can evaluate futex2. While 32bit ABIs
    from glibc aren't yet able to use 64 bit sized time variables, add a
    temporary workaround that implements the required types and calls the
    appropriated syscalls, since futex2 doesn't supports 32 bit sized time.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 3a436bacb510d4baf4f261afa2c296a498d68d7b
Author: André Almeida <andrealmeid@collabora.com>
Date:   Tue Feb 9 13:59:00 2021 -0300

    docs: locking: futex2: Add documentation
    
    Add a new documentation file specifying both userspace API and internal
    implementation details of futex2 syscalls.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit dfbd7ee0e0ab45ec50ab015a7fd8b291403c3816
Author: André Almeida <andrealmeid@collabora.com>
Date:   Thu Feb 11 10:47:23 2021 -0300

    futex2: Add compatibility entry point for x86_x32 ABI
    
    New syscalls should use the same entry point for x86_64 and x86_x32
    paths. Add a wrapper for x32 calls to use parse functions that assumes
    32bit pointers.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 55a89f30afbb770a801707186cdac407f5e445eb
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    futex2: Implement requeue operation
    
    Implement requeue interface similarly to FUTEX_CMP_REQUEUE operation.
    This is the syscall implemented by this patch:
    
    futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2,
                  unsigned int nr_wake, unsigned int nr_requeue,
                  unsigned int cmpval, unsigned int flags)
    
    struct futex_requeue {
            void *uaddr;
            unsigned int flags;
    };
    
    If (uaddr1->uaddr == cmpval), wake at uaddr1->uaddr a nr_wake number of
    waiters and then, remove a number of nr_requeue waiters at uaddr1->uaddr
    and add them to uaddr2->uaddr list. Each uaddr has its own set of flags,
    that must be defined at struct futex_requeue (such as size, shared, NUMA).
    The flags argument of the syscall is there just for the sake of
    extensibility, and right now it needs to be zero.
    
    Return the number of the woken futexes + the number of requeued ones on
    success, error code otherwise.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 4a1a89328b2a9d351d93146edcf7870d5f220bfc
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:00 2021 -0300

    futex2: Implement vectorized wait
    
    Add support to wait on multiple futexes. This is the interface
    implemented by this syscall:
    
    futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes,
                unsigned int flags, struct timespec *timo)
    
    struct futex_waitv {
            void *uaddr;
            unsigned int val;
            unsigned int flags;
    };
    
    Given an array of struct futex_waitv, wait on each uaddr. The thread
    wakes if a futex_wake() is performed at any uaddr. The syscall returns
    immediately if any waiter has *uaddr != val. *timo is an optional
    timeout value for the operation. The flags argument of the syscall
    should be used solely for specifying the timeout as realtime, if needed.
    Flags for shared futexes, sizes, etc. should be used on the individual
    flags of each waiter.
    
    Returns the array index of one of the awakened futexes. There’s no given
    information of how many were awakened, or any particular attribute of it
    (if it’s the first awakened, if it is of the smaller index...).
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit ed939f97e4aaef8c872edc9a014a8ac8290fd975
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    futex2: Add support for shared futexes
    
    Add support for shared futexes for cross-process resources. This design
    relies on the same approach done in old futex to create an unique id for
    file-backed shared memory, by using a counter at struct inode.
    
    There are two types of futexes: private and shared ones. The private are
    futexes meant to be used by threads that shares the same memory space,
    are easier to be uniquely identified an thus can have some performance
    optimization. The elements for identifying one are: the start address of
    the page where the address is, the address offset within the page and
    the current->mm pointer.
    
    Now, for uniquely identifying shared futex:
    
    - If the page containing the user address is an anonymous page, we can
      just use the same data used for private futexes (the start address of
      the page, the address offset within the page and the current->mm
      pointer) that will be enough for uniquely identifying such futex. We
      also set one bit at the key to differentiate if a private futex is
      used on the same address (mixing shared and private calls are not
      allowed).
    
    - If the page is file-backed, current->mm maybe isn't the same one for
      every user of this futex, so we need to use other data: the
      page->index, an UUID for the struct inode and the offset within the
      page.
    
    Note that members of futex_key doesn't have any particular meaning after
    they are part of the struct - they are just bytes to identify a futex.
    Given that, we don't need to use a particular name or type that matches
    the original data, we only need to care about the bitsize of each
    component and make both private and shared data fit in the same memory
    space.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit e2c4808dcb8076acad938d19ddc375552e80764b
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:00 2021 -0300

    futex2: Implement wait and wake functions
    
    Create a new set of futex syscalls known as futex2. This new interface
    is aimed to implement a more maintainable code, while removing obsolete
    features and expanding it with new functionalities.
    
    Implements wait and wake semantics for futexes, along with the base
    infrastructure for future operations. The whole wait path is designed to
    be used by N waiters, thus making easier to implement vectorized wait.
    
    * Syscalls implemented by this patch:
    
    - futex_wait(void *uaddr, unsigned int val, unsigned int flags,
                 struct timespec *timo)
    
       The user thread is put to sleep, waiting for a futex_wake() at uaddr,
       if the value at *uaddr is the same as val (otherwise, the syscall
       returns immediately with -EAGAIN). timo is an optional timeout value
       for the operation.
    
       Return 0 on success, error code otherwise.
    
     - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)
    
       Wake `nr_wake` threads waiting at uaddr.
    
       Return the number of woken threads on success, error code otherwise.
    
    ** The `flag` argument
    
     The flag is used to specify the size of the futex word
     (FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no
     default size.
    
     By default, the timeout uses a monotonic clock, but can be used as a
     realtime one by using the FUTEX_REALTIME_CLOCK flag.
    
     By default, futexes are of the private type, that means that this user
     address will be accessed by threads that shares the same memory region.
     This allows for some internal optimizations, so they are faster.
     However, if the address needs to be shared with different processes
     (like using `mmap()` or `shm()`), they need to be defined as shared and
     the flag FUTEX_SHARED_FLAG is used to set that.
    
     By default, the operation has no NUMA-awareness, meaning that the user
     can't choose the memory node where the kernel side futex data will be
     stored. The user can choose the node where it wants to operate by
     setting the FUTEX_NUMA_FLAG and using the following structure (where X
     can be 8, 16, or 32):
    
      struct futexX_numa {
              __uX value;
              __sX hint;
      };
    
     This structure should be passed at the `void *uaddr` of futex
     functions. The address of the structure will be used to be waited/waken
     on, and the `value` will be compared to `val` as usual. The `hint`
     member is used to defined which node the futex will use. When waiting,
     the futex will be registered on a kernel-side table stored on that
     node; when waking, the futex will be searched for on that given table.
     That means that there's no redundancy between tables, and the wrong
     `hint` value will led to undesired behavior.  Userspace is responsible
     for dealing with node migrations issues that may occur. `hint` can
     range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use
     the same node the current process is using.
    
     When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be
     stored on a global table on some node, defined at compilation time.
    
    ** The `timo` argument
    
    As per the Y2038 work done in the kernel, new interfaces shouldn't add
    timeout options known to be buggy. Given that, `timo` should be a 64bit
    timeout at all platforms, using an absolute timeout value.
    
    Signed-off-by: André Almeida <andrealmeid@collabora.com>

commit 4dcff2d5188229437d00eb9e6d6f8bb133a1e625
Author: Gabriel Krisman Bertazi <krisman@collabora.com>
Date:   Sat Jan 30 11:57:02 2021 -0300

    futex: Implement mechanism to wait on any of several futexes
    
    This is a new futex operation, called FUTEX_WAIT_MULTIPLE, which allows
    a thread to wait on several futexes at the same time, and be awoken by
    any of them.  In a sense, it implements one of the features that was
    supported by pooling on the old FUTEX_FD interface.
    
    The use case lies in the Wine implementation of the Windows NT interface
    WaitMultipleObjects. This Windows API function allows a thread to sleep
    waiting on the first of a set of event sources (mutexes, timers, signal,
    console input, etc) to signal.  Considering this is a primitive
    synchronization operation for Windows applications, being able to quickly
    signal events on the producer side, and quickly go to sleep on the
    consumer side is essential for good performance of those running over Wine.
    
    Wine developers have an implementation that uses eventfd, but it suffers
    from FD exhaustion (there is applications that go to the order of
    multi-milion FDs), and higher CPU utilization than this new operation.
    
    The futex list is passed as an array of `struct futex_wait_block`
    (pointer, value, bitset) to the kernel, which will enqueue all of them
    and sleep if none was already triggered. It returns a hint of which
    futex caused the wake up event to userspace, but the hint doesn't
    guarantee that is the only futex triggered.  Before calling the syscall
    again, userspace should traverse the list, trying to re-acquire any of
    the other futexes, to prevent an immediate -EWOULDBLOCK return code from
    the kernel.
    
    This was tested using three mechanisms:
    
    1) By reimplementing FUTEX_WAIT in terms of FUTEX_WAIT_MULTIPLE and
    running the unmodified tools/testing/selftests/futex and a full linux
    distro on top of this kernel.
    
    2) By an example code that exercises the FUTEX_WAIT_MULTIPLE path on a
    multi-threaded, event-handling setup.
    
    3) By running the Wine fsync implementation and executing multi-threaded
    applications, in particular modern games, on top of this implementation.
    
    Changes were tested for the following ABIs: x86_64, i386 and x32.
    Support for x32 applications is not implemented since it would
    take a major rework adding a new entry point and splitting the current
    futex 64 entry point in two and we can't change the current x32 syscall
    number without breaking user space compatibility.
    
    Included Valve's Proton compatibility code.
    
    Adjusted for v5.9: Removed `put_futex_key` calls.
    
    CC: Steven Rostedt <rostedt@goodmis.org>
    Cc: Richard Yao <ryao@gentoo.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Co-developed-by: Zebediah Figura <z.figura12@gmail.com>
    Signed-off-by: Zebediah Figura <z.figura12@gmail.com>
    Co-developed-by: Steven Noonan <steven@valvesoftware.com>
    Signed-off-by: Steven Noonan <steven@valvesoftware.com>
    Co-developed-by: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com>
    Signed-off-by: Pierre-Loup A. Griffais <pgriffais@valvesoftware.com>
    Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
    [Added compatibility code]
    Co-developed-by: André Almeida <andrealmeid@collabora.com>
    Signed-off-by: André Almeida <andrealmeid@collabora.com>
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 6cb1628344e53c1e4241ddf61c6477a64f43bc90
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Dec 14 19:09:01 2020 +0000

    clockevents, hrtimer: Make hrtimer granularity and minimum hrtimeout configurable in sysctl. Set default granularity to 100us and min timeout to 500us
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 3171d4c85e07207a489d126ae1817d08396468a1
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Feb 20 13:32:58 2017 +1100

    time: Don't use hrtimer overlay when pm_freezing since some drivers still don't correctly use freezable timeouts.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 7e6da4f601f11604b6a0f4836dd3ae27a0b2e285
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Feb 20 13:30:32 2017 +1100

    hrtimer: Replace all calls to schedule_timeout_uninterruptible of potentially under 50ms to use schedule_msec_hrtimeout_uninterruptible
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 677095c12af4e6b5262981f25f699637bb6dd75d
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Feb 20 13:30:07 2017 +1100

    hrtimer: Replace all calls to schedule_timeout_interruptible of potentially under 50ms to use schedule_msec_hrtimeout_interruptible.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 659b093c3e6f2e85fcfbeaa16f78ee7379f734e0
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Feb 15 21:56:16 2021 +0000

    hrtimer: Replace all schedule timeout(1) with schedule_min_hrtimeout()
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit e45b16b57f21526565ab015268ede50269bccfbe
Author: Con Kolivas <kernel@kolivas.org>
Date:   Fri Nov 4 09:25:54 2016 +1100

    timer: Convert msleep to use hrtimers when active.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 5f97a164368b5a37d7a47f38b02f87674b1f3908
Author: Con Kolivas <kernel@kolivas.org>
Date:   Sat Nov 5 09:27:36 2016 +1100

    time: Special case calls of schedule_timeout(1) to use the min hrtimeout of 1ms, working around low Hz resolutions.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 9e8b253b51b7c4de6fbe0d6990656f7014523e25
Author: Con Kolivas <kernel@kolivas.org>
Date:   Sat Aug 12 11:53:39 2017 +1000

    hrtimer: Create highres timeout variants of schedule_timeout functions.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit def949464c1155b36fe47bb805878b3c436f2258
Author: Serge Hallyn <serge.hallyn@canonical.com>
Date:   Fri May 31 19:12:12 2013 +0100

    sysctl: add sysctl to disallow unprivileged CLONE_NEWUSER by default
    
    add sysctl to disallow unprivileged CLONE_NEWUSER by default
    
    This is a short-term patch.  Unprivileged use of CLONE_NEWUSER
    is certainly an intended feature of user namespaces.  However
    for at least saucy we want to make sure that, if any security
    issues are found, we have a fail-safe.
    
    Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
    [bwh: Remove unneeded binary sysctl bits]
    [bwh: Keep this sysctl, but change the default to enabled]

commit 304956bbffac6dc1e69255dc0b6f8e8d8d20d046
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Thu Sep 3 20:36:13 2020 +0000

    XANMOD: init/Kconfig: Enable -O3 KBUILD_CFLAGS optimization for all architectures
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 510d1f92759f493b1c67f434c27f120eec5e2cc2
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Thu Jun 25 16:40:43 2020 -0300

    XANMOD: lib/kconfig.debug: disable default CONFIG_SYMBOLIC_ERRNAME and CONFIG_DEBUG_BUGVERBOSE
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 3d7289c9b78900385a87feddff8b7078b1b8b22d
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Mon Jan 29 17:41:29 2018 +0000

    XANMOD: scripts: disable the localversion "+" tag of a git repo
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit b8c6ca8120cb90b5d7950ac4ce74506a0b211f53
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Tue Mar 31 13:32:08 2020 -0300

    XANMOD: cpufreq: tunes ondemand and conservative governor for performance
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit b32129893af46015289030a58ab161bcae30fcff
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Mon Jan 29 17:31:25 2018 +0000

    XANMOD: mm/vmscan: vm_swappiness = 30 decreases the amount of swapping
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 2fa57761d30722ab8a35fc9c552bfbb22d754408
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Thu Aug 13 14:57:06 2020 +0000

    XANMOD: sched/autogroup: Add kernel parameter and config option to enable/disable autogroup feature by default
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit fbe47c7621434844942b8bbcf6dbc51f1a17ac54
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Mon Jan 29 16:59:22 2018 +0000

    XANMOD: dcache: cache_pressure = 50 decreases the rate at which VFS caches are reclaimed
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit a23c9bb3f4c580099298933e7af6ad08c2c73ede
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Sun Oct 13 03:10:39 2019 -0300

    XANMOD: kconfig: set PREEMPT and RCU_BOOST without delay by default
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 101559dd026662f89f38397532079cea0bb2b667
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Mon Jan 29 17:26:15 2018 +0000

    XANMOD: kconfig: add 500Hz timer interrupt kernel config option
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 374104a9829d9c93dfec99f52abf2a3a223d8948
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Mon Dec 14 16:24:26 2020 +0000

    XANMOD: block: set rq_affinity to force full multithreading I/O requests
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 59c8dee296ddd20cc155b5b42fc5d3c678dba1eb
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Mon Jun 1 18:23:51 2020 -0300

    XANMOD: block, bfq: change BLK_DEV_ZONED depends to IOSCHED_BFQ
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit fb1b8b1ed9619b6f598fdc292a4e8940826665e2
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Mon Nov 25 15:13:06 2019 -0300

    XANMOD: elevator: set default scheduler to bfq for blk-mq
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 9f4ad9e425a1d3b6a34617b8ea226d56a119a717
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Apr 25 13:49:08 2021 -0700

    Linux 5.12