commit 60fc05ca606199dc68d8a0340208c7d81fb0a2bb
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Tue Jun 29 17:14:02 2021 +0000

    Linux 5.13.0-xanmod1
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit a543c785fb9b0667f652b0d5fff72d1d583a1f14
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    futex2: Add sysfs entry for syscall numbers
    
    In the course of futex2 development, it will be rebased on top of
    different kernel releases, and the syscall number can change in this
    process. Expose futex2 syscall number via sysfs so tools that are
    experimenting with futex2 (like Proton/Wine) can test it and set the
    syscall number at runtime, rather than setting it at compilation time.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit cda39e4a94bfcc37ef7c076c45a9c3cd06f8a011
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    kernel: Enable waitpid() for futex2
    
    To make pthreads works as expected if they are using futex2, wake
    clear_child_tid with futex2 as well. This is make applications that uses
    waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the
    child to terminate. Given that apps should not mix futex() and futex2(),
    any correct app will trigger a harmless noop wakeup on the interface
    that it isn't using.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit ea28e4268af7767743dbc6e8d4e8dd2e8dd9bc43
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    perf bench: Add futex2 benchmark tests
    
    Add support at the existing futex benchmarking code base to enable
    futex2 calls. `perf bench` tests can be used not only as a way to
    measure the performance of implementation, but also as stress testing
    for the kernel infrastructure.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit b210ef8a11ecb759721f859bb2d666270ecc8cd2
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    selftests: futex2: Add requeue test
    
    Add testing for futex_requeue(). The first test just requeue from one
    waiter to another one, and wake it. The second performs both wake and
    requeue, and we check return values to see if the operation
    woke/requeued the expected number of waiters.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit 89340f2e0019bd13093f549f301e2dfa7f42c69e
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:02 2021 -0300

    selftests: futex2: Add waitv test
    
    Create a new file to test the waitv mechanism. Test both private and
    shared futexes. Wake the last futex in the array, and check if the
    return value from futex_waitv() is the right index.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit ccd8ae7aeb9d31aacf076d4678bd9393be2038b8
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    selftests: futex2: Add wouldblock test
    
    Adapt existing futex wait wouldblock file to test the same mechanism for
    futex2.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit 5ba67bd8162499f6d95f090095d0ae856d5a5a41
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    selftests: futex2: Add timeout test
    
    Adapt existing futex wait timeout file to test the same mechanism for
    futex2. futex2 accepts only absolute 64bit timers, but supports both
    monotonic and realtime clocks.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit 96e0a1497d297d62188b3c8f23d5e865a3b359f0
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    selftests: futex2: Add wake/wait test
    
    Add a simple file to test wake/wait mechanism using futex2 interface.
    Test three scenarios: using a common local int variable as private
    futex, a shm futex as shared futex and a file-backed shared memory as a
    shared futex. This should test all branches of futex_get_key().
    
    Create helper files so more tests can evaluate futex2. While 32bit ABIs
    from glibc aren't yet able to use 64 bit sized time variables, add a
    temporary workaround that implements the required types and calls the
    appropriated syscalls, since futex2 doesn't supports 32 bit sized time.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit 641c6815a5ddadf150102b68e3750f35157220ae
Author: André Almeida <andrealmeid@collabora.com>
Date:   Tue Feb 9 13:59:00 2021 -0300

    docs: locking: futex2: Add documentation
    
    Add a new documentation file specifying both userspace API and internal
    implementation details of futex2 syscalls.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit f156016764cb8c151e5c1ff6fbdcd236b8614ead
Author: André Almeida <andrealmeid@collabora.com>
Date:   Thu Feb 11 10:47:23 2021 -0300

    futex2: Add compatibility entry point for x86_x32 ABI
    
    New syscalls should use the same entry point for x86_64 and x86_x32
    paths. Add a wrapper for x32 calls to use parse functions that assumes
    32bit pointers.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit 39b86f9fdcf0ae22037690185f39a4ac3b1fd118
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    futex2: Implement requeue operation
    
    Implement requeue interface similary to FUTEX_CMP_REQUEUE operation.
    This is the syscall implemented by this patch:
    
    futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2,
                  unsigned int nr_wake, unsigned int nr_requeue,
                  unsigned int cmpval, unsigned int flags)
    
    struct futex_requeue {
            void *uaddr;
            unsigned int flags;
    };
    
    If (uaddr1->uaddr == cmpval), wake at uaddr1->uaddr a nr_wake number of
    waiters and then, remove a number of nr_requeue waiters at uaddr1->uaddr
    and add them to uaddr2->uaddr list. Each uaddr has its own set of flags,
    that must be defined at struct futex_requeue (such as size, shared, NUMA).
    The flags argument of the syscall is there just for the sake of
    extensibility, and right now it needs to be zero.
    
    Return the number of the woken futexes + the number of requeued ones on
    success, error code otherwise.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>
    
    Rebased-by: Joshua Ashton <joshua@froggi.es>

commit 59c86f97e01ec6163a2527af393b2c3e28a6a798
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:00 2021 -0300

    futex2: Implement vectorized wait
    
    Add support to wait on multiple futexes. This is the interface
    implemented by this syscall:
    
    futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes,
                unsigned int flags, struct timespec *timo)
    
    struct futex_waitv {
            void *uaddr;
            unsigned int val;
            unsigned int flags;
    };
    
    Given an array of struct futex_waitv, wait on each uaddr. The thread
    wakes if a futex_wake() is performed at any uaddr. The syscall returns
    immediately if any waiter has *uaddr != val. *timo is an optional
    timeout value for the operation. The flags argument of the syscall
    should be used solely for specifying the timeout as realtime, if needed.
    Flags for shared futexes, sizes, etc. should be used on the individual
    flags of each waiter.
    
    Returns the array index of one of the awakened futexes. Thereâ€™s no given
    information of how many were awakened, or any particular attribute of it
    (if itâ€™s the first awakened, if it is of the smaller index...).
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>
    
    Rebased-by: Joshua Ashton <joshua@froggi.es>

commit b5384ddb56accd2d54d23854ad288bc65bc4ab82
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:01 2021 -0300

    futex2: Add support for shared futexes
    
    Add support for shared futexes for cross-process resources. This design
    relies on the same approach done in old futex to create an unique id for
    file-backed shared memory, by using a counter at struct inode.
    
    There are two types of futexes: private and shared ones. The private are futexes
    meant to be used by threads that shares the same memory space, are easier to be
    uniquely identified an thus can have some performance optimization. The elements
    for identifying one are: the start address of the page where the address is,
    the address offset within the page and the current->mm pointer.
    
    Now, for uniquely identifying shared futex:
    
    - If the page containing the user address is an anonymous page, we can
      just use the same data used for private futexes (the start address of
      the page, the address offset within the page and the current->mm
      pointer) that will be enough for uniquely identifying such futex. We
      also set one bit at the key to differentiate if a private futex is
      used on the same address (mixing shared and private calls are not
      allowed).
    
    - If the page is file-backed, current->mm maybe isn't the same one for
      every user of this futex, so we need to use other data: the
      page->index, an UUID for the struct inode and the offset within the
      page.
    
    Note that members of futex_key doesn't have any particular meaning after they
    are part of the struct - they are just bytes to identify a futex.  Given that,
    we don't need to use a particular name or type that matches the original data,
    we only need to care about the bitsize of each component and make both private
    and shared data fit in the same memory space.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>

commit 4f95b459e1a9ea5a90234cf2e25144588d2a40d9
Author: André Almeida <andrealmeid@collabora.com>
Date:   Fri Feb 5 10:34:00 2021 -0300

    futex2: Implement wait and wake functions
    
    Create a new set of futex syscalls known as futex2. This new interface
    is aimed to implement a more maintainable code, while removing obsolete
    features and expanding it with new functionalities.
    
    Implements wait and wake semantics for futexes, along with the base
    infrastructure for future operations. The whole wait path is designed to
    be used by N waiters, thus making easier to implement vectorized wait.
    
    * Syscalls implemented by this patch:
    
    - futex_wait(void *uaddr, unsigned int val, unsigned int flags,
                 struct timespec *timo)
    
       The user thread is put to sleep, waiting for a futex_wake() at uaddr,
       if the value at *uaddr is the same as val (otherwise, the syscall
       returns immediately with -EAGAIN). timo is an optional timeout value
       for the operation.
    
       Return 0 on success, error code otherwise.
    
     - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags)
    
       Wake `nr_wake` threads waiting at uaddr.
    
       Return the number of woken threads on success, error code otherwise.
    
    ** The `flag` argument
    
     The flag is used to specify the size of the futex word
     (FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no
     default size.
    
     By default, the timeout uses a monotonic clock, but can be used as a realtime
     one by using the FUTEX_REALTIME_CLOCK flag.
    
     By default, futexes are of the private type, that means that this user address
     will be accessed by threads that shares the same memory region. This allows for
     some internal optimizations, so they are faster. However, if the address needs
     to be shared with different processes (like using `mmap()` or `shm()`), they
     need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that.
    
     By default, the operation has no NUMA-awareness, meaning that the user can't
     choose the memory node where the kernel side futex data will be stored. The
     user can choose the node where it wants to operate by setting the
     FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or
     32):
    
      struct futexX_numa {
              __uX value;
              __sX hint;
      };
    
     This structure should be passed at the `void *uaddr` of futex functions. The
     address of the structure will be used to be waited/waken on, and the
     `value` will be compared to `val` as usual. The `hint` member is used to
     defined which node the futex will use. When waiting, the futex will be
     registered on a kernel-side table stored on that node; when waking, the futex
     will be searched for on that given table. That means that there's no redundancy
     between tables, and the wrong `hint` value will led to undesired behavior.
     Userspace is responsible for dealing with node migrations issues that may
     occur. `hint` can range from [0, MAX_NUMA_NODES], for specifying a node, or
     -1, to use the same node the current process is using.
    
     When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a
     global table on some node, defined at compilation time.
    
    ** The `timo` argument
    
    As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout
    options known to be buggy. Given that, `timo` should be a 64bit timeout at
    all platforms, using an absolute timeout value.
    
    Signed-off-by: AndrÃ© Almeida <andrealmeid@collabora.com>
    
    Rebased-by: Joshua Ashton <joshua@froggi.es>

commit 74c1627d5a8ea7a11916de59ba154004b6f83bb9
Author: John Wood <john.wood@gmx.com>
Date:   Sat Jun 5 17:04:05 2021 +0200

    MAINTAINERS: Add a new entry for the Brute LSM
    
    In order to maintain the code for the Brute LSM add a new entry to the
    maintainers list.
    
    Signed-off-by: John Wood <john.wood@gmx.com>

commit 9f7c2580970bd6d996d19750a9053ed126a87f2a
Author: John Wood <john.wood@gmx.com>
Date:   Sat Jun 5 17:04:04 2021 +0200

    Documentation: Add documentation for the Brute LSM
    
    Add some info detailing what is the Brute LSM, its motivation, weak
    points of existing implementations, proposed solutions, notifications,
    enabling, disabling, configuration and self-tests.
    
    Signed-off-by: John Wood <john.wood@gmx.com>

commit 687990775a7d44bcb60cee58c2be5b456f90d243
Author: John Wood <john.wood@gmx.com>
Date:   Sat Jun 5 17:04:03 2021 +0200

    selftests/brute: Add tests for the Brute LSM
    
    Add tests to check the Brute LSM functionality and cover fork/exec brute
    force attacks crossing the following privilege boundaries:
    
    1.- setuid process
    2.- privilege changes
    3.- network to local
    
    Also, as a first step check that fork/exec brute force attacks without
    crossing any privilege boundary already commented doesn't trigger the
    detection and mitigation stage.
    
    Moreover, test if the userspace notification, via "waitid" system call,
    is sent when an attack is mitigated (to inform that all the offending
    tasks involved in the attack have been killed by Brute LSM).
    
    Once a brute force attack is detected, the "test" executable is marked
    as "not allowed". To start again a new test, use the "rmxattr" app to
    revert this state. This way, all the tests can be run using the same
    binary.
    
    Signed-off-by: John Wood <john.wood@gmx.com>

commit 1efb7f356df2b0d07ebdec50e533ad84db9a14d0
Author: John Wood <john.wood@gmx.com>
Date:   Sat Jun 5 17:04:02 2021 +0200

    security/brute: Notify to userspace "task killed"
    
    Add a new SIGCHLD si_code to notify to userspace, using the "waitid"
    system call, that a task has been killed by Brute LSM to mitigate a
    brute force attack.
    
    This is useful to supervisors in order to decide if a process that has
    been killed to avoid an attack needs to be respawned. This way, it is
    possible to avoid the scenario where a brute force attack can be
    continued due to the respawn of a process. Although the xattr of the
    executable is accessible from userspace, in complex daemons this file
    may not be visible directly by the supervisor as it may be run through
    some wrapper. So, the waitid notification is necessary.
    
    To achieve this, use the task_struct security blob to hold a flag that
    shows when a task has been killed by Brute LSM, and also, test this flag
    in the "wait_task_zombie" and "do_notify_parent" functions.
    
    Suggested-by: Andi Kleen <ak@linux.intel.com>
    Signed-off-by: John Wood <john.wood@gmx.com>

commit ae4be957f3e282604517d34727dcd8cd721aea18
Author: John Wood <john.wood@gmx.com>
Date:   Sat Jun 5 17:04:01 2021 +0200

    security/brute: Mitigate a brute force attack
    
    When a brute force attack is detected all the offending tasks involved
    in the attack must be killed. In other words, it is necessary to kill
    all the tasks that are executing the same file that is running during
    the brute force attack.
    
    Also, to prevent the executable involved in the attack from being
    respawned by a supervisor, and thus prevent a brute force attack from
    being started again, test the "not_allowed" flag and avoid the file
    execution based on this.
    
    Signed-off-by: John Wood <john.wood@gmx.com>

commit f60567c9dcdc772321e5279d96fa1b8c3a3e50f9
Author: John Wood <john.wood@gmx.com>
Date:   Sat Jun 5 17:04:00 2021 +0200

    security/brute: Detect a brute force attack
    
    For a correct management of a fork brute force attack it is necessary to
    track all the information related to the application crashes. To do so,
    use the extended attributes (xattr) of the executable files and define a
    statistical data structure to hold all the necessary information shared
    by all the fork hierarchy processes. This info is the number of crashes,
    the last crash timestamp and the crash period's moving average.
    
    The same can be achieved using a pointer to the fork hierarchy
    statistical data held by the task_struct structure. But this has an
    important drawback: a brute force attack that happens through the execve
    system call losts the faults info since these statistics are freed when
    the fork hierarchy disappears. Using this method makes not possible to
    manage this attack type that can be successfully treated using extended
    attributes.
    
    Also, to avoid false positives during the attack detection it is
    necessary to narrow the possible cases. So, only the following scenarios
    are taken into account:
    
    1.- Launching (fork()/exec()) a setuid/setgid process repeatedly until a
        desirable memory layout is got (e.g. Stack Clash).
    2.- Connecting to an exec()ing network daemon (e.g. xinetd) repeatedly
        until a desirable memory layout is got (e.g. what CTFs do for simple
        network service).
    3.- Launching processes without exec() (e.g. Android Zygote) and
        exposing state to attack a sibling.
    4.- Connecting to a fork()ing network daemon (e.g. apache) repeatedly
        until the previously shared memory layout of all the other children
        is exposed (e.g. kind of related to HeartBleed).
    
    In each case, a privilege boundary has been crossed:
    
    Case 1: setuid/setgid process
    Case 2: network to local
    Case 3: privilege changes
    Case 4: network to local
    
    To mark that a privilege boundary has been crossed it is only necessary
    to create a new stats for the executable file via the extended attribute
    and only if it has no previous statistical data. This is done using four
    different LSM hooks, one per privilege boundary:
    
    setuid/setgid process --> bprm_creds_from_file hook (based on secureexec
                              flag).
    network to local -------> socket_accept hook (taking into account only
                              external connections).
    privilege changes ------> task_fix_setuid and task_fix_setgid hooks.
    
    To detect a brute force attack it is necessary that the executable file
    statistics be updated in every fatal crash and the most important data
    to update is the application crash period. To do so, use the new
    "task_fatal_signal" LSM hook added in a previous step.
    
    The application crash period must be a value that is not prone to change
    due to spurious data and follows the real crash period. So, to compute
    it, the exponential moving average (EMA) is used.
    
    Based on the updated statistics two different attacks can be handled. A
    slow brute force attack that is detected if the maximum number of faults
    per fork hierarchy is reached and a fast brute force attack that is
    detected if the application crash period falls below a certain
    threshold.
    
    Moreover, only the signals delivered by the kernel are taken into
    account with the exception of the SIGABRT signal since the latter is
    used by glibc for stack canary, malloc, etc failures, which may indicate
    that a mitigation has been triggered.
    
    Signed-off-by: John Wood <john.wood@gmx.com>

commit 2d334e739905a49eddb703ba28c63a4b37540aa1
Author: John Wood <john.wood@gmx.com>
Date:   Sat Jun 5 17:03:59 2021 +0200

    security/brute: Define a LSM and add sysctl attributes
    
    Add a new Kconfig file to define a menu entry under "Security options"
    to enable the "Fork brute force attack detection and mitigation"
    feature.
    
    The detection of a brute force attack can be based on the number of
    faults per application and its crash rate.
    
    There are two types of brute force attacks that can be detected. The
    first one is a slow brute force attack that is detected if the maximum
    number of faults per fork hierarchy is reached. The second type is a
    fast brute force attack that is detected if the application crash period
    falls below a certain threshold.
    
    The application crash period must be a value that is not prone to change
    due to spurious data and follows the real crash period. So, to compute
    it, the exponential moving average (EMA) will be used.
    
    This kind of average defines a weight (between 0 and 1) for the new
    value to add and applies the remainder of the weight to the current
    average value. This way, some spurious data will not excessively modify
    the average and only if the new values are persistent, the moving
    average will tend towards them.
    
    Mathematically the application crash period's EMA can be expressed as
    follows:
    
    period_ema = period * weight + period_ema * (1 - weight)
    
    Moreover, it is important to note that a minimum number of faults is
    needed to guarantee a trend in the crash period when the EMA is used.
    
    So, based on all the previous information define a LSM with five sysctl
    attributes that will be used to fine tune the attack detection.
    
    ema_weight_numerator
    ema_weight_denominator
    max_faults
    min_faults
    crash_period_threshold
    
    This patch is a previous step on the way to fine tune the attack
    detection.
    
    Signed-off-by: John Wood <john.wood@gmx.com>

commit 6542176ea3074d932ff243f1e2c644ba9de76d24
Author: John Wood <john.wood@gmx.com>
Date:   Sat Jun 5 17:03:58 2021 +0200

    security: Add LSM hook at the point where a task gets a fatal signal
    
    Add a security hook that allows a LSM to be notified when a task gets a
    fatal signal. This patch is a previous step on the way to compute the
    task crash period by the "brute" LSM (linux security module to detect
    and mitigate fork brute force attack against vulnerable userspace
    processes).
    
    Signed-off-by: John Wood <john.wood@gmx.com>
    Reviewed-by: Kees Cook <keescook@chromium.org>

commit 30671056e8c33ee0e132bb7a2bf69a532d86f376
Author: Anthony Ruhier <aruhier@mailbox.org>
Date:   Fri May 14 22:02:11 2021 +0200

    fs/ntfs3: Fix unsupported flags by clang (#146)

commit 88e07a562001c27846d36111108a7f8a147700be
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:47 2021 +0300

    fs/ntfs3: Add MAINTAINERS
    
    This adds MAINTAINERS
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 7c9d02befc039313db68b1519f1d141261f6beeb
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:46 2021 +0300

    fs/ntfs3: Add NTFS3 in fs/Kconfig and fs/Makefile
    
    This adds NTFS3 in fs/Kconfig and fs/Makefile
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit b89232db5a07a3e8b379e3ddf14299cc4d84f514
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:45 2021 +0300

    fs/ntfs3: Add Kconfig, Makefile and doc
    
    This adds Kconfig, Makefile and doc
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 266408faa802c2653c5391212486d86dc5881cf0
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:44 2021 +0300

    fs/ntfs3: Add NTFS journal
    
    This adds NTFS journal
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit af78fb43d06f0759313f9d63dfafb456e8e8e95e
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:43 2021 +0300

    fs/ntfs3: Add compression
    
    This patch adds different types of NTFS-applicable compressions:
    - lznt
    - lzx
    - xpress
    Latter two (lzx, xpress) implement Windows Compact OS feature and
    were taken from ntfs-3g system comression plugin authored by Eric Biggers
    (https://github.com/ebiggers/ntfs-3g-system-compression)
    which were ported to ntfs3 and adapted to Linux Kernel environment.
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit e52b0644215be33c465d0aad404ee5ff1ad7053f
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:42 2021 +0300

    fs/ntfs3: Add attrib operations
    
    This adds attrib operations
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 2eca3e5e19d8033857903d42d95086b44b1f16aa
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:41 2021 +0300

    fs/ntfs3: Add file operations and implementation
    
    This adds file operations and implementation
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 885829d6df27fc704a6061ea49697cd41130b0de
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:40 2021 +0300

    fs/ntfs3: Add bitmap
    
    This adds bitmap
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 0737d42306861ebee6c3ae2001258d05adb15a67
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:39 2021 +0300

    fs/ntfs3: Add initialization of super block
    
    This adds initialization of super block
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit 77a12f7fef92c974286636d6d5cced57f56bdbc3
Author: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Date:   Fri Apr 2 18:53:38 2021 +0300

    fs/ntfs3: Add headers and misc files
    
    This adds headers and misc files
    
    Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>

commit c3a3081d8fa6bdf83f910e4fb30d67c9f21d5b4b
Author: Alexey Avramov <hakavlad@inbox.lv>
Date:   Wed May 12 15:36:56 2021 +0000

    mm/vmscan: add sysctl knobs for protecting clean cache
    
    The patch provides sysctl knobs for protecting the specified amount of
    clean file pages (CFP) under memory pressure.
    
    The kernel does not have a mechanism for selectively protecting clean file
    pages. A certain amount of the CFP is required by the userspace for normal
    operation. First of all, you need a cache of shared libraries and
    executable files. If the volume of the CFP cache falls below a certain
    level, thrashing and even livelock occurs.
    
    Protection of CFP may be used to prevent thrashing and reducing I/O under
    memory pressure. Hard protection of CFP may be used to avoid high latency
    and prevent livelock in near-OOM conditions. The patch provides sysctl
    knobs for protecting the specified amount of clean file cache under memory
    pressure.
    
    The vm.clean_low_kbytes sysctl knob provides *best-effort* protection of
    CFP. The CFP on the current node won't be reclaimed uder memory pressure
    when their amount is below vm.clean_low_kbytes *unless* we threaten to OOM
    or have no free swap space or vm.swappiness=0. Setting it to a high value
    may result in a early eviction of anonymous pages into the swap space by
    attempting to hold the protected amount of clean file pages in memory. The
    default value is defined by CONFIG_CLEAN_LOW_KBYTES (suggested 150000 in
    Kconfig).
    
    The vm.clean_min_kbytes sysctl knob provides *hard* protection of CFP. The
    CFP on the current node won't be reclaimed under memory pressure when their
    amount is below vm.clean_min_kbytes. Setting it to a high value may result
    in a early out-of-memory condition due to the inability to reclaim the
    protected amount of CFP when other types of pages cannot be reclaimed. The
    default value is defined by CONFIG_CLEAN_MIN_KBYTES (suggested 0 in
    Kconfig).
    
    Added compatibility with Multigenerational LRU Framework patchset v2.
    
    Signed-off-by: Alexey Avramov <hakavlad@inbox.lv>
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 13254efc25a9429fa5541ce9e4c0878de72f398a
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:55 2021 -0600

    mm: multigenerational lru: documentation
    
    Add Documentation/vm/multigen_lru.rst.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 7585161cf89f0d90e808ce10667efdbd7762f76c
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:54 2021 -0600

    mm: multigenerational lru: Kconfig
    
    Add configuration options for the multigenerational lru.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 5f940f698e98d72a16d7c7ab9bb21f33b756c67e
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:53 2021 -0600

    mm: multigenerational lru: user interface
    
    Add a sysfs file /sys/kernel/mm/lru_gen/enabled to enable and disable
    the multigenerational lru at runtime.
    
    Add a sysfs file /sys/kernel/mm/lru_gen/spread to optionally spread
    pages out across more than three generations. More generations make
    the background aging more aggressive.
    
    Add a debugfs file /sys/kernel/debug/lru_gen to monitor the
    multigenerational lru and trigger the aging and the eviction. This
    file has the following output:
      memcg  memcg_id  memcg_path
        node  node_id
          min_gen  birth_time  anon_size  file_size
          ...
          max_gen  birth_time  anon_size  file_size
    
    Given a memcg and a node, "min_gen" is the oldest generation (number)
    and "max_gen" is the youngest. Birth time is in milliseconds. The
    sizes of anon and file types are in pages.
    
    This file takes the following input:
      + memcg_id node_id gen [swappiness]
      - memcg_id node_id gen [swappiness] [nr_to_reclaim]
    
    The first command line accounts referenced pages to generation
    "max_gen" and creates the next generation "max_gen"+1. In this case,
    "gen" should be equal to "max_gen". A swap file and a non-zero
    "swappiness" are required to scan anon type. If swapping is not
    desired, set vm.swappiness to 0. The second command line evicts
    generations less than or equal to "gen". In this case, "gen" should be
    less than "max_gen"-1 as "max_gen" and "max_gen"-1 are active
    generations and therefore protected from the eviction. Use
    "nr_to_reclaim" to limit the number of pages to evict. Multiple
    command lines are supported, so does concatenation with delimiters ","
    and ";".
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 48f06a6457a3a4862ae01540dab8b4f11977f5e4
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:52 2021 -0600

    mm: multigenerational lru: eviction
    
    The eviction consumes old generations. Given an lruvec, the eviction
    scans the pages on the per-zone lists indexed by either of min_seq[2].
    It first tries to select a type based on the values of min_seq[2].
    When anon and file types are both available from the same generation,
    it selects the one that has a lower refault rate.
    
    During a scan, the eviction sorts pages according to their new
    generation numbers, if the aging has found them referenced. It also
    moves pages from the tiers that have higher refault rates than tier 0
    to the next generation. When it finds all the per-zone lists of a
    selected type are empty, the eviction increments min_seq[2] indexed by
    this selected type.
    
    With the aging and the eviction in place, we can build page reclaim in
    a straightforward manner:
      1) In order to reduce the latency, direct reclaim only invokes the
      aging when both min_seq[2] reaches max_seq-1; otherwise it invokes
      the eviction.
      2) In order to avoid the aging in the direct reclaim path, kswapd
      does the background aging. It invokes the aging when either of
      min_seq[2] reaches max_seq-1; otherwise it invokes the eviction.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 2f9e3a866620f6f44e127db86fafbb2ad01a1c5e
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:51 2021 -0600

    mm: multigenerational lru: aging
    
    The aging produces young generations. Given an lruvec, the aging scans
    page tables for referenced pages of this lruvec. Upon finding one, the
    aging updates its generation number to max_seq. After each round of
    scan, the aging increments max_seq. The aging is due when both of
    min_seq[2] reaches max_seq-1, assuming both anon and file types are
    reclaimable.
    
    The aging uses the following optimizations when scanning page tables:
      1) It will not scan page tables from processes that have been
      sleeping since the last scan.
      2) It will not scan PTE tables under non-leaf PMD entries that do
      not have the accessed bit set, when
      CONFIG_HAVE_ARCH_PARENT_PMD_YOUNG=y.
      3) It will not zigzag between the PGD table and the same PMD or PTE
      table spanning multiple VMAs. In other words, it finishes all the
      VMAs within the range of the same PMD or PTE table before it returns
      to the PGD table. This optimizes workloads that have large numbers
      of tiny VMAs, especially when CONFIG_PGTABLE_LEVELS=5.
    
    The aging also takes advantage of the spatial locality: pages mapped
    around a referenced PTE may also have been referenced. If the rmap
    finds the PTE mapping a page under reclaim referenced, it will call a
    new function lru_gen_scan_around() to scan the vicinity of this PTE.
    And for each additional PTE found referenced, lru_gen_scan_around()
    will update the generation number of the page mapped by this PTE.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit f537af6ce87d24e67618eb0496545edf41963843
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:50 2021 -0600

    mm: multigenerational lru: mm_struct list
    
    In order to scan page tables, we add an infrastructure to maintain
    either a system-wide mm_struct list or per-memcg mm_struct lists, and
    track whether an mm_struct is being used or has been used since the
    last scan.
    
    Multiple threads can concurrently work on the same mm_struct list, and
    each of them will be given a different mm_struct belonging to a
    process that has been scheduled since the last scan.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 67268761c200722a6f5b8e974e2c69526246f79d
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:49 2021 -0600

    mm: multigenerational lru: activation
    
    For pages accessed multiple times via file descriptors, instead of
    activating them upon the second access, we activate them based on the
    refault rates of their tiers. Each generation contains at most
    MAX_NR_TIERS tiers, and they require additional MAX_NR_TIERS-2 bits in
    page->flags. Pages accessed N times via file descriptors belong to
    tier order_base_2(N). Tier 0 is the base tier and it contains pages
    read ahead, accessed once via file descriptors and accessed only via
    page tables. Pages from the base tier are evicted regardless of the
    refault rate. Pages from upper tiers that have higher refault rates
    than the base tier will be moved to the next generation. A feedback
    loop modeled after the PID controller monitors refault rates across
    all tiers and decides when to activate pages from which upper tiers
    in the reclaim path. The advantages of this model are:
      1) It has a negligible cost in the buffered IO access path because
      activations are done optionally in the reclaim path.
      2) It takes mapped pages into account and avoids overprotecting
      pages accessed multiple times via file descriptors.
      3) More tiers offer better protection to pages accessed more than
      twice when workloads doing intensive buffered IO are under memory
      pressure.
    
    For pages mapped upon page faults, the accessed bit is set during the
    initial faults. Ideally we add them to the per-zone lists index by
    max_seq, i.e., the youngest generation, so that eviction will not
    consider them before the aging has scanned them. For anon pages not in
    swap cache, this can be done easily in the page fault path: we rename
    lru_cache_add_inactive_or_unevictable() to lru_cache_add_page_vma()
    and add a new parameter, which is set to true for pages mapped upon
    page faults. For pages in page cache or swap cache, we cannot
    differentiate the page fault path from the read ahead path at the time
    we call lru_cache_add(). So we add them to the per-zone lists index by
    min_seq, i.e., the oldest generation, for now.
    
    Finally, we need to make sure deactivation works when the
    multigenerational lru is enabled. We cannot use PageActive() because
    it is not set on pages from active generations, in order to spare the
    aging the trouble of clearing it when active generations become
    inactive. So we deactivate pages unconditionally since deactivation is
    not a hot code path worth additional optimizations.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 7c0e13b4718581fff1252094a4e97690d07904c3
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:48 2021 -0600

    mm: multigenerational lru: groundwork
    
    For each lruvec, evictable pages are divided into multiple
    generations. The youngest generation number is stored in
    lrugen->max_seq for both anon and file types as they are aged on an
    equal footing. The oldest generation numbers are stored in
    lrugen->min_seq[2] separately for anon and file types as clean file
    pages can be evicted regardless of may_swap or may_writepage. These
    three variables are monotonically increasing. Generation numbers are
    truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into
    page->flags. The sliding window technique is used to prevent truncated
    generation numbers from overlapping. Each truncated generation number
    is an index to
    lrugen->lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]. Evictable
    pages are added to the per-zone lists indexed by lrugen->max_seq or
    lrugen->min_seq[2] (modulo MAX_NR_GENS), depending on their types.
    
    Each generation is then divided into multiple tiers. Tiers represent
    levels of usage from file descriptors only. Pages accessed N times via
    file descriptors belong to tier order_base_2(N). Each generation
    contains at most MAX_NR_TIERS tiers, and they require additional
    MAX_NR_TIERS-2 bits in page->flags. In contrast to moving across
    generations which requires the lru lock for the list operations,
    moving across tiers only involves an atomic operation on page->flags
    and therefore has a negligible cost. A feedback loop modeled after the
    PID controller monitors the refault rates across all tiers and decides
    when to activate pages from which tiers in the reclaim path.
    
    The framework comprises two conceptually independent components: the
    aging and the eviction, which can be invoked separately from user
    space for the purpose of working set estimation and proactive reclaim.
    
    The aging produces young generations. Given an lruvec, the aging scans
    page tables for referenced pages of this lruvec. Upon finding one, the
    aging updates its generation number to max_seq. After each round of
    scan, the aging increments max_seq. The aging is due when both of
    min_seq[2] reaches max_seq-1, assuming both anon and file types are
    reclaimable.
    
    The eviction consumes old generations. Given an lruvec, the eviction
    scans the pages on the per-zone lists indexed by either of min_seq[2].
    It tries to select a type based on the values of min_seq[2] and
    swappiness. During a scan, the eviction sorts pages according to their
    new generation numbers, if the aging has found them referenced. When
    it finds all the per-zone lists of a selected type are empty, the
    eviction increments min_seq[2] indexed by this selected type.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit d4b7b401da2714a7adb9622ad9540334069aa1df
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:47 2021 -0600

    mm/workingset.c: refactor pack_shadow() and unpack_shadow()
    
    This patches moves the bucket order and PageWorkingset() out of
    pack_shadow() and unpack_shadow(). It has no merits on its own but
    makes the upcoming changes to mm/workingset.c less diffy.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 1a7d050e005e0a8988f697316bb3312523778d30
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:46 2021 -0600

    mm/vmscan.c: refactor shrink_node()
    
    Heuristics that determine scan balance between anon and file LRUs are
    rather independent. Move them into a separate function to improve
    readability.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 1ed1d521be2b8db853688c2006c9ca9f8a438ef9
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:45 2021 -0600

    mm, x86: support the access bit on non-leaf PMD entries
    
    Some architectures support the accessed bit on non-leaf PMD entries
    (parents) in addition to leaf PTE entries (children) where pages are
    mapped, e.g., x86_64 sets the accessed bit on a parent when using it
    as part of linear-address translation [1]. Page table walkers who are
    interested in the accessed bit on children can take advantage of this:
    they do not need to search the children when the accessed bit is not
    set on a parent, given that they have previously cleared the accessed
    bit on this parent.
    
    [1]: Intel 64 and IA-32 Architectures Software Developer's Manual
         Volume 3 (October 2019), section 4.8
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit ca9c490417125ab19628320ef6edb5c52e95eff4
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:44 2021 -0600

    include/linux/cgroup.h: export cgroup_mutex
    
    cgroup_mutex is needed to synchronize with memcg creations.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit f65854fc7b78132edab4bb0311941f3bd91a1ced
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:43 2021 -0600

    include/linux/nodemask.h: define next_memory_node() if !CONFIG_NUMA
    
    Currently next_memory_node only exists when CONFIG_NUMA=y. This patch
    adds the macro for !CONFIG_NUMA.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit f6a5ac2f99019609c46bfbd6ec097d4867e8c377
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu May 20 00:53:42 2021 -0600

    include/linux/memcontrol.h: do not warn in page_memcg_rcu() if !CONFIG_MEMCG
    
    page_memcg_rcu() warns on !rcu_read_lock_held() regardless of
    CONFIG_MEMCG. The following legit code trips the warning when
    !CONFIG_MEMCG, since lock_page_memcg() and unlock_page_memcg() are
    empty for this config.
    
      memcg = lock_page_memcg(page1)
        (rcu_read_lock() if CONFIG_MEMCG=y)
    
      do something to page1
    
      if (page_memcg_rcu(page2) == memcg)
        do something to page2 too as it cannot be migrated away from the
        memcg either.
    
      unlock_page_memcg(page1)
        (rcu_read_unlock() if CONFIG_MEMCG=y)
    
    Locking/unlocking rcu consistently for both configs is rigorous but it
    also forces unnecessary locking upon users who have no interest in
    CONFIG_MEMCG.
    
    This patch removes the assertion for !CONFIG_MEMCG, because
    page_memcg_rcu() has a few callers and there are no concerns regarding
    their correctness at the moment.
    
    Signed-off-by: Yu Zhao <yuzhao@google.com>
    Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>

commit 32cbb0002dfa001849108ad66c6def4fece6fbf0
Author: graysky <graysky@archlinux.us>
Date:   Sun Jun 6 09:41:36 2021 -0400

    x86/kconfig: more uarches for kernel 5.8+
    
    FEATURES
    This patch adds additional CPU options to the Linux kernel accessible under:
     Processor type and features  --->
      Processor family --->
    
    With the release of gcc 11.1 and clang 12.0, several generic 64-bit levels are
    offered which are good for supported Intel or AMD CPUs:
    • x86-64-v2
    • x86-64-v3
    • x86-64-v4
    
    Users of glibc 2.33 and above can see which level is supported by current
    hardware by running:
      /lib/ld-linux-x86-64.so.2 --help | grep supported
    
    Alternatively, compare the flags from /proc/cpuinfo to this list.[1]
    
    CPU-specific microarchitectures include:
    • AMD Improved K8-family
    • AMD K10-family
    • AMD Family 10h (Barcelona)
    • AMD Family 14h (Bobcat)
    • AMD Family 16h (Jaguar)
    • AMD Family 15h (Bulldozer)
    • AMD Family 15h (Piledriver)
    • AMD Family 15h (Steamroller)
    • AMD Family 15h (Excavator)
    • AMD Family 17h (Zen)
    • AMD Family 17h (Zen 2)
    • AMD Family 19h (Zen 3)†
    • Intel Silvermont low-power processors
    • Intel Goldmont low-power processors (Apollo Lake and Denverton)
    • Intel Goldmont Plus low-power processors (Gemini Lake)
    • Intel 1st Gen Core i3/i5/i7 (Nehalem)
    • Intel 1.5 Gen Core i3/i5/i7 (Westmere)
    • Intel 2nd Gen Core i3/i5/i7 (Sandybridge)
    • Intel 3rd Gen Core i3/i5/i7 (Ivybridge)
    • Intel 4th Gen Core i3/i5/i7 (Haswell)
    • Intel 5th Gen Core i3/i5/i7 (Broadwell)
    • Intel 6th Gen Core i3/i5/i7 (Skylake)
    • Intel 6th Gen Core i7/i9 (Skylake X)
    • Intel 8th Gen Core i3/i5/i7 (Cannon Lake)
    • Intel 10th Gen Core i7/i9 (Ice Lake)
    • Intel Xeon (Cascade Lake)
    • Intel Xeon (Cooper Lake)*
    • Intel 3rd Gen 10nm++ i3/i5/i7/i9-family (Tiger Lake)*
    • Intel 3rd Gen 10nm++ Xeon (Sapphire Rapids)‡
    • Intel 11th Gen i3/i5/i7/i9-family (Rocket Lake)‡
    • Intel 12th Gen i3/i5/i7/i9-family (Alder Lake)‡
    
    Notes: If not otherwise noted, gcc >=9.1 is required for support.
           *Requires gcc >=10.1 or clang >=10.0
           †Required gcc >=10.3 or clang >=12.0
           ‡Required gcc >=11.1 or clang >=12.0
    
    It also offers to compile passing the 'native' option which, "selects the CPU
    to generate code for at compilation time by determining the processor type of
    the compiling machine. Using -march=native enables all instruction subsets
    supported by the local machine and will produce code optimized for the local
    machine under the constraints of the selected instruction set."[2]
    
    Users of Intel CPUs should select the 'Intel-Native' option and users of AMD
    CPUs should select the 'AMD-Native' option.
    
    MINOR NOTES RELATING TO INTEL ATOM PROCESSORS
    This patch also changes -march=atom to -march=bonnell in accordance with the
    gcc v4.9 changes. Upstream is using the deprecated -match=atom flags when I
    believe it should use the newer -march=bonnell flag for atom processors.[3]
    
    It is not recommended to compile on Atom-CPUs with the 'native' option.[4] The
    recommendation is to use the 'atom' option instead.
    
    BENEFITS
    Small but real speed increases are measurable using a make endpoint comparing
    a generic kernel to one built with one of the respective microarchs.
    
    See the following experimental evidence supporting this statement:
    https://github.com/graysky2/kernel_gcc_patch
    
    REQUIREMENTS
    linux version >=5.8
    gcc version >=9.0 or clang version >=9.0
    
    ACKNOWLEDGMENTS
    This patch builds on the seminal work by Jeroen.[5]
    
    REFERENCES
    1.  https://gitlab.com/x86-psABIs/x86-64-ABI/-/commit/77566eb03bc6a326811cb7e9
    2.  https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html#index-x86-Options
    3.  https://bugzilla.kernel.org/show_bug.cgi?id=77461
    4.  https://github.com/graysky2/kernel_gcc_patch/issues/15
    5.  http://www.linuxforge.net/docs/linux/linux-gcc.php
    
    Signed-off-by: graysky <graysky@archlinux.us>

commit e0b39748388917a4fc1bc0f4b2c7a27fbb728d69
Author: Arjan van de Ven <arjan@linux.intel.com>
Date:   Wed May 17 01:52:11 2017 +0000

    init: wait for partition and retry scan
    
    As Clear Linux boots fast the device is not ready when
    the mounting code is reached, so a retry device scan will
    be performed every 0.5 sec for at least 40 sec
    and synchronize the async task.
    
    Signed-off-by: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>

commit 098278378f29876afc66d59a59ffe30306a663b1
Author: Arjan van de Ven <arjan@linux.intel.com>
Date:   Thu Jun 2 23:36:32 2016 -0500

    drivers: initialize ata before graphics
    
    ATA init is the long pole in the boot process, and its asynchronous.
    move the graphics init after it so that ata and graphics initialize
    in parallel

commit fe8e758cd20aef4208c04f8432e70d3ea4b9b966
Author: Arjan van de Ven <arjan@linux.intel.com>
Date:   Sun Feb 18 23:35:41 2018 +0000

    locking: rwsem: spin faster
    
    tweak rwsem owner spinning a bit
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 22a10a41ec25856026fa786058fb114c89f7f141
Author: William Douglas <william.douglas@intel.com>
Date:   Wed Jun 20 17:23:21 2018 +0000

    firmware: Enable stateless firmware loading
    
    Prefer the order of specific version before generic and /etc before
    /lib to enable the user to give specific overrides for generic
    firmware and distribution firmware.

commit bbf1b8101b5c83dc86a3fc43a5e81d428bc8a461
Author: Arjan van de Ven <arjan@linux.intel.com>
Date:   Sun Sep 22 11:12:35 2019 -0300

    intel_rapl: Silence rapl trace debug

commit 2fbd4c73b5c2a32411b3176bc958b7af751817dc
Author: Christian Brauner <christian@brauner.io>
Date:   Wed Jan 23 21:54:23 2019 +0100

    SAUCE: binder: give binder_alloc its own debug mask file
    
    Currently both binder.c and binder_alloc.c both register the
    /sys/module/binder_linux/paramters/debug_mask file which leads to conflicts
    in sysfs. This commit gives binder_alloc.c its own
    /sys/module/binder_linux/paramters/alloc_debug_mask file.
    
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

commit 90709f29669b5d8bd05949b4ef0b6614ea4453cf
Author: Christian Brauner <christian@brauner.io>
Date:   Wed Jan 16 23:13:25 2019 +0100

    SAUCE: binder: turn into module
    
    The Android binder driver needs to become a module for the sake of shipping
    Anbox. To do this we need to export the following functions since binder is
    currently still using them:
    
    - security_binder_set_context_mgr()
    - security_binder_transaction()
    - security_binder_transfer_binder()
    - security_binder_transfer_file()
    - can_nice()
    - __close_fd_get_file()
    - mmput_async()
    - task_work_add()
    - map_kernel_range_noflush()
    - get_vm_area()
    - zap_page_range()
    - put_ipc_ns()
    - get_ipc_ns_exported()
    - show_init_ipc_ns()
    
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    [ saf: fix additional reference to init_ipc_ns from 5.0-rc6 ]
    Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

commit 4422e944facf99f68f57885af459c818f372a475
Author: Christian Brauner <christian@brauner.io>
Date:   Wed Jun 20 19:21:37 2018 +0200

    SAUCE: ashmem: turn into module
    
    The Android ashmem driver needs to become a module for the sake of Anbox.
    To do this we need to export shmem_zero_setup() since ashmem is currently
    using is.
    Note, the abomination that is the Android ashmem driver will go away in the
    not so distant future in favour of memfds.
    
    Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
    Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

commit a21790055f75d75f27ce6d61ff00acef77770cb5
Author: Serge Hallyn <serge.hallyn@canonical.com>
Date:   Fri May 31 19:12:12 2013 +0100

    sysctl: add sysctl to disallow unprivileged CLONE_NEWUSER by default
    
    add sysctl to disallow unprivileged CLONE_NEWUSER by default
    
    This is a short-term patch.  Unprivileged use of CLONE_NEWUSER
    is certainly an intended feature of user namespaces.  However
    for at least saucy we want to make sure that, if any security
    issues are found, we have a fail-safe.
    
    Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
    [bwh: Remove unneeded binary sysctl bits]
    [bwh: Keep this sysctl, but change the default to enabled]

commit 7e6cf37cb565bfa906cd8d90c2493aa7af098a62
Author: Mark Weiman <mark.weiman@markzz.com>
Date:   Sun Aug 12 11:36:21 2018 -0400

    pci: Enable overrides for missing ACS capabilities
    
    This an updated version of Alex Williamson's patch from:
    https://lkml.org/lkml/2013/5/30/513
    
    Original commit message follows:
    
    PCIe ACS (Access Control Services) is the PCIe 2.0+ feature that
    allows us to control whether transactions are allowed to be redirected
    in various subnodes of a PCIe topology.  For instance, if two
    endpoints are below a root port or downsteam switch port, the
    downstream port may optionally redirect transactions between the
    devices, bypassing upstream devices.  The same can happen internally
    on multifunction devices.  The transaction may never be visible to the
    upstream devices.
    
    One upstream device that we particularly care about is the IOMMU.  If
    a redirection occurs in the topology below the IOMMU, then the IOMMU
    cannot provide isolation between devices.  This is why the PCIe spec
    encourages topologies to include ACS support.  Without it, we have to
    assume peer-to-peer DMA within a hierarchy can bypass IOMMU isolation.
    
    Unfortunately, far too many topologies do not support ACS to make this
    a steadfast requirement.  Even the latest chipsets from Intel are only
    sporadically supporting ACS.  We have trouble getting interconnect
    vendors to include the PCIe spec required PCIe capability, let alone
    suggested features.
    
    Therefore, we need to add some flexibility.  The pcie_acs_override=
    boot option lets users opt-in specific devices or sets of devices to
    assume ACS support.  The "downstream" option assumes full ACS support
    on root ports and downstream switch ports.  The "multifunction"
    option assumes the subset of ACS features available on multifunction
    endpoints and upstream switch ports are supported.  The "id:nnnn:nnnn"
    option enables ACS support on devices matching the provided vendor
    and device IDs, allowing more strategic ACS overrides.  These options
    may be combined in any order.  A maximum of 16 id specific overrides
    are available.  It's suggested to use the most limited set of options
    necessary to avoid completely disabling ACS across the topology.
    Note to hardware vendors, we have facilities to permanently quirk
    specific devices which enforce isolation but not provide an ACS
    capability.  Please contact me to have your devices added and save
    your customers the hassle of this boot option.
    
    Signed-off-by: Mark Weiman <mark.weiman@markzz.com>

commit 26424c585889e5266866d59448b4409017cfe0c7
Author: Neal Cardwell <ncardwell@google.com>
Date:   Mon Dec 28 19:23:09 2020 -0500

    net-tcp_bbr: v2: don't assume prior_cwnd was set entering CA_Loss
    
    Fix WARN_ON_ONCE() warnings that were firing and pointing to a
    bbr->prior_cwnd of 0 when exiting CA_Loss and transitioning to
    CA_Open.
    
    The issue was that tcp_simple_retransmit() calls:
    
      tcp_set_ca_state(sk, TCP_CA_Loss);
    
    without first calling icsk_ca_ops->ssthresh(sk) (because
    tcp_simple_retransmit() is dealing with losses due to MTU issues and
    not congestion). The lack of this callback means that BBR did not get
    a chance to set bbr->prior_cwnd, and thus upon exiting CA_Loss in such
    cases the WARN_ON_ONCE() would fire due to a zero bbr->prior_cwnd.
    
    This commit removes that warning, since a bbr->prior_cwnd of 0 is a
    valid situation in this state transition.
    
    For setting inflight_lo upon entering CA_Loss, to avoid setting an
    inflight_lo of 0 in this case, this commit switches to taking the max
    of cwnd and prior_cwnd. We plan to remove that line of code when we
    switch to cautious (PRR-style) recovery, so that awkwardness will go
    away.
    
    Change-Id: I575dce871c2f20e91e3e9449e1706f42a07b8118

commit fe92fc6b8b602a9b19d2edb9b008c849bfda1d86
Author: Neal Cardwell <ncardwell@google.com>
Date:   Mon Aug 17 19:10:21 2020 -0400

    net-tcp_bbr: v2: remove cycle_rand parameter that is unused in BBRv2
    
    Change-Id: Iee1df7e41e42de199068d7c89131ed3d228327c0

commit cb1802b2bd6333def771ac74c04b7f31d631e861
Author: Neal Cardwell <ncardwell@google.com>
Date:   Mon Aug 17 19:08:41 2020 -0400

    net-tcp_bbr: v2: remove field bw_rtts that is unused in BBRv2
    
    Change-Id: I58e3346c707748a6f316f3ed060d2da84c32a79b

commit 95c333a67f22ed1f02e7ba2108e348c925a02107
Author: Neal Cardwell <ncardwell@google.com>
Date:   Thu Nov 21 15:28:01 2019 -0500

    net-tcp_bbr: v2: remove unnecessary rs.delivered_ce logic upon loss
    
    There is no reason to compute rs.delivered_ce upon loss.
    
    In fact, we specifically do not want to compute rs.delivered_ce upon loss.
    
    Two issues:
    
    (1) This would be the wrong thing to do, in behavior terms.  With
        RACK's dynamic reordering window, losses can be marked long after
        the sequence hole appears in the ACK/SACK stream. We want to to
        catch the ECN mark rate rising too high as quickly as possible,
        which means we want to check for high ECN mark rates at ACK time
        (as BBRv2 currently does) and not loss marking time.
    
    (2) This is dead code. The ECN mark rate cannot be detected as too
        high because the check needs rs->delivered to be > 0 as well:
    
           if (rs->delivered_ce > 0 && rs->delivered > 0 &&
    
        Since we are not setting rs->delivered upon loss, this check
        cannot succeed, so setting delivered_ce is pointless.
    
    This dead and wrong line was discovered by Randall Stewart at Netflix
    as he was reading the BBRv2 code.
    
    Change-Id: I37f83f418a259ec31d8f82de986db071b364b76a

commit 6d1fea18523f193cd3527f36955c26d7f6927a76
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue Jun 11 12:54:22 2019 -0400

    net-tcp_bbr: v2: BBRv2 ("bbr2") congestion control for Linux TCP
    
    BBR v2 is an enhacement to the BBR v1 algorithm. It's designed to aim for lower
    queues, lower loss, and better Reno/CUBIC coexistence than BBR v1.
    
    BBR v2 maintains the core of BBR v1: an explicit model of the network
    path that is two-dimensional, adapting to estimate the (a) maximum
    available bandwidth and (b) maximum safe volume of data a flow can
    keep in-flight in the network. It maintains the estimated BDP as a
    core guide for estimating an appropriate level of in-flight data.
    
    BBR v2 makes several key enhancements:
    
    o Its bandwidth-probing time scale is adapted, within bounds, to allow improved
    coexistence with Reno and CUBIC. The bandwidth-probing time scale is (a)
    extended dynamically based on estimated BDP to improve coexistence with
    Reno/CUBIC; (b) bounded by an interactive wall-clock time-scale to be more
    scalable and responsive than Reno and CUBIC.
    
    o Rather than being largely agnostic to loss and ECN marks, it explicitly uses
    loss and (DCTCP-style) ECN signals to maintain its model.
    
    o It aims for lower losses than v1 by adjusting its model to attempt to stay
    within loss rate and ECN mark rate bounds (loss_thresh and ecn_thresh,
    respectively).
    
    o It adapts to loss/ECN signals even when the application is running out of
    data ("application-limited"), in case the "application-limited" flow is also
    "network-limited" (the bw and/or inflight available to this flow is lower than
    previously estimated when the flow ran out of data).
    
    o It has a three-part model: the model explicit three tracks operating points,
    where an operating point is a tuple: (bandwidth, inflight). The three operating
    points are:
    
      o latest:        the latest measurement from the current round trip
      o upper bound:   robust, optimistic, long-term upper bound
      o lower bound:   robust, conservative, short-term lower bound
    
    These are stored in the following state variables:
    
      o latest:  bw_latest, inflight_latest
      o lo:      bw_lo,     inflight_lo
      o hi:      bw_hi[2],  inflight_hi
    
    To gain intuition about the meaning of the three operating points, it
    may help to consider the analogs in CUBIC, which has a somewhat
    analogous three-part model used by its probing state machine:
    
      BBR param     CUBIC param
      -----------   -------------
      latest     ~  cwnd
      lo         ~  ssthresh
      hi         ~  last_max_cwnd
    
    The analogy is only a loose one, though, since the BBR operating
    points are calculated differently, and are 2-dimensional (bw,inflight)
    rather than CUBIC's one-dimensional notion of operating point
    (inflight).
    
    o It uses the three-part model to adapt the magnitude of its bandwidth
    to match the estimated space available in the buffer, rather than (as
    in BBR v1) assuming that it was always acceptable to place 0.25*BDP in
    the bottleneck buffer when probing (commodity datacenter switches
    commonly do not have that much buffer for WAN flows). When BBR v2
    estimates it hit a buffer limit during probing, its bandwidth probing
    then starts gently in case little space is still available in the
    buffer, and the accelerates, slowly at first and then rapidly if it
    can grow inflight without seeing congestion signals. In such cases,
    probing is bounded by inflight_hi + inflight_probe, where
    inflight_probe grows as: [0, 1, 2, 4, 8, 16,...]. This allows BBR to
    keep losses low and bounded if a bottleneck remains congested, while
    rapidly/scalably utilizing free bandwidth when it becomes available.
    
    o It has a slightly revised state machine, to achieve the goals above.
        BBR_BW_PROBE_UP:    pushes up inflight to probe for bw/vol
        BBR_BW_PROBE_DOWN:  drain excess inflight from the queue
        BBR_BW_PROBE_CRUISE: use pipe, w/ headroom in queue/pipe
        BBR_BW_PROBE_REFILL: try refill the pipe again to 100%, leaving queue empty
    
    o The estimated BDP: BBR v2 continues to maintain an estimate of the
    path's two-way propagation delay, by tracking a windowed min_rtt, and
    coordinating (on an as-ndeeded basis) to try to expose the two-way
    propagation delay by draining the bottleneck queue.
    
    BBR v2 continues to use its min_rtt and (currently-applicable) bandwidth
    estimate to estimate the current bandwidth-delay product. The estimated BDP
    still provides one important guideline for bounding inflight data. However,
    because any min-filtered RTT and max-filtered bw inherently tend to both
    overestimate, the estimated BDP is often too high; in this case loss or ECN
    marks can ensue, in which case BBR v2 adjusts inflight_hi and inflight_lo to
    adapt its sending rate and inflight down to match the available capacity of the
    path.
    
    o Space: Note that ICSK_CA_PRIV_SIZE increased. This is because BBR v2
    requires more space. Note that much of the space is due to support for
    per-socket parameterization and debugging in this release for research
    and debugging. With that state removed, the full "struct bbr" is 140
    bytes, or 144 with padding. This is an increase of 40 bytes over the
    existing ca_priv space.
    
    o Code: BBR v2 reuses many pieces from BBR v1. But it omits the following
      significant pieces:
    
      o "packet conservation" (bbr_set_cwnd_to_recover_or_restore(),
        bbr_can_grow_inflight())
      o long-term bandwidth estimator ("policer mode")
    
      The code layout tries to keep BBR v2 code near the bottom of the
      file, so that v1-applicable code in the top does not accidentally
      refer to v2 code.
    
    o Docs:
      See the following docs for more details and diagrams decsribing the BBR v2
      algorithm:
        https://datatracker.ietf.org/meeting/104/materials/slides-104-iccrg-an-update-on-bbr-00
        https://datatracker.ietf.org/meeting/102/materials/slides-102-iccrg-an-update-on-bbr-work-at-google-00
    
    o Internal notes:
      For this upstream rebase, Neal started from:
        git show fed518041ac6:net/ipv4/tcp_bbr.c > net/ipv4/tcp_bbr.c
      then removed dev instrumentation (dynamic get/set for parameters)
      and code that was only used by BBRv1
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 2c84098e60bed6d67dde23cd7538c51dee273102
    Change-Id: I125cf26ba2a7a686f2fa5e87f4c2afceb65f7a05
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 18fc2c83ed5d99742a6b9aea6fe9437a86ca1b02
Author: Neal Cardwell <ncardwell@google.com>
Date:   Sat Nov 16 13:16:25 2019 -0500

    net-tcp: add fast_ack_mode=1: skip rwin check in tcp_fast_ack_mode__tcp_ack_snd_check()
    
    Add logic for an experimental TCP connection behavior, enabled with
    tp->fast_ack_mode = 1, which disables checking the receive window
    before sending an ack in __tcp_ack_snd_check(). If this behavior is
    enabled, the data receiver sends an ACK if the amount of data is >
    RCV.MSS.
    
    Change-Id: Iaa0a0fd7108221f883137a79d5bfa724f1b096d4

commit 3feef542c4831f2b663782b093a57e61edbc3e0f
Author: Neal Cardwell <ncardwell@google.com>
Date:   Fri Sep 27 17:10:26 2019 -0400

    net-tcp: re-generalize TSO sizing in TCP CC module API
    
    Reorganize the API for CC modules so that the CC module once again
    gets complete control of the TSO sizing decision. This is how the API
    was set up around 2016 and the initial BBRv1 upstreaming. Later Eric
    Dumazet simplified it. But with wider testing it now seems that to
    avoid CPU regressions BBR needs to have a different TSO sizing
    function.
    
    This is necessary to handle cases where there are many flows
    bottlenecked on the sender host's NIC, in which case BBR's pacing rate
    is much lower than CUBIC/Reno/DCTCP's. Why does this happen? Because
    BBR's pacing rate adapts to the low bandwidth share each flow sees. By
    contrast, CUBIC/Reno/DCTCP see no loss or ECN, so they grow a very
    large cwnd, and thus large pacing rate and large TSO burst size.
    
    Change-Id: Ic8ccfdbe4010ee8d4bf6a6334c48a2fceb2171ea

commit 5d925d39230e9b38781d3de56f34823d2d0e96ad
Author: Yousuk Seung <ysseung@google.com>
Date:   Wed May 23 17:55:54 2018 -0700

    net-tcp: add new ca opts flag TCP_CONG_WANTS_CE_EVENTS
    
    Add a a new ca opts flag TCP_CONG_WANTS_CE_EVENTS that allows a
    congestion control module to receive CE events.
    
    Currently congestion control modules have to set the TCP_CONG_NEEDS_ECN
    bit in opts flag to receive CE events but this may incur changes in ECN
    behavior elsewhere. This patch adds a new bit TCP_CONG_WANTS_CE_EVENTS
    that allows congestion control modules to receive CE events
    independently of TCP_CONG_NEEDS_ECN.
    
    Effort: net-tcp
    Origin-9xx-SHA1: 9f7e14716cde760bc6c67ef8ef7e1ee48501d95b
    Change-Id: I2255506985242f376d910c6fd37daabaf4744f24

commit 0d84776c598a28776b29aa8f7ed7fb7b48cb5da1
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue May 7 22:37:19 2019 -0400

    net-tcp_bbr: v2: set tx.in_flight for skbs in repair write queue
    
    Syzkaller was able to use TCP_REPAIR to reproduce the new warning
    added in tcp_fragment():
    
      WARNING: CPU: 0 PID: 118174 at net/ipv4/tcp_output.c:1487
        tcp_fragment+0xdcc/0x10a0 net/ipv4/tcp_output.c:1487()
      inconsistent: tx.in_flight: 0 old_factor: 53
    
    The warning happens because skbs inserted into the tcp_rtx_queue
    during the repair process go through a sort of "fake send" process,
    and that process was seting pcount but not tx.in_flight, and thus the
    warnings (where old_factor is the old pcount).
    
    The fix of setting tx.in_flight in the TCP_REPAIR code path seems
    simple enough, and indeed makes the repro code from syzkaller stop
    producing warnings. Running through kokonut tests, and will send out
    for review when all tests pass.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 330f825a08a6fe92cef74d799cc468864c479f63
    Change-Id: I0bc4a790f040fd4239620e1eedd5dc64666c6f05

commit 35560060f6d729ad38606bdc3092a65be1930f62
Author: Neal Cardwell <ncardwell@google.com>
Date:   Wed May 1 20:16:25 2019 -0400

    net-tcp_bbr: v2: adjust skb tx.in_flight upon split in tcp_fragment()
    
    When we fragment an skb that has already been sent, we need to update
    the tx.in_flight for the first skb in the resulting pair ("buff").
    
    Because we were not updating the tx.in_flight, the tx.in_flight value
    was inconsistent with the pcount of the "buff" skb (tx.in_flight would
    be too high). That meant that if the "buff" skb was lost, then
    bbr2_inflight_hi_from_lost_skb() would calculate an inflight_hi value
    that is too high. This could result in longer queues and higher packet
    loss.
    
    Packetdrill testing verified that without this commit, when the second
    half of an skb is SACKed and then later the first half of that skb is
    marked lost, the calculated inflight_hi was incorrect.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 385f1ddc610798fab2837f9f372857438b25f874
    Change-Id: I617f8cab4e9be7a0b8e8d30b047bf8645393354d

commit 579c6a93a97b4b2ebba38d2f9f7e02da5b1c3ff0
Author: Neal Cardwell <ncardwell@google.com>
Date:   Wed May 1 20:16:33 2019 -0400

    net-tcp_bbr: v2: adjust skb tx.in_flight upon merge in tcp_shifted_skb()
    
    When tcp_shifted_skb() updates state as adjacent SACKed skbs are
    coalesced, previously the tx.in_flight was not adjusted, so we could
    get contradictory state where the skb's recorded pcount was bigger
    than the tx.in_flight (the number of segments that were in_flight
    after sending the skb).
    
    Normally have a SACKed skb with contradictory pcount/tx.in_flight
    would not matter. However, with SACK reneging, the SACKed bit is
    removed, and an skb once again becomes eligible for retransmitting,
    fragmenting, SACKing, etc. Packetdrill testing verified the following
    sequence is possible in a kernel that does not have this commit:
    
     - skb N is SACKed
     - skb N+1 is SACKed and combined with skb N using tcp_shifted_skb()
       - tcp_shifted_skb() will increase the pcount of prev,
         but leave tx.in_flight as-is
       - so prev skb can have pcount > tx.in_flight
     - RTO, tcp_timeout_mark_lost(), detect reneg,
       remove "SACKed" bit, mark skb N as lost
       - find pcount of skb N is greater than its tx.in_flight
    
    I suspect this issue iw what caused the bbr2_inflight_hi_from_lost_skb():
      WARN_ON_ONCE(inflight_prev < 0)
    to fire in production machines using bbr2.
    
    Tested: See last commit in series for sponge link.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 1a3e997e613d2dcf32b947992882854ebe873715
    Change-Id: I1b0b75c27519953430c7db51c6f358f104c7af55

commit 76821784fbd3409fed7ee05b43f4adc143edf9d9
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue May 7 22:36:36 2019 -0400

    net-tcp_bbr: v2: factor out tx.in_flight setting into tcp_set_tx_in_flight()
    
    Factor out the code to set an skb's tx.in_flight field into its own
    function, so that this code can be used for the TCP_REPAIR "fake send"
    code path that inserts skbs into the rtx queue without sending
    them. This is in preparation for the following patch, which fixes an
    issue with TCP_REPAIR and tx.in_flight.
    
    Tested: See last patch in series for sponge link.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: e880fc907d06ea7354333f60f712748ebce9497b
    Change-Id: I4fbd4a6e18a51ab06d50ab1c9ad820ce5bea89af

commit 9d5d2cac73f74a9fc8508f603599c921ec56e85e
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue Aug 7 21:52:06 2018 -0400

    net-tcp_bbr: v2: introduce ca_ops->skb_marked_lost() CC module callback API
    
    For connections experiencing reordering, RACK can mark packets lost
    long after we receive the SACKs/ACKs hinting that the packets were
    actually lost.
    
    This means that CC modules cannot easily learn the volume of inflight
    data at which packet loss happens by looking at the current inflight
    or even the packets in flight when the most recently SACKed packet was
    sent. To learn this, CC modules need to know how many packets were in
    flight at the time lost packets were sent. This new callback, combined
    with TCP_SKB_CB(skb)->tx.in_flight, allows them to learn this.
    
    This also provides a consistent callback that is invoked whether
    packets are marked lost upon ACK processing, using the RACK reordering
    timer, or at RTO time.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: afcbebe3374e4632ac6714d39e4dc8a8455956f4
    Change-Id: I54826ab53df636be537e5d3c618a46145d12d51a

commit 2567ef4ab9429f43deb2329411d529c44806432c
Author: Neal Cardwell <ncardwell@google.com>
Date:   Mon Nov 19 13:48:36 2018 -0500

    net-tcp_bbr: v2: export FLAG_ECE in rate_sample.is_ece
    
    For understanding the relationship between inflight and ECN signals,
    to try to find the highest inflight value that has acceptable levels
    ECN marking.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 3eba998f2898541406c2666781182200934965a8
    Change-Id: I3a964e04cee83e11649a54507043d2dfe769a3b3

commit ec58ebdc0e0b575fdb6088c1864a660f53f7b6c8
Author: Neal Cardwell <ncardwell@google.com>
Date:   Thu Oct 12 23:44:27 2017 -0400

    net-tcp_bbr: v2: count packets lost over TCP rate sampling interval
    
    For understanding the relationship between inflight and packet loss
    signals, to try to find the highest inflight value that has acceptable
    levels of packet losses.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 4527e26b2bd7756a88b5b9ef1ada3da33dd609ab
    Change-Id: I594c2500868d9c530770e7ddd68ffc87c57f4fd5

commit 7cda445ac95319d80b204fcfdac56f8f9c8e0f63
Author: Neal Cardwell <ncardwell@google.com>
Date:   Sat Aug 5 11:49:50 2017 -0400

    net-tcp_bbr: v2: snapshot packets in flight at transmit time and pass in rate_sample
    
    For understanding the relationship between inflight and losses or ECN
    signals, to try to find the highest inflight value that has acceptable
    levels of loss/ECN marking.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: b3eb4f2d20efab4ca001f32c9294739036c493ea
    Change-Id: I7314047d0ff14dd261a04b1969a46dc658c8836a

commit 0afe1df768336756315cad96c36ea538772cc8dc
Author: Neal Cardwell <ncardwell@google.com>
Date:   Sun Jun 24 21:55:59 2018 -0400

    net-tcp_bbr: v2: shrink delivered_mstamp, first_tx_mstamp to u32 to free up 8 bytes
    
    Free up some space for tracking inflight and losses for each
    bw sample, in upcoming commits.
    
    These timestamps are in microseconds, and are now stored in 32
    bits. So they can only hold time intervals up to roughly 2^12 = 4096
    seconds.  But Linux TCP RTT and RTO tracking has the same 32-bit
    microsecond implementation approach and resulting deployment
    limitations. So this is not introducing a new limit. And these should
    not be a limitation for the foreseeable future.
    
    Effort: net-tcp_bbr
    Origin-9xx-SHA1: 238a7e6b5d51625fef1ce7769826a7b21b02ae55
    Change-Id: I3b779603797263b52a61ad57c565eb91fe42680c

commit 0b6885b342b47d86f4c693eb488655924bd73c81
Author: Yuchung Cheng <ycheng@google.com>
Date:   Tue Mar 27 18:01:46 2018 -0700

    net-tcp_rate: account for CE marks in rate sample
    
    This patch counts number of packets delivered have CE mark in the
    rate sample, using similar approach of delivery accounting.
    
    Effort: net-tcp_rate
    Origin-9xx-SHA1: 710644db434c3da335a7c8b72207a671ccbb5cf8
    Change-Id: I0968fb33fe19b5c774e8c3afd2685558a6ec8710

commit 7ef26812cc5241a14abf80e1cc50a80c558e9cc0
Author: Yuchung Cheng <ycheng@google.com>
Date:   Tue Mar 27 18:33:29 2018 -0700

    net-tcp_rate: consolidate inflight tracking approaches in TCP
    
    In order to track CE marks per rate sample (one round trip), we'll
    need to snap the starting tcp delivered_ce acount in the packet
    meta header (tcp_skb_cb). But there's not enough space.
    
    Good news is that the "last_in_flight" in the header, used by
    NV congestion control, is almost equivalent as "delivered". In
    fact "delivered" is better by accounting out-of-order packets
    additionally.  Therefore we can remove it to make room for the
    CE tracking.
    
    This would make delayed ACK detection slightly less accurate but the
    impact is negligible since it's not used for any critical control.
    
    Effort: net-tcp_rate
    Origin-9xx-SHA1: ddcd46ec85d5f1c4454258af0c54b3254c0d64a7
    Change-Id: I1a184aad6d101c981ac7f2f275aa9417ff856910

commit 39b24060c7f4792aa1c784ed48ffc352d8626089
Author: Neal Cardwell <ncardwell@google.com>
Date:   Tue Jun 11 12:26:55 2019 -0400

    net-tcp_bbr: broaden app-limited rate sample detection
    
    This commit is a bug fix for the Linux TCP app-limited
    (application-limited) logic that is used for collecting rate
    (bandwidth) samples.
    
    Previously the app-limited logic only looked for "bubbles" of
    silence in between application writes, by checking at the start
    of each sendmsg. But "bubbles" of silence can also happen before
    retransmits: e.g. bubbles can happen between an application write
    and a retransmit, or between two retransmits.
    
    Retransmits are triggered by ACKs or timers. So this commit checks
    for bubbles of app-limited silence upon ACKs or timers.
    
    Why does this commit check for app-limited state at the start of
    ACKs and timer handling? Because at that point we know whether
    inflight was fully using the cwnd.  During processing the ACK or
    timer event we often change the cwnd; after changing the cwnd we
    can't know whether inflight was fully using the old cwnd.
    
    Origin-9xx-SHA1: 3fe9b53291e018407780fb8c356adb5666722cbc
    Change-Id: I37221506f5166877c2b110753d39bb0757985e68

commit af3f13dec469240feb2de138caeaa3590619d48f
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Dec 14 19:09:01 2020 +0000

    clockevents, hrtimer: Make hrtimer granularity and minimum hrtimeout configurable in sysctl. Set default granularity to 100us and min timeout to 500us
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 1e6491a8e58321c292b21555c33a0f37ca6f7f76
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Feb 20 13:32:58 2017 +1100

    time: Don't use hrtimer overlay when pm_freezing since some drivers still don't correctly use freezable timeouts.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit c885e05107d34b59b509ecf9ec8b0465f4533180
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Feb 20 13:30:32 2017 +1100

    hrtimer: Replace all calls to schedule_timeout_uninterruptible of potentially under 50ms to use schedule_msec_hrtimeout_uninterruptible
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 566dae0ba9409d80c5dec523527438504cd7f68e
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Feb 20 13:30:07 2017 +1100

    hrtimer: Replace all calls to schedule_timeout_interruptible of potentially under 50ms to use schedule_msec_hrtimeout_interruptible.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit cdc128ec06453c6d0167f67d0dd07f99106047b2
Author: Con Kolivas <kernel@kolivas.org>
Date:   Mon Feb 15 21:56:16 2021 +0000

    hrtimer: Replace all schedule timeout(1) with schedule_min_hrtimeout()
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 2362167cbfea9f499f2d77a703d09dfec9765eed
Author: Con Kolivas <kernel@kolivas.org>
Date:   Fri Nov 4 09:25:54 2016 +1100

    timer: Convert msleep to use hrtimers when active.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 3805e10893b55c065047465e9db7d602831647ce
Author: Con Kolivas <kernel@kolivas.org>
Date:   Sat Nov 5 09:27:36 2016 +1100

    time: Special case calls of schedule_timeout(1) to use the min hrtimeout of 1ms, working around low Hz resolutions.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 014af161265d0090571328108d390d4b8df383c4
Author: Con Kolivas <kernel@kolivas.org>
Date:   Sat Aug 12 11:53:39 2017 +1000

    hrtimer: Create highres timeout variants of schedule_timeout functions.
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 83d88873e71bc2304cd6558a77eae1d23e1c5378
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Mon Jun 28 12:20:10 2021 +0000

    XANMOD: fair: Remove all energy efficiency functions
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 6e7b703dd3cc2c1be839d5527ca693869c7fb259
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Fri Jun 18 19:10:55 2021 +0000

    XANMOD: Makefile: Turn off loop vectorization for GCC -O3 optimization level
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 34c89a08f5d7ed549da33c6802a85cda003af0ee
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Thu Sep 3 20:36:13 2020 +0000

    XANMOD: init/Kconfig: Enable -O3 KBUILD_CFLAGS optimization for all architectures
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit aa8e8f2b383d5296dad5e0bbecbc7a599907fc82
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Thu Jun 25 16:40:43 2020 -0300

    XANMOD: lib/kconfig.debug: disable default CONFIG_SYMBOLIC_ERRNAME and CONFIG_DEBUG_BUGVERBOSE
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 3f92fd721efba0314bc4c1f59dd966d5873987c0
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Mon Jan 29 17:41:29 2018 +0000

    XANMOD: scripts: disable the localversion "+" tag of a git repo
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit eed16a16b4ebf84bc0a33111a828fc60b8cfb2a7
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Tue Mar 31 13:32:08 2020 -0300

    XANMOD: cpufreq: tunes ondemand and conservative governor for performance
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 4c03c7bd9798279aa504ac776c1e49295ddff344
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Mon Jan 29 17:31:25 2018 +0000

    XANMOD: mm/vmscan: vm_swappiness = 30 decreases the amount of swapping
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 5c255f6bcd36c8a40c1c577da1d615346776ad7e
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Thu Aug 13 14:57:06 2020 +0000

    XANMOD: sched/autogroup: Add kernel parameter and config option to enable/disable autogroup feature by default
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 0d114eaffe0e3b0da3172e5b31f8860427cc9668
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Mon Jan 29 16:59:22 2018 +0000

    XANMOD: dcache: cache_pressure = 50 decreases the rate at which VFS caches are reclaimed
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 44984407c3c4f5f15b10b16c45c97aacc2073a6a
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Sun Oct 13 03:10:39 2019 -0300

    XANMOD: kconfig: set PREEMPT and RCU_BOOST without delay by default
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit ab3a0ab75ff8b08998544e12f3b2f55409511de1
Author: Alexandre Frade <admfrade@gmail.com>
Date:   Mon Jan 29 17:26:15 2018 +0000

    XANMOD: kconfig: add 500Hz timer interrupt kernel config option
    
    Signed-off-by: Alexandre Frade <admfrade@gmail.com>

commit 149a5b5fc8cae0249b8fe998979bb5a8ae103b8c
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Mon Dec 14 16:24:26 2020 +0000

    XANMOD: block: set rq_affinity to force full multithreading I/O requests
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 3a070a83ae19936b700c48c6d294a746d1cd4d46
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Mon Jun 1 18:23:51 2020 -0300

    XANMOD: block, bfq: change BLK_DEV_ZONED depends to IOSCHED_BFQ
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 72f33c30c0f18ffd03a16f480be0b2e05e0f51cc
Author: Alexandre Frade <kernel@xanmod.org>
Date:   Mon Nov 25 15:13:06 2019 -0300

    XANMOD: elevator: set default scheduler to bfq for blk-mq
    
    Signed-off-by: Alexandre Frade <kernel@xanmod.org>

commit 62fb9874f5da54fdb243003b386128037319b219
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Sun Jun 27 15:21:11 2021 -0700

    Linux 5.13