commit 1d25d6095fa0b7f02c0f17322df4bc95f4a3106c Author: Alexandre Frade Date: Sat Oct 2 18:11:11 2021 +0000 Linux 5.14.9-xanmod2 Signed-off-by: Alexandre Frade commit f164cc43ea1acaa1378b2ea1154f6c412b67505f Author: Tor Vic Date: Fri Sep 10 15:05:58 2021 +0200 zstd: fix fall-through warnings commit 90383d79db7ccfa57657589d56e714807400e679 Author: André Almeida Date: Fri Feb 5 10:34:02 2021 -0300 futex2: Add sysfs entry for syscall numbers In the course of futex2 development, it will be rebased on top of different kernel releases, and the syscall number can change in this process. Expose futex2 syscall number via sysfs so tools that are experimenting with futex2 (like Proton/Wine) can test it and set the syscall number at runtime, rather than setting it at compilation time. Signed-off-by: André Almeida commit 442d05a80b7e337f296ae1c5eb81f65b2a248db2 Author: André Almeida Date: Fri Feb 5 10:34:02 2021 -0300 kernel: Enable waitpid() for futex2 To make pthreads works as expected if they are using futex2, wake clear_child_tid with futex2 as well. This is make applications that uses waitpid() (and clone(CLONE_CHILD_SETTID)) wake while waiting for the child to terminate. Given that apps should not mix futex() and futex2(), any correct app will trigger a harmless noop wakeup on the interface that it isn't using. Signed-off-by: André Almeida commit e5a1578d65cee6cb5421578dbd706998d80241d1 Author: André Almeida Date: Tue Feb 9 13:59:00 2021 -0300 docs: locking: futex2: Add documentation Add a new documentation file specifying both userspace API and internal implementation details of futex2 syscalls. Signed-off-by: André Almeida commit fa85b67049cc0904c3e2f27379a34ec70bbefcc9 Author: André Almeida Date: Thu Feb 11 10:47:23 2021 -0300 futex2: Add compatibility entry point for x86_x32 ABI New syscalls should use the same entry point for x86_64 and x86_x32 paths. Add a wrapper for x32 calls to use parse functions that assumes 32bit pointers. Signed-off-by: André Almeida commit 5f5c483198973d48f3f79a61c796862aef2d9f0f Author: André Almeida Date: Fri Feb 5 10:34:01 2021 -0300 futex2: Implement requeue operation Implement requeue interface similary to FUTEX_CMP_REQUEUE operation. This is the syscall implemented by this patch: futex_requeue(struct futex_requeue *uaddr1, struct futex_requeue *uaddr2, unsigned int nr_wake, unsigned int nr_requeue, unsigned int cmpval, unsigned int flags) struct futex_requeue { void *uaddr; unsigned int flags; }; If (uaddr1->uaddr == cmpval), wake at uaddr1->uaddr a nr_wake number of waiters and then, remove a number of nr_requeue waiters at uaddr1->uaddr and add them to uaddr2->uaddr list. Each uaddr has its own set of flags, that must be defined at struct futex_requeue (such as size, shared, NUMA). The flags argument of the syscall is there just for the sake of extensibility, and right now it needs to be zero. Return the number of the woken futexes + the number of requeued ones on success, error code otherwise. Signed-off-by: André Almeida Rebased-by: Joshua Ashton commit 9970c3df4edb7199bbf6b2f30f7fa8d5e13d7657 Author: André Almeida Date: Fri Feb 5 10:34:00 2021 -0300 futex2: Implement vectorized wait Add support to wait on multiple futexes. This is the interface implemented by this syscall: futex_waitv(struct futex_waitv *waiters, unsigned int nr_futexes, unsigned int flags, struct timespec *timo) struct futex_waitv { void *uaddr; unsigned int val; unsigned int flags; }; Given an array of struct futex_waitv, wait on each uaddr. The thread wakes if a futex_wake() is performed at any uaddr. The syscall returns immediately if any waiter has *uaddr != val. *timo is an optional timeout value for the operation. The flags argument of the syscall should be used solely for specifying the timeout as realtime, if needed. Flags for shared futexes, sizes, etc. should be used on the individual flags of each waiter. Returns the array index of one of the awakened futexes. There’s no given information of how many were awakened, or any particular attribute of it (if it’s the first awakened, if it is of the smaller index...). Signed-off-by: André Almeida Rebased-by: Joshua Ashton commit 1ee81ce1070686361e9d071e6541cf724e0b2dc6 Author: André Almeida Date: Fri Feb 5 10:34:01 2021 -0300 futex2: Add support for shared futexes Add support for shared futexes for cross-process resources. This design relies on the same approach done in old futex to create an unique id for file-backed shared memory, by using a counter at struct inode. There are two types of futexes: private and shared ones. The private are futexes meant to be used by threads that shares the same memory space, are easier to be uniquely identified an thus can have some performance optimization. The elements for identifying one are: the start address of the page where the address is, the address offset within the page and the current->mm pointer. Now, for uniquely identifying shared futex: - If the page containing the user address is an anonymous page, we can just use the same data used for private futexes (the start address of the page, the address offset within the page and the current->mm pointer) that will be enough for uniquely identifying such futex. We also set one bit at the key to differentiate if a private futex is used on the same address (mixing shared and private calls are not allowed). - If the page is file-backed, current->mm maybe isn't the same one for every user of this futex, so we need to use other data: the page->index, an UUID for the struct inode and the offset within the page. Note that members of futex_key doesn't have any particular meaning after they are part of the struct - they are just bytes to identify a futex. Given that, we don't need to use a particular name or type that matches the original data, we only need to care about the bitsize of each component and make both private and shared data fit in the same memory space. Signed-off-by: André Almeida commit ea14247af97759537ac1bea99e3721a2e96a1a5b Author: André Almeida Date: Fri Feb 5 10:34:00 2021 -0300 futex2: Implement wait and wake functions Create a new set of futex syscalls known as futex2. This new interface is aimed to implement a more maintainable code, while removing obsolete features and expanding it with new functionalities. Implements wait and wake semantics for futexes, along with the base infrastructure for future operations. The whole wait path is designed to be used by N waiters, thus making easier to implement vectorized wait. * Syscalls implemented by this patch: - futex_wait(void *uaddr, unsigned int val, unsigned int flags, struct timespec *timo) The user thread is put to sleep, waiting for a futex_wake() at uaddr, if the value at *uaddr is the same as val (otherwise, the syscall returns immediately with -EAGAIN). timo is an optional timeout value for the operation. Return 0 on success, error code otherwise. - futex_wake(void *uaddr, unsigned long nr_wake, unsigned int flags) Wake `nr_wake` threads waiting at uaddr. Return the number of woken threads on success, error code otherwise. ** The `flag` argument The flag is used to specify the size of the futex word (FUTEX_[8, 16, 32]). It's mandatory to define one, since there's no default size. By default, the timeout uses a monotonic clock, but can be used as a realtime one by using the FUTEX_REALTIME_CLOCK flag. By default, futexes are of the private type, that means that this user address will be accessed by threads that shares the same memory region. This allows for some internal optimizations, so they are faster. However, if the address needs to be shared with different processes (like using `mmap()` or `shm()`), they need to be defined as shared and the flag FUTEX_SHARED_FLAG is used to set that. By default, the operation has no NUMA-awareness, meaning that the user can't choose the memory node where the kernel side futex data will be stored. The user can choose the node where it wants to operate by setting the FUTEX_NUMA_FLAG and using the following structure (where X can be 8, 16, or 32): struct futexX_numa { __uX value; __sX hint; }; This structure should be passed at the `void *uaddr` of futex functions. The address of the structure will be used to be waited/waken on, and the `value` will be compared to `val` as usual. The `hint` member is used to defined which node the futex will use. When waiting, the futex will be registered on a kernel-side table stored on that node; when waking, the futex will be searched for on that given table. That means that there's no redundancy between tables, and the wrong `hint` value will led to undesired behavior. Userspace is responsible for dealing with node migrations issues that may occur. `hint` can range from [0, MAX_NUMA_NODES], for specifying a node, or -1, to use the same node the current process is using. When not using FUTEX_NUMA_FLAG on a NUMA system, the futex will be stored on a global table on some node, defined at compilation time. ** The `timo` argument As per the Y2038 work done in the kernel, new interfaces shouldn't add timeout options known to be buggy. Given that, `timo` should be a 64bit timeout at all platforms, using an absolute timeout value. Signed-off-by: André Almeida Rebased-by: Joshua Ashton commit 6afbd4f8cd3c74ca144409f92710c612bd8627b2 Author: Alexandre Frade Date: Sat Oct 2 17:14:02 2021 +0000 futex: Implement mechanism to wait on any of several futexes This is a new futex operation, called FUTEX_WAIT_MULTIPLE, which allows a thread to wait on several futexes at the same time, and be awoken by any of them. In a sense, it implements one of the features that was supported by pooling on the old FUTEX_FD interface. The use case lies in the Wine implementation of the Windows NT interface WaitMultipleObjects. This Windows API function allows a thread to sleep waiting on the first of a set of event sources (mutexes, timers, signal, console input, etc) to signal. Considering this is a primitive synchronization operation for Windows applications, being able to quickly signal events on the producer side, and quickly go to sleep on the consumer side is essential for good performance of those running over Wine. Wine developers have an implementation that uses eventfd, but it suffers from FD exhaustion (there is applications that go to the order of multi-milion FDs), and higher CPU utilization than this new operation. The futex list is passed as an array of `struct futex_wait_block` (pointer, value, bitset) to the kernel, which will enqueue all of them and sleep if none was already triggered. It returns a hint of which futex caused the wake up event to userspace, but the hint doesn't guarantee that is the only futex triggered. Before calling the syscall again, userspace should traverse the list, trying to re-acquire any of the other futexes, to prevent an immediate -EWOULDBLOCK return code from the kernel. This was tested using three mechanisms: 1) By reimplementing FUTEX_WAIT in terms of FUTEX_WAIT_MULTIPLE and running the unmodified tools/testing/selftests/futex and a full linux distro on top of this kernel. 2) By an example code that exercises the FUTEX_WAIT_MULTIPLE path on a multi-threaded, event-handling setup. 3) By running the Wine fsync implementation and executing multi-threaded applications, in particular modern games, on top of this implementation. Changes were tested for the following ABIs: x86_64, i386 and x32. Support for x32 applications is not implemented since it would take a major rework adding a new entry point and splitting the current futex 64 entry point in two and we can't change the current x32 syscall number without breaking user space compatibility. Included Valve's Proton compatibility code. CC: Steven Rostedt Cc: Richard Yao Cc: Thomas Gleixner Cc: Peter Zijlstra Co-developed-by: Zebediah Figura Signed-off-by: Zebediah Figura Co-developed-by: Steven Noonan Signed-off-by: Steven Noonan Co-developed-by: Pierre-Loup A. Griffais Signed-off-by: Pierre-Loup A. Griffais Signed-off-by: Gabriel Krisman Bertazi [Added compatibility code] Co-developed-by: André Almeida Signed-off-by: André Almeida Rebased-by: Alexandre Frade commit 12aea22c5b44afca07cd0a19923b6c57c5ac0851 Author: Alexandre Frade Date: Sat Oct 2 15:56:10 2021 +0000 futex*: remove FUTEX_WAIT_MULTIPLE operation and futex2 patchset Signed-off-by: Alexandre Frade