commit 41ff3162bc65f39329052e806762e24908211c2e Author: Alexandre Frade Date: Mon Aug 3 19:02:09 2020 +0000 Linux 5.8.0-xanmod1 Signed-off-by: Alexandre Frade commit bafdc4b2d355ff81814678ba221d538763b497a0 Author: graysky Date: Mon Aug 3 18:17:59 2020 +0000 x86/kconfig: Enable additional cpu optimizations for gcc v10.1+ kernel v5.8 WARNING This patch works with gcc versions 10.1+ and with kernel version 5.8 and should NOT be applied when compiling on older versions of gcc due to key name changes of the march flags introduced with the version 4.9 release of gcc.[1] Use the older version of this patch hosted on the same github for older versions of gcc. FEATURES This patch adds additional CPU options to the Linux kernel accessible under: Processor type and features ---> Processor family ---> The expanded microarchitectures include: * AMD Improved K8-family * AMD K10-family * AMD Family 10h (Barcelona) * AMD Family 14h (Bobcat) * AMD Family 16h (Jaguar) * AMD Family 15h (Bulldozer) * AMD Family 15h (Piledriver) * AMD Family 15h (Steamroller) * AMD Family 15h (Excavator) * AMD Family 17h (Zen) * AMD Family 17h (Zen 2) * Intel Silvermont low-power processors * Intel Goldmont low-power processors (Apollo Lake and Denverton) * Intel Goldmont Plus low-power processors (Gemini Lake) * Intel 1st Gen Core i3/i5/i7 (Nehalem) * Intel 1.5 Gen Core i3/i5/i7 (Westmere) * Intel 2nd Gen Core i3/i5/i7 (Sandybridge) * Intel 3rd Gen Core i3/i5/i7 (Ivybridge) * Intel 4th Gen Core i3/i5/i7 (Haswell) * Intel 5th Gen Core i3/i5/i7 (Broadwell) * Intel 6th Gen Core i3/i5/i7 (Skylake) * Intel 6th Gen Core i7/i9 (Skylake X) * Intel 8th Gen Core i3/i5/i7 (Cannon Lake) * Intel 10th Gen Core i7/i9 (Ice Lake) * Intel Xeon (Cascade Lake) * Intel Xeon (Cooper Lake) * Intel 3rd Gen 10nm++ i3/i5/i7/i9-family (Tiger Lake) It also offers to compile passing the 'native' option which, "selects the CPU to generate code for at compilation time by determining the processor type of the compiling machine. Using -march=native enables all instruction subsets supported by the local machine and will produce code optimized for the local machine under the constraints of the selected instruction set."[2] Do NOT try using the 'native' option on AMD Piledriver, Steamroller, or Excavator CPUs (-march=bdver{2,3,4} flag). The build will error out due the kernel's objtool issue with these.[3a,b] MINOR NOTES This patch also changes 'atom' to 'bonnell' in accordance with the gcc v4.9 changes. Note that upstream is using the deprecated 'match=atom' flags when I believe it should use the newer 'march=bonnell' flag for atom processors.[4] It is not recommended to compile on Atom-CPUs with the 'native' option.[5] The recommendation is to use the 'atom' option instead. BENEFITS Small but real speed increases are measurable using a make endpoint comparing a generic kernel to one built with one of the respective microarchs. See the following experimental evidence supporting this statement: https://github.com/graysky2/kernel_gcc_patch REQUIREMENTS linux version >=5.8 gcc version >=10.1 ACKNOWLEDGMENTS This patch builds on the seminal work by Jeroen.[6] REFERENCES 1. https://gcc.gnu.org/gcc-4.9/changes.html 2. https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html 3a. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95671#c11 3b. https://github.com/graysky2/kernel_gcc_patch/issues/55 4. https://bugzilla.kernel.org/show_bug.cgi?id=77461 5. https://github.com/graysky2/kernel_gcc_patch/issues/15 6. http://www.linuxforge.net/docs/linux/linux-gcc.php commit d2abc6660bbf329b5d682fd20bddc032b14ad891 Author: Andy Lutomirski Date: Wed Jun 24 17:48:40 2020 -0700 x86/fsgsbase: Fix Xen PV support On Xen PV, SWAPGS doesn't work. Teach __rdfsbase_inactive() and __wrgsbase_inactive() to use rdmsrl()/wrmsrl() on Xen PV. The Xen pvop code will understand this and issue the correct hypercalls. Cc: Boris Ostrovsky Cc: Juergen Gross Cc: Stefano Stabellini Cc: xen-devel@lists.xenproject.org Signed-off-by: Andy Lutomirski commit 409ca12910c347c32d0e860010a7fd689187c819 Author: Andy Lutomirski Date: Wed May 27 16:02:36 2020 -0700 selftests/x86: Add a syscall_arg_fault_64 test for negative GSBASE If the kernel erroneously allows WRGSBASE and user code writes a negative value, paranoid_entry will get confused. Check for this by writing a negative value to GSBASE and doing SYSENTER with TF set. A successful run looks like: [RUN] SYSENTER with TF, invalid state, and GSBASE < 0 [SKIP] Illegal instruction A failed run causes a kernel hang, and I believe it's because we double-fault and then get a never ending series of page faults and, when we exhaust the double fault stack we double fault again, starting the process over. Signed-off-by: Andy Lutomirski Signed-off-by: Borislav Petkov Link: https://lkml.kernel.org/r/f4f71efc91b9eae5e3dae21c9aee1c70cf5f370e.1590620529.git.luto@kernel.org commit 622c4163f95fde29d3bb22c337f73da289bca163 Author: Andy Lutomirski Date: Sat Jun 20 08:29:44 2020 -0700 x86/ptrace: Fix 32-bit PTRACE_SETREGS vs fsbase and gsbase Debuggers expect that doing PTRACE_GETREGS, then poking at a tracee and maybe letting it run for a while, then doing PTRACE_SETREGS will put the tracee back where it was. In the specific case of a 32-bit tracer and tracee, the PTRACE_GETREGS/SETREGS data structure doesn't have fs_base or gs_base fields, so FSBASE and GSBASE fields are never stored anywhere. Everything used to still work because nonzero FS or GS would result full reloads of the segment registers when the tracee resumes, and the bases associated with FS==0 or GS==0 are irrelevant to 32-bit code. Adding FSGSBASE support broke this: when FSGSBASE is enabled, FSBASE and GSBASE are now restored independently of FS and GS for all tasks when context-switched in. This means that, if a 32-bit tracer restores a previous state using PTRACE_SETREGS but the tracee's pre-restore and post-restore bases don't match, then the tracee is resumed with the wrong base. Fix it by explicitly loading the base when a 32-bit tracer pokes FS or GS on a 64-bit kernel. Also add a test case. Fixes: 673903495c85 ("x86/process/64: Use FSBSBASE in switch_to() if available") Cc: Sasha Levin Signed-off-by: Andy Lutomirski Signed-off-by: Oleksandr Natalenko commit 72b043f2a25293ba3b5528dba56a4559c7f6119d Author: Andy Lutomirski Date: Fri Jun 19 22:20:35 2020 -0700 selftests/x86/fsgsbase: Add a missing memory constraint The manual call to set_thread_area() via int $0x80 was missing any indication that the descriptor was a pointer, causing gcc to occasionally generate wrong code. Add the missing constraint. Signed-off-by: Andy Lutomirski commit 094004f153cf530724e0571b2f0b69fa77977df7 Author: Andy Lutomirski Date: Fri Jun 19 16:46:33 2020 -0700 selftests/x86/fsgsbase: Fix a comment in the ptrace_write_gsbase test A comment was unclear. Fix it. Fixes: 5e7ec8578fa3 ("selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE") Signed-off-by: Andy Lutomirski commit bf2165bcef2666e9146b0fba12cc4d94369dc0e7 Author: Chang S. Bae Date: Thu May 28 16:14:02 2020 -0400 selftests/x86/fsgsbase: Test ptracer-induced GS base write with FSGSBASE This validates that GS selector and base are independently preserved in ptrace commands. Suggested-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Reviewed-by: Tony Luck Link: https://lkml.kernel.org/r/20200528201402.1708239-17-sashal@kernel.org commit f985091ae2da0911e66476a8ac50a475fac4ebb0 Author: Chang S. Bae Date: Thu May 28 16:14:01 2020 -0400 selftests/x86/fsgsbase: Test GS selector on ptracer-induced GS base write The test validates that the selector is not changed when a ptracer writes the ptracee's GS base. Originally-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Reviewed-by: Tony Luck Link: https://lkml.kernel.org/r/20200528201402.1708239-16-sashal@kernel.org commit 169e0a20429abf92adb03e5018c8dc0f468814c9 Author: Thomas Gleixner Date: Thu May 28 16:14:00 2020 -0400 Documentation/x86/64: Add documentation for GS/FS addressing mode Explain how the GS/FS based addressing can be utilized in user space applications along with the differences between the generic prctl() based GS/FS base control and the FSGSBASE version available on newer CPUs. Originally-by: Andi Kleen Signed-off-by: Thomas Gleixner Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Reviewed-by: Tony Luck Link: https://lkml.kernel.org/r/20200528201402.1708239-15-sashal@kernel.org commit 138288fd19ddee25b9234ff70d2511297dce98cb Author: Andi Kleen Date: Thu May 28 16:13:59 2020 -0400 x86/elf: Enumerate kernel FSGSBASE capability in AT_HWCAP2 The kernel needs to explicitly enable FSGSBASE. So, the application needs to know if it can safely use these instructions. Just looking at the CPUID bit is not enough because it may be running in a kernel that does not enable the instructions. One way for the application would be to just try and catch the SIGILL. But that is difficult to do in libraries which may not want to overwrite the signal handlers of the main application. Enumerate the enabled FSGSBASE capability in bit 1 of AT_HWCAP2 in the ELF aux vector. AT_HWCAP2 is already used by PPC for similar purposes. The application can access it open coded or by using the getauxval() function in newer versions of glibc. [ tglx: Massaged changelog ] Signed-off-by: Andi Kleen Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/1557309753-24073-18-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-14-sashal@kernel.org commit 3cdabcc93c7d63b56cb34b2b8270d62045132fb1 Author: Andy Lutomirski Date: Thu May 28 16:13:58 2020 -0400 x86/cpu: Enable FSGSBASE on 64bit by default and add a chicken bit Now that FSGSBASE is fully supported, remove unsafe_fsgsbase, enable FSGSBASE by default, and add nofsgsbase to disable it. Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Reviewed-by: Andi Kleen Link: https://lkml.kernel.org/r/1557309753-24073-17-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-13-sashal@kernel.org commit ec27f0dec0553a60eefd4b0ffa11e6b6a8eada5d Author: Chang S. Bae Date: Thu May 28 16:13:57 2020 -0400 x86/entry/64: Handle FSGSBASE enabled paranoid entry/exit Without FSGSBASE, user space cannot change GSBASE other than through a PRCTL. The kernel enforces that the user space GSBASE value is postive as negative values are used for detecting the kernel space GSBASE value in the paranoid entry code. If FSGSBASE is enabled, user space can set arbitrary GSBASE values without kernel intervention, including negative ones, which breaks the paranoid entry assumptions. To avoid this, paranoid entry needs to unconditionally save the current GSBASE value independent of the interrupted context, retrieve and write the kernel GSBASE and unconditionally restore the saved value on exit. The restore happens either in paranoid_exit or in the special exit path of the NMI low level code. All other entry code pathes which use unconditional SWAPGS are not affected as they do not depend on the actual content. [ tglx: Massaged changelogs and comments ] Suggested-by: H. Peter Anvin Suggested-by: Andy Lutomirski Suggested-by: Thomas Gleixner Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/1557309753-24073-13-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-12-sashal@kernel.org commit bef7c9e845d99f39598ef3afbf89a1c091ad564a Author: Chang S. Bae Date: Thu May 28 16:13:56 2020 -0400 x86/entry/64: Introduce the FIND_PERCPU_BASE macro GSBASE is used to find per-CPU data in the kernel. But when GSBASE is unknown, the per-CPU base can be found from the per_cpu_offset table with a CPU NR. The CPU NR is extracted from the limit field of the CPUNODE entry in GDT, or by the RDPID instruction. This is a prerequisite for using FSGSBASE in the low level entry code. Also, add the GAS-compatible RDPID macro as binutils 2.23 do not support it. Support is added in version 2.27. [ tglx: Massaged changelog ] Suggested-by: H. Peter Anvin Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/1557309753-24073-12-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-11-sashal@kernel.org commit d77c4392dabf2bf50648082b7dc1038a41dc57cd Author: Chang S. Bae Date: Thu May 28 16:13:55 2020 -0400 x86/entry/64: Switch CR3 before SWAPGS in paranoid entry When FSGSBASE is enabled, the GSBASE handling in paranoid entry will need to retrieve the kernel GSBASE which requires that the kernel page table is active. As the CR3 switch to the kernel page tables (PTI is active) does not depend on kernel GSBASE, move the CR3 switch in front of the GSBASE handling. Comment the EBX content while at it. No functional change. [ tglx: Rewrote changelog and comments ] Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/1557309753-24073-11-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-10-sashal@kernel.org commit 0103466ef93c1a786788772b13796650504fb76b Author: Tony Luck Date: Thu May 28 16:13:54 2020 -0400 x86/speculation/swapgs: Check FSGSBASE in enabling SWAPGS mitigation Before enabling FSGSBASE the kernel could safely assume that the content of GS base was a user address. Thus any speculative access as the result of a mispredicted branch controlling the execution of SWAPGS would be to a user address. So systems with speculation-proof SMAP did not need to add additional LFENCE instructions to mitigate. With FSGSBASE enabled a hostile user can set GS base to a kernel address. So they can make the kernel speculatively access data they wish to leak via a side channel. This means that SMAP provides no protection. Add FSGSBASE as an additional condition to enable the fence-based SWAPGS mitigation. Signed-off-by: Tony Luck Signed-off-by: Chang S. Bae Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20200528201402.1708239-9-sashal@kernel.org commit 030eebf498dd304269e6710cc6578046ed3268dd Author: Chang S. Bae Date: Thu May 28 16:13:53 2020 -0400 x86/process/64: Use FSGSBASE instructions on thread copy and ptrace When FSGSBASE is enabled, copying threads and reading fsbase and gsbase using ptrace must read the actual values. When copying a thread, use save_fsgs() and copy the saved values. For ptrace, the bases must be read from memory regardless of the selector if FSGSBASE is enabled. [ tglx: Invoke __rdgsbase_inactive() with interrupts disabled ] [ luto: Massage changelog ] Suggested-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/1557309753-24073-9-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-8-sashal@kernel.org commit 161a41720c508b6d2e9d411cb832a8211656b477 Author: Andy Lutomirski Date: Thu May 28 16:13:51 2020 -0400 x86/process/64: Use FSBSBASE in switch_to() if available With the new FSGSBASE instructions, FS and GSABSE can be efficiently read and writen in __switch_to(). Use that capability to preserve the full state. This will enable user code to do whatever it wants with the new instructions without any kernel-induced gotchas. (There can still be architectural gotchas: movl %gs,%eax; movl %eax,%gs may change GSBASE if WRGSBASE was used, but users are expected to read the CPU manual before doing things like that.) This is a considerable speedup. It seems to save about 100 cycles per context switch compared to the baseline 4.6-rc1 behavior on a Skylake laptop. This is mostly due to avoiding the WRMSR operation. [ chang: 5~10% performance improvements were seen with a context switch benchmark that ran threads with different FS/GSBASE values (to the baseline 4.16). Minor edit on the changelog. ] [ tglx: Masaage changelog ] Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Reviewed-by: Andi Kleen Link: https://lkml.kernel.org/r/1557309753-24073-8-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-6-sashal@kernel.org commit 4d0325de135cd16bf6c1dcc4531b3ac1607b14b0 Author: Thomas Gleixner Date: Thu May 28 16:13:52 2020 -0400 x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE save_fsgs_for_kvm() is invoked via vcpu_enter_guest() kvm_x86_ops.prepare_guest_switch(vcpu) vmx_prepare_switch_to_guest() save_fsgs_for_kvm() with preemption disabled, but interrupts enabled. The upcoming FSGSBASE based GS safe needs interrupts to be disabled. This could be done in the helper function, but that function is also called from switch_to() which has interrupts disabled already. Disable interrupts inside save_fsgs_for_kvm() and rename the function to current_save_fsgs() so it can be invoked from other places. Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20200528201402.1708239-7-sashal@kernel.org commit 80c266610841652649bde697687169d7e46bdc5e Author: Chang S. Bae Date: Thu May 28 16:13:50 2020 -0400 x86/fsgsbase/64: Enable FSGSBASE instructions in helper functions Add cpu feature conditional FSGSBASE access to the relevant helper functions. That allows to accelerate certain FS/GS base operations in subsequent changes. Note, that while possible, the user space entry/exit GSBASE operations are not going to use the new FSGSBASE instructions. The reason is that it would require additional storage for the user space value which adds more complexity to the low level code and experiments have shown marginal benefit. This may be revisited later but for now the SWAPGS based handling in the entry code is preserved except for the paranoid entry/exit code. To preserve the SWAPGS entry mechanism introduce __[rd|wr]gsbase_inactive() helpers. Note, for Xen PV, paravirt hooks can be added later as they might allow a very efficient but different implementation. [ tglx: Massaged changelog, convert it to noinstr and force inline native_swapgs() ] Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/1557309753-24073-7-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-5-sashal@kernel.org commit 44442fee11be272d1e99ea8ae9f5502462b828d5 Author: Andi Kleen Date: Thu May 28 16:13:49 2020 -0400 x86/fsgsbase/64: Add intrinsics for FSGSBASE instructions [ luto: Rename the variables from FS and GS to FSBASE and GSBASE and make safe to include on 32-bit kernels. ] Signed-off-by: Andi Kleen Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Reviewed-by: Andy Lutomirski Reviewed-by: Andi Kleen Link: https://lkml.kernel.org/r/1557309753-24073-6-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-4-sashal@kernel.org commit 2274cc22ae0f027c23d9f73de43aa197627d0a5e Author: Andy Lutomirski Date: Thu May 28 16:13:48 2020 -0400 x86/cpu: Add 'unsafe_fsgsbase' to enable CR4.FSGSBASE This is temporary. It will allow the next few patches to be tested incrementally. Setting unsafe_fsgsbase is a root hole. Don't do it. Signed-off-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Reviewed-by: Andi Kleen Reviewed-by: Andy Lutomirski Link: https://lkml.kernel.org/r/1557309753-24073-4-git-send-email-chang.seok.bae@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-3-sashal@kernel.org commit de7704133da4461703a5bad8ccadf600a569446b Author: Chang S. Bae Date: Thu May 28 16:13:47 2020 -0400 x86/ptrace: Prevent ptrace from clearing the FS/GS selector When a ptracer writes a ptracee's FS/GSBASE with a different value, the selector is also cleared. This behavior is not correct as the selector should be preserved. Update only the base value and leave the selector intact. To simplify the code further remove the conditional checking for the same value as this code is not performance critical. The only recognizable downside of this change is when the selector is already nonzero on write. The base will be reloaded according to the selector. But the case is highly unexpected in real usages. [ tglx: Massage changelog ] Suggested-by: Andy Lutomirski Signed-off-by: Chang S. Bae Signed-off-by: Thomas Gleixner Signed-off-by: Sasha Levin Signed-off-by: Thomas Gleixner Link: https://lkml.kernel.org/r/9040CFCD-74BD-4C17-9A01-B9B713CF6B10@intel.com Link: https://lkml.kernel.org/r/20200528201402.1708239-2-sashal@kernel.org commit 407bee0a0ac4da977c7fdc02a46821c4c48aa273 Author: Ben Hutchings Date: Tue Jun 26 16:59:01 2018 +0100 android: Export symbols needed by Android drivers We want to enable use of the Android ashmem and binder drivers to support Anbox, but they should not be built-in as that would waste resources and increase security attack surface on systems that don't need them. Export the currently un-exported symbols they depend on. commit 09e4c5ffee7ab383f0233bcc69f8866d1e33ac07 Author: Ben Hutchings Date: Fri Jun 22 17:27:00 2018 +0100 android: Enable building ashmem and binder as modules We want to enable use of the Android ashmem and binder drivers to support Anbox, but they should not be built-in as that would waste resources and increase security attack surface on systems that don't need them. - Add a MODULE_LICENSE declaration to ashmem - Change the Makefiles to build each driver as an object with the "_linux" suffix (which is what Anbox expects) - Change config symbol types to tristate commit a45bb94ca506ea9c0ebe142afd772b9d49bb1ec9 Author: Arjan van de Ven Date: Sun Feb 18 23:35:41 2018 +0000 locking: rwsem: spin faster tweak rwsem owner spinning a bit commit a291ccc237ad46106d34d6f867d80646d885ffad Author: William Douglas Date: Wed Jun 20 17:23:21 2018 +0000 firmware: Enable stateless firmware loading Prefer the order of specific version before generic and /etc before /lib to enable the user to give specific overrides for generic firmware and distribution firmware. commit 8f639c95d9369a888841d98b208d418e5609f4be Author: Arjan van de Ven Date: Sun Sep 22 11:12:35 2019 -0300 intel_rapl: Silence rapl trace debug commit 36b5665989f7640faf2b7926501ca818455b9b45 Author: Nick Terrell Date: Thu Jul 30 11:05:59 2020 -0700 Documentation: dontdiff: Add zstd compressed files For now, that's arch/x86/boot/compressed/vmlinux.bin.zst but probably more will come, thus let's be consistent with all other compressors. Signed-off-by: Nick Terrell commit 31fb93d3a517ed548eb690fa6338c4860a7ad4ee Author: Adam Borowski Date: Wed Mar 18 20:43:11 2020 -0700 .gitignore: add ZSTD-compressed files For now, that's arch/x86/boot/compressed/vmlinux.bin.zst but probably more will come, thus let's be consistent with all other compressors. Tested-by: Sedat Dilek Reviewed-by: Kees Cook Signed-off-by: Nick Terrell Signed-off-by: Adam Borowski commit 50d1707225da8b320941be842c987c603b5420ed Author: Nick Terrell Date: Thu Jun 20 15:18:36 2019 -0700 x86: Add support for ZSTD compressed kernel * Add support for zstd compressed kernel * Define __DISABLE_EXPORTS in Makefile * Remove __DISABLE_EXPORTS definition from kaslr.c * Bump the heap size for zstd. * Update the documentation. Integrates the ZSTD decompression code to the x86 pre-boot code. Zstandard requires slightly more memory during the kernel decompression on x86 (192 KB vs 64 KB), and the memory usage is independent of the window size. __DISABLE_EXPORTS is now defined in the Makefile, which covers both the existing use in kaslr.c, and the use needed by the zstd decompressor in misc.c. This patch has been boot tested with both a zstd and gzip compressed kernel on i386 and x86_64 using buildroot and QEMU. Additionally, this has been tested in production on x86_64 devices. We saw a 2 second boot time reduction by switching kernel compression from xz to zstd. Reviewed-by: Kees Cook Tested-by: Sedat Dilek Signed-off-by: Nick Terrell commit 723179e01d134e07f21251d2b979b53516acb9ce Author: Nick Terrell Date: Thu Jun 20 15:17:17 2019 -0700 x86: bump ZO_z_extra_bytes margin for zstd Bump the ZO_z_extra_bytes margin for zstd. Zstd needs 3 bytes per 128 KB, and has a 22 byte fixed overhead. Zstd needs to maintain 128 KB of space at all times, since that is the maximum block size. See the comments regarding in-place decompression added in lib/decompress_unzstd.c for details. The existing code is written so that all the compression algorithms use the same ZO_z_extra_bytes. It is taken to be the maximum of the growth rate plus the maximum fixed overhead. The comments just above this diff state that: Reviewed-by: Kees Cook Tested-by: Sedat Dilek Signed-off-by: Nick Terrell commit ff56ebd466c785a18a3588c563127eb7533e9ffc Author: Nick Terrell Date: Thu Jun 20 15:15:46 2019 -0700 usr: add support for zstd compressed initramfs * Add support for a zstd compressed initramfs. * Add compression for compressing built-in initramfs with zstd. I have tested this patch by boot testing with buildroot and QEMU. Specifically, I booted the kernel with both a zstd and gzip compressed initramfs, both built into the kernel and separate. I ensured that the correct compression algorithm was used. I tested on arm, aarch64, i386, and x86_64. This patch has been tested in production on aarch64 and x86_64 devices. Additionally, I have performance measurements from internal use in production. On an aarch64 device we saw 19 second boot time improvement from switching from lzma to zstd (27 seconds to 8 seconds). On an x86_64 device we saw a 9 second boot time reduction from switching from xz to zstd. Reviewed-by: Kees Cook Tested-by: Sedat Dilek Signed-off-by: Nick Terrell commit 687f0349e4944bc24207f1f3e31f6c27863a74d7 Author: Nick Terrell Date: Thu Jun 20 15:15:08 2019 -0700 init: add support for zstd compressed kernel * Adds the zstd and zstd22 cmds to scripts/Makefile.lib * Adds the HAVE_KERNEL_ZSTD and KERNEL_ZSTD options Architecture specific support is still needed for decompression. Reviewed-by: Kees Cook Tested-by: Sedat Dilek Signed-off-by: Nick Terrell commit b61df72ba55028cba9a708afae36dc4e630ed4f2 Author: Nick Terrell Date: Thu Jun 20 15:14:00 2019 -0700 lib: add zstd support to decompress * Add unzstd() and the zstd decompress interface. * Add zstd support to decompress_method(). The decompress_method() and unzstd() functions are used to decompress the initramfs and the initrd. The __decompress() function is used in the preboot environment to decompress a zstd compressed kernel. The zstd decompression function allows the input and output buffers to overlap because that is used by x86 kernel decompression. Reviewed-by: Kees Cook Tested-by: Sedat Dilek Signed-off-by: Nick Terrell commit bfb37404837db0953d70938a0f9793272cf039b4 Author: Nick Terrell Date: Thu Jun 20 15:03:27 2019 -0700 lib: prepare zstd for preboot environment * Remove a double definition of the CHECK_F macro when the zstd library is amalgamated. * Switch ZSTD_copy8() to __builtin_memcpy(), because in the preboot environment on x86 gcc can't inline `memcpy()` otherwise. * Limit the gcc hack in ZSTD_wildcopy() to the broken gcc version. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81388. These changes are necessary to get the build to work in the preboot environment, and to get reasonable performance. ZSTD_copy8() and ZSTD_wildcopy() are in the core of the zstd hot loop. So outlining these calls to memcpy(), and having an extra branch are very detrimental to performance. Reviewed-by: Kees Cook Tested-by: Sedat Dilek Signed-off-by: Nick Terrell commit 6fbf15e9aebb9d04aab09367e0ed6e2d561db2b6 Author: Gabriel Krisman Bertazi Date: Thu Feb 13 18:45:25 2020 -0300 selftests: futex: Add FUTEX_WAIT_MULTIPLE wake up test Add test for wait at multiple futexes mechanism. Skip the test if it's a x32 application and the kernel returned the approtiaded error, since this ABI is not supported for this operation. Signed-off-by: Gabriel Krisman Bertazi Co-developed-by: André Almeida Signed-off-by: André Almeida Signed-off-by: Alexandre Frade commit 44987d0be4a19db5e42f24f10d6fcf21ba0a6b17 Author: Gabriel Krisman Bertazi Date: Thu Feb 13 18:45:24 2020 -0300 selftests: futex: Add FUTEX_WAIT_MULTIPLE wouldblock test Add test for wouldblock return when waiting for multiple futexes. Skip the test if it's a x32 application and the kernel returned the approtiaded error, since this ABI is not supported for this operation. Signed-off-by: Gabriel Krisman Bertazi Co-developed-by: André Almeida Signed-off-by: André Almeida Signed-off-by: Alexandre Frade commit c11b1f0cf2e7378c47d4cfb4faf54c8d2c891705 Author: Gabriel Krisman Bertazi Date: Thu Feb 13 18:45:23 2020 -0300 selftests: futex: Add FUTEX_WAIT_MULTIPLE timeout test Add test for timeout when waiting for multiple futexes. Skip the test if it's a x32 application and the kernel returned the approtiaded error, since this ABI is not supported for this operation. Signed-off-by: Gabriel Krisman Bertazi Co-developed-by: André Almeida Signed-off-by: André Almeida Signed-off-by: Alexandre Frade commit 01508e4b79d99cca21c9da4de0a0b803eef615f6 Author: Gabriel Krisman Bertazi Date: Thu Feb 13 18:45:22 2020 -0300 futex: Implement mechanism to wait on any of several futexes This is a new futex operation, called FUTEX_WAIT_MULTIPLE, which allows a thread to wait on several futexes at the same time, and be awoken by any of them. In a sense, it implements one of the features that was supported by pooling on the old FUTEX_FD interface. The use case lies in the Wine implementation of the Windows NT interface WaitMultipleObjects. This Windows API function allows a thread to sleep waiting on the first of a set of event sources (mutexes, timers, signal, console input, etc) to signal. Considering this is a primitive synchronization operation for Windows applications, being able to quickly signal events on the producer side, and quickly go to sleep on the consumer side is essential for good performance of those running over Wine. Wine developers have an implementation that uses eventfd, but it suffers from FD exhaustion (there is applications that go to the order of multi-milion FDs), and higher CPU utilization than this new operation. The futex list is passed as an array of `struct futex_wait_block` (pointer, value, bitset) to the kernel, which will enqueue all of them and sleep if none was already triggered. It returns a hint of which futex caused the wake up event to userspace, but the hint doesn't guarantee that is the only futex triggered. Before calling the syscall again, userspace should traverse the list, trying to re-acquire any of the other futexes, to prevent an immediate -EWOULDBLOCK return code from the kernel. This was tested using three mechanisms: 1) By reimplementing FUTEX_WAIT in terms of FUTEX_WAIT_MULTIPLE and running the unmodified tools/testing/selftests/futex and a full linux distro on top of this kernel. 2) By an example code that exercises the FUTEX_WAIT_MULTIPLE path on a multi-threaded, event-handling setup. 3) By running the Wine fsync with Valve's Proton compatibility code implementation and executing multi-threaded applications, in particular modern games, on top of this implementation. Changes were tested for the following ABIs: x86_64, i386 and x32. Support for x32 applications is not implemented since it would take a major rework adding a new entry point and splitting the current futex 64 entry point in two and we can't change the current x32 syscall number without breaking user space compatibility. CC: Steven Rostedt Cc: Richard Yao Cc: Thomas Gleixner Cc: Peter Zijlstra Co-developed-by: Zebediah Figura Signed-off-by: Zebediah Figura Co-developed-by: Steven Noonan Signed-off-by: Steven Noonan Co-developed-by: Pierre-Loup A. Griffais Signed-off-by: Pierre-Loup A. Griffais Signed-off-by: Gabriel Krisman Bertazi [Added compatibility code] Co-developed-by: André Almeida Signed-off-by: André Almeida Signed-off-by: Alexandre Frade commit a9549d9943c90d19672cc3b0f9e7429ed6fc9afe Author: Scott James Remnant Date: Tue Oct 27 10:05:32 2009 +0000 trace: add trace events for open(), exec() and uselib() (for v3.7+) BugLink: http://bugs.launchpad.net/bugs/462111 This patch uses TRACE_EVENT to add tracepoints for the open(), exec() and uselib() syscalls so that ureadahead can cheaply trace the boot sequence to determine what to read to speed up the next. It's not upstream because it will need to be rebased onto the syscall trace events whenever that gets merged, and is a stop-gap. [apw@canonical.com: updated for v3.7 and later.] [apw@canonical.com: updated for v3.19 and later.] BugLink: http://bugs.launchpad.net/bugs/1085766 Signed-off-by: Scott James Remnant Acked-by: Stefan Bader Acked-by: Andy Whitcroft Signed-off-by: Stefan Bader Conflicts: fs/open.c Signed-off-by: Tim Gardner commit 3c229f434aca65c4ca61772bc03c3e0370817b92 Author: Alexandre Frade Date: Mon Aug 3 17:05:04 2020 +0000 mm: set 2 megabytes for address_space-level file read-ahead pages size Signed-off-by: Alexandre Frade commit 881e4852aea54a90b0ea5e64832a0a16961c8625 Author: Alexandre Frade Date: Thu Jun 25 16:40:43 2020 -0300 lib/kconfig.debug: disable default CONFIG_SYMBOLIC_ERRNAME and CONFIG_DEBUG_BUGVERBOSE Signed-off-by: Alexandre Frade commit c0531be649949d034e4d3a904df2f8bc7f98cbb1 Author: Alexandre Frade Date: Mon Jan 29 18:29:13 2018 +0000 sched/core: nr_migrate = 256 increases number of tasks to iterate in a single balance run. Signed-off-by: Alexandre Frade commit 2b65a1329cb220b43c19c4d0de5833fae9e2b22d Author: Alexandre Frade Date: Wed Oct 24 16:58:52 2018 -0300 net/sched: allow configuring cake qdisc as default Signed-off-by: Alexandre Frade commit 329cf1c917d59097f06149ec5a2f72c3137aa931 Author: Alexandre Frade Date: Tue Mar 31 13:32:08 2020 -0300 cpufreq: tunes ondemand and conservative governor for performance Signed-off-by: Alexandre Frade commit acc49f33a10f61dc66c423888cbb883ba46710e4 Author: Alexandre Frade Date: Mon Jan 29 17:41:29 2018 +0000 scripts: disable the localversion "+" tag of a git repo Signed-off-by: Alexandre Frade commit a92783513ab522be01b5ee68d5fcc4d2c528f630 Author: Mark Weiman Date: Sun Aug 12 11:36:21 2018 -0400 pci: Enable overrides for missing ACS capabilities This an updated version of Alex Williamson's patch from: https://lkml.org/lkml/2013/5/30/513 Original commit message follows: PCIe ACS (Access Control Services) is the PCIe 2.0+ feature that allows us to control whether transactions are allowed to be redirected in various subnodes of a PCIe topology. For instance, if two endpoints are below a root port or downsteam switch port, the downstream port may optionally redirect transactions between the devices, bypassing upstream devices. The same can happen internally on multifunction devices. The transaction may never be visible to the upstream devices. One upstream device that we particularly care about is the IOMMU. If a redirection occurs in the topology below the IOMMU, then the IOMMU cannot provide isolation between devices. This is why the PCIe spec encourages topologies to include ACS support. Without it, we have to assume peer-to-peer DMA within a hierarchy can bypass IOMMU isolation. Unfortunately, far too many topologies do not support ACS to make this a steadfast requirement. Even the latest chipsets from Intel are only sporadically supporting ACS. We have trouble getting interconnect vendors to include the PCIe spec required PCIe capability, let alone suggested features. Therefore, we need to add some flexibility. The pcie_acs_override= boot option lets users opt-in specific devices or sets of devices to assume ACS support. The "downstream" option assumes full ACS support on root ports and downstream switch ports. The "multifunction" option assumes the subset of ACS features available on multifunction endpoints and upstream switch ports are supported. The "id:nnnn:nnnn" option enables ACS support on devices matching the provided vendor and device IDs, allowing more strategic ACS overrides. These options may be combined in any order. A maximum of 16 id specific overrides are available. It's suggested to use the most limited set of options necessary to avoid completely disabling ACS across the topology. Note to hardware vendors, we have facilities to permanently quirk specific devices which enforce isolation but not provide an ACS capability. Please contact me to have your devices added and save your customers the hassle of this boot option. Signed-off-by: Mark Weiman Signed-off-by: Alexandre Frade commit 7fc483d271599f22c347ccf9f1c26c3da1355568 Author: Alexandre Frade Date: Mon Jan 29 17:31:25 2018 +0000 mm/vmscan: vm_swappiness = 30 decreases the amount of swapping Signed-off-by: Alexandre Frade commit c304f43d14e98d4bf1215fc10bc5012f554bdd8a Author: Alexandre Frade Date: Mon Jan 29 16:59:22 2018 +0000 dcache: cache_pressure = 50 decreases the rate at which VFS caches are reclaimed Signed-off-by: Alexandre Frade commit c9f3ba648f9266af835eb9e15c194f60894f7ff4 Author: Alexandre Frade Date: Sun Oct 13 03:10:39 2019 -0300 kconfig: set PREEMPT and RCU_BOOST without delay by default Signed-off-by: Alexandre Frade commit a7372927fb39602a7182c0a1440cd560af65b19f Author: Alexandre Frade Date: Mon Jan 29 17:26:15 2018 +0000 kconfig: add 500Hz timer interrupt kernel config option Signed-off-by: Alexandre Frade commit e2111bc5989131c675659d40e0cc4f214df2f990 Author: Alexandre Frade Date: Fri May 10 16:45:59 2019 -0300 block: set rq_affinity = 2 for full multithreading I/O requests Signed-off-by: Alexandre Frade commit f44d2469c53c8aeb53fcf8219138a1ebe4a6befc Author: Alexandre Frade Date: Mon Jun 1 18:23:51 2020 -0300 block, bfq: change BLK_DEV_ZONED depends to IOSCHED_BFQ Signed-off-by: Alexandre Frade commit 977812938da7c7226415778c340832141d9278b7 Author: Alexandre Frade Date: Mon Nov 25 15:13:06 2019 -0300 elevator: set default scheduler to bfq for blk-mq Signed-off-by: Alexandre Frade commit bcf876870b95592b52519ed4aafcf9d95999bc9c Author: Linus Torvalds Date: Sun Aug 2 14:21:45 2020 -0700 Linux 5.8