Skip to content

Commit 3efc573

Browse files
committed
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini: "x86: - KVM currently invalidates the entirety of the page tables, not just those for the memslot being touched, when a memslot is moved or deleted. This does not traditionally have particularly noticeable overhead, but Intel's TDX will require the guest to re-accept private pages if they are dropped from the secure EPT, which is a non starter. Actually, the only reason why this is not already being done is a bug which was never fully investigated and caused VM instability with assigned GeForce GPUs, so allow userspace to opt into the new behavior. - Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10 functionality that is on the horizon) - Rework common MSR handling code to suppress errors on userspace accesses to unsupported-but-advertised MSRs This will allow removing (almost?) all of KVM's exemptions for userspace access to MSRs that shouldn't exist based on the vCPU model (the actual cleanup is non-trivial future work) - Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the 64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv) stores the entire 64-bit value at the ICR offset - Fix a bug where KVM would fail to exit to userspace if one was triggered by a fastpath exit handler - Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when there's already a pending wake event at the time of the exit - Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the nested guest) - Overhaul the "unprotect and retry" logic to more precisely identify cases where retrying is actually helpful, and to harden all retry paths against putting the guest into an infinite retry loop - Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in the shadow MMU - Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for adding multi generation LRU support in KVM - Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled, i.e. when the CPU has already flushed the RSB - Trace the per-CPU host save area as a VMCB pointer to improve readability and cleanup the retrieval of the SEV-ES host save area - Remove unnecessary accounting of temporary nested VMCB related allocations - Set FINAL/PAGE in the page fault error code for EPT violations if and only if the GVA is valid. If the GVA is NOT valid, there is no guest-side page table walk and so stuffing paging related metadata is nonsensical - Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of emulating posted interrupt delivery to L2 - Add a lockdep assertion to detect unsafe accesses of vmcs12 structures - Harden eVMCS loading against an impossible NULL pointer deref (really truly should be impossible) - Minor SGX fix and a cleanup - Misc cleanups Generic: - Register KVM's cpuhp and syscore callbacks when enabling virtualization in hardware, as the sole purpose of said callbacks is to disable and re-enable virtualization as needed - Enable virtualization when KVM is loaded, not right before the first VM is created Together with the previous change, this simplifies a lot the logic of the callbacks, because their very existence implies virtualization is enabled - Fix a bug that results in KVM prematurely exiting to userspace for coalesced MMIO/PIO in many cases, clean up the related code, and add a testcase - Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_ the gpa+len crosses a page boundary, which thankfully is guaranteed to not happen in the current code base. Add WARNs in more helpers that read/write guest memory to detect similar bugs Selftests: - Fix a goof that caused some Hyper-V tests to be skipped when run on bare metal, i.e. NOT in a VM - Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest - Explicitly include one-off assets in .gitignore. Past Sean was completely wrong about not being able to detect missing .gitignore entries - Verify userspace single-stepping works when KVM happens to handle a VM-Exit in its fastpath - Misc cleanups" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits) Documentation: KVM: fix warning in "make htmldocs" s390: Enable KVM_S390_UCONTROL config in debug_defconfig selftests: kvm: s390: Add VM run test case KVM: SVM: let alternatives handle the cases when RSB filling is required KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range() KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps() KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range() KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure() KVM: x86: Update retry protection fields when forcing retry on emulation failure KVM: x86: Apply retry protection to "unprotect on failure" path KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn ...
2 parents e08d227 + efbc6bd commit 3efc573

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+2876
-1525
lines changed

Documentation/admin-guide/kernel-parameters.txt

+17
Original file line numberDiff line numberDiff line change
@@ -2677,6 +2677,23 @@
26772677

26782678
Default is Y (on).
26792679

2680+
kvm.enable_virt_at_load=[KVM,ARM64,LOONGARCH,MIPS,RISCV,X86]
2681+
If enabled, KVM will enable virtualization in hardware
2682+
when KVM is loaded, and disable virtualization when KVM
2683+
is unloaded (if KVM is built as a module).
2684+
2685+
If disabled, KVM will dynamically enable and disable
2686+
virtualization on-demand when creating and destroying
2687+
VMs, i.e. on the 0=>1 and 1=>0 transitions of the
2688+
number of VMs.
2689+
2690+
Enabling virtualization at module lode avoids potential
2691+
latency for creation of the 0=>1 VM, as KVM serializes
2692+
virtualization enabling across all online CPUs. The
2693+
"cost" of enabling virtualization when KVM is loaded,
2694+
is that doing so may interfere with using out-of-tree
2695+
hypervisors that want to "own" virtualization hardware.
2696+
26802697
kvm.enable_vmware_backdoor=[KVM] Support VMware backdoor PV interface.
26812698
Default is false (don't support).
26822699

Documentation/virt/kvm/api.rst

+27-4
Original file line numberDiff line numberDiff line change
@@ -4214,7 +4214,9 @@ whether or not KVM_CAP_X86_USER_SPACE_MSR's KVM_MSR_EXIT_REASON_FILTER is
42144214
enabled. If KVM_MSR_EXIT_REASON_FILTER is enabled, KVM will exit to userspace
42154215
on denied accesses, i.e. userspace effectively intercepts the MSR access. If
42164216
KVM_MSR_EXIT_REASON_FILTER is not enabled, KVM will inject a #GP into the guest
4217-
on denied accesses.
4217+
on denied accesses. Note, if an MSR access is denied during emulation of MSR
4218+
load/stores during VMX transitions, KVM ignores KVM_MSR_EXIT_REASON_FILTER.
4219+
See the below warning for full details.
42184220

42194221
If an MSR access is allowed by userspace, KVM will emulate and/or virtualize
42204222
the access in accordance with the vCPU model. Note, KVM may still ultimately
@@ -4229,9 +4231,22 @@ filtering. In that mode, ``KVM_MSR_FILTER_DEFAULT_DENY`` is invalid and causes
42294231
an error.
42304232

42314233
.. warning::
4232-
MSR accesses as part of nested VM-Enter/VM-Exit are not filtered.
4233-
This includes both writes to individual VMCS fields and reads/writes
4234-
through the MSR lists pointed to by the VMCS.
4234+
MSR accesses that are side effects of instruction execution (emulated or
4235+
native) are not filtered as hardware does not honor MSR bitmaps outside of
4236+
RDMSR and WRMSR, and KVM mimics that behavior when emulating instructions
4237+
to avoid pointless divergence from hardware. E.g. RDPID reads MSR_TSC_AUX,
4238+
SYSENTER reads the SYSENTER MSRs, etc.
4239+
4240+
MSRs that are loaded/stored via dedicated VMCS fields are not filtered as
4241+
part of VM-Enter/VM-Exit emulation.
4242+
4243+
MSRs that are loaded/store via VMX's load/store lists _are_ filtered as part
4244+
of VM-Enter/VM-Exit emulation. If an MSR access is denied on VM-Enter, KVM
4245+
synthesizes a consistency check VM-Exit(EXIT_REASON_MSR_LOAD_FAIL). If an
4246+
MSR access is denied on VM-Exit, KVM synthesizes a VM-Abort. In short, KVM
4247+
extends Intel's architectural list of MSRs that cannot be loaded/saved via
4248+
the VM-Enter/VM-Exit MSR list. It is platform owner's responsibility to
4249+
to communicate any such restrictions to their end users.
42354250

42364251
x2APIC MSR accesses cannot be filtered (KVM silently ignores filters that
42374252
cover any x2APIC MSRs).
@@ -8082,6 +8097,14 @@ KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS By default, KVM emulates MONITOR/MWAIT (if
80828097
guest CPUID on writes to MISC_ENABLE if
80838098
KVM_X86_QUIRK_MISC_ENABLE_NO_MWAIT is
80848099
disabled.
8100+
8101+
KVM_X86_QUIRK_SLOT_ZAP_ALL By default, KVM invalidates all SPTEs in
8102+
fast way for memslot deletion when VM type
8103+
is KVM_X86_DEFAULT_VM.
8104+
When this quirk is disabled or when VM type
8105+
is other than KVM_X86_DEFAULT_VM, KVM zaps
8106+
only leaf SPTEs that are within the range of
8107+
the memslot being deleted.
80858108
=================================== ============================================
80868109

80878110
7.32 KVM_CAP_MAX_VCPU_ID

Documentation/virt/kvm/locking.rst

+24-8
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ The acquisition orders for mutexes are as follows:
1111

1212
- cpus_read_lock() is taken outside kvm_lock
1313

14+
- kvm_usage_lock is taken outside cpus_read_lock()
15+
1416
- kvm->lock is taken outside vcpu->mutex
1517

1618
- kvm->lock is taken outside kvm->slots_lock and kvm->irq_lock
@@ -24,6 +26,13 @@ The acquisition orders for mutexes are as follows:
2426
are taken on the waiting side when modifying memslots, so MMU notifiers
2527
must not take either kvm->slots_lock or kvm->slots_arch_lock.
2628

29+
cpus_read_lock() vs kvm_lock:
30+
31+
- Taking cpus_read_lock() outside of kvm_lock is problematic, despite that
32+
being the official ordering, as it is quite easy to unknowingly trigger
33+
cpus_read_lock() while holding kvm_lock. Use caution when walking vm_list,
34+
e.g. avoid complex operations when possible.
35+
2736
For SRCU:
2837

2938
- ``synchronize_srcu(&kvm->srcu)`` is called inside critical sections
@@ -227,10 +236,16 @@ time it will be set using the Dirty tracking mechanism described above.
227236
:Type: mutex
228237
:Arch: any
229238
:Protects: - vm_list
230-
- kvm_usage_count
239+
240+
``kvm_usage_lock``
241+
^^^^^^^^^^^^^^^^^^
242+
243+
:Type: mutex
244+
:Arch: any
245+
:Protects: - kvm_usage_count
231246
- hardware virtualization enable/disable
232-
:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
233-
enable/disable.
247+
:Comment: Exists to allow taking cpus_read_lock() while kvm_usage_count is
248+
protected, which simplifies the virtualization enabling logic.
234249

235250
``kvm->mn_invalidate_lock``
236251
^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -290,11 +305,12 @@ time it will be set using the Dirty tracking mechanism described above.
290305
wakeup.
291306

292307
``vendor_module_lock``
293-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
308+
^^^^^^^^^^^^^^^^^^^^^^
294309
:Type: mutex
295310
:Arch: x86
296311
:Protects: loading a vendor module (kvm_amd or kvm_intel)
297-
:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
298-
taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
299-
many operations need to take cpu_hotplug_lock when loading a vendor module,
300-
e.g. updating static calls.
312+
:Comment: Exists because using kvm_lock leads to deadlock. kvm_lock is taken
313+
in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while
314+
cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many
315+
operations need to take cpu_hotplug_lock when loading a vendor module, e.g.
316+
updating static calls.

arch/arm64/kvm/arm.c

+3-3
Original file line numberDiff line numberDiff line change
@@ -2164,7 +2164,7 @@ static void cpu_hyp_uninit(void *discard)
21642164
}
21652165
}
21662166

2167-
int kvm_arch_hardware_enable(void)
2167+
int kvm_arch_enable_virtualization_cpu(void)
21682168
{
21692169
/*
21702170
* Most calls to this function are made with migration
@@ -2184,7 +2184,7 @@ int kvm_arch_hardware_enable(void)
21842184
return 0;
21852185
}
21862186

2187-
void kvm_arch_hardware_disable(void)
2187+
void kvm_arch_disable_virtualization_cpu(void)
21882188
{
21892189
kvm_timer_cpu_down();
21902190
kvm_vgic_cpu_down();
@@ -2380,7 +2380,7 @@ static int __init do_pkvm_init(u32 hyp_va_bits)
23802380

23812381
/*
23822382
* The stub hypercalls are now disabled, so set our local flag to
2383-
* prevent a later re-init attempt in kvm_arch_hardware_enable().
2383+
* prevent a later re-init attempt in kvm_arch_enable_virtualization_cpu().
23842384
*/
23852385
__this_cpu_write(kvm_hyp_initialized, 1);
23862386
preempt_enable();

arch/loongarch/kvm/main.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -261,7 +261,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
261261
return -ENOIOCTLCMD;
262262
}
263263

264-
int kvm_arch_hardware_enable(void)
264+
int kvm_arch_enable_virtualization_cpu(void)
265265
{
266266
unsigned long env, gcfg = 0;
267267

@@ -300,7 +300,7 @@ int kvm_arch_hardware_enable(void)
300300
return 0;
301301
}
302302

303-
void kvm_arch_hardware_disable(void)
303+
void kvm_arch_disable_virtualization_cpu(void)
304304
{
305305
write_csr_gcfg(0);
306306
write_csr_gstat(0);

arch/mips/include/asm/kvm_host.h

+2-2
Original file line numberDiff line numberDiff line change
@@ -728,8 +728,8 @@ struct kvm_mips_callbacks {
728728
int (*handle_fpe)(struct kvm_vcpu *vcpu);
729729
int (*handle_msa_disabled)(struct kvm_vcpu *vcpu);
730730
int (*handle_guest_exit)(struct kvm_vcpu *vcpu);
731-
int (*hardware_enable)(void);
732-
void (*hardware_disable)(void);
731+
int (*enable_virtualization_cpu)(void);
732+
void (*disable_virtualization_cpu)(void);
733733
int (*check_extension)(struct kvm *kvm, long ext);
734734
int (*vcpu_init)(struct kvm_vcpu *vcpu);
735735
void (*vcpu_uninit)(struct kvm_vcpu *vcpu);

arch/mips/kvm/mips.c

+4-4
Original file line numberDiff line numberDiff line change
@@ -125,14 +125,14 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
125125
return 1;
126126
}
127127

128-
int kvm_arch_hardware_enable(void)
128+
int kvm_arch_enable_virtualization_cpu(void)
129129
{
130-
return kvm_mips_callbacks->hardware_enable();
130+
return kvm_mips_callbacks->enable_virtualization_cpu();
131131
}
132132

133-
void kvm_arch_hardware_disable(void)
133+
void kvm_arch_disable_virtualization_cpu(void)
134134
{
135-
kvm_mips_callbacks->hardware_disable();
135+
kvm_mips_callbacks->disable_virtualization_cpu();
136136
}
137137

138138
int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)

arch/mips/kvm/vz.c

+4-4
Original file line numberDiff line numberDiff line change
@@ -2869,7 +2869,7 @@ static unsigned int kvm_vz_resize_guest_vtlb(unsigned int size)
28692869
return ret + 1;
28702870
}
28712871

2872-
static int kvm_vz_hardware_enable(void)
2872+
static int kvm_vz_enable_virtualization_cpu(void)
28732873
{
28742874
unsigned int mmu_size, guest_mmu_size, ftlb_size;
28752875
u64 guest_cvmctl, cvmvmconfig;
@@ -2983,7 +2983,7 @@ static int kvm_vz_hardware_enable(void)
29832983
return 0;
29842984
}
29852985

2986-
static void kvm_vz_hardware_disable(void)
2986+
static void kvm_vz_disable_virtualization_cpu(void)
29872987
{
29882988
u64 cvmvmconfig;
29892989
unsigned int mmu_size;
@@ -3280,8 +3280,8 @@ static struct kvm_mips_callbacks kvm_vz_callbacks = {
32803280
.handle_msa_disabled = kvm_trap_vz_handle_msa_disabled,
32813281
.handle_guest_exit = kvm_trap_vz_handle_guest_exit,
32823282

3283-
.hardware_enable = kvm_vz_hardware_enable,
3284-
.hardware_disable = kvm_vz_hardware_disable,
3283+
.enable_virtualization_cpu = kvm_vz_enable_virtualization_cpu,
3284+
.disable_virtualization_cpu = kvm_vz_disable_virtualization_cpu,
32853285
.check_extension = kvm_vz_check_extension,
32863286
.vcpu_init = kvm_vz_vcpu_init,
32873287
.vcpu_uninit = kvm_vz_vcpu_uninit,

arch/riscv/kvm/main.c

+2-2
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ long kvm_arch_dev_ioctl(struct file *filp,
2020
return -EINVAL;
2121
}
2222

23-
int kvm_arch_hardware_enable(void)
23+
int kvm_arch_enable_virtualization_cpu(void)
2424
{
2525
csr_write(CSR_HEDELEG, KVM_HEDELEG_DEFAULT);
2626
csr_write(CSR_HIDELEG, KVM_HIDELEG_DEFAULT);
@@ -35,7 +35,7 @@ int kvm_arch_hardware_enable(void)
3535
return 0;
3636
}
3737

38-
void kvm_arch_hardware_disable(void)
38+
void kvm_arch_disable_virtualization_cpu(void)
3939
{
4040
kvm_riscv_aia_disable();
4141

arch/s390/configs/debug_defconfig

+1
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ CONFIG_CMM=m
5959
CONFIG_APPLDATA_BASE=y
6060
CONFIG_S390_HYPFS_FS=y
6161
CONFIG_KVM=m
62+
CONFIG_KVM_S390_UCONTROL=y
6263
CONFIG_S390_UNWIND_SELFTEST=m
6364
CONFIG_S390_KPROBES_SANITY_TEST=m
6465
CONFIG_S390_MODULES_SANITY_TEST=m

arch/s390/kvm/kvm-s390.c

+18-9
Original file line numberDiff line numberDiff line change
@@ -348,20 +348,29 @@ static inline int plo_test_bit(unsigned char nr)
348348
return cc == 0;
349349
}
350350

351-
static __always_inline void __insn32_query(unsigned int opcode, u8 *query)
351+
static __always_inline void __sortl_query(u8 (*query)[32])
352352
{
353353
asm volatile(
354354
" lghi 0,0\n"
355-
" lgr 1,%[query]\n"
355+
" la 1,%[query]\n"
356356
/* Parameter registers are ignored */
357-
" .insn rrf,%[opc] << 16,2,4,6,0\n"
357+
" .insn rre,0xb9380000,2,4\n"
358+
: [query] "=R" (*query)
358359
:
359-
: [query] "d" ((unsigned long)query), [opc] "i" (opcode)
360-
: "cc", "memory", "0", "1");
360+
: "cc", "0", "1");
361361
}
362362

363-
#define INSN_SORTL 0xb938
364-
#define INSN_DFLTCC 0xb939
363+
static __always_inline void __dfltcc_query(u8 (*query)[32])
364+
{
365+
asm volatile(
366+
" lghi 0,0\n"
367+
" la 1,%[query]\n"
368+
/* Parameter registers are ignored */
369+
" .insn rrf,0xb9390000,2,4,6,0\n"
370+
: [query] "=R" (*query)
371+
:
372+
: "cc", "0", "1");
373+
}
365374

366375
static void __init kvm_s390_cpu_feat_init(void)
367376
{
@@ -415,10 +424,10 @@ static void __init kvm_s390_cpu_feat_init(void)
415424
kvm_s390_available_subfunc.kdsa);
416425

417426
if (test_facility(150)) /* SORTL */
418-
__insn32_query(INSN_SORTL, kvm_s390_available_subfunc.sortl);
427+
__sortl_query(&kvm_s390_available_subfunc.sortl);
419428

420429
if (test_facility(151)) /* DFLTCC */
421-
__insn32_query(INSN_DFLTCC, kvm_s390_available_subfunc.dfltcc);
430+
__dfltcc_query(&kvm_s390_available_subfunc.dfltcc);
422431

423432
if (MACHINE_HAS_ESOP)
424433
allow_cpu_feat(KVM_S390_VM_CPU_FEAT_ESOP);

arch/x86/include/asm/cpuid.h

+1
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,7 @@ static __always_inline bool cpuid_function_is_indexed(u32 function)
179179
case 0x1d:
180180
case 0x1e:
181181
case 0x1f:
182+
case 0x24:
182183
case 0x8000001d:
183184
return true;
184185
}

arch/x86/include/asm/kvm-x86-ops.h

+3-3
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,8 @@ BUILD_BUG_ON(1)
1414
* be __static_call_return0.
1515
*/
1616
KVM_X86_OP(check_processor_compatibility)
17-
KVM_X86_OP(hardware_enable)
18-
KVM_X86_OP(hardware_disable)
17+
KVM_X86_OP(enable_virtualization_cpu)
18+
KVM_X86_OP(disable_virtualization_cpu)
1919
KVM_X86_OP(hardware_unsetup)
2020
KVM_X86_OP(has_emulated_msr)
2121
KVM_X86_OP(vcpu_after_set_cpuid)
@@ -125,7 +125,7 @@ KVM_X86_OP_OPTIONAL(mem_enc_unregister_region)
125125
KVM_X86_OP_OPTIONAL(vm_copy_enc_context_from)
126126
KVM_X86_OP_OPTIONAL(vm_move_enc_context_from)
127127
KVM_X86_OP_OPTIONAL(guest_memory_reclaimed)
128-
KVM_X86_OP(get_msr_feature)
128+
KVM_X86_OP(get_feature_msr)
129129
KVM_X86_OP(check_emulate_instruction)
130130
KVM_X86_OP(apic_init_signal_blocked)
131131
KVM_X86_OP_OPTIONAL(enable_l2_tlb_flush)

0 commit comments

Comments
 (0)