Skip to content

feat: add AMD SEV-SNP attestation support#1

Draft
clawdbot-glitch003 wants to merge 10 commits into
masterfrom
feat/amd-sev-snp-conversion
Draft

feat: add AMD SEV-SNP attestation support#1
clawdbot-glitch003 wants to merge 10 commits into
masterfrom
feat/amd-sev-snp-conversion

Conversation

@clawdbot-glitch003
Copy link
Copy Markdown
Owner

@clawdbot-glitch003 clawdbot-glitch003 commented May 26, 2026

Summary

  • add AMD SEV-SNP as a dstack attestation mode and v1 platform evidence variant
  • collect SNP reports from configfs TSM or /dev/sev-guest inside the guest
  • add VMM platform = "auto" | "tdx" | "amd-sev-snp" selection and QEMU SEV-SNP launch args
  • keep KMS/verifier exhaustive over the new mode while leaving full AMD cert-chain policy verification as follow-up work

Proof / validation

  • cargo fmt --all
  • cargo test -p dstack-attest --all-features
  • cargo test -p dstack-vmm --all-features
  • cargo check --workspace --all-features
  • remote AMD SEV-SNP smoke test on chris@173.234.27.162 booted an SNP guest and generated a hardware report via configfs TSM:
Memory Encryption Features active: AMD SEV SEV-ES SEV-SNP
SEV: SNP running at VMPL0.
sev-guest sev-guest: Initialized SEV guest driver (using vmpck_id 0)
DSTACK_SEV_SNP_ATTESTATION_PROOF_BEGIN
source=configfs-tsm
report_size=1184
expected_report_data=a0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedf
report_data_offset=80
report_contains_expected_report_data=true
DSTACK_SEV_SNP_ATTESTATION_PROOF_END

Note: full proof log is saved locally at /home/chris/sev-snp-dstack-conversion/amd-sev-snp-attestation-proof.log; plan/status notes are in /home/chris/sev-snp-dstack-conversion/plan.md.

Closes Dstack-TEE#443.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Hardening follow-up after reviewing:

  • Decentriq: "Swiss cheese to cheddar: securing AMD SEV-SNP early boot"
  • Trail of Bits: "What we learned about TEE security from auditing WhatsApp's Private Inference"

Applied in commit 5a60fc78:

  • SNP report collection now fails closed if the returned hardware report does not contain the requested 64-byte report_data challenge at the SNP report-data field (0x50..0x90).
  • v1 with_report_data() now patches raw SEV-SNP report evidence as well as stack report data, matching the existing TDX rebinding behavior and avoiding stale/non-fresh SNP evidence.
  • Added negative/positive unit tests for SNP report-data mismatch and v1 SNP report-data patching.

Validation rerun:

cargo fmt --all
cargo test -p dstack-attest --all-features
cargo check --workspace --all-features

I also added a local hardening review at /home/chris/sev-snp-dstack-conversion/hardening-review.md and updated the plan. Bottom line from that review: this PR is good first functional enablement, but should stay draft/not production key-release-ready until full AMD VCEK/TCB verification, early-boot input hardening (ACPI/OVMF/q35 or minimal measured firmware), and SNP-specific app config/env binding are implemented.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Ran a 3-subagent adversarial review pass (security, code-correctness, integration) over origin/master...HEAD. Full local review artifact: /home/chris/sev-snp-dstack-conversion/adversarial-review.md.

Consolidated highest-priority findings:

  1. BLOCKER: SNP reports are generated but not cryptographically verified. Need AMD report sig verification, ARK/ASK/VCEK chain, VCEK TCB extension enforcement, policy/debug/migration/SMT checks, and launch measurement validation before production KMS/key release.
  2. BLOCKER: SNP app/runtime config is not bound for KMS auth. TDX has RTMR/event-log composition; SNP currently does not. Do not use os_image_hash as a compose/app identity substitute.
  3. HIGH/BLOCKER: Early boot / OVMF / ACPI / launch measurement hardening missing. kernel-hashes=on helps but is insufficient without recomputing/verifying the SNP measurement and handling ACPI/fw_cfg/OVMF inputs.
  4. CRITICAL compatibility: SCALE enum discriminants may have shifted. DstackAmdSevSnp was inserted before existing variants in AttestationQuote / AttestationMode, which can break v0 GCP/Nitro compatibility. Likely fix: append new variants or explicitly version/migrate.
  5. BLOCKER/HIGH correctness: configfs TSM detection can misclassify TDX as SNP. /sys/kernel/config/tsm/report is generic; must check provider is sev-guest, and prefer /dev/tdx_guest when present.
  6. HIGH: VMM auto-selects SNP from CPU flag only. Should be explicit/experimental or validate /dev/sev, KVM caps, QEMU object support, firmware, and host parameters.
  7. HIGH: QEMU SNP args hard-code host-specific/security-sensitive values (cbitpos=51, reduced-phys-bits=1, policy=0x30000). Need detection/config + named policy bits enforced by verifier.
  8. HIGH correctness: SNP memory backend size can diverge from final -m after hugepage/NUMA rounding. Compute final mem first and pass it to SNP machine config.

Mediums included: with_report_data() mutates signed SNP report bytes (should be simulator/test-only), configfs report dirs leak/pid-collide, ioctl error details/busy retry missing, ABI layout assertions missing, legacy raw quote API returns empty quote/event_log for SNP.

Recommended immediate fixes on this draft PR: SCALE enum compatibility + TSM provider detection + explicit experimental SNP gating. Keep the PR draft until real SNP verifier/app-binding work lands.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Pushed the first adversarial-review fix batch: 64b0c8b2b961bb79ba9f4c9448a5b7aefe327cfd.

Fixed in this batch:

  • Preserved legacy SCALE wire compatibility by moving new SEV-SNP variants after existing TDX/GCP/Nitro variants, with discriminant regression tests.
  • Fixed configfs TSM detection so generic /sys/kernel/config/tsm/report no longer implies SNP; it now requires an SEV-SNP provider (sev_guest / sev-guest) and prefers TDX when /dev/tdx_guest is present.
  • Made VMM platform = "auto" conservative while SNP is still experimental; SNP launch now requires explicit platform = "amd-sev-snp".
  • Fixed the SEV-SNP memory-backend-memfd size to use the final rounded memory size after hugepage/NUMA adjustment instead of the original manifest value.

Validation:

cargo fmt --all
cargo test -p dstack-attest --all-features
cargo test -p dstack-vmm --all-features
cargo check --workspace --all-features

Also ran an independent subagent review of this fix batch. It flagged the first provider-detection revision for likely rejecting real sev_guest; I corrected that before this push.

Still intentionally draft/blocked for production until SNP cryptographic verification + app/config binding are implemented.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Pushed the first SEV-SNP verifier-core slice: 375bc9bdff777ace9458a10e8a8529272a66b515.

What changed:

  • Added dstack-attest/src/amd_sev_snp.rs using the sev crate.
  • Added AMD SEV-SNP report signature verification for the narrow first slice:
    • parses a raw 1184-byte SNP report;
    • verifies hardcoded Genoa ARK -> supplied ASK -> supplied VCEK;
    • verifies the report signature using VCEK;
    • extracts measurement, report_data, and chip_id;
    • enforces verified report report_data == stack.report_data.
  • Wired PlatformEvidence::SevSnp { report, cert_chain } into AttestationV1::verify_with_time() instead of unconditionally rejecting it.
  • Kept key-release/app authorization fail-closed: decode_app_info_ex() and verifier OS-image/app checks still reject SNP until measurement/app identity binding is implemented.

Validation:

cargo fmt --all
cargo test -p dstack-attest --all-features
cargo check --workspace --all-features

Independent review note: this is intentionally not “production-ready full SNP attestation” yet. It is the first cryptographic verifier core. Remaining blockers:

  • normalize real configfs/ioctl cert blobs into ASK/VCEK inputs (cert_chain currently expects [ASK PEM, VCEK PEM]);
  • enforce AMD TCB/VCEK extension policy, revocation/advisories, and non-Genoa roots/products;
  • bind verified SNP measurement to OS/app/compose/config identity before KMS key release.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Update: continued the production-attestation hardening path.

New commit: 5b3fc3858ca4ffcdf3fc3b58e5bc9d1a2c218bbb

What changed:

  • Normalized SEV-SNP collateral so verifier accepts:
    • existing two-item ASK/VCEK PEM or DER chains; and
    • kernel certificate-table auxblobs from configfs/extended ioctl.
  • Changed /dev/sev-guest collection to use the sev crate extended-report ioctl path so generated evidence can include certificate collateral when provided by the kernel/VMM.
  • Replaced unsafe cert-table parsing with a bounds-checked parser after review found the sev helper can panic on malformed attacker-supplied auxblob input.
  • Added fail-closed SNP report policy checks after signature verification:
    • report version 2/3 only;
    • VMPL0 only;
    • debug disabled;
    • migration-agent disabled;
    • VCEK signing only;
    • unmasked chip key;
    • basic SMT/RAPL/ciphertext-hiding policy/platform consistency.

Validation:

cargo fmt --all
cargo test -p dstack-attest --all-features
cargo check --workspace --all-features
git diff --check

This still intentionally keeps the PR draft. Next blocker is SNP measurement/app binding before any KMS key release path should be enabled.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Progress update: added the next fail-closed KMS measurement/app-binding slice without enabling AMD key release.

What changed:

  • added kms/src/main_service/amd_attest.rs with a pure SNP binding helper;
  • added optional KMS sev_snp config (ovmf_path, guest_features), but helper requires config for SNP validation;
  • helper rejects missing config, malformed/wrong-length app/compose/rootfs/kernel/initrd/OVMF hashes, zero vCPU count, missing vCPU type, missing OVMF metadata when no OVMF path exists, unsafe guest feature config, malformed OVMF section metadata, and missing/mismatched trusted expected measurement;
  • kept AMD key release disabled: no get_app_key_amd RPC/path was added and SNP app-info decode remains fail-closed.

Validation:

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features
  • cargo check --workspace --all-features
  • git diff --check

Important caveat before any future key release: trusted_expected_measurement must be produced by in-KMS/verifier recomputation from validated inputs + trusted config, never copied from untrusted caller input. The next slice should port/adapt the PR Dstack-TEE#630 pure measurement recomputation helpers and then design the SNP-specific BootInfo binding path.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Progress update: ported the pure AMD SEV-SNP launch measurement recomputation helper into KMS, still with no AMD key release enabled.

What changed:

  • kms/src/main_service/amd_attest.rs now recomputes expected SNP MEASUREMENT internally from validated launch inputs/config instead of accepting any caller-supplied expected measurement.
  • Ported/adapted the pure pieces from PR feat: AMD SEV-SNP app key provisioning (GetAppKeyAmd) Dstack-TEE/dstack#630:
    • GCTX launch digest updates
    • SEV hashes table page construction
    • OVMF footer/SEV metadata parsing
    • QEMU VMSA page construction
    • vCPU type mapping
    • metadata section replay and final measurement comparison
  • Hardened validation:
    • missing/invalid hashes fail closed
    • all-zero app id rejected
    • vCPU count capped
    • OVMF section count/page count capped
    • request-provided ovmf_hash cannot override configured ovmf_path mode
    • missing SNP_KERNEL_HASHES section rejects
    • measurement mismatch rejects
  • Added deterministic tests, including a synthetic golden measurement vector to catch accidental drift.

Validation:

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features
  • cargo check --workspace --all-features
  • git diff --check

Still intentionally not enabled:

  • no GetAppKeyAmd
  • no AMD RPC/proto changes
  • no SNP key release path
  • decode_app_info_ex() remains fail-closed for SNP

Remaining blocker before release: SNP still needs a dedicated BootInfo/authorization construction path. Current QEMU SNP launch measurement does not directly bind app_id, so app-id authorization must be modeled explicitly before app-key release is safe.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Progress update: added a helper-only AMD SEV-SNP BootInfo construction path, still with no SNP key release enabled.

What changed:

  • Added build_amd_snp_boot_info(...) in kms/src/main_service/amd_attest.rs.
  • The helper first validates/recomputes SNP launch measurement, then builds deterministic auth-facing BootInfo fields:
    • attestation_mode = DstackAmdSevSnp
    • mr_aggregated = verified 48-byte SNP launch MEASUREMENT
    • device_id = verified 64-byte SNP chip_id
    • app_id / compose_hash decoded from launch/auth input
    • os_image_hash = rootfs_hash, because rootfs hash is carried in the measured SNP kernel cmdline model
    • mr_system, key_provider_info, and instance_id as domain-separated SHA-256 digests over SNP-specific launch/auth inputs
  • Documented the important semantic split: current QEMU SNP launch measurement does not directly hardware-bind app_id; the helper treats app_id as authorization input and keeps production release gated behind a future SNP auth policy.
  • Kept tcb_status = "snp-verified-basic-policy" rather than "UpToDate", so current auth flows remain fail-closed unless explicitly updated for SNP.

Tests:

  • BootInfo builds only for matching recomputed measurement.
  • Changing app_id changes auth-binding fields without changing launch measurement.
  • Changing measured inputs rejects stale measurement until recomputed.
  • chip_id maps to device_id and changes chip-bound digests.

Validation:

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features
  • cargo check --workspace --all-features
  • git diff --check

Still intentionally not enabled:

  • no GetAppKeyAmd
  • no AMD RPC/proto changes
  • no SNP app-key release path
  • decode_app_info_ex() still rejects SNP for current key release paths

Next remaining blocker: add real OVMF/kernel/initrd golden-vector proof from sev-snp-measure or the remote SEV-SNP host, then finalize the explicit SNP authorization/TCB policy before wiring any release path.

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Progress update: added the SEV-SNP measurement golden-vector/live proof check.

What changed:

  • Added ignored test main_service::amd_attest::tests::recomputation_matches_sev_snp_measure_live_golden_vector.
  • The test runs dstack's pure KMS SNP MEASUREMENT recomputation and compares it against /usr/local/bin/sev-snp-measure.
  • It uses the SNP-capable OVMF at /opt/AMDSEV/usr/local/share/qemu/OVMF.fd, deterministic kernel/initrd fixture bytes, EPYC-v4, vcpus=2, and guest_features=0x1.
  • Golden measurement locked in the test:
    859c646870cffdb4620077c20ea81702c1bd0bde9c967887ddbd430ebe31a89d2832a442b8d8d83e4bdd70b52bb3f009
  • Saved the local proof log outside the repo at:
    /home/chris/sev-snp-dstack-conversion/sev-snp-measure-golden-vector-proof.log

Validation passed:

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features recomputation_matches_sev_snp_measure_live_golden_vector -- --ignored --nocapture
  • cargo test -p dstack-kms --all-features
  • cargo check --workspace --all-features
  • git diff --check

Still intentionally not enabled:

  • no AMD key-release RPC/proto path
  • no GetAppKeyAmd
  • SNP authorization/release remains fail-closed until explicit SNP auth/TCB policy is finalized

@clawdbot-glitch003
Copy link
Copy Markdown
Owner Author

Progress update: added the explicit SNP authorization/TCB policy helper layer, still without wiring key release.

What changed:

  • Added helper-only AmdSnpAuthPolicy in kms/src/main_service/amd_attest.rs.
  • Added validate_amd_snp_auth_policy(...) to fail closed unless SNP BootInfo matches explicit allowlists for:
    • verified SNP launch MEASUREMENT
    • app id
    • compose hash
    • OS/rootfs hash
    • chip/device id
    • TCB status
    • advisory ids
  • Added shape checks for SNP BootInfo before any policy evaluation:
    • mode must be DstackAmdSevSnp
    • measurement/app/config/device/digest fields must have the expected lengths
    • blank TCB status rejects
  • Made BootInfo cloneable for safe policy materialization/testing.

Tests added:

  • exact verified SNP identity satisfies an exact policy
  • compose/app/config identity mismatches reject
  • non-SNP mode rejects
  • unexpected TCB status rejects
  • unallowlisted advisories reject
  • empty/partial allowlists reject

Validation passed:

  • cargo fmt --all
  • cargo test -p dstack-kms --all-features explicit_snp_auth_policy -- --nocapture
  • cargo test -p dstack-kms --all-features
  • cargo check --workspace --all-features
  • git diff --check

Still intentionally not enabled:

  • no AMD key-release RPC/proto path
  • no GetAppKeyAmd
  • no SNP app-key release path
  • this is still a reviewable policy scaffold, not production key release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Support for AMD SEV-SNP

1 participant