Skip to content

storage: storvsc user-mode implementation #1256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 1, 2025
Merged

Conversation

eric135
Copy link
Contributor

@eric135 eric135 commented Apr 28, 2025

This is an implementation of storvsc (client complement to storvsp) for SCSI storage over VMBus.

#273

@eric135 eric135 marked this pull request as ready for review May 6, 2025 22:23
@eric135 eric135 requested review from a team as code owners May 6, 2025 22:23
struct StorvscInner {
next_transaction_id: AtomicU64,
new_request_receiver: Receiver<StorvscRequest>,
transactions: Mutex<HashMap<u64, PendingOperation>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use a slab::Slab<PendingOperation> instead of a HashMap. The slab will give you a usize key which you can use as the transaction ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see from the Slab docs that it can reuse keys for removed items. Could there be any issues with reused, non-sequential keys on the storvsp side?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a protocol violation--storvsp can only send a completion for a transaction once.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does slab still work for this if it can reuse keys, given that it can result in duplicate transaction IDs?

@jstarks
Copy link
Member

jstarks commented May 12, 2025

Looking good!

@eric135
Copy link
Contributor Author

eric135 commented May 28, 2025

Pinging this issue because it still needs reviews

Copy link

At least one Petri test failed.

Comment on lines 845 to 868
async fn test_request_response(driver: DefaultDriver) {
let (guest, host) = connected_async_channels(16 * 1024);
let host_queue = Queue::new(host).unwrap();
let test_guest_mem = GuestMemory::allocate(16384);

let storvsp = TestStorvspWorker::start(
driver.clone(),
test_guest_mem.clone(),
host_queue,
Vec::new(),
);
let mut storvsc = TestStorvscWorker::new();
storvsc.start(driver.clone(), guest);

let mut timer = PolledTimer::new(&driver);
timer.sleep(time::Duration::from_secs(1)).await;

storvsc.stop().await;
assert!(storvsc.get_mut().has_negotiated);
storvsc.resume().await;

// Send SCSI write request
let write_buf = [7u8; 4096];
test_guest_mem.write_at(4096, &write_buf).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like the same code that is in storage_tests/tests/storvsc.rs. What's the difference between them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. These tests don't have a "host" side controller or disk, whereas the ones in storage_tests do. I'm not sure what we get by having the tests with a backend disk. Presumably it sets us up for some more interesting tests in the future? Was that your intention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests here are unit tests, which the ones in storage_test are integration tests. I think it is important to test the component in isolation, as well as with a validated storvsp to ensure proper protocol operations.

Comment on lines 590 to 600
let (payload_bytes, padding_bytes) = if len > storvsp_protocol::SCSI_REQUEST_LEN_MAX {
(
&payload[..storvsp_protocol::SCSI_REQUEST_LEN_MAX - size_of_val(&header)],
&[][..],
)
} else {
(
payload,
&padding[..storvsp_protocol::SCSI_REQUEST_LEN_MAX - len],
)
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have expected us to just panic if the vmbus packet is bigger than what we expect for our protocol, since we're truncating it anyways in a way that is potentially semantically unsafe. But I imagine there are some quirks in the vstor protocol ( :) ), so what quirks are we hitting here that require the code to hande this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The storvsp implementation also does this unusual truncation:

// Zero pad or truncate the payload to the queue's packet size. This is
// necessary because Windows guests check that each packet's size is
// exactly the largest possible packet size for the negotiated protocol
// version.
let len = size_of_val(&header) + size_of_val(payload);
let padding = [0; storvsp_protocol::SCSI_REQUEST_LEN_MAX];
let (payload_bytes, padding_bytes) = if len > packet_size {
(&payload[..packet_size - size_of_val(&header)], &[][..])
} else {
(payload, &padding[..packet_size - len])
};
assert_eq!(
size_of_val(&header) + payload_bytes.len() + padding_bytes.len(),
packet_size
);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I don't think the storvsp behavior is sufficient justification. They're different ends of the pipe, and storvsp has to make sure that it is compatible with various guests.

I don't think we should truncate without understanding it is safe.
Furthermore, the padding is only necessary if storvsp (in either the openvmm or Hyper-V implementations) require it for packets incoming to storvsp.

Silently truncating packets seems like a recipe for difficult debugging later. I'd rather panic here and then know we need to re-add this code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked over this code multiple times, including the comments in storvsp. The comments linked from storvsp talk about padding or truncating the payload specifically, but it seems to also consider the length of the header when padding/truncating. The size of a SCSI request (the payload) is 52 bytes, which is the length limited to in storvsp and here. Therefore, I believe the intention is to pad/truncate the payload only. I have modified the code here in storvsc (not in storvsp) to remove truncation (to avoid any hidden issues) and to pad only the payload. It is worth revisiting the code in storvsp to see if this is an error there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I think that's reasonable. Please ensure that this is covered in a test case, and then I'm happy to consider this resolved.

@mattkur
Copy link
Contributor

mattkur commented Jun 12, 2025

Overall this is looking quite good. I've left some suggestions and asked some questions to help me understand both your thinking and the design here. Thanks for working on this!

I have ideas, but it is unclear to me how and where we plug in this storvsc_driver (this the TL;DR of my comment here: #1256 (comment)). What are your thoughts?

mattkur
mattkur previously approved these changes Jun 24, 2025
Copy link
Contributor

@mattkur mattkur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some questions and comments that I'd like you to address, but that can happen in a follow-up PR. Please review my comments before completing. Thanks for the hard work to get this in, Eric!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been mulling this over based on other comments that you made. I now understand the value here. I do think there are changes that you should make so that this is more useful to our testing:

  • there are several cases where this code logs an error or warning due to unexpected state. Those should all be assert or panic calls, so that we catch if storvsc is misbehaving instantly (rather than tracing a warning and returning an error packet back to storvsp).
  • the fake storvsp here does not provide configurability to test interesting cases (at least I don't think it does). For example: testing various subsets of interesting versions.

}

#[derive(Debug)]
#[allow(dead_code)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is dead_code here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these fields are necessary because they are in the packet structure, but we don't actually end up using most of them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I don't see any value in keeping dead code around. This packet is not a byte representation of anything, so we don't need the padding.

However, I think it could be interesting to log the details when we see something unexpected, like in this case:

        // Wait for completion
        let completion = match self.next_packet(&mut reader).await? {
            Packet::Completion(packet) => Ok(packet),
            Packet::Data(_) => Err(StorvscError(StorvscErrorInner::PacketError(
                PacketError::InvalidPacketType,
            ))),
        }?;

No need to make a change to this PR, we can do this as a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These fields will also likely be used if we add support for data packets that actually do something (and are not no-ops like EnumerateBus)

@eric135 eric135 enabled auto-merge (squash) June 26, 2025 17:19
@benhillis benhillis disabled auto-merge June 27, 2025 16:57
@mattkur mattkur requested a review from Copilot June 27, 2025 19:49
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements the storvsc user‐mode driver (the client counterpart to storvsp) with updated feature flag usage and enhanced testing support. Key changes include renaming the "fuzz_helpers" feature to "test" in multiple Cargo.toml files and source code, adding new test helper functions for integration tests, and updating integration tests for storvsc and storvsp functionality.

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
vm/devices/storage/storvsp_protocol/Cargo.toml Removed the public "fuzz_helpers" array in favor of the new "test" feature
vm/devices/storage/storvsp/src/test_helpers.rs Updated feature flag from "fuzz_helpers" to "test" and added a new teardown_or_panic method
vm/devices/storage/storvsp/src/lib.rs Replaced conditional compilation from "fuzz_helpers" to "test" for the test_helpers module
vm/devices/storage/storvsp/fuzz/Cargo.toml Changed dependency feature list from "fuzz_helpers" to "test" for storvsp
vm/devices/storage/storvsc_driver/* Added a new test_helpers module and updated integration test helpers with corresponding feature flag changes
vm/devices/storage/storage_tests/tests/{storvsc.rs, tests.rs} Added and updated integration tests exercising storvsp and storvsc workflows
Comments suppressed due to low confidence (2)

vm/devices/storage/storvsp_protocol/Cargo.toml:10

  • Removing the fuzz_helpers field in favor of the new 'test' feature is fine, but please ensure that downstream consumers are updated to refer to 'test' consistently in their documentation and configuration.
# Enable generating arbitrary values of types useful for fuzzing.

vm/devices/storage/storvsp/src/test_helpers.rs:52

  • [nitpick] The conversion from 'fuzz_helpers' to 'test' improves clarity but the name 'test' can be ambiguous because of the built-in #[cfg(test)] attribute; consider a more descriptive feature name such as 'integration_tests' if applicable.
    #[cfg(feature = "test")]

Comment on lines +304 to +338
self.inner
.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::BEGIN_INITIALIZATION,
1,
&(),
)
.await?;

// Step 2: QUERY_PROTOCOL_VERSION - request latest version
self.inner
.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::QUERY_PROTOCOL_VERSION,
2,
&self.version,
)
.await
.map_err(|err| match err {
StorvscError(StorvscErrorInner::PacketError(PacketError::UnexpectedStatus(
storvsp_protocol::NtStatus::INVALID_DEVICE_STATE,
))) => StorvscError(StorvscErrorInner::UnsupportedProtocolVersion),
_ => err,
})?;

// Step 3: QUERY_PROPERTIES
let properties_packet = self
.inner
.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::QUERY_PROPERTIES,
3,
&(),
)
.await?;
Copy link
Preview

Copilot AI Jun 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The current pattern of sending the request and immediately propagating Ok(()) using '?' could be refactored to directly propagate the result of the send operation to enhance clarity and maintainability.

Suggested change
self.inner
.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::BEGIN_INITIALIZATION,
1,
&(),
)
.await?;
// Step 2: QUERY_PROTOCOL_VERSION - request latest version
self.inner
.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::QUERY_PROTOCOL_VERSION,
2,
&self.version,
)
.await
.map_err(|err| match err {
StorvscError(StorvscErrorInner::PacketError(PacketError::UnexpectedStatus(
storvsp_protocol::NtStatus::INVALID_DEVICE_STATE,
))) => StorvscError(StorvscErrorInner::UnsupportedProtocolVersion),
_ => err,
})?;
// Step 3: QUERY_PROPERTIES
let properties_packet = self
.inner
.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::QUERY_PROPERTIES,
3,
&(),
)
.await?;
self.inner.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::BEGIN_INITIALIZATION,
1,
&(),
).await?;
// Step 2: QUERY_PROTOCOL_VERSION - request latest version
self.inner.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::QUERY_PROTOCOL_VERSION,
2,
&self.version,
).await.map_err(|err| match err {
StorvscError(StorvscErrorInner::PacketError(PacketError::UnexpectedStatus(
storvsp_protocol::NtStatus::INVALID_DEVICE_STATE,
))) => StorvscError(StorvscErrorInner::UnsupportedProtocolVersion),
_ => err,
})?;
// Step 3: QUERY_PROPERTIES
let properties_packet = self.inner.send_packet_and_expect_completion(
&mut self.queue,
storvsp_protocol::Operation::QUERY_PROPERTIES,
3,
&(),
).await?;

Copilot uses AI. Check for mistakes.

Copy link
Member

@chris-oo chris-oo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i reviewed only the dependency changes

@eric135 eric135 merged commit 07faf46 into microsoft:main Jul 1, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants