Add support for vng --blk-disk and appending disk I/O / topology options to either --*disk#389
Add support for vng --blk-disk and appending disk I/O / topology options to either --*disk#389intelfx wants to merge 6 commits into
vng --blk-disk and appending disk I/O / topology options to either --*disk#389Conversation
f0b6c10 to
c7f018f
Compare
marcosps
left a comment
There was a problem hiding this comment.
Seems good to me. (I saw that this commit is also present on your patchset to add blk-disk, but this one is easier merged here :) )
arighi
left a comment
There was a problem hiding this comment.
Looks good, thanks for working on this!!
Just one comment, it'd be nice to not change the behavior of --disk (BTW, with the new syntax how do we specify virtio-blk disks?).
Maybe adding a separate --scsi-disk would be nicer, because we don't break the current behavior, someone may currently use --disk expecting to create virtio-blk devices.
Like I wrote, I made
OK, so you want I did not do it this way because I thought that mapping A to B and B to C would be highly confusing, but I'll do whatever you decide on. |
Could we eventually use aliases not to change the behaviours? So having |
|
So, introduce and prominently document |
Ah alright, it makes sense as it is then, thanks for clarifying it. Also I think not many people are using --disk, so let's stick with this syntax, which seems more sane. Edit: I should read the whole thread before replying. This #389 (comment) would be the best, thanks! |
matttbe
left a comment
There was a problem hiding this comment.
(Explicitly marking this PR as "Request Changes", because it is currently marked as "Approve" while the CI is not happy and there are suggested changes.)
|
@intelfx here as well, thank you for this PR. Out-of-curiosity, do you still plan to work on it? |
428ff10 to
4d29c1a
Compare
4d29c1a to
5ee6208
Compare
matttbe
left a comment
There was a problem hiding this comment.
thank you for the new version. A few suggestions, but my main issue is with the last commit: can it be simpler?
| "warning: `--disk` is deprecated, use `--scsi-disk` instead.\n" | ||
| ) | ||
| args.scsi_disk.extend(args.disk) | ||
| args.disk = [] |
There was a problem hiding this comment.
Instead of having to deal with an extra variable, maybe easier to allow --scsi-disk to be used as an alias, e.g.
g.add_argument(
"--scsi-disk",
"--disk",
(...)
)Except if we do want to deprecate --disk?
There was a problem hiding this comment.
Instead of having to deal with an extra variable, maybe easier to allow --scsi-disk to be used as an alias
Yes, but that way I wouldn't have been able to throw a warning. My idea was to deprecate --disk as its meaning is three-way inconsistent (between virtme-run, vng, and vng's help). If the maintainers disagree, I can make it an alias as you suggested.
There was a problem hiding this comment.
vng has been introduced also to be able to change some default values from virtme-run. So as long as vng and the help menu is consistent (which is the case now), that's fine for me. But OK to wait to get feedback from others.
| "warning: `--disk` is deprecated, use `--blk-disk` instead.\n" | ||
| ) | ||
| args.blk_disk.extend(args.disk) | ||
| args.disk = [] |
There was a problem hiding this comment.
same here, we could have:
parser.add_argument(
"--blk-disk",
"--disk",
"-D",
(...)
(or --disk first with dest='blk_disk' if we prefer for the help menu)
| def scsi_device_id(name: str, max_len: int) -> str: | ||
| """ | ||
| Trim a longer string which may or may not be a path to fit within `max_len` | ||
| characters. |
There was a problem hiding this comment.
will it still be unique after this trimming? Or does it not need to be?
There was a problem hiding this comment.
- The path is trimmed from the beginning (i.e.
/long/path/foo/bar->foo/bar) to minimize the chances of that happening. - It is necessary to trim them somehow because otherwise QEMU will fail to start, this is a pre-existing problem with vng.
- If it nonetheless happens that the
device_ids are not unique, the/dev/disk/by-idsymlinks will clobber each other. I'm not aware of any other effects.
If you think that even with (1), the minuscule chance of (3) happening is unacceptable, I can do a dedicated pass to check for collisions and add a numerical disambiguator in that case.
There was a problem hiding this comment.
I have also just noticed that the SCSI serial itself is also limited to 36 characters, so I'm going to apply the same treatment there. The full SCSI serial isn't used for anything meaningful, so there's even less problems with that.
There was a problem hiding this comment.
If you think that even with (1), the minuscule chance of (3) happening is unacceptable, I can do a dedicated pass to check for collisions and add a numerical disambiguator in that case.
I guess it should be fine. I'm not sure how people are using this option: is it common to use it multiple times?
Also, do you know what the error message will be if the device IDs are not unique? If it is clear, no need to do anything else.
There was a problem hiding this comment.
I guess it should be fine. I'm not sure how people are using this option: is it common to use it multiple times?
No idea. It is very common for me to use it multiple times because I use vng to work on multi-disk filesystems, but I can't speak for anyone else.
Also, do you know what the error message will be if the device IDs are not unique? If it is clear, no need to do anything else.
There is no error message as is not an "error" per se (nothing requires these IDs be unique). It only causes the by-id symlinks to clobber each other, with no diagnostic emitted.
| "if=none", | ||
| f"id={driveid}", | ||
| f"file={disk.path}", | ||
| "format=raw", |
There was a problem hiding this comment.
is there a risk of breaking the behaviour here "just to avoid a warning"? If yes, maybe only set the format if specified (in the next commit)?
There was a problem hiding this comment.
It's not just a warning — QEMU restricts writing to sector 0 if format autodetection is used, which kinda makes the entire feature useless because then you cannot format or partition these disks. But yes, sure, someone may have been passing a preformatted qcow2 and relying on autodetection (and not relying on the ability to write to sector 0). How probable is that?
| help="Add a read/write virtio-scsi disk, available at " | ||
| + "/dev/disk/by-id/scsi-0virtme_disk_NAME. " | ||
| + "OPTs are comma-separated KEY[=VALUE] pairs; " | ||
| + "pass 'help' to list them.", |
There was a problem hiding this comment.
can we not refer to qemu help for this and simply pass extra values? This last commit is quite specific, long, and not easy to maintain. I would prefer to have a simpler version in virtme, and refer to qemu for additional specific attributes, no?
There was a problem hiding this comment.
can we not refer to qemu help for this and simply pass extra values?
No, because qemu's help is atrocious, these parameters are strewn across multiple QEMU devices and the naming convention is all over the place (some use dashes, some underscores, etc). As described in the commit message, I chose to use a naming convention reminiscent of lsblk --topology for these parameters as it is much cleaner.
There was a problem hiding this comment.
Mmh, I see your point, but I don't think any of us are going to proactively maintain this mapping :-/
There was a problem hiding this comment.
I don't think it is expected to change any time soon. Qualitatively new properties of disk devices aren't introduced very often. Or did I misunderstand your concern?
There was a problem hiding this comment.
Regarding "just" referring people to QEMU's help and passing these options 1:1, it took me a ≈full day of digging through QEMU help, mailing lists, and QEMU and Linux kernel source code to come up with a good mental model of how these parameters work and how they have to be passed. The point here was to save everyone else the effort.
(not to mention, again, that these parameters have to be arranged across multiple devices in the QEMU emulated device model, so it is technically not possible to blindly pass them through 1:1)
There was a problem hiding this comment.
I don't know, maintaining this in the code feels wrong, I would have added such useful info in the README instead. But let's see with @arighi
There was a problem hiding this comment.
The main problem with the "make it simpler" approach, like I said, is that it is physically impossible to pass these parameters 1:1 because they must be distributed between several QEMU devices (i.e. multiple -drive / -device flags) in a very specific manner, and entirely new iothread -objects have to be created for each iothread.
Either way, this info has to be maintained somewhere. Virtme already encodes a lot of high-level QEMU knowledge (device types, machine types, which device types are allowed for which machines, etc), and this isn't too different.
5ee6208 to
c35468d
Compare
The historical behavior of the `--disk` argument is inconsistent between `vng` and `virtme-run` (and `vng --help`). In `vng`, `--disk` is claimed to create a virtio-scsi disk, however, it is in fact translated into `virtme-run --blk-disk`, creating a virtio-blk disk instead. At the same time, in `virtme-run`, `--disk` is indeed creating a virtio-scsi disk. Deprecate `virtme-run --disk` and add `virtme-run --scsi-disk` instead with a non-ambiguous meaning. Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Previously, `vng --disk` was translated into `virtme-run --blk-disk`, contrary to the help claiming that `vng --disk` adds a virtio-scsi disk. At the same time, `virtme-run --disk` (same-named parameter) was indeed creating a virtio-scsi disk, making it a three-way inconsistency. Deprecate `vng --disk` (preserving its actual behavior of adding a virtio-blk disk) and add a `vng --blk-disk` with unambiguous meaning. Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
In vng, only expand the `PATH` shorthand into `NAME=PATH` if the equals sign is not present in the parameter value. In virtme-run, parse the values into a dataclass and refactor QEMU controller/drive/device generation to support adding further parameters to either. Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
…t fit
The scsi-hd `serial=` parameter is used by QEMU as a default for the
`device_id=` parameter (which is what in fact ends up in the by-id
path).
The `serial=` parameter is set by `virtme-run` using the disk "name"
provided in the `--{,blk-}disk` argument, which is in turn set by `vng`
to the disk path, when not explicitly provided.
When all of this is chained, the full path to disk image ends up being
used as the SCSI device_id:
1. `vng --disk /path/to/disk` leads to
2. `virtme-run --disk /path/to/disk=/path/to/disk` leads to
3. `qemu -device scsi-hd,serial=/path/to/disk` leads to (internally)
4. `qemu -device scsi-hd,serial=/path/to/disk,device_id=/path/to/disk`
However, QEMU (or perhaps it is SCSI) limits the device_id to
20 characters:
```
qemu-system-x86_64: -device scsi-hd,drive=disk0,vendor=virtme,product=disk,serial=/long/path/to/image/file: The serial number can't be longer than 20 characters when it is also used as the default for device_id
```
Likewise, the serial itself is limited to 36 characters.
This problem is indirectly exposed since the previous commit, because
`vng --disk` is now a shorthand for a SCSI disk, not a virtio-blk disk.
Fix this by truncating the device path manually and using the truncated
values as the SCSI serial and device_id.
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
Not specifying a disk image file format leads to QEMU issuing a warning
and write-protecting block 0. This is seldom desirable:
```
WARNING: Image format was not specified for '/path/to/file' and probing guessed raw.
Automatically detecting the format is dangerous for raw images, write operations on block 0 will be restricted.
Specify the 'raw' format explicitly to remove the restrictions.
```
To squelch the warning, specify `format=raw` explicitly. This will be
made configurable in the next commit.
Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
This implements support for various disk topology and I/O driver options for both `--disk` and `--blk-disk` arguments. Options are accepted as a comma-separated list after the image file path e.g., `--disk /path/to/file,format=qcow2,...`. In addition to general QEMU options (format=), I/O driver options (cache=, aio=, discard=, detect-zeroes=, queues=) and topology options (log-sec=, phy-sec=, min-io=, opt-io=, disc-gran=) a "topology=" metaoption is accepted to pass through host device queue limits. The names for these options were chosen to match `lsblk` columns rather than QEMU's own -drive/-device options, because the latter are underdocumented and non-uniform. Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
c35468d to
7fbe3b0
Compare
|
Hello, I just checked again this PR, and, even if it is quite a lot of code and mapping, I think it can indeed help to better support disk images. Because I don't know well this area, I still have a few opened questions:
It would be good if someone else can have a look at this PR :) @arighi @marcosps : you commented this PR earlier, WDYT about it? |
| disk.pop_opt_qemu("aio", None), | ||
| f"discard={'unmap' if discard else 'ignore'}" | ||
| if discard is not None | ||
| else None, |
There was a problem hiding this comment.
This seems too complicated, maybe they could be simplified? Same applies top below. This is not a nack for the patch, but too many nested if else clauses can get confusing quite easy.
|
It has been a couple years already since I used the |
|
in my tests, if you add two But for But if I add a name to one of the disks, it then has one dir, for the unnamed disk, and one symlink, as expected: Is it expected? |
arighi
left a comment
There was a problem hiding this comment.
I left a few comments (mostly stylish stuff). Overall looks good to me. I rarely use the --disk options, so I don't have a strong opinion on the redesign / behavior shifts.
But I share the same concern mentioned by @matttbe, we won't be able to maintain the scsi-disk opts mappings. However, I also don't have a strong opinion on this, if you guys think it's helpful and it can save time to hunt the right qemu options, I'm ok with that.
So, on my side, once those small nits that I reported are fixed, I'm ok to merge this.
| # unused | ||
| disk.opts.pop("rota", None) | ||
|
|
||
| if disk.pop_opt("iothread", bool, False): |
There was a problem hiding this comment.
Should use strtobool here instead of bool?
|
|
||
| if args.disk: | ||
| qemuargs.extend(["-device", "{},id=scsi".format(arch.virtio_dev_type("scsi"))]) | ||
| if disk.pop_opt("iothread", bool, False): |
| if disk.opts: | ||
| raise ValueError( | ||
| f"invalid --disk parameter: {d!r}\n(keys were not consumed: {disk.opts.keys()})" | ||
| ) |
There was a problem hiding this comment.
We should probably use arg_fail() here instead of raise. Something like this?
arg_fail(f"unrecognized --blk-disk options for {d!r}: {', '.join(sorted(disk.opts))}")
| if disk.opts: | ||
| raise ValueError( | ||
| f"invalid --disk parameter: {d!r}\n(keys were not consumed: {disk.opts.keys()})" | ||
| ) |
There was a problem hiding this comment.
Ditto, arg_fail() instead of raise, like:
arg_fail(f"unrecognized --scsi-disk options for {d!r}: {', '.join(sorted(disk.opts))}")
|
|
||
| def topology(self) -> dict[str, str]: | ||
| # Get the real device name (handles symlinks like /dev/mapper -> /dev/dm-X) | ||
| real_path = os.path.realpath(self.path, strict=True) |
There was a problem hiding this comment.
Maybe add something like this to check if the path exists:
try:
real_path = os.path.realpath(self.path, strict=True)
except FileNotFoundError:
arg_fail(
f"disk path does not exist (needed for topology=): {self.path!r}"
)
Expand
vng's capabilities for configuring emulated disk drives. This has two meaningful parts.First, this PR adds support to the
vngwrapper for connecting both types of emulated disks to the VM. Originally,vngonly accepted a--diskargument (which was internally translated tovng --blk-disk, despite the documentation), with no option to attach a SCSI disk using thevngwrapper (while at the same time,virtme-ng --diskhad an opposite meaning).This PR adds support for both
vng --scsi-diskandvng --blk-disk, which now map 1:1 to identicalvirtme-ngoptions, and convertsvng --disk/virtme-ng --diskinto a legacy alias (with a warning on usage) due to its inconsistent behavior.Second, this PR adds support to both
vngandvirtme-runfor passing extended options in both--scsi-diskand--blk-diskarguments. Each value is now treated as a comma-separated list with a number of I/O driver options and disk topology options available.There are several I/O driver options:
format=(raw|qcow2)(withrawnow explicitly passed to QEMU by default, since QEMU considers implicit format detection a deprecated feature and prohibits write operations to sector 0 if it is used)cache=(...)corresponds to QEMU's-blockdev cache.*, or-drive cache=...aio=(...)corresponds to QEMU's-blockdev aio=..., or-drive aio=...discard=(on|off)corresponds to QEMU's-blockdev discard=(ignore|unmap)or-drive discard=...detect-zeroes=(on|off)corresponds to QEMU's-blockdev detect-zeroes=(on|off|unmap)or-drive detect-zeroes=...queues=(int)corresponds to QEMU's-device num_queues=...onvirtio-blk-deviceorvirtio-scsi-devicenodesThen, there are topology options which change the characteristics and limits of the disk reported to the guest OS:
log-sec=(bytes)sets the logical sector (LBA) size, e.g. to emulate 4Kn disksphy-sec=(bytes)sets the physical (underlying) sector sizemin-io=(bytes)sets the advisory minimum I/O sizeopt-io=(bytes)sets the advisory optimal I/O sizedisc-gran=(bytes)sets the advisory TRIM/UNMAP request granularityrota=(on|off)indicates whether the emulated drive is reported as rotational mediumFinally, there are two metaoptions:
iothread=(on|off)adds a dedicated I/O thread for the disk and/or SCSI controller, andtopology=(on|off)extracts host device topology data from sysfs and translates that into the values of the options above (naturally, this only works when the host path points to an actual block device).