Skip to content

Commit 77a0cfa

Browse files
committed
Merge tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe: - NVMe updates via Keith: - Use uring_cmd helper (Pavel) - Host Memory Buffer allocation enhancements (Christoph) - Target persistent reservation support (Guixin) - Persistent reservation tracing (Guixen) - NVMe 2.1 specification support (Keith) - Rotational Meta Support (Matias, Wang, Keith) - Volatile cache detection enhancment (Guixen) - MD updates via Song: - Maintainers update - raid5 sync IO fix - Enhance handling of faulty and blocked devices - raid5-ppl atomic improvement - md-bitmap fix - Support for manually defining embedded partition tables - Zone append fixes and cleanups - Stop sending the queued requests in the plug list to the driver ->queue_rqs() handle in reverse order. - Zoned write plug cleanups - Cleanups disk stats tracking and add support for disk stats for passthrough IO - Add preparatory support for file system atomic writes - Add lockdep support for queue freezing. Already found a bunch of issues, and some fixes for that are in here. More will be coming. - Fix race between queue stopping/quiescing and IO queueing - ublk recovery improvements - Fix ublk mmap for 64k pages - Various fixes and cleanups * tag 'for-6.13/block-20241118' of git://git.kernel.dk/linux: (118 commits) MAINTAINERS: Update git tree for mdraid subsystem block: make struct rq_list available for !CONFIG_BLOCK block/genhd: use seq_put_decimal_ull for diskstats decimal values block: don't reorder requests in blk_mq_add_to_batch block: don't reorder requests in blk_add_rq_to_plug block: add a rq_list type block: remove rq_list_move virtio_blk: reverse request order in virtio_queue_rqs nvme-pci: reverse request order in nvme_queue_rqs btrfs: validate queue limits block: export blk_validate_limits nvmet: add tracing of reservation commands nvme: parse reservation commands's action and rtype to string nvmet: report ns's vwc not present md/raid5: Increase r5conf.cache_name size block: remove the ioprio field from struct request block: remove the write_hint field from struct request nvme: check ns's volatile write cache not present nvme: add rotational support nvme: use command set independent id ns if available ...
2 parents 3d1b536 + 88d47f6 commit 77a0cfa

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+3579
-905
lines changed

Documentation/ABI/stable/sysfs-block

+7
Original file line numberDiff line numberDiff line change
@@ -424,6 +424,13 @@ Description:
424424
[RW] This file is used to control (on/off) the iostats
425425
accounting of the disk.
426426

427+
What: /sys/block/<disk>/queue/iostats_passthrough
428+
Date: October 2024
429+
430+
Description:
431+
[RW] This file is used to control (on/off) the iostats
432+
accounting of the disk for passthrough commands.
433+
427434

428435
What: /sys/block/<disk>/queue/logical_block_size
429436
Date: May 2009

Documentation/block/cmdline-partition.rst

+4-1
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,16 @@ blkdevparts=<blkdev-def>[;<blkdev-def>]
3939
create a link to block device partition with the name "PARTNAME".
4040
User space application can access partition by partition name.
4141

42+
ro
43+
read-only. Flag the partition as read-only.
44+
4245
Example:
4346

4447
eMMC disk names are "mmcblk0" and "mmcblk0boot0".
4548

4649
bootargs::
4750

48-
'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot),-(kernel)'
51+
'blkdevparts=mmcblk0:1G(data0),1G(data1),-;mmcblk0boot0:1m(boot)ro,-(kernel)'
4952

5053
dmesg::
5154

Documentation/block/ublk.rst

+18-6
Original file line numberDiff line numberDiff line change
@@ -199,24 +199,36 @@ managing and controlling ublk devices with help of several control commands:
199199

200200
- user recovery feature description
201201

202-
Two new features are added for user recovery: ``UBLK_F_USER_RECOVERY`` and
203-
``UBLK_F_USER_RECOVERY_REISSUE``.
204-
205-
With ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
202+
Three new features are added for user recovery: ``UBLK_F_USER_RECOVERY``,
203+
``UBLK_F_USER_RECOVERY_REISSUE``, and ``UBLK_F_USER_RECOVERY_FAIL_IO``. To
204+
enable recovery of ublk devices after the ublk server exits, the ublk server
205+
should specify the ``UBLK_F_USER_RECOVERY`` flag when creating the device. The
206+
ublk server may additionally specify at most one of
207+
``UBLK_F_USER_RECOVERY_REISSUE`` and ``UBLK_F_USER_RECOVERY_FAIL_IO`` to
208+
modify how I/O is handled while the ublk server is dying/dead (this is called
209+
the ``nosrv`` case in the driver code).
210+
211+
With just ``UBLK_F_USER_RECOVERY`` set, after one ubq_daemon(ublk server's io
206212
handler) is dying, ublk does not delete ``/dev/ublkb*`` during the whole
207213
recovery stage and ublk device ID is kept. It is ublk server's
208214
responsibility to recover the device context by its own knowledge.
209215
Requests which have not been issued to userspace are requeued. Requests
210216
which have been issued to userspace are aborted.
211217

212-
With ``UBLK_F_USER_RECOVERY_REISSUE`` set, after one ubq_daemon(ublk
213-
server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
218+
With ``UBLK_F_USER_RECOVERY_REISSUE`` additionally set, after one ubq_daemon
219+
(ublk server's io handler) is dying, contrary to ``UBLK_F_USER_RECOVERY``,
214220
requests which have been issued to userspace are requeued and will be
215221
re-issued to the new process after handling ``UBLK_CMD_END_USER_RECOVERY``.
216222
``UBLK_F_USER_RECOVERY_REISSUE`` is designed for backends who tolerate
217223
double-write since the driver may issue the same I/O request twice. It
218224
might be useful to a read-only FS or a VM backend.
219225

226+
With ``UBLK_F_USER_RECOVERY_FAIL_IO`` additionally set, after the ublk server
227+
exits, requests which have issued to userspace are failed, as are any
228+
subsequently issued requests. Applications continuously issuing I/O against
229+
devices with this flag set will see a stream of I/O errors until a new ublk
230+
server recovers the device.
231+
220232
Unprivileged ublk device is supported by passing ``UBLK_F_UNPRIVILEGED_DEV``.
221233
Once the flag is set, all control commands can be sent by unprivileged
222234
user. Except for command of ``UBLK_CMD_ADD_DEV``, permission check on

Documentation/devicetree/bindings/mmc/mmc-card.yaml

+52
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@ description: |
1313
This documents describes the devicetree bindings for a mmc-host controller
1414
child node describing a mmc-card / an eMMC.
1515
16+
It's possible to define a fixed partition table for an eMMC for the user
17+
partition, the 2 BOOT partition (boot1/2) and the 4 GP (gp1/2/3/4) if supported
18+
by the eMMC.
19+
1620
properties:
1721
compatible:
1822
const: mmc-card
@@ -26,6 +30,24 @@ properties:
2630
Use this to indicate that the mmc-card has a broken hpi
2731
implementation, and that hpi should not be used.
2832

33+
patternProperties:
34+
"^partitions(-boot[12]|-gp[14])?$":
35+
$ref: /schemas/mtd/partitions/partitions.yaml
36+
37+
patternProperties:
38+
"^partition@[0-9a-f]+$":
39+
$ref: /schemas/mtd/partitions/partition.yaml
40+
41+
properties:
42+
reg:
43+
description: Must be multiple of 512 as it's converted
44+
internally from bytes to SECTOR_SIZE (512 bytes)
45+
46+
required:
47+
- reg
48+
49+
unevaluatedProperties: false
50+
2951
required:
3052
- compatible
3153
- reg
@@ -42,6 +64,36 @@ examples:
4264
compatible = "mmc-card";
4365
reg = <0>;
4466
broken-hpi;
67+
68+
partitions {
69+
compatible = "fixed-partitions";
70+
71+
#address-cells = <1>;
72+
#size-cells = <1>;
73+
74+
partition@0 {
75+
label = "kernel"; /* Kernel */
76+
reg = <0x0 0x2000000>; /* 32 MB */
77+
};
78+
79+
partition@2000000 {
80+
label = "rootfs";
81+
reg = <0x2000000 0x40000000>; /* 1GB */
82+
};
83+
};
84+
85+
partitions-boot1 {
86+
compatible = "fixed-partitions";
87+
88+
#address-cells = <1>;
89+
#size-cells = <1>;
90+
91+
partition@0 {
92+
label = "bl";
93+
reg = <0x0 0x2000000>; /* 32MB */
94+
read-only;
95+
};
96+
};
4597
};
4698
};
4799

MAINTAINERS

+2-2
Original file line numberDiff line numberDiff line change
@@ -21393,11 +21393,11 @@ F: include/linux/property.h
2139321393

2139421394
SOFTWARE RAID (Multiple Disks) SUPPORT
2139521395
M: Song Liu <[email protected]>
21396-
R: Yu Kuai <[email protected]>
21396+
M: Yu Kuai <[email protected]>
2139721397
2139821398
S: Supported
2139921399
Q: https://patchwork.kernel.org/project/linux-raid/list/
21400-
T: git git://git.kernel.org/pub/scm/linux/kernel/git/song/md.git
21400+
T: git git://git.kernel.org/pub/scm/linux/kernel/git/mdraid/linux.git
2140121401
F: drivers/md/Kconfig
2140221402
F: drivers/md/Makefile
2140321403
F: drivers/md/md*

block/bio-integrity.c

+5-8
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ EXPORT_SYMBOL(bio_integrity_add_page);
199199

200200
static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
201201
int nr_vecs, unsigned int len,
202-
unsigned int direction, u32 seed)
202+
unsigned int direction)
203203
{
204204
bool write = direction == ITER_SOURCE;
205205
struct bio_integrity_payload *bip;
@@ -247,7 +247,6 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
247247
}
248248

249249
bip->bip_flags |= BIP_COPY_USER;
250-
bip->bip_iter.bi_sector = seed;
251250
bip->bip_vcnt = nr_vecs;
252251
return 0;
253252
free_bip:
@@ -258,7 +257,7 @@ static int bio_integrity_copy_user(struct bio *bio, struct bio_vec *bvec,
258257
}
259258

260259
static int bio_integrity_init_user(struct bio *bio, struct bio_vec *bvec,
261-
int nr_vecs, unsigned int len, u32 seed)
260+
int nr_vecs, unsigned int len)
262261
{
263262
struct bio_integrity_payload *bip;
264263

@@ -267,7 +266,6 @@ static int bio_integrity_init_user(struct bio *bio, struct bio_vec *bvec,
267266
return PTR_ERR(bip);
268267

269268
memcpy(bip->bip_vec, bvec, nr_vecs * sizeof(*bvec));
270-
bip->bip_iter.bi_sector = seed;
271269
bip->bip_iter.bi_size = len;
272270
bip->bip_vcnt = nr_vecs;
273271
return 0;
@@ -303,8 +301,7 @@ static unsigned int bvec_from_pages(struct bio_vec *bvec, struct page **pages,
303301
return nr_bvecs;
304302
}
305303

306-
int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes,
307-
u32 seed)
304+
int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes)
308305
{
309306
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
310307
unsigned int align = blk_lim_dma_alignment_and_pad(&q->limits);
@@ -350,9 +347,9 @@ int bio_integrity_map_user(struct bio *bio, void __user *ubuf, ssize_t bytes,
350347

351348
if (copy)
352349
ret = bio_integrity_copy_user(bio, bvec, nr_bvecs, bytes,
353-
direction, seed);
350+
direction);
354351
else
355-
ret = bio_integrity_init_user(bio, bvec, nr_bvecs, bytes, seed);
352+
ret = bio_integrity_init_user(bio, bvec, nr_bvecs, bytes);
356353
if (ret)
357354
goto release_pages;
358355
if (bvec != stack_vec)

block/bio.c

+12-69
Original file line numberDiff line numberDiff line change
@@ -1064,39 +1064,6 @@ int bio_add_pc_page(struct request_queue *q, struct bio *bio,
10641064
}
10651065
EXPORT_SYMBOL(bio_add_pc_page);
10661066

1067-
/**
1068-
* bio_add_zone_append_page - attempt to add page to zone-append bio
1069-
* @bio: destination bio
1070-
* @page: page to add
1071-
* @len: vec entry length
1072-
* @offset: vec entry offset
1073-
*
1074-
* Attempt to add a page to the bio_vec maplist of a bio that will be submitted
1075-
* for a zone-append request. This can fail for a number of reasons, such as the
1076-
* bio being full or the target block device is not a zoned block device or
1077-
* other limitations of the target block device. The target block device must
1078-
* allow bio's up to PAGE_SIZE, so it is always possible to add a single page
1079-
* to an empty bio.
1080-
*
1081-
* Returns: number of bytes added to the bio, or 0 in case of a failure.
1082-
*/
1083-
int bio_add_zone_append_page(struct bio *bio, struct page *page,
1084-
unsigned int len, unsigned int offset)
1085-
{
1086-
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
1087-
bool same_page = false;
1088-
1089-
if (WARN_ON_ONCE(bio_op(bio) != REQ_OP_ZONE_APPEND))
1090-
return 0;
1091-
1092-
if (WARN_ON_ONCE(!bdev_is_zoned(bio->bi_bdev)))
1093-
return 0;
1094-
1095-
return bio_add_hw_page(q, bio, page, len, offset,
1096-
queue_max_zone_append_sectors(q), &same_page);
1097-
}
1098-
EXPORT_SYMBOL_GPL(bio_add_zone_append_page);
1099-
11001067
/**
11011068
* __bio_add_page - add page(s) to a bio in a new segment
11021069
* @bio: destination bio
@@ -1206,21 +1173,12 @@ EXPORT_SYMBOL_GPL(__bio_release_pages);
12061173

12071174
void bio_iov_bvec_set(struct bio *bio, struct iov_iter *iter)
12081175
{
1209-
size_t size = iov_iter_count(iter);
1210-
12111176
WARN_ON_ONCE(bio->bi_max_vecs);
12121177

1213-
if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
1214-
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
1215-
size_t max_sectors = queue_max_zone_append_sectors(q);
1216-
1217-
size = min(size, max_sectors << SECTOR_SHIFT);
1218-
}
1219-
12201178
bio->bi_vcnt = iter->nr_segs;
12211179
bio->bi_io_vec = (struct bio_vec *)iter->bvec;
12221180
bio->bi_iter.bi_bvec_done = iter->iov_offset;
1223-
bio->bi_iter.bi_size = size;
1181+
bio->bi_iter.bi_size = iov_iter_count(iter);
12241182
bio_set_flag(bio, BIO_CLONED);
12251183
}
12261184

@@ -1245,20 +1203,6 @@ static int bio_iov_add_folio(struct bio *bio, struct folio *folio, size_t len,
12451203
return 0;
12461204
}
12471205

1248-
static int bio_iov_add_zone_append_folio(struct bio *bio, struct folio *folio,
1249-
size_t len, size_t offset)
1250-
{
1251-
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
1252-
bool same_page = false;
1253-
1254-
if (bio_add_hw_folio(q, bio, folio, len, offset,
1255-
queue_max_zone_append_sectors(q), &same_page) != len)
1256-
return -EINVAL;
1257-
if (same_page && bio_flagged(bio, BIO_PAGE_PINNED))
1258-
unpin_user_folio(folio, 1);
1259-
return 0;
1260-
}
1261-
12621206
static unsigned int get_contig_folio_len(unsigned int *num_pages,
12631207
struct page **pages, unsigned int i,
12641208
struct folio *folio, size_t left,
@@ -1365,14 +1309,7 @@ static int __bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
13651309
len = get_contig_folio_len(&num_pages, pages, i,
13661310
folio, left, offset);
13671311

1368-
if (bio_op(bio) == REQ_OP_ZONE_APPEND) {
1369-
ret = bio_iov_add_zone_append_folio(bio, folio, len,
1370-
folio_offset);
1371-
if (ret)
1372-
break;
1373-
} else
1374-
bio_iov_add_folio(bio, folio, len, folio_offset);
1375-
1312+
bio_iov_add_folio(bio, folio, len, folio_offset);
13761313
offset = 0;
13771314
}
13781315

@@ -1728,16 +1665,22 @@ struct bio *bio_split(struct bio *bio, int sectors,
17281665
{
17291666
struct bio *split;
17301667

1731-
BUG_ON(sectors <= 0);
1732-
BUG_ON(sectors >= bio_sectors(bio));
1668+
if (WARN_ON_ONCE(sectors <= 0))
1669+
return ERR_PTR(-EINVAL);
1670+
if (WARN_ON_ONCE(sectors >= bio_sectors(bio)))
1671+
return ERR_PTR(-EINVAL);
17331672

17341673
/* Zone append commands cannot be split */
17351674
if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
1736-
return NULL;
1675+
return ERR_PTR(-EINVAL);
1676+
1677+
/* atomic writes cannot be split */
1678+
if (bio->bi_opf & REQ_ATOMIC)
1679+
return ERR_PTR(-EINVAL);
17371680

17381681
split = bio_alloc_clone(bio->bi_bdev, bio, gfp, bs);
17391682
if (!split)
1740-
return NULL;
1683+
return ERR_PTR(-ENOMEM);
17411684

17421685
split->bi_iter.bi_size = sectors << 9;
17431686

0 commit comments

Comments
 (0)