Skip to content

Commit 2434533

Browse files
adam900710gregkh
authored andcommitted
btrfs: do proper folio cleanup when run_delalloc_nocow() failed
commit c2b47df upstream. [BUG] With CONFIG_DEBUG_VM set, test case generic/476 has some chance to crash with the following VM_BUG_ON_FOLIO(): BTRFS error (device dm-3): cow_file_range failed, start 1146880 end 1253375 len 106496 ret -28 BTRFS error (device dm-3): run_delalloc_nocow failed, start 1146880 end 1253375 len 106496 ret -28 page: refcount:4 mapcount:0 mapping:00000000592787cc index:0x12 pfn:0x10664 aops:btrfs_aops [btrfs] ino:101 dentry name(?):"f1774" flags: 0x2fffff80004028(uptodate|lru|private|node=0|zone=2|lastcpupid=0xfffff) page dumped because: VM_BUG_ON_FOLIO(!folio_test_locked(folio)) ------------[ cut here ]------------ kernel BUG at mm/page-writeback.c:2992! Internal error: Oops - BUG: 00000000f2000800 [#1] SMP CPU: 2 UID: 0 PID: 3943513 Comm: kworker/u24:15 Tainted: G OE 6.12.0-rc7-custom+ #87 Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE Hardware name: QEMU KVM Virtual Machine, BIOS unknown 2/2/2022 Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs] pc : folio_clear_dirty_for_io+0x128/0x258 lr : folio_clear_dirty_for_io+0x128/0x258 Call trace: folio_clear_dirty_for_io+0x128/0x258 btrfs_folio_clamp_clear_dirty+0x80/0xd0 [btrfs] __process_folios_contig+0x154/0x268 [btrfs] extent_clear_unlock_delalloc+0x5c/0x80 [btrfs] run_delalloc_nocow+0x5f8/0x760 [btrfs] btrfs_run_delalloc_range+0xa8/0x220 [btrfs] writepage_delalloc+0x230/0x4c8 [btrfs] extent_writepage+0xb8/0x358 [btrfs] extent_write_cache_pages+0x21c/0x4e8 [btrfs] btrfs_writepages+0x94/0x150 [btrfs] do_writepages+0x74/0x190 filemap_fdatawrite_wbc+0x88/0xc8 start_delalloc_inodes+0x178/0x3a8 [btrfs] btrfs_start_delalloc_roots+0x174/0x280 [btrfs] shrink_delalloc+0x114/0x280 [btrfs] flush_space+0x250/0x2f8 [btrfs] btrfs_async_reclaim_data_space+0x180/0x228 [btrfs] process_one_work+0x164/0x408 worker_thread+0x25c/0x388 kthread+0x100/0x118 ret_from_fork+0x10/0x20 Code: 910a8021 a90363f7 a9046bf9 94012379 (d4210000) ---[ end trace 0000000000000000 ]--- [CAUSE] The first two lines of extra debug messages show the problem is caused by the error handling of run_delalloc_nocow(). E.g. we have the following dirtied range (4K blocksize 4K page size): 0 16K 32K |//////////////////////////////////////| | Pre-allocated | And the range [0, 16K) has a preallocated extent. - Enter run_delalloc_nocow() for range [0, 16K) Which found range [0, 16K) is preallocated, can do the proper NOCOW write. - Enter fallback_to_fow() for range [16K, 32K) Since the range [16K, 32K) is not backed by preallocated extent, we have to go COW. - cow_file_range() failed for range [16K, 32K) So cow_file_range() will do the clean up by clearing folio dirty, unlock the folios. Now the folios in range [16K, 32K) is unlocked. - Enter extent_clear_unlock_delalloc() from run_delalloc_nocow() Which is called with PAGE_START_WRITEBACK to start page writeback. But folios can only be marked writeback when it's properly locked, thus this triggered the VM_BUG_ON_FOLIO(). Furthermore there is another hidden but common bug that run_delalloc_nocow() is not clearing the folio dirty flags in its error handling path. This is the common bug shared between run_delalloc_nocow() and cow_file_range(). [FIX] - Clear folio dirty for range [@start, @cur_offset) Introduce a helper, cleanup_dirty_folios(), which will find and lock the folio in the range, clear the dirty flag and start/end the writeback, with the extra handling for the @locked_folio. - Introduce a helper to clear folio dirty, start and end writeback - Introduce a helper to record the last failed COW range end This is to trace which range we should skip, to avoid double unlocking. - Skip the failed COW range for the error handling CC: [email protected] Reviewed-by: Boris Burkov <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
1 parent 692cf71 commit 2434533

File tree

2 files changed

+102
-6
lines changed

2 files changed

+102
-6
lines changed

fs/btrfs/inode.c

+89-6
Original file line numberDiff line numberDiff line change
@@ -1967,6 +1967,53 @@ static int can_nocow_file_extent(struct btrfs_path *path,
19671967
return ret < 0 ? ret : can_nocow;
19681968
}
19691969

1970+
/*
1971+
* Cleanup the dirty folios which will never be submitted due to error.
1972+
*
1973+
* When running a delalloc range, we may need to split the ranges (due to
1974+
* fragmentation or NOCOW). If we hit an error in the later part, we will error
1975+
* out and previously successfully executed range will never be submitted, thus
1976+
* we have to cleanup those folios by clearing their dirty flag, starting and
1977+
* finishing the writeback.
1978+
*/
1979+
static void cleanup_dirty_folios(struct btrfs_inode *inode,
1980+
struct folio *locked_folio,
1981+
u64 start, u64 end, int error)
1982+
{
1983+
struct btrfs_fs_info *fs_info = inode->root->fs_info;
1984+
struct address_space *mapping = inode->vfs_inode.i_mapping;
1985+
pgoff_t start_index = start >> PAGE_SHIFT;
1986+
pgoff_t end_index = end >> PAGE_SHIFT;
1987+
u32 len;
1988+
1989+
ASSERT(end + 1 - start < U32_MAX);
1990+
ASSERT(IS_ALIGNED(start, fs_info->sectorsize) &&
1991+
IS_ALIGNED(end + 1, fs_info->sectorsize));
1992+
len = end + 1 - start;
1993+
1994+
/*
1995+
* Handle the locked folio first.
1996+
* The btrfs_folio_clamp_*() helpers can handle range out of the folio case.
1997+
*/
1998+
btrfs_folio_clamp_finish_io(fs_info, locked_folio, start, len);
1999+
2000+
for (pgoff_t index = start_index; index <= end_index; index++) {
2001+
struct folio *folio;
2002+
2003+
/* Already handled at the beginning. */
2004+
if (index == locked_folio->index)
2005+
continue;
2006+
folio = __filemap_get_folio(mapping, index, FGP_LOCK, GFP_NOFS);
2007+
/* Cache already dropped, no need to do any cleanup. */
2008+
if (IS_ERR(folio))
2009+
continue;
2010+
btrfs_folio_clamp_finish_io(fs_info, locked_folio, start, len);
2011+
folio_unlock(folio);
2012+
folio_put(folio);
2013+
}
2014+
mapping_set_error(mapping, error);
2015+
}
2016+
19702017
/*
19712018
* when nowcow writeback call back. This checks for snapshots or COW copies
19722019
* of the extents that exist in the file, and COWs the file as required.
@@ -1982,6 +2029,11 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
19822029
struct btrfs_root *root = inode->root;
19832030
struct btrfs_path *path;
19842031
u64 cow_start = (u64)-1;
2032+
/*
2033+
* If not 0, represents the inclusive end of the last fallback_to_cow()
2034+
* range. Only for error handling.
2035+
*/
2036+
u64 cow_end = 0;
19852037
u64 cur_offset = start;
19862038
int ret;
19872039
bool check_prev = true;
@@ -2142,6 +2194,7 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
21422194
found_key.offset - 1);
21432195
cow_start = (u64)-1;
21442196
if (ret) {
2197+
cow_end = found_key.offset - 1;
21452198
btrfs_dec_nocow_writers(nocow_bg);
21462199
goto error;
21472200
}
@@ -2215,24 +2268,54 @@ static noinline int run_delalloc_nocow(struct btrfs_inode *inode,
22152268
cow_start = cur_offset;
22162269

22172270
if (cow_start != (u64)-1) {
2218-
cur_offset = end;
22192271
ret = fallback_to_cow(inode, locked_folio, cow_start, end);
22202272
cow_start = (u64)-1;
2221-
if (ret)
2273+
if (ret) {
2274+
cow_end = end;
22222275
goto error;
2276+
}
22232277
}
22242278

22252279
btrfs_free_path(path);
22262280
return 0;
22272281

22282282
error:
2283+
/*
2284+
* There are several error cases:
2285+
*
2286+
* 1) Failed without falling back to COW
2287+
* start cur_offset end
2288+
* |/////////////| |
2289+
*
2290+
* For range [start, cur_offset) the folios are already unlocked (except
2291+
* @locked_folio), EXTENT_DELALLOC already removed.
2292+
* Only need to clear the dirty flag as they will never be submitted.
2293+
* Ordered extent and extent maps are handled by
2294+
* btrfs_mark_ordered_io_finished() inside run_delalloc_range().
2295+
*
2296+
* 2) Failed with error from fallback_to_cow()
2297+
* start cur_offset cow_end end
2298+
* |/////////////|-----------| |
2299+
*
2300+
* For range [start, cur_offset) it's the same as case 1).
2301+
* But for range [cur_offset, cow_end), the folios have dirty flag
2302+
* cleared and unlocked, EXTENT_DEALLLOC cleared by cow_file_range().
2303+
*
2304+
* Thus we should not call extent_clear_unlock_delalloc() on range
2305+
* [cur_offset, cow_end), as the folios are already unlocked.
2306+
*
2307+
* So clear the folio dirty flags for [start, cur_offset) first.
2308+
*/
2309+
if (cur_offset > start)
2310+
cleanup_dirty_folios(inode, locked_folio, start, cur_offset - 1, ret);
2311+
22292312
/*
22302313
* If an error happened while a COW region is outstanding, cur_offset
2231-
* needs to be reset to cow_start to ensure the COW region is unlocked
2232-
* as well.
2314+
* needs to be reset to @cow_end + 1 to skip the COW range, as
2315+
* cow_file_range() will do the proper cleanup at error.
22332316
*/
2234-
if (cow_start != (u64)-1)
2235-
cur_offset = cow_start;
2317+
if (cow_end)
2318+
cur_offset = cow_end + 1;
22362319

22372320
/*
22382321
* We need to lock the extent here because we're clearing DELALLOC and

fs/btrfs/subpage.h

+13
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,19 @@ DECLARE_BTRFS_SUBPAGE_OPS(writeback);
137137
DECLARE_BTRFS_SUBPAGE_OPS(ordered);
138138
DECLARE_BTRFS_SUBPAGE_OPS(checked);
139139

140+
/*
141+
* Helper for error cleanup, where a folio will have its dirty flag cleared,
142+
* with writeback started and finished.
143+
*/
144+
static inline void btrfs_folio_clamp_finish_io(struct btrfs_fs_info *fs_info,
145+
struct folio *locked_folio,
146+
u64 start, u32 len)
147+
{
148+
btrfs_folio_clamp_clear_dirty(fs_info, locked_folio, start, len);
149+
btrfs_folio_clamp_set_writeback(fs_info, locked_folio, start, len);
150+
btrfs_folio_clamp_clear_writeback(fs_info, locked_folio, start, len);
151+
}
152+
140153
bool btrfs_subpage_clear_and_test_dirty(const struct btrfs_fs_info *fs_info,
141154
struct folio *folio, u64 start, u32 len);
142155

0 commit comments

Comments
 (0)