Skip to content

Commit 0ec8bc9

Browse files
ryncsnakpm00
authored andcommitted
mm, swap: fix allocation and scanning race with swapoff
There are two flags used to synchronize allocation and scanning with swapoff: SWP_WRITEOK and SWP_SCANNING. SWP_WRITEOK: Swapoff will first unset this flag, at this point any further swap allocation or scanning on this device should just abort so no more new entries will be referencing this device. Swapoff will then unuse all existing swap entries. SWP_SCANNING: This flag is set when device is being scanned. Swapoff will wait for all scanner to stop before the final release of the swap device structures to avoid UAF. Note this flag is the highest used bit of si->flags so it could be added up arithmetically, if there are multiple scanner. commit 5f843a9 ("mm: swap: separate SSD allocation from scan_swap_map_slots()") ignored SWP_SCANNING and SWP_WRITEOK flags while separating cluster allocation path from the old allocation path. Add the flags back to fix swapoff race. The race is hard to trigger as si->lock prevents most parallel operations, but si->lock could be dropped for reclaim or discard. This issue is found during code review. This commit fixes this problem. For SWP_SCANNING, Just like before, set the flag before scan and remove it afterwards. For SWP_WRITEOK, there are several places where si->lock could be dropped, it will be error-prone and make the code hard to follow if we try to cover these places one by one. So just do one check before the real allocation, which is also very similar like before. With new cluster allocator it may waste a bit of time iterating the clusters but won't take long, and swapoff is not performance sensitive. Link: https://lkml.kernel.org/r/[email protected] Fixes: 5f843a9 ("mm: swap: separate SSD allocation from scan_swap_map_slots()") Reported-by: "Huang, Ying" <[email protected]> Closes: https://lore.kernel.org/linux-mm/[email protected]/ Signed-off-by: Kairui Song <[email protected]> Cc: Barry Song <[email protected]> Cc: Chris Li <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Kalesh Singh <[email protected]> Cc: Ryan Roberts <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent dcf32ea commit 0ec8bc9

File tree

1 file changed

+19
-3
lines changed

1 file changed

+19
-3
lines changed

mm/swapfile.c

+19-3
Original file line numberDiff line numberDiff line change
@@ -664,12 +664,15 @@ static bool cluster_scan_range(struct swap_info_struct *si,
664664
return true;
665665
}
666666

667-
static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster_info *ci,
667+
static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster_info *ci,
668668
unsigned int start, unsigned char usage,
669669
unsigned int order)
670670
{
671671
unsigned int nr_pages = 1 << order;
672672

673+
if (!(si->flags & SWP_WRITEOK))
674+
return false;
675+
673676
if (cluster_is_free(ci)) {
674677
if (nr_pages < SWAPFILE_CLUSTER) {
675678
list_move_tail(&ci->list, &si->nonfull_clusters[order]);
@@ -690,6 +693,8 @@ static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster
690693
list_move_tail(&ci->list, &si->full_clusters);
691694
ci->flags = CLUSTER_FLAG_FULL;
692695
}
696+
697+
return true;
693698
}
694699

695700
static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigned long offset,
@@ -713,7 +718,10 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne
713718

714719
while (offset <= end) {
715720
if (cluster_scan_range(si, ci, offset, nr_pages)) {
716-
cluster_alloc_range(si, ci, offset, usage, order);
721+
if (!cluster_alloc_range(si, ci, offset, usage, order)) {
722+
offset = SWAP_NEXT_INVALID;
723+
goto done;
724+
}
717725
*foundp = offset;
718726
if (ci->count == SWAPFILE_CLUSTER) {
719727
offset = SWAP_NEXT_INVALID;
@@ -805,7 +813,11 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o
805813
if (!list_empty(&si->free_clusters)) {
806814
ci = list_first_entry(&si->free_clusters, struct swap_cluster_info, list);
807815
offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage);
808-
VM_BUG_ON(!found);
816+
/*
817+
* Either we didn't touch the cluster due to swapoff,
818+
* or the allocation must success.
819+
*/
820+
VM_BUG_ON((si->flags & SWP_WRITEOK) && !found);
809821
goto done;
810822
}
811823

@@ -1041,6 +1053,8 @@ static int cluster_alloc_swap(struct swap_info_struct *si,
10411053

10421054
VM_BUG_ON(!si->cluster_info);
10431055

1056+
si->flags += SWP_SCANNING;
1057+
10441058
while (n_ret < nr) {
10451059
unsigned long offset = cluster_alloc_swap_entry(si, order, usage);
10461060

@@ -1049,6 +1063,8 @@ static int cluster_alloc_swap(struct swap_info_struct *si,
10491063
slots[n_ret++] = swp_entry(si->type, offset);
10501064
}
10511065

1066+
si->flags -= SWP_SCANNING;
1067+
10521068
return n_ret;
10531069
}
10541070

0 commit comments

Comments
 (0)