-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Kernel update breaks loopbacks on ZFS volumes #17277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Using kernel-6.13.12 with Fedora, it fails but doesn't hang:
Error message in shell session:
Error message from journald:
Checking the filesystem:
Shows:
Similar results using ext4: Error reported in shell session:
journald:
I ran a check on the ext4 volume:
Which shows:
So file system checks succeed for both xfs and ext4 but will not mount. I had an encryted zvol created on a previous version of linux long ago.
|
Thank you for the reply, but unless I'm missing something, you are using a zvol, not a loopback image, though it does look like there's some issue with zvols and that kernel, but they appear to still be working normally for me with kernel 6.12.25. Problem I've seen is just with loopback images mounted on top of a zfs dataset, with Unraid they are mostly used to store the Docker image, regular folders or a dataset can also be used, and those still work fine with kernel 6.12.25, as do loopback images with btrfs or xfs, problem is only with zfs. |
I can reproduce this following the steps for xfs using: NixOS dataset properties:
git bisect 6.12.24..6.12.25 reveals:
Building 6.12.25 while reverting 78253d44e9d343258d7163ab70f4ecff4430a9b4 fixes this Other notes:
|
Just to confirm, is your backing disk ( |
Unfortunately it does not |
Possibly same issue for me. There exists an rootfs image located at encrypted zfs in zpool. I used Click to expand[5月 3 12:03] sysrq: Show Blocked State
[ +0.001393] task:txg_sync state:D stack:0 pid:49224 tgid:49224 ppid:2 task_flags:0x288040 flags:0x00004000
[ +0.000009] Call Trace:
[ +0.000002] <TASK>
[ +0.000004] __schedule+0x460/0x1ff0
[ +0.000022] ? ttwu_queue_wakelist+0xf9/0x110
[ +0.000008] ? try_to_wake_up+0x325/0x730
[ +0.000005] schedule+0x27/0xf0
[ +0.000003] schedule_timeout+0x84/0x100
[ +0.000003] ? __pfx_process_timeout+0x10/0x10
[ +0.000004] io_schedule_timeout+0x5b/0x90
[ +0.000006] __cv_timedwait_io+0xbe/0x150 [spl af12c84ae427751769114f0474b67e3a2df37a77]
[ +0.000014] ? __pfx_autoremove_wake_function+0x10/0x10
[ +0.000007] zio_wait+0x13a/0x350 [zfs de4dde959c5ceb29641386757f258284b9ad1d65]
[ +0.000248] dsl_pool_sync+0xe9/0x5c0 [zfs de4dde959c5ceb29641386757f258284b9ad1d65]
[ +0.000247] ? add_timer+0x183/0x210
[ +0.000005] spa_sync+0x597/0x1070 [zfs de4dde959c5ceb29641386757f258284b9ad1d65]
[ +0.000237] ? spa_txg_history_init_io+0x19d/0x1c0 [zfs de4dde959c5ceb29641386757f258284b9ad1d65]
[ +0.000219] txg_sync_thread+0x20b/0x3b0 [zfs de4dde959c5ceb29641386757f258284b9ad1d65]
[ +0.000216] ? __pfx_txg_sync_thread+0x10/0x10 [zfs de4dde959c5ceb29641386757f258284b9ad1d65]
[ +0.000195] ? __pfx_thread_generic_wrapper+0x10/0x10 [spl af12c84ae427751769114f0474b67e3a2df37a77]
[ +0.000011] thread_generic_wrapper+0x5a/0x70 [spl af12c84ae427751769114f0474b67e3a2df37a77]
[ +0.000008] kthread+0xec/0x230
[ +0.000004] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork+0x31/0x50
[ +0.000004] ? __pfx_kthread+0x10/0x10
[ +0.000002] ret_from_fork_asm+0x1a/0x30
[ +0.000004] </TASK>
[5月 3 12:04] loop0: detected capacity change from 0 to 1000000000
[5月 3 12:06] loop0: detected capacity change from 0 to 1000000000
[ +3.512477] EXT4-fs (loop0): mounted filesystem 82651462-72e2-47de-8fa1-c2a1be805445 r/w with ordered data mode. Quota mode: none.
[5月 3 12:08] INFO: task kworker/u80:14:965 blocked for more than 122 seconds.
[ +0.000005] Tainted: P OE 6.14.4-zen1-2-zen #1
[ +0.000000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000001] task:kworker/u80:14 state:D stack:0 pid:965 tgid:965 ppid:2 task_flags:0x4248060 flags:0x00004000
[ +0.000003] Workqueue: writeback wb_workfn (flush-7:0)
[ +0.000005] Call Trace:
[ +0.000001] <TASK>
[ +0.000001] ? __pfx_wbt_cleanup_cb+0x10/0x10
[ +0.000003] __schedule+0x460/0x1ff0
[ +0.000004] ? __pfx_wbt_inflight_cb+0x10/0x10
[ +0.000001] ? __pfx_wbt_cleanup_cb+0x10/0x10
[ +0.000001] io_schedule+0x57/0x140
[ +0.000001] rq_qos_wait+0xbc/0x130
[ +0.000002] ? __pfx_rq_qos_wake_function+0x10/0x10
[ +0.000001] ? __pfx_wbt_inflight_cb+0x10/0x10
[ +0.000001] wbt_wait+0xc3/0x180
[ +0.000001] __rq_qos_throttle+0x24/0x50
[ +0.000002] blk_mq_submit_bio+0x21c/0x990
[ +0.000003] __submit_bio+0xc3/0x270
[ +0.000002] submit_bio_noacct_nocheck+0x31f/0x400
[ +0.000002] ext4_io_submit+0x24/0x40
[ +0.000003] ext4_do_writepages+0x494/0x1120
[ +0.000004] ? ext4_writepages+0xab/0x170
[ +0.000001] ext4_writepages+0xab/0x170
[ +0.000003] do_writepages+0x87/0x280
[ +0.000003] ? __blk_rq_map_sg+0xb0/0x440
[ +0.000002] ? __sbitmap_get_word+0x2b/0x70
[ +0.000001] ? sbitmap_get+0x14f/0x390
[ +0.000002] __writeback_single_inode+0x41/0x350
[ +0.000001] writeback_sb_inodes+0x256/0x5d0
[ +0.000003] __writeback_inodes_wb+0x4c/0xf0
[ +0.000001] wb_writeback+0x323/0x3b0
[ +0.000002] wb_workfn+0x39a/0x5d0
[ +0.000001] ? finish_task_switch.isra.0+0x99/0x2e0
[ +0.000002] ? __schedule+0x468/0x1ff0
[ +0.000002] process_one_work+0x190/0x360
[ +0.000003] worker_thread+0x24f/0x380
[ +0.000001] ? __pfx_worker_thread+0x10/0x10
[ +0.000002] kthread+0xec/0x230
[ +0.000001] ? __pfx_kthread+0x10/0x10
[ +0.000001] ret_from_fork+0x31/0x50
[ +0.000003] ? __pfx_kthread+0x10/0x10
[ +0.000001] ret_from_fork_asm+0x1a/0x30
[ +0.000002] </TASK>
[ +0.000251] INFO: task dd:71207 blocked for more than 122 seconds.
[ +0.000001] Tainted: P OE 6.14.4-zen1-2-zen #1
[ +0.000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000000] task:dd state:D stack:0 pid:71207 tgid:71207 ppid:71206 task_flags:0x440100 flags:0x00004002
[ +0.000002] Call Trace:
[ +0.000001] <TASK>
[ +0.000000] ? __pfx_wbt_cleanup_cb+0x10/0x10
[ +0.000002] __schedule+0x460/0x1ff0
[ +0.000001] ? kmem_cache_alloc_noprof+0xe1/0x410
[ +0.000002] ? __pfx_wbt_inflight_cb+0x10/0x10
[ +0.000001] ? __pfx_wbt_cleanup_cb+0x10/0x10
[ +0.000001] io_schedule+0x57/0x140
[ +0.000001] rq_qos_wait+0xbc/0x130
[ +0.000001] ? __pfx_rq_qos_wake_function+0x10/0x10
[ +0.000001] ? __pfx_wbt_inflight_cb+0x10/0x10
[ +0.000001] wbt_wait+0xc3/0x180
[ +0.000001] __rq_qos_throttle+0x24/0x50
[ +0.000001] blk_mq_submit_bio+0x21c/0x990
[ +0.000002] __submit_bio+0xc3/0x270
[ +0.000002] submit_bio_noacct_nocheck+0x31f/0x400
[ +0.000002] ext4_io_submit+0x24/0x40
[ +0.000003] ext4_do_writepages+0x494/0x1120
[ +0.000003] ? ext4_writepages+0xab/0x170
[ +0.000001] ext4_writepages+0xab/0x170
[ +0.000002] do_writepages+0x87/0x280
[ +0.000002] ? file_tty_write.isra.0+0x20c/0x350
[ +0.000003] __filemap_fdatawrite_range+0xb0/0xd0
[ +0.000003] file_write_and_wait_range+0xc9/0x160
[ +0.000002] ext4_sync_file+0x86/0x3b0
[ +0.000003] __x64_sys_fdatasync+0x4c/0x90
[ +0.000002] do_syscall_64+0x7b/0x190
[ +0.000002] ? do_syscall_64+0x87/0x190
[ +0.000001] ? __x64_sys_write+0x71/0xf0
[ +0.000002] ? syscall_exit_to_user_mode+0x10/0x210
[ +0.000002] ? do_syscall_64+0x87/0x190
[ +0.000001] ? irq_exit_rcu+0x55/0x100
[ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000003] RIP: 0033:0x73869e62a006
[ +0.000029] RSP: 002b:00007fff2fb11120 EFLAGS: 00000202 ORIG_RAX: 000000000000004b
[ +0.000002] RAX: ffffffffffffffda RBX: 0000000000004200 RCX: 000073869e62a006
[ +0.000000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[ +0.000001] RBP: 00007fff2fb11140 R08: 0000000000000000 R09: 0000000000000000
[ +0.000001] R10: 0000000000000000 R11: 0000000000000202 R12: 000073869e5956c8
[ +0.000000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000080000
[ +0.000001] </TASK>
[5月 3 12:10] INFO: task kworker/u80:14:965 blocked for more than 245 seconds.
[ +0.000005] Tainted: P OE 6.14.4-zen1-2-zen #1
[ +0.000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000001] task:kworker/u80:14 state:D stack:0 pid:965 tgid:965 ppid:2 task_flags:0x4248060 flags:0x00004000
[ +0.000003] Workqueue: writeback wb_workfn (flush-7:0)
[ +0.000006] Call Trace:
[ +0.000001] <TASK>
[ +0.000001] ? __pfx_wbt_cleanup_cb+0x10/0x10
[ +0.000004] __schedule+0x460/0x1ff0
[ +0.000003] ? __pfx_wbt_inflight_cb+0x10/0x10
[ +0.000001] ? __pfx_wbt_cleanup_cb+0x10/0x10
[ +0.000001] io_schedule+0x57/0x140
[ +0.000002] rq_qos_wait+0xbc/0x130
[ +0.000002] ? __pfx_rq_qos_wake_function+0x10/0x10
[ +0.000001] ? __pfx_wbt_inflight_cb+0x10/0x10
[ +0.000001] wbt_wait+0xc3/0x180
[ +0.000001] __rq_qos_throttle+0x24/0x50
[ +0.000002] blk_mq_submit_bio+0x21c/0x990
[ +0.000003] __submit_bio+0xc3/0x270
[ +0.000002] submit_bio_noacct_nocheck+0x31f/0x400
[ +0.000002] ext4_io_submit+0x24/0x40
[ +0.000004] ext4_do_writepages+0x494/0x1120
[ +0.000004] ? ext4_writepages+0xab/0x170
[ +0.000001] ext4_writepages+0xab/0x170
[ +0.000003] do_writepages+0x87/0x280
[ +0.000003] ? __blk_rq_map_sg+0xb0/0x440
[ +0.000002] ? __sbitmap_get_word+0x2b/0x70
[ +0.000002] ? sbitmap_get+0x14f/0x390
[ +0.000001] __writeback_single_inode+0x41/0x350
[ +0.000002] writeback_sb_inodes+0x256/0x5d0
[ +0.000002] __writeback_inodes_wb+0x4c/0xf0
[ +0.000002] wb_writeback+0x323/0x3b0
[ +0.000001] wb_workfn+0x39a/0x5d0
[ +0.000001] ? finish_task_switch.isra.0+0x99/0x2e0
[ +0.000003] ? __schedule+0x468/0x1ff0
[ +0.000001] process_one_work+0x190/0x360
[ +0.000003] worker_thread+0x24f/0x380
[ +0.000002] ? __pfx_worker_thread+0x10/0x10
[ +0.000002] kthread+0xec/0x230
[ +0.000002] ? __pfx_kthread+0x10/0x10
[ +0.000001] ret_from_fork+0x31/0x50
[ +0.000002] ? __pfx_kthread+0x10/0x10
[ +0.000001] ret_from_fork_asm+0x1a/0x30
[ +0.000002] </TASK>
[ +0.000269] INFO: task dd:71207 blocked for more than 245 seconds.
[ +0.000001] Tainted: P OE 6.14.4-zen1-2-zen #1
[ +0.000001] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ +0.000000] task:dd state:D stack:0 pid:71207 tgid:71207 ppid:71206 task_flags:0x440100 flags:0x00004002
[ +0.000002] Call Trace:
[ +0.000001] <TASK>
[ +0.000000] ? __pfx_wbt_cleanup_cb+0x10/0x10
[ +0.000002] __schedule+0x460/0x1ff0
[ +0.000002] ? kmem_cache_alloc_noprof+0xe1/0x410
[ +0.000002] ? __pfx_wbt_inflight_cb+0x10/0x10
[ +0.000001] ? __pfx_wbt_cleanup_cb+0x10/0x10
[ +0.000001] io_schedule+0x57/0x140
[ +0.000001] rq_qos_wait+0xbc/0x130
[ +0.000001] ? __pfx_rq_qos_wake_function+0x10/0x10
[ +0.000002] ? __pfx_wbt_inflight_cb+0x10/0x10
[ +0.000001] wbt_wait+0xc3/0x180
[ +0.000001] __rq_qos_throttle+0x24/0x50
[ +0.000001] blk_mq_submit_bio+0x21c/0x990
[ +0.000002] __submit_bio+0xc3/0x270
[ +0.000002] submit_bio_noacct_nocheck+0x31f/0x400
[ +0.000002] ext4_io_submit+0x24/0x40
[ +0.000002] ext4_do_writepages+0x494/0x1120
[ +0.000003] ? ext4_writepages+0xab/0x170
[ +0.000002] ext4_writepages+0xab/0x170
[ +0.000001] do_writepages+0x87/0x280
[ +0.000002] ? file_tty_write.isra.0+0x20c/0x350
[ +0.000003] __filemap_fdatawrite_range+0xb0/0xd0
[ +0.000003] file_write_and_wait_range+0xc9/0x160
[ +0.000002] ext4_sync_file+0x86/0x3b0
[ +0.000002] __x64_sys_fdatasync+0x4c/0x90
[ +0.000002] do_syscall_64+0x7b/0x190
[ +0.000002] ? do_syscall_64+0x87/0x190
[ +0.000001] ? __x64_sys_write+0x71/0xf0
[ +0.000002] ? syscall_exit_to_user_mode+0x10/0x210
[ +0.000002] ? do_syscall_64+0x87/0x190
[ +0.000001] ? irq_exit_rcu+0x55/0x100
[ +0.000002] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000003] RIP: 0033:0x73869e62a006
[ +0.000033] RSP: 002b:00007fff2fb11120 EFLAGS: 00000202 ORIG_RAX: 000000000000004b
[ +0.000001] RAX: ffffffffffffffda RBX: 0000000000004200 RCX: 000073869e62a006
[ +0.000001] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
[ +0.000001] RBP: 00007fff2fb11140 R08: 0000000000000000 R09: 0000000000000000
[ +0.000001] R10: 0000000000000000 R11: 0000000000000202 R12: 000073869e5956c8
[ +0.000000] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000080000
[ +0.000001] </TASK> Note that I am an Arch user, the kernel version I am using is the latest one and zfs version is:
which cherry-picks the fixes to make it work on latest kernel. I am pretty sure that this has nothing to do with this issue because I reproduced this issue on the latest LTS kernel (6.12.25-2) (currently on the job and can not reboot to lts kernel right away). And after doing more tests, I found that this sync issue also persist on btrfs image. I wonder if the information I gave would help you to locate the problem. |
Possible fix in #17298. If it's right, then it turns out it's an ancient bug that this kernel change just happened to start tickling. Thanks you for bisecting it, very useful! |
Add a test case to reproduce issue openzfs#17277: 1. Make a pool 2. Write a file to the pool 3. Mount the file as a loopback device 4. Make an XFS filesystem on the loopback device 5. Mount the XFS filesystem... <hangs> Signed-off-by: Tony Hutter <[email protected]>
Add a test case to reproduce issue openzfs#17277: 1. Make a pool 2. Write a file to the pool 3. Mount the file as a loopback device 4. Make an XFS filesystem on the loopback device 5. Mount the XFS filesystem... <hangs> Signed-off-by: Tony Hutter <[email protected]>
System information
Describe the problem you're observing
This looks more like a kernel issue to me, but if I report it there I expect them to say that ZFS is not supported, so thought of reporting it here first to see what you think.
After updating the kernel from 6.12.24 to 6.12.25, loopbacks on a ZFS volume hang
Describe how to reproduce the problem
Create a new loopback image on a zfs volume, can be a single device zfs volume, format the loopback device xfs, attempt to mount it and it hangs:
There are no panic/crashes logged, it stops here:
Top shows that process using 100% CPU, it cannot be killed, and the server needs a hard reboot to recover:
P.S. if I format the loopback device btrfs, it doesn't hang on mount, but it does immediately after some i/o, reverting the kernel to 6.12.24 resolves the issue, if any other info is needed, please let me know.
Thanks.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: