-
Couldn't load subscription status.
- Fork 669
Description
Hello, I'm trying to checkpoint and restore a process tree with one of the sub processes in a different user namespace and it fails: my final goal is to make runc + user namespaces (rootless containers) work with criu checkpoint/restore.
I first hit something similar to the issue reported here:
ctrox/zeropod#96
#2651 -> this one is in stale status but definitely should not be.
So I tried to get runc out of the equation and narrow the issue down to the way criu interracts with user namespaces.
Here is a small script to reproduce the issue:
https://github.com/oOraph/userns_remap/
This code snippet creates a process in a different user namespace with root remapped to uid/gid 100000 (when argv[1] not provided). No other namespace is changed (in particular they share the same pid namespace)
# setsid /root/remap < /dev/null > /tmp/remap.log 2>&1 &
[1] 85986
#
[1]+ Done setsid /root/remap < /dev/null > /tmp/remap.log 2>&1
# ps faux | grep -A 1 remap
[...]
root 85987 0.0 0.0 2684 1580 ? Ss 09:57 0:00 /root/remap
100000 85989 0.0 0.0 6112 1920 ? S 09:57 0:00 \_ sleep 3600
Checkpointing it using criu works fine.
root@hostname:/tmp/dump# ~/criu/criu/criu dump -t $(ps aux | grep remap | grep -v grep | awk '{print $2}' | tee /tmp/dumped_pid) -vvv -o dump.log && echo OK
OK
root@hostname:/tmp/dump# cat /tmp/dumped_pid
85987
However, with the current criu-dev branch, when trying to restore the tree, the following issue occurs:
root@hostname:/tmp/dump# ~/criu/criu/criu restore -d -vvv && echo OK
[...]
(00.041764) Forking task with 85987 pid (flags 0x10000000)
(00.041773) Creating process using clone3()
(00.042017) PID: real 85987 virt 85987
(00.042193) Wait until namespaces are created
(00.042278) 85987: timens: monotonic -354 446598598
(00.042297) 85987: timens: boottime -354 446579771
(00.042373) Running setup-namespaces scripts
(00.042465) 85987: cg: Cgroup namespace inherited from parent
(00.042499) 85987: cg: Cgroups 1 inherited from parent
(00.042509) 85987: Calling restore_sid() for init
(00.042518) 85987: Restoring 85987 to 85987 sid
(00.042555) 85987: Error (criu/util.c:1621): Unable to open the proc file system: Operation not permitted
(00.042639) uns: calling exit_usernsd (-1, 1)
(00.042683) uns: daemon calls 0x5c7e76520e20 (88320, -1, 1)
(00.042697) uns: `- daemon exits w/ 0
(00.043505) Error (criu/cr-restore.c:1262): 85987 killed by signal 9: Killed
(00.043521) uns: daemon stopped
(00.043524) Error (criu/cr-restore.c:2324): Restoring FAILED.
So we hit an error when trying to mount /proc here:
Line 1614 in 08fa6c3
| if (mount_proc()) |
I looked at the criu processes right before we try to mount the procfs:
root@hostname:/tmp/dump# ps aux | grep criu
100000 85987 0.0 0.0 13356 2868 ? Ss 10:37 0:00 /root/criu/criu/criu restore -d -vvv
root 93006 0.0 0.0 13356 6552 pts/1 S+ 10:37 0:00 /root/criu/criu/criu restore -d -vvv
root 93008 0.0 0.0 13356 2108 pts/1 S+ 10:37 0:00 /root/criu/criu/criu restore -d -vvv
So we fail to mount the procfs to restore pid 86987 but I guess the error should be expected because at that point we are already in the new user namespace (and root in this namespace is actually user 100000, not 0 for the system), so it's too late no ?
Is this a bug or am I misunderstanding sth ?
Optional question: why do we try to mount /proc since in this case pid namespaces are shared by both the parent and the child ?
I guess this is to be as generic as possible and be able to handle all cases without the need to distinguish cases am I wrong ? (that is what the comment in code suggests at least)