Skip to content

Unable to restore a process tree with a child in a separate user namespace #2778

@oOraph

Description

@oOraph

Hello, I'm trying to checkpoint and restore a process tree with one of the sub processes in a different user namespace and it fails: my final goal is to make runc + user namespaces (rootless containers) work with criu checkpoint/restore.

I first hit something similar to the issue reported here:

ctrox/zeropod#96
#2651 -> this one is in stale status but definitely should not be.

So I tried to get runc out of the equation and narrow the issue down to the way criu interracts with user namespaces.

Here is a small script to reproduce the issue:
https://github.com/oOraph/userns_remap/

This code snippet creates a process in a different user namespace with root remapped to uid/gid 100000 (when argv[1] not provided). No other namespace is changed (in particular they share the same pid namespace)

# setsid /root/remap < /dev/null > /tmp/remap.log 2>&1 &
[1] 85986
# 
[1]+  Done                    setsid /root/remap < /dev/null > /tmp/remap.log 2>&1
# ps faux | grep -A 1 remap
[...]
root       85987  0.0  0.0   2684  1580 ?        Ss   09:57   0:00 /root/remap
100000     85989  0.0  0.0   6112  1920 ?        S    09:57   0:00  \_ sleep 3600

Checkpointing it using criu works fine.

root@hostname:/tmp/dump# ~/criu/criu/criu dump   -t $(ps aux | grep remap | grep -v grep | awk '{print $2}' | tee /tmp/dumped_pid) -vvv -o dump.log && echo OK
OK
root@hostname:/tmp/dump# cat /tmp/dumped_pid 
85987

However, with the current criu-dev branch, when trying to restore the tree, the following issue occurs:

root@hostname:/tmp/dump# ~/criu/criu/criu restore -d -vvv     && echo OK
[...]
(00.041764) Forking task with 85987 pid (flags 0x10000000)
(00.041773) Creating process using clone3()
(00.042017) PID: real 85987 virt 85987
(00.042193) Wait until namespaces are created
(00.042278)  85987: timens: monotonic -354 446598598
(00.042297)  85987: timens: boottime -354 446579771
(00.042373) Running setup-namespaces scripts
(00.042465)  85987: cg: Cgroup namespace inherited from parent
(00.042499)  85987: cg: Cgroups 1 inherited from parent
(00.042509)  85987: Calling restore_sid() for init
(00.042518)  85987: Restoring 85987 to 85987 sid
(00.042555)  85987: Error (criu/util.c:1621): Unable to open the proc file system: Operation not permitted
(00.042639) uns: calling exit_usernsd (-1, 1)
(00.042683) uns: daemon calls 0x5c7e76520e20 (88320, -1, 1)
(00.042697) uns: `- daemon exits w/ 0
(00.043505) Error (criu/cr-restore.c:1262): 85987 killed by signal 9: Killed
(00.043521) uns: daemon stopped
(00.043524) Error (criu/cr-restore.c:2324): Restoring FAILED.

So we hit an error when trying to mount /proc here:

if (mount_proc())

I looked at the criu processes right before we try to mount the procfs:

root@hostname:/tmp/dump# ps aux | grep criu
100000     85987  0.0  0.0  13356  2868 ?        Ss   10:37   0:00 /root/criu/criu/criu restore -d -vvv
root       93006  0.0  0.0  13356  6552 pts/1    S+   10:37   0:00 /root/criu/criu/criu restore -d -vvv
root       93008  0.0  0.0  13356  2108 pts/1    S+   10:37   0:00 /root/criu/criu/criu restore -d -vvv

So we fail to mount the procfs to restore pid 86987 but I guess the error should be expected because at that point we are already in the new user namespace (and root in this namespace is actually user 100000, not 0 for the system), so it's too late no ?
Is this a bug or am I misunderstanding sth ?

Optional question: why do we try to mount /proc since in this case pid namespaces are shared by both the parent and the child ?
I guess this is to be as generic as possible and be able to handle all cases without the need to distinguish cases am I wrong ? (that is what the comment in code suggests at least)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions