Skip to content

Commit 9a1565a

Browse files
berrangeotubo
authored andcommitted
seccomp: don't kill process for resource control syscalls
The Mesa library tries to set process affinity on some of its threads in order to optimize its performance. Currently this results in QEMU being immediately terminated when seccomp is enabled. Mesa doesn't consider failure of the process affinity settings to be fatal to its operation, but our seccomp policy gives it no choice in gracefully handling this denial. It is reasonable to consider that malicious code using the resource control syscalls to be a less serious attack than if they were trying to spawn processes or change UIDs and other such things. Generally speaking changing the resource control setting will "merely" affect quality of service of processes on the host. With this in mind, rather than kill the process, we can relax the policy for these syscalls to return the EPERM errno value. This allows callers to detect that QEMU does not want them to change resource allocations, and apply some reasonable fallback logic. The main downside to this is for code which uses these syscalls but does not check the return value, blindly assuming they will always succeeed. Returning an errno could result in sub-optimal behaviour. Arguably though such code is already broken & needs fixing regardless. Signed-off-by: Daniel P. Berrangé <[email protected]> Reviewed-by: Marc-André Lureau <[email protected]> Signed-off-by: Eduardo Otubo <[email protected]>
1 parent 49fc899 commit 9a1565a

File tree

1 file changed

+25
-7
lines changed

1 file changed

+25
-7
lines changed

qemu-seccomp.c

+25-7
Original file line numberDiff line numberDiff line change
@@ -121,20 +121,37 @@ qemu_seccomp(unsigned int operation, unsigned int flags, void *args)
121121
#endif
122122
}
123123

124-
static uint32_t qemu_seccomp_get_kill_action(void)
124+
static uint32_t qemu_seccomp_get_action(int set)
125125
{
126+
switch (set) {
127+
case QEMU_SECCOMP_SET_DEFAULT:
128+
case QEMU_SECCOMP_SET_OBSOLETE:
129+
case QEMU_SECCOMP_SET_PRIVILEGED:
130+
case QEMU_SECCOMP_SET_SPAWN: {
126131
#if defined(SECCOMP_GET_ACTION_AVAIL) && defined(SCMP_ACT_KILL_PROCESS) && \
127132
defined(SECCOMP_RET_KILL_PROCESS)
128-
{
129-
uint32_t action = SECCOMP_RET_KILL_PROCESS;
133+
static int kill_process = -1;
134+
if (kill_process == -1) {
135+
uint32_t action = SECCOMP_RET_KILL_PROCESS;
130136

131-
if (qemu_seccomp(SECCOMP_GET_ACTION_AVAIL, 0, &action) == 0) {
137+
if (qemu_seccomp(SECCOMP_GET_ACTION_AVAIL, 0, &action) == 0) {
138+
kill_process = 1;
139+
}
140+
kill_process = 0;
141+
}
142+
if (kill_process == 1) {
132143
return SCMP_ACT_KILL_PROCESS;
133144
}
134-
}
135145
#endif
146+
return SCMP_ACT_TRAP;
147+
}
148+
149+
case QEMU_SECCOMP_SET_RESOURCECTL:
150+
return SCMP_ACT_ERRNO(EPERM);
136151

137-
return SCMP_ACT_TRAP;
152+
default:
153+
g_assert_not_reached();
154+
}
138155
}
139156

140157

@@ -143,7 +160,6 @@ static int seccomp_start(uint32_t seccomp_opts)
143160
int rc = 0;
144161
unsigned int i = 0;
145162
scmp_filter_ctx ctx;
146-
uint32_t action = qemu_seccomp_get_kill_action();
147163

148164
ctx = seccomp_init(SCMP_ACT_ALLOW);
149165
if (ctx == NULL) {
@@ -157,10 +173,12 @@ static int seccomp_start(uint32_t seccomp_opts)
157173
}
158174

159175
for (i = 0; i < ARRAY_SIZE(blacklist); i++) {
176+
uint32_t action;
160177
if (!(seccomp_opts & blacklist[i].set)) {
161178
continue;
162179
}
163180

181+
action = qemu_seccomp_get_action(blacklist[i].set);
164182
rc = seccomp_rule_add_array(ctx, action, blacklist[i].num,
165183
blacklist[i].narg, blacklist[i].arg_cmp);
166184
if (rc < 0) {

0 commit comments

Comments
 (0)