Skip to content

feat(gdb): Multithreading support#1298

Merged
jounathaen merged 6 commits intohermit-os:mainfrom
fogti:gdb-stub-inline
Mar 18, 2026
Merged

feat(gdb): Multithreading support#1298
jounathaen merged 6 commits intohermit-os:mainfrom
fogti:gdb-stub-inline

Conversation

@fogti
Copy link
Copy Markdown
Contributor

@fogti fogti commented Mar 10, 2026

This is an experiment.

Supersedes #1196.
Fixes #1088.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 85.39604% with 59 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.58%. Comparing base (7dea91c) to head (c124f38).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/linux/mod.rs 64.91% 20 Missing ⚠️
src/linux/gdb/resume.rs 86.13% 14 Missing ⚠️
src/lib.rs 0.00% 12 Missing ⚠️
src/linux/gdb/mod.rs 92.20% 12 Missing ⚠️
src/linux/gdb/breakpoints.rs 98.41% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1298      +/-   ##
==========================================
+ Coverage   75.79%   76.58%   +0.78%     
==========================================
  Files          27       29       +2     
  Lines        4033     4266     +233     
==========================================
+ Hits         3057     3267     +210     
- Misses        976      999      +23     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@fogti fogti force-pushed the gdb-stub-inline branch 3 times, most recently from 4659754 to c3cfe88 Compare March 10, 2026 23:21
@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 10, 2026

The commits before the multithreading [WIP] one do afaik work as-is (and are mergable, maybe I'll port them to a separate branch).

The rest is still a bit WIP while I and @n0toose iron out the remaining bugs.

@fogti fogti changed the title chore(gdb): inline run_blocking to make further refactoring easier feat(gdb): Multithreading support Mar 11, 2026
Comment thread src/linux/gdb/mod.rs Outdated
@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 11, 2026

Besides the mentioned issue (which is a fixable race-condition), this should now mostly work.

@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 11, 2026

And the failure that I think stems from the issue mentioned above is:

/tmp/.tmphJS6yJ/commands:5: Error in sourced command file:
Remote connection closed
test gdb ... FAILED

failures:

---- gdb stdout ----
Building test kernel: gdb
Waiting for a local GDB connection on port 1234...
Debugger connected from 127.0.0.1:41290

thread '<unnamed>' panicked at src/linux/mod.rs:123:26:
GDB incoming_data error: GdbStubError { kind: PacketParse(MalformedCommand) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'gdb' panicked at tests/gdb.rs:96:5:
assertion failed: status.success()


failures:
    gdb

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.96s

error: test failed, to rerun pass `--test gdb`

@fogti fogti mentioned this pull request Mar 11, 2026
@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 11, 2026

Successful gdb invocation now look like this:

(gdb) break multi_thread::main
Breakpoint 1 at 0x26b787: file src/bin/multi-thread.rs, line 14.
(gdb) target remote :6677
Remote debugging using :6677
0x000000001f4e7240 in _start ()
(gdb) c
Continuing.

Breakpoint 1, multi_thread::main () at src/bin/multi-thread.rs:14
14		println!("Multi threading test");
(gdb) step 100
core::alloc::layout::Layout::pad_to_align (self=0x400002519e98)
    at /home/fogti/.rustup/toolchains/nightly-2025-06-01-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/alloc/layout.rs:339
339	       let new_size = self.size_rounded_up_to_custom_align(self.align);
(gdb) c
Continuing.
[Inferior 1 (process 1) exited normally]
(gdb) exit

@n0toose n0toose requested review from jounathaen and mkroening March 11, 2026 08:25
Comment thread src/linux/mod.rs
Comment thread src/linux/mod.rs Outdated
@fogti fogti marked this pull request as ready for review March 11, 2026 09:40
@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 11, 2026

There are still cases where this runs into

/tmp/.tmpLEcOLe/commands:5: Error in sourced command file:
Remote connection closed
test gdb ... FAILED

failures:

---- gdb stdout ----
Building test kernel: gdb
Waiting for a local GDB connection on port 1234...
Debugger connected from 127.0.0.1:41622

thread '<unnamed>' (21473) panicked at src/linux/mod.rs:121:26:
GDB incoming_data error: GdbStubError { kind: PacketParse(MalformedCommand) }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread 'gdb' (21446) panicked at tests/gdb.rs:96:5:
assertion failed: status.success()


failures:
    gdb

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.30s

error: test failed, to rerun pass `-p uhyve --test gdb`

but idk what causes those.

@fogti fogti force-pushed the gdb-stub-inline branch from de81fe2 to 43977df Compare March 11, 2026 09:43
@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 11, 2026

uhyve-gdb-malformed-01.pdf
is an example TCP stream of a session that ended with the above malformed packet error.

@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 11, 2026

uhyve-gdb-not-malformed-01.pdf
And an example of a successful run.

@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 11, 2026

ohno

bad:  $vCont;s:p1.-358a3940#bf
good: $vCont;s:p1.527f66c0#c4

We might have to truncate thread ids to 31 bits instead of 32 bits...

@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 11, 2026

Looks good now.

@n0toose
Copy link
Copy Markdown
Member

n0toose commented Mar 11, 2026

We might have to truncate thread ids to 31 bits instead of 32 bits...

Yeah, I just noticed that when looking at this:

image

Comment thread src/linux/gdb/mod.rs Outdated
Comment thread src/linux/gdb/mod.rs Outdated
Comment thread src/linux/gdb/mod.rs Outdated
}

impl Target for GdbUhyve {
impl Freewheel {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to move the Freewheel impls up to be directly after the struct declaration.

Comment thread src/linux/gdb/mod.rs Outdated
Comment thread src/linux/mod.rs Outdated
Comment thread src/lib.rs Outdated
@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 16, 2026

Unfortunately, a force-push was necessary to get rid of the merge-conflict.

@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 16, 2026

cc @n0toose I had to squash fogti#1, I hope that's okay.

also, don't forget to provide me your latest results with regarding the GDB architecture-independence stuff.

@jounathaen jounathaen mentioned this pull request Mar 17, 2026
3 tasks
@jounathaen
Copy link
Copy Markdown
Member

I've created a follow-up issue so we can merge this now: #1314

@fogti @n0toose should we keep the 35 commits, or is squashing the way to go?

@n0toose
Copy link
Copy Markdown
Member

n0toose commented Mar 17, 2026

I think separating concerns would work better and would suggest that not expanding the original scope of the PR would work best.

I currently have rather limited capacities; I can't predictably tell when I'd have the time to combine some of the changes with the tree that was worked on in parallel. Planning accordingly would be my advice.

@n0toose
Copy link
Copy Markdown
Member

n0toose commented Mar 17, 2026

squashing the way to go

I'd be for squashing, e.g. for the typo fixes.

@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 17, 2026

Should I squash this PR and put the squashed version on this branch, or should I leave this as-is and let @jounathaen squash-merge this?

@n0toose
Copy link
Copy Markdown
Member

n0toose commented Mar 18, 2026

Former, I believe.

@jounathaen
Copy link
Copy Markdown
Member

Please squash it.

@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 18, 2026

Okay, before I do that, I first try to merge this into the arch-indep branch version, in order to not have to untangle that later.

@fogti fogti force-pushed the gdb-stub-inline branch 2 times, most recently from 1dc4a0e to 2a8c716 Compare March 18, 2026 09:21
@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 18, 2026

Okay, squashed all the chore/nit commits into one, and added reference URLs for the gdbstub Tid oddity.

This is a combination of 30 commits.

nit(gdb): remove single gdb restriction
feat(gdb): set affinity in threads, fetch vcpu id from object
nit(gdb): Improve cpu_affinity handling
nit(gdb): have swap use AcqRel ordering

fix(gdb): GDB truncates thread-ids supplied by us, so truncate them before supplying them to GDB to ensure consistency

```
[TRACE uhyvelib::linux::gdb] tid2vcpu = {140098616854208: 0}
[TRACE uhyvelib::linux] Active thread: 140098616854208
[TRACE uhyvelib::linux::gdb] tid_to_vcpuw(1078625984)

thread 'main' (4706) panicked at src/linux/gdb/mod.rs:237:37:
no entry found for key
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
```

```
>>> 140098616854208 & 1078625984
1078625984
>>> 140098616854208 ^ 1078625984
140097538228224
>>> x = 140098616854208 ^ 1078625984
>>> f"{x:#x}"
'0x7f6b00000000'
```

fix(gdb): Fix most cases of GDB not being able to interrupt the target

These were actually multiple issues in a trenchcoat:
- set_scheduler_lock is called very late (often just before resume)
- for cleanliness, we now always check if we're stopped (per vcpu), even on first iteration
- wait in the vcpu threads for the gdb connection to be fully initialized,
  to ensure that the breakpoints are set correctly (`full_init_event`)

It appears that this leads to a breakpoint getting hit in _start,
but at least it otherwise appears to work perfectly almost always.
Remaining failures are probably due to the non-intended treatment of `full_init_event`
(it should've been paired with an `AtomicBool` to ensure spurious wakeups don't lead to protocol errors).

fix(gdb): Truncate TIDs to be provided to GDB to 31 bits

See also:
- daniel5151/gdbstub#193
- https://sourceware.org/bugzilla/show_bug.cgi?id=33979

nit(gdb): Rename 'struct Freewheel' to 'struct GdbVcpuManager'
nit(gdb): Rename 'fn backend_report_invalid_value' to 'fn backend_invalid_value'
nit(gdb): Rename Tid resolvers
nit(gdb): Fix VcpuWrapper::tid comment
nit(gdb): Rename 'ResumeMode::Freewheel' to 'ResumeMode::FreeWheeling'
nit(gdb/resume): renamse '.free_wheel' to '.r#continue'
nit(gdb): Rename '.finished_initializing' to '.set_finished_initializing'
chore(gdb): Mark GdbVcpuManager getters as non-pub

Co-authored-by: Panagiotis "Ivory" Vasilopoulos <git@n0toose.net>
@fogti fogti force-pushed the gdb-stub-inline branch from 2a8c716 to c124f38 Compare March 18, 2026 09:24
@fogti
Copy link
Copy Markdown
Contributor Author

fogti commented Mar 18, 2026

(sorry, had to fix the attribution in the second multithreading commit) (this is now ready to merge, afaik)

@fogti fogti added enhancement New feature or request rust Pull requests that update Rust code feature/debugger Features involving the debugging facilities of uhyve, i.e. the GDBstub implementation. labels Mar 18, 2026
@jounathaen jounathaen added this pull request to the merge queue Mar 18, 2026
@jounathaen
Copy link
Copy Markdown
Member

Thank you both!

Merged via the queue into hermit-os:main with commit eab0ac0 Mar 18, 2026
11 checks passed
@fogti fogti deleted the gdb-stub-inline branch March 18, 2026 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request feature/debugger Features involving the debugging facilities of uhyve, i.e. the GDBstub implementation. rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KickSignal: decouple pthread_kill, move to vCPU

3 participants