-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Debugger: New Feature: source-level debugging #13444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
To somewhat belatedly provide feedback on this, I have a really bad feeling about this new file format. It's a custom binary format which the MAME debugger itself can't create, so if support from other tools never flourishes, the whole thing stands to rot. The coupling to MAME's CPU state interface will also easily get worse considering the number of CPU types MAME supports. Speaking from the standpoint of someone interested in reverse engineering, I would much prefer a text file format for enhanced debugging information. The increased parsing overhead for something like JSON should matter less in this context than the ability to easily create and edit files without specialized tools. |
First off, source-level debugging is a top ask from Apple II users who run MAME or Ample, so I love that someone's done something with it. By way of some initial feedback, I'm not a fan of having separate source and assembly step and run commands. The majority of debuggers automatically show source if it's available for the current program counter and assembly if not. |
Regarding the file format, I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them. |
The trouble with this analogy is that MAME does not literally "read whatever the assembler or compiler outputs for a given system" in most cases. The primary purpose of object files is for their sections to be linked together and loaded into memory. For virtually any object file format sophisticated enough to even allow for extra debugging information, that is going to be performed not by MAME itself but by the emulated operating system or monitor program. I don't think storing debugging information in binary object files is the best solution here, but if that's to be supported, I think it would be better to use an existing format like ELF that provides robust debugging support for a wide range of architectures. |
Yes, I hear you on the concern that tools-writers must choose to support the file format for the feature not to rot away. However we slice it, there are platforms with popular tools that do not generate sufficient information for source-level debugging (in any form), and they would need to change one way or another for this feature to work. My plan is, should this feature be accepted into MAME, focus first on the CoCo / Dragon tools I've already been prototyping for: lwtools assembler, ugBasic compiler, and CMOC (C-like) compiler. I would work on PRs for them which turn my prototype code into real code they'd be willing to accept. (the ugBasic developer already expressed interest in supporting this feature back in October). Once CoCo / Dragon have decent support sufficiently visible, I would hope that interest would start growing to other platforms, and I'd be happy to help out other interested platforms with integrating the support. Point being that I'm expecting to continue to actively build up this feature from the tools end, as that is crucial.
Are you referring to the register #defines? They can be de-coupled via a lookup table if we agree that's the right way to go. Sorry if I'm misinterpreting.
I have considered using text instead of binary for the format, and I generally tend to lean toward binary for a few reasons. Binary is unambiguous to specify and easier to parse. Text tends to be verbose, and with json in particular, there's the repeating field names, which can add up with a tool like CMOC when debugging information is generated for the entire standard library. I also worry that a text format will give a false impression to tools vendors that it would be easier to hand-roll it themselves instead of using a library we provide which is guaranteed to do it right. There could be subtle things like which fields or sections are optional, or how to manage cross-references between sections, which could cause frustrating surprises down the road for the tool writer. A binary format is a signal to go down the (right) path of using a generator library instead. I would be curious to understand more your scenario of doing reverse engineering, and how that would benefit from a text format. My first thought would be to provide a simple tool that translates between a text and the binary format to give a human the ability to manually manipulate the file. (Though I wouldn't recommend build tools taking advantage of this.) |
@rb6502 :
Having separate source vs. disassembly stepping commands allow for scenarios where you're doing source-debugging, but temporarily want to step through individual assembly instructions (e.g., to understand what the compiler just did, or to diagnose a potential issue with the compiler). I've seen modern debuggers handle this by choosing the stepping type based on which window is active (thus, the MAME keyboard & menu commands adapt in the same way, based on what the main window is showing). For purposes of use of console or scripting, we'd need an unambiguous way to specify what type of stepping is requested. |
Yeah, as @ajrhacker mentioned, some tools simply don't provide sufficient information in any form to enable source-level debugging. The idea of adopting an existing format, such as ELF object files (and specifically the DWARF debugging format embedded inside them) is something which I've also considered and am not recommending. A quick summary why:
Overall, one of my goals is to make it as easy as possible for tools to participate. DWARF introduces significnant obstacles, without much upside. |
Curious if anyone has started reviewing the code, or if there's anything I can do to make a review more manageable? |
It's been 4 weeks, but it doesn't look like a review has started. I'm willing to put in more work to move this forward, both on this PR and on the tools side. Is it possible to start a conversation with a reviewer? Or is there something about the idea of source-level debugging that is fundamentally incompatible with MAME's vision? |
There appears to be conflicts with your branch and the main branch that need to be fixed. Once those are sorted out, you can have the dev team review it. |
The conflicts you saw had just arisen after a commit March 31, almost 4 weeks after I opened this pr (the pr was opened clean). I just fixed those conflicts, but new conflicts will likely crop up as long as the pr is open. As I mentioned I'm willing to put in a lot of additional work on this, but it would help if I knew whether the idea of source-code-level debugging is something the owners would even consider. |
As I said above, I'm personally enthusiastic about having source-level debugging in MAME. Standalone computer and even console emulators are increasingly using those kinds of capabilities as differentiators and we're fairly well positioned to have it across a lot of platforms. Speaking for myself, I would have preferred MAMEdev be in the conversation much earlier instead of having a big pile of code dropped on us at once. I dislike doing full-stack code reviews at work too, it's nothing personal :-) I do think this might be a good candidate for something I've been thinking about recently: a staging repo similar to WINE where we can put things that we're not comfortable dropping directly into master but would like to see get there and hopefully get more eyes and testing on first. @galibert and @cuavas thoughts? |
Thanks for the message, and yeah, I realize I'm kind of hoisting a whole bunch of work (unannounced!) on the owners. 😵 I figured a complete, working PR might be a more compelling prospect than trying to start a conversation without much to show for it. But of course this has its own disadvantages! If there's anything I can do to lighten the load, like an audio call (Discord?), or a set of testable scenarios you can run, etc., please let me know. I'm also open to a staged release, etc. |
I agree that it would be better to support elf/dwarf.
Even if you are using tools that don't support elf output, writing a
converter to output elf would be a better use of your time than a custom
binary format only used by MAME.
…On 10/03/2025 17:35, ajrhacker wrote:
I don't think storing debugging information in binary object files is
the best solution here, but if that's to be supported, I think it
would be better to use an existing format like ELF that provides
robust debugging support for a wide range of architectures.
|
In case you missed them, I listed some of my concerns about using DWARF above. I'm curious what your thoughts are about those? I'm open to doing whatever we think makes the most sense for MAME as a whole. When we chatted about source-level debugging on Discord last year, a couple scenarios came up, including Z80 C compilers (SDCC’s z80 target) and reverse-engineering (Ghidra). SDCC’s DWARF support looks too limited to be applicable here, but Ghidra seems to have some kind of support through a plug-in. Is that what you have in mind, or are there other scenarios that's motivating an interest in DWARF? I'd like to read further on them. |
Hey, just a friendly reminder that I'm still here and happy to help when you're ready to start reviewing the code. Also happy to continue the above discussion when you're ready. |
Hi @dave-br, I'm not part of the dev team so I'm not going to attempt to review the entire PR. I am curious whether you had considered using the gdbstub for source-level debugging? It is already possible to have source-level debugging with IDA Pro (even decompiling on certain platforms), and I suspect it would also work with Ghidra. I have seen people using the gdbstub through Visual Studio Code for source-level debugging of z80. You can also use it with any gdb client that is targeted for the platform you're debugging. By using alternative debuggers, the heavy lifting of parsing file formats and dealing with symbols is already done for you. I'm aware that the gdbstub could be improved (for example, you need to relaunch mame every time you want to attach), and I'd be happy to improve that if people can tell me their use-cases and how it could be improved. Regarding the new debugging information file format, this makes it harder for existing debuggers to support MAME files. Whereas if you target an existing format like DWARF, you will get functionality with other debuggers for free. Yes, it is a complex format, but in the long run it will benefit everyone. Also I think it might help to split the PR into multiple commits. This makes it easier to review and merge incrementally. Nobody has the patience to review 6000 lines of code in one go! :P |
Hi, @ramiropolla, thank you for the input! A while back I tried gdbstub with gdb-multiarch against 6809. I am able to set breakpoints, but cannot step or view registers. It is possible I need some kind of unofficial fork of gdb to work with 6809 properly? To be honest, even if I could get it to work, I would still be pushing to get source-level debugging working directly in MAME's built-in debuggers. It is so much easier using something built in and documented vs. going on a hunt for the right version of gdb to install, a GUI front end to use, and then figuring out how to link it together properly. The built-in debuggers are a nice differentiator for MAME, and I'd like to see them continue to improve. I looked into Ghidra 6809 support last month. Ghidra includes per-processor data files used by its Java classes to implement DWARF support. But those data files do not exist for 6809, 6502, or any 8-bit CPU I could recognize. There is also a ghidra2dwarf plugin, but it depends on Ghidra's built-in DWARF support, so it doesn't add anything there. This may be because DWARF itself is not a complete specification--it relies on the ABI authoring committee for each architecture to define the mapping from registers to numbers. And I have not been able to find any such mapping for the 6809. There is a commonly-used port of gcc for 6809, but even that does not generate DWARF (only STABS). That said, I'm intrigued that you saw folks doing source-level debugging of z80 via gdbstub. I would be interested to research the toolset involved if you can point me in the right direction. There may well be scenarios where MAME supporting DWARF is beneficial. But I'd like to see a solution for all processors. Perhaps that solution is DWARF for some CPUs, and a custom format for others. Thank you for your idea of splitting into multiple commits. This is similar to @rb6502's idea of a staging repo where things can be approved & merged piecemeal. I'd be happy to engage with the dev team to come up with a plan for reviewing that they would find feasible. |
I have not reviewed this exhaustively (i.e. - I have not tried building and running these changes), nor am I in a good position to review platforms that I am not familiar with (e.g. - macOS/Cocoa stuff), nor do I have any specific expertise in debugging formats, so bear that in mind when reviewing this feedback. I think there are small opportunities to commit uncontraversial/mundane aspects of this PR in independent of the overall PR. These are small things (e.g. - changing There is a great deal of chatter about the merits of DWARF, and text-vs-binary file formats. Its clear that there are people on this thread that have thought about this way more than I have, and I'm well aware of the pitfalls of "big" file formats with myriad features (don't get me started about PDF), and the fact that these formats have not been really brought into the retrocomputing world. But I do have great concerns about introducing a new MAME-centric debugging symbol format (especially binary formats) and trying to implicitly sponsor an ecosystem of tools around that format. MAME is already huge, and all else being equal I'd rather try to avoid seeing it accrete more (I know this from experience; This feels like a "pick your poison" scenario. I'm absolutely sure that DWARF is massive, and that in practice any DWARF support we might create would in practice be a subset of that format. But simply using a format that already exists seems to be a good idea, and we can "infill" that support as needed. If we have to declare things like register mappings for CoCo 6[8|3]09 or in absense of that some MAME defaults, then lets do that. |
It would be interesting to see what would need to happen to get to a point where:
|
Thank you for your interest! As for the less controversial item, I agree that even if we just got the enums for m_views indices committed and nothing else, that would be a net positive for MAME. I'll take a look at extracting that part. I'll also take another look through the PR for similar items, though I wouldn't get my hopes up too much on finding many. I can't think of anything else off-hand that is similarly self-contained and noncontroversial. As for DWARF, there is some discussion on that right now in the MAME Discord, so I'll comment here more on that later. |
Regarding debugging file formats, here are my preferences (in order):
|
@dave-br, I suggest you could try using the |
(As described at PR mamedev#13444)
@dave-br, I have rebased your code against latest git These commands should clean up this PR: (note: I presume that you already have a remote called
|
I would be grateful if someone could review the commands I posted above to make sure they're correct before @dave-br uses them. Just to be sure I did not make any silly mistake. I hope I'm helping here with this. |
@felipesanches Thank you for the help, and I agree a rebase would be cleaner. I started off doing rebases, and then got lazy at some point and just did regular merges. It looks like in your version of pr-sd, in scripts/src/emu.lua, these entries If you'd like another go at it, let me know, otherwise I can fix it up. |
(As described at PR mamedev#13444)
Done. The commit on my fork was amended now. |
@felipesanches I completed your steps, and things look good so far. Thank you for your help! |
(As described at PR mamedev#13444)
(As described at PR mamedev#13444)
Quick update...
I do see the new merge conflict. I will continue occasionally checking for these and fixing them. |
Crazy idea - support DWARF (even if it is not exhaustive support), but provide an option for the user to provide a register assignment? |
(As described at PR mamedev#13444)
I think I see where you're coming from on that, it provides a way we can tell users to unstuck themselves if conflicting register maps arise. But it would be messy trying to come up with a way an end user can specify a custom map for any supported CPU. And really should the burden be on the end user? My preferred solution would be to see if the DWARF committee could somehow host our register mapping to make it findable and “official”. But solving the undefined register mapping problem sidesteps a more general problem I alluded to earlier on this PR, and that was recently confirmed for me more explicitly: Different compilers generate incompatible DWARF information (even when register mappings are officially defined for that architecture). One of the responders from the discussion list wrote this: “For our debugger, you'd be surprised how much we depend on knowing the exact compiler producer and compiler version in order to interpret the DWARF correctly.” Debuggers can’t “support DWARF”; debuggers support specific DWARF producers. This is why it's important to choose exact scenarios to enable. For example, as discussed on the Discord, eventually it will make sense to talk about supporting modern GCC builds targeting PS1. For now, though, I have not encountered any DWARF-producing assemblers / compilers for the 6809. No ports of gcc or gdb, lwtools, ugBasic, or Ghidra. My guess is that bringing the complexity of DWARF to the 6809 would be counterproductive not only to getting tools owners onboard, but also to getting this already large PR off the ground. It would introduce significant extra code to maintain in the MAME code base (and likely bugs). And I don’t see it buying us anything. For anyone unfamiliar with the complexity of DWARF, I'd recommend spending about 10 minutes scanning through the DWARF 5 specification or the API's for the DWARF consuming and producing libraries just to get a feel for the scope. This is something to take on only if there are sufficiently compelling scenarios that offset the cost. The good news is that this PR already has clear interface boundaries between the file format itself, the low-level code that reads / writes the file format, and the debugger code that queries line-mapping and symbol info. So, should we deem it worthwhile, we have the flexibility to support specific DWARF producers in the future without affecting the higher-level debugger code. |
(As described at PR mamedev#13444)
(As described at PR mamedev#13444)
(As described at PR mamedev#13444)
SUMMARY:
This feature is targeted at MAME debugger users who have access to original source code that is assembled or compiled for emulated machines (for example, developers of new games to run on emulated machines). Add the ability to view, set breakpoints in, and step through the original source code instead of just the disassembly. Add symbols from the original source to MAME’s symbol tables for expression evaluation. Mostly useful for earlier 8-bit machines, tested with Tandy CoCo 2 and 3.
An early video I sent around the MAME Discord demonstrates what this looks like: https://youtu.be/2tu4t2bBjzo
COMPONENTS:
IMPLEMENTATION DETAILS:
GUI: new menu items to toggle between showing the source and showing the disassembly inside the main console debugger window. Free-floating disassembly windows remain unchanged. When source is shown, pre-existing keyboard shortcuts for stepping or setting breakpoints automatically invoke the corresponding source-level commands. When disassembly is shown in the main console debugger window, those shortcuts revert back to the old disassembly stepping commands.
Source level stepping: implementation reuses the corresponding disassembly stepping command, but with “slipping” at the end to ensure the stepping ends at a reasonable location in the source.
Symbol tables: add 2 new symbol tables, one for local variables from source, and one for global variables from source, chained in front of the pre-existing CPU and global symbol table. Support case sensitive symbol lookup, falling back to case insensitive symbol lookup as necessary. Source-level symbols, when present, eclipse any conflicting pre-existing symbols, but syntax is provided (“ns\”) to allow users to force references to pre-existing symbols.
File format: source level debugging information is stored in .mdi files. These act as containers, which can theoretically house different underlying formats, though only one format (“simple”) is supported so far in this pull request. I am familiar with 6809 and the TRS-80 CoCo, and believe this simple format is sufficient for that machine. I expect it will be sufficient for other similar machines and processors. I also allow for the possibility that experts in other machines might know of other (possibly incompatible) needs. Thus, .mdi files can be used to store other formats that better support fundamentally different machines. Inside the debugger implementation is an internal interface that can be used to read this simple format, and can be extended to read new formats that might be invented as necessary. The goal is to keep as small a quantity of debugger code as possible format-specific, with the remainder of the debugging code simply querying the interface without any knowledge of the underlying format.
File format library (mame_srcdbg_[static/shared]): this library is intended to be consumed by cross assemblers and cross compilers that target emulated machines. I have tested it so far with a 6809 assembler (lwasm / lwlink), a 6809 C compiler (CMOC), and a multi-platform basic compiler (ugBasic, though only the 6809-targeting compiler so far). The library is a C++ library with a pure C interface, so it can be consumed by tools written in either C or C++. Both a static and shared version of the library are built, so in theory even non C/C++ tools could dynamically load and call into the shared library, assuming the language supports that. MAME itself and the srcdbgdump tool link to the static version of this library for reading the format.
TODOs: I intentionally left a couple TODOs in the code for discussion with the reviewers.