-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Debugger: New Feature: source-level debugging #13444
base: master
Are you sure you want to change the base?
Conversation
To somewhat belatedly provide feedback on this, I have a really bad feeling about this new file format. It's a custom binary format which the MAME debugger itself can't create, so if support from other tools never flourishes, the whole thing stands to rot. The coupling to MAME's CPU state interface will also easily get worse considering the number of CPU types MAME supports. Speaking from the standpoint of someone interested in reverse engineering, I would much prefer a text file format for enhanced debugging information. The increased parsing overhead for something like JSON should matter less in this context than the ability to easily create and edit files without specialized tools. |
First off, source-level debugging is a top ask from Apple II users who run MAME or Ample, so I love that someone's done something with it. By way of some initial feedback, I'm not a fan of having separate source and assembly step and run commands. The majority of debuggers automatically show source if it's available for the current program counter and assembly if not. |
Regarding the file format, I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them. |
The trouble with this analogy is that MAME does not literally "read whatever the assembler or compiler outputs for a given system" in most cases. The primary purpose of object files is for their sections to be linked together and loaded into memory. For virtually any object file format sophisticated enough to even allow for extra debugging information, that is going to be performed not by MAME itself but by the emulated operating system or monitor program. I don't think storing debugging information in binary object files is the best solution here, but if that's to be supported, I think it would be better to use an existing format like ELF that provides robust debugging support for a wide range of architectures. |
Yes, I hear you on the concern that tools-writers must choose to support the file format for the feature not to rot away. However we slice it, there are platforms with popular tools that do not generate sufficient information for source-level debugging (in any form), and they would need to change one way or another for this feature to work. My plan is, should this feature be accepted into MAME, focus first on the CoCo / Dragon tools I've already been prototyping for: lwtools assembler, ugBasic compiler, and CMOC (C-like) compiler. I would work on PRs for them which turn my prototype code into real code they'd be willing to accept. (the ugBasic developer already expressed interest in supporting this feature back in October). Once CoCo / Dragon have decent support sufficiently visible, I would hope that interest would start growing to other platforms, and I'd be happy to help out other interested platforms with integrating the support. Point being that I'm expecting to continue to actively build up this feature from the tools end, as that is crucial.
Are you referring to the register #defines? They can be de-coupled via a lookup table if we agree that's the right way to go. Sorry if I'm misinterpreting.
I have considered using text instead of binary for the format, and I generally tend to lean toward binary for a few reasons. Binary is unambiguous to specify and easier to parse. Text tends to be verbose, and with json in particular, there's the repeating field names, which can add up with a tool like CMOC when debugging information is generated for the entire standard library. I also worry that a text format will give a false impression to tools vendors that it would be easier to hand-roll it themselves instead of using a library we provide which is guaranteed to do it right. There could be subtle things like which fields or sections are optional, or how to manage cross-references between sections, which could cause frustrating surprises down the road for the tool writer. A binary format is a signal to go down the (right) path of using a generator library instead. I would be curious to understand more your scenario of doing reverse engineering, and how that would benefit from a text format. My first thought would be to provide a simple tool that translates between a text and the binary format to give a human the ability to manually manipulate the file. (Though I wouldn't recommend build tools taking advantage of this.) |
@rb6502 :
Having separate source vs. disassembly stepping commands allow for scenarios where you're doing source-debugging, but temporarily want to step through individual assembly instructions (e.g., to understand what the compiler just did, or to diagnose a potential issue with the compiler). I've seen modern debuggers handle this by choosing the stepping type based on which window is active (thus, the MAME keyboard & menu commands adapt in the same way, based on what the main window is showing). For purposes of use of console or scripting, we'd need an unambiguous way to specify what type of stepping is requested. |
Yeah, as @ajrhacker mentioned, some tools simply don't provide sufficient information in any form to enable source-level debugging. The idea of adopting an existing format, such as ELF object files (and specifically the DWARF debugging format embedded inside them) is something which I've also considered and am not recommending. A quick summary why:
Overall, one of my goals is to make it as easy as possible for tools to participate. DWARF introduces significnant obstacles, without much upside. |
SUMMARY:
This feature is targeted at MAME debugger users who have access to original source code that is assembled or compiled for emulated machines (for example, developers of new games to run on emulated machines). Add the ability to view, set breakpoints in, and step through the original source code instead of just the disassembly. Add symbols from the original source to MAME’s symbol tables for expression evaluation. Mostly useful for earlier 8-bit machines, tested with Tandy CoCo 2 and 3.
An early video I sent around the MAME Discord demonstrates what this looks like: https://youtu.be/2tu4t2bBjzo
COMPONENTS:
IMPLEMENTATION DETAILS:
GUI: new menu items to toggle between showing the source and showing the disassembly inside the main console debugger window. Free-floating disassembly windows remain unchanged. When source is shown, pre-existing keyboard shortcuts for stepping or setting breakpoints automatically invoke the corresponding source-level commands. When disassembly is shown in the main console debugger window, those shortcuts revert back to the old disassembly stepping commands.
Source level stepping: implementation reuses the corresponding disassembly stepping command, but with “slipping” at the end to ensure the stepping ends at a reasonable location in the source.
Symbol tables: add 2 new symbol tables, one for local variables from source, and one for global variables from source, chained in front of the pre-existing CPU and global symbol table. Support case sensitive symbol lookup, falling back to case insensitive symbol lookup as necessary. Source-level symbols, when present, eclipse any conflicting pre-existing symbols, but syntax is provided (“ns\”) to allow users to force references to pre-existing symbols.
File format: source level debugging information is stored in .mdi files. These act as containers, which can theoretically house different underlying formats, though only one format (“simple”) is supported so far in this pull request. I am familiar with 6809 and the TRS-80 CoCo, and believe this simple format is sufficient for that machine. I expect it will be sufficient for other similar machines and processors. I also allow for the possibility that experts in other machines might know of other (possibly incompatible) needs. Thus, .mdi files can be used to store other formats that better support fundamentally different machines. Inside the debugger implementation is an internal interface that can be used to read this simple format, and can be extended to read new formats that might be invented as necessary. The goal is to keep as small a quantity of debugger code as possible format-specific, with the remainder of the debugging code simply querying the interface without any knowledge of the underlying format.
File format library (mame_srcdbg_[static/shared]): this library is intended to be consumed by cross assemblers and cross compilers that target emulated machines. I have tested it so far with a 6809 assembler (lwasm / lwlink), a 6809 C compiler (CMOC), and a multi-platform basic compiler (ugBasic, though only the 6809-targeting compiler so far). The library is a C++ library with a pure C interface, so it can be consumed by tools written in either C or C++. Both a static and shared version of the library are built, so in theory even non C/C++ tools could dynamically load and call into the shared library, assuming the language supports that. MAME itself and the srcdbgdump tool link to the static version of this library for reading the format.
TODOs: I intentionally left a couple TODOs in the code for discussion with the reviewers.