Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugger: New Feature: source-level debugging #13444

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

dave-br
Copy link
Contributor

@dave-br dave-br commented Mar 5, 2025

SUMMARY:

This feature is targeted at MAME debugger users who have access to original source code that is assembled or compiled for emulated machines (for example, developers of new games to run on emulated machines). Add the ability to view, set breakpoints in, and step through the original source code instead of just the disassembly. Add symbols from the original source to MAME’s symbol tables for expression evaluation. Mostly useful for earlier 8-bit machines, tested with Tandy CoCo 2 and 3.

An early video I sent around the MAME Discord demonstrates what this looks like: https://youtu.be/2tu4t2bBjzo

COMPONENTS:

  • Create specification for a simple MAME Debugging Information File format with source mapping and symbol information.
  • Provide a library (mame_srcdbg_static.a, mame_srcdbg_shared.so/.dll) for cross assemblers and cross compilers that target emulated machines to easily generate MAME Debugging Information Files.
  • Add command-line options for loading MAME Debugging Information Files
  • Add command-line tool (srcdbgdump) for dumping MAME Debugging Information Files
  • Add debugger console commands for source-level stepping
  • Add source-level global symbols, local “fixed” symbols (which are scoped but have “fixed”, constant values), and local “relative” symbols (which are scoped and have values determined by an offset to a register, e.g., stack-local symbols) to the debugger symbol tables.
  • Add source-file + line number to the expression evaluator (e.g., for setting breakpoints)
  • Support runtime address offsetting for operating systems that relocate loaded code
  • Add GUI to Windows, Mac, and, QT debuggers to support source-level debugging. Thanks to @tlindner for writing Mac implementation.
  • Add sphinx documentation for entire feature (start reading at docs/source/debugger/general.rst)

IMPLEMENTATION DETAILS:

GUI: new menu items to toggle between showing the source and showing the disassembly inside the main console debugger window. Free-floating disassembly windows remain unchanged. When source is shown, pre-existing keyboard shortcuts for stepping or setting breakpoints automatically invoke the corresponding source-level commands. When disassembly is shown in the main console debugger window, those shortcuts revert back to the old disassembly stepping commands.

Source level stepping: implementation reuses the corresponding disassembly stepping command, but with “slipping” at the end to ensure the stepping ends at a reasonable location in the source.

Symbol tables: add 2 new symbol tables, one for local variables from source, and one for global variables from source, chained in front of the pre-existing CPU and global symbol table. Support case sensitive symbol lookup, falling back to case insensitive symbol lookup as necessary. Source-level symbols, when present, eclipse any conflicting pre-existing symbols, but syntax is provided (“ns\”) to allow users to force references to pre-existing symbols.

File format: source level debugging information is stored in .mdi files. These act as containers, which can theoretically house different underlying formats, though only one format (“simple”) is supported so far in this pull request. I am familiar with 6809 and the TRS-80 CoCo, and believe this simple format is sufficient for that machine. I expect it will be sufficient for other similar machines and processors. I also allow for the possibility that experts in other machines might know of other (possibly incompatible) needs. Thus, .mdi files can be used to store other formats that better support fundamentally different machines. Inside the debugger implementation is an internal interface that can be used to read this simple format, and can be extended to read new formats that might be invented as necessary. The goal is to keep as small a quantity of debugger code as possible format-specific, with the remainder of the debugging code simply querying the interface without any knowledge of the underlying format.

File format library (mame_srcdbg_[static/shared]): this library is intended to be consumed by cross assemblers and cross compilers that target emulated machines. I have tested it so far with a 6809 assembler (lwasm / lwlink), a 6809 C compiler (CMOC), and a multi-platform basic compiler (ugBasic, though only the 6809-targeting compiler so far). The library is a C++ library with a pure C interface, so it can be consumed by tools written in either C or C++. Both a static and shared version of the library are built, so in theory even non C/C++ tools could dynamically load and call into the shared library, assuming the language supports that. MAME itself and the srcdbgdump tool link to the static version of this library for reading the format.

TODOs: I intentionally left a couple TODOs in the code for discussion with the reviewers.

@ajrhacker
Copy link
Contributor

To somewhat belatedly provide feedback on this, I have a really bad feeling about this new file format. It's a custom binary format which the MAME debugger itself can't create, so if support from other tools never flourishes, the whole thing stands to rot. The coupling to MAME's CPU state interface will also easily get worse considering the number of CPU types MAME supports.

Speaking from the standpoint of someone interested in reverse engineering, I would much prefer a text file format for enhanced debugging information. The increased parsing overhead for something like JSON should matter less in this context than the ability to easily create and edit files without specialized tools.

@rb6502
Copy link
Contributor

rb6502 commented Mar 10, 2025

First off, source-level debugging is a top ask from Apple II users who run MAME or Ample, so I love that someone's done something with it.

By way of some initial feedback, I'm not a fan of having separate source and assembly step and run commands. The majority of debuggers automatically show source if it's available for the current program counter and assembly if not.

@rb6502
Copy link
Contributor

rb6502 commented Mar 10, 2025

Regarding the file format, I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them.

@ajrhacker
Copy link
Contributor

I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them.

The trouble with this analogy is that MAME does not literally "read whatever the assembler or compiler outputs for a given system" in most cases. The primary purpose of object files is for their sections to be linked together and loaded into memory. For virtually any object file format sophisticated enough to even allow for extra debugging information, that is going to be performed not by MAME itself but by the emulated operating system or monitor program.

I don't think storing debugging information in binary object files is the best solution here, but if that's to be supported, I think it would be better to use an existing format like ELF that provides robust debugging support for a wide range of architectures.

@dave-br
Copy link
Contributor Author

dave-br commented Mar 10, 2025

@ajrhacker :

Yes, I hear you on the concern that tools-writers must choose to support the file format for the feature not to rot away. However we slice it, there are platforms with popular tools that do not generate sufficient information for source-level debugging (in any form), and they would need to change one way or another for this feature to work. My plan is, should this feature be accepted into MAME, focus first on the CoCo / Dragon tools I've already been prototyping for: lwtools assembler, ugBasic compiler, and CMOC (C-like) compiler. I would work on PRs for them which turn my prototype code into real code they'd be willing to accept. (the ugBasic developer already expressed interest in supporting this feature back in October). Once CoCo / Dragon have decent support sufficiently visible, I would hope that interest would start growing to other platforms, and I'd be happy to help out other interested platforms with integrating the support. Point being that I'm expecting to continue to actively build up this feature from the tools end, as that is crucial.

The coupling to MAME's CPU state interface will also easily get worse considering the number of CPU types MAME supports.

Are you referring to the register #defines? They can be de-coupled via a lookup table if we agree that's the right way to go. Sorry if I'm misinterpreting.

Speaking from the standpoint of someone interested in reverse engineering, I would much prefer a text file format for enhanced debugging information.

I have considered using text instead of binary for the format, and I generally tend to lean toward binary for a few reasons. Binary is unambiguous to specify and easier to parse. Text tends to be verbose, and with json in particular, there's the repeating field names, which can add up with a tool like CMOC when debugging information is generated for the entire standard library. I also worry that a text format will give a false impression to tools vendors that it would be easier to hand-roll it themselves instead of using a library we provide which is guaranteed to do it right. There could be subtle things like which fields or sections are optional, or how to manage cross-references between sections, which could cause frustrating surprises down the road for the tool writer. A binary format is a signal to go down the (right) path of using a generator library instead.

I would be curious to understand more your scenario of doing reverse engineering, and how that would benefit from a text format. My first thought would be to provide a simple tool that translates between a text and the binary format to give a human the ability to manually manipulate the file. (Though I wouldn't recommend build tools taking advantage of this.)

@dave-br
Copy link
Contributor Author

dave-br commented Mar 10, 2025

@rb6502 :

By way of some initial feedback, I'm not a fan of having separate source and assembly step and run commands. The majority of debuggers automatically show source if it's available for the current program counter and assembly if not.

Having separate source vs. disassembly stepping commands allow for scenarios where you're doing source-debugging, but temporarily want to step through individual assembly instructions (e.g., to understand what the compiler just did, or to diagnose a potential issue with the compiler). I've seen modern debuggers handle this by choosing the stepping type based on which window is active (thus, the MAME keyboard & menu commands adapt in the same way, based on what the main window is showing). For purposes of use of console or scripting, we'd need an unambiguous way to specify what type of stepping is requested.

@dave-br
Copy link
Contributor Author

dave-br commented Mar 10, 2025

@rb6502

Regarding the file format, I do think it's important that MAME be able to just directly read whatever the assembler or compiler outputs for a given system, in much the same way that we prefer certain disk image formats but accept a wide variety of them.

Yeah, as @ajrhacker mentioned, some tools simply don't provide sufficient information in any form to enable source-level debugging.

The idea of adopting an existing format, such as ELF object files (and specifically the DWARF debugging format embedded inside them) is something which I've also considered and am not recommending. A quick summary why:

  • DWARF is staggeringly complex. DWARF5 specification is 477 pages long, its reader library has over 300 API calls, and its writer library adds another 70 API calls plus callbacks. My PR's generator library has only 9 API calls.
  • "supporting DWARF" is an amibguous claim. DIEs have optional attributes, and there are multiple compression schemes. Some components of DWARF use a stack-based interpreted language to express values, and others use a finite state machine program (e.g., the line table encoding). MAME would need a way to clearly specify what portions of those are supported so that tools-writers know what to generate.
  • DWARF may actually not even be sufficient. Its specification defers to the ABI for some things, like register numbers, but there may be no ABIs for some vintage processors. I couldn't find anything formally specified for 6809 or 6502, for example, though there is what appears to be an unofficial fork of gcc for 6809.

Overall, one of my goals is to make it as easy as possible for tools to participate. DWARF introduces significnant obstacles, without much upside.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants