Skip to content

Conversation

@cmd05
Copy link
Contributor

@cmd05 cmd05 commented Oct 27, 2025

Fixes #1146. Capstone is using LLVM Tablegen for its RISC-V dependencies. However, some sections of the code can be automatically generated.

The script generate_csr_switch.py generates the list of CSRs in the function getCSRSystemRegisterName. Status of the script:

  • It adds some extra CSRs to the code, which maybe filtered by the extension list. However, it is a switch case statement so it should not be an issue.
  • The following CSRs are missing in UDB and hence are missing from the output right now: ustatus, uie, utvec, uscratch, uepc, ucause, utval, uip, sedeleg, sideleg, sie, dscratch

Output:

root@edb48186766d:/workspaces/riscv-unified-db# ./do gen:capstone
Using devcontainer environment
Recreating bundle config...
Running with 1 job(s)
/workspaces/riscv-unified-db/.home/.venv/bin/python3 /workspaces/riscv-unified-db/backends/generators/capstone/generate_csr_switch.py --csr-dir=/workspaces/riscv-unified-db/gen/resolved_spec/_/csr --arch=BOTH --output=/workspaces/riscv-unified-db/gen/capstone/csr_switch.c
INFO:: Searching for CSR files in /workspaces/riscv-unified-db/gen/resolved_spec/_/csr for target architecture BOTH
INFO:: Found 382 CSR definitions in 382 files
INFO:: Added 382 CSRs to the output
Generated: /workspaces/riscv-unified-db/gen/capstone/csr_switch.c

@ThinkOpenly @AFOliveira

@AFOliveira AFOliveira changed the title Add generator for capstone feat: add generator for capstone Oct 27, 2025
@codecov
Copy link

codecov bot commented Oct 27, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 46.05%. Comparing base (3d9129d) to head (e378ea3).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1209   +/-   ##
=======================================
  Coverage   46.05%   46.05%           
=======================================
  Files          11       11           
  Lines        4942     4942           
  Branches     1345     1345           
=======================================
  Hits         2276     2276           
  Misses       2666     2666           
Flag Coverage Δ
idlc 46.05% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@dhower-qc dhower-qc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. A few requests:

  1. There is a background effort to try to move what is in backends to tools. Since this is a new tool, could you please move it there? Something like tools/python-packages/udb-capstone.
  2. Give yourself credit! Add copyright/license information to the new files. See other files in the repo for an example that is Reuse-compatible.
  3. Add a regression test so that we can make sure this continues to work going forward.

@ThinkOpenly ThinkOpenly added the generators Related to backend/generator label Oct 29, 2025
@ThinkOpenly ThinkOpenly moved this to In progress in UDB Generators Oct 29, 2025
@ThinkOpenly
Copy link
Collaborator

The script generate_csr_switch.py generates the list of CSRs in the function getCSRSystemRegisterName. Status of the script:

  • It adds some extra CSRs to the code, which maybe filtered by the extension list. However, it is a switch case statement so it should not be an issue.

I would agree that since they're all part of a switch statement, the "extra" CSRs will have no effect.

  • The following CSRs are missing in UDB and hence are missing from the output right now: ustatus, uie, utvec, uscratch, uepc, ucause, utval, uip, sedeleg, sideleg, sie, dscratch

If you only need the name and the CSR address, maybe you could add incomplete YAML files for each of these CSRs that has the information you need (and anything else you are willing to add). What do you think?

@ThinkOpenly
Copy link
Collaborator

  1. Give yourself credit! Add copyright/license information to the new files. See other files in the repo for an example that is Reuse-compatible.

Indeed!

  1. Add a regression test so that we can make sure this continues to work going forward.

Thinking about this, it's not straightforward, given you are generating the contents of an internal function. Brainstorming a bit... a few ideas:

  • The function is very basic. You could download a (known version of) the source, extract the function and build it with a test routine that invokes it with each of the possible parameter values it expects. Then, build the same test routine with your generated function body, and verify the results are the same.
  • You could download and build Capstone and write a test that uses Capstone to exercise the function in question, then replace the function body with your generated content and compare.
  • You could also create a "golden reference" output based on the Capstone source, and compare your test output with that.

The hard part is the extract/replace.

@ThinkOpenly
Copy link
Collaborator

Here's a basic script which exercises the function in Capstone:

#!/usr/bin/env python3
from capstone import *
md = Cs(CS_ARCH_RISCV,CS_MODE_32)
for CSR in range(2 ** 12 - 1):
    csrr_hex = f"{CSR:03x}020f3"
    # byte swap
    csrr = csrr_hex[6] + csrr_hex[7] + csrr_hex[4] + csrr_hex[5] + csrr_hex[2] + csrr_hex[3] + csrr_hex[0] + csrr_hex[1]
    csrr_bytes = bytes.fromhex(csrr)
    for i in md.disasm(csrr_bytes, 0x1000):
        print("%4d(%03x) %s\t%s" % (CSR, CSR, i.mnemonic, i.op_str))

Note that not all of the range of CSR addresses are mapped to a string, obviously, and some of the CSRs are associated with pseudoinstructions, and Capstone will emit those instead, so, you'll something like:

   0(000) csrr  ra, ustatus
   1(001) frflags       ra
   2(002) frrm  ra
   3(003) frcsr ra
   4(004) csrr  ra, uie
   5(005) csrr  ra, utvec
   6(006) csrr  ra, 6
   7(007) csrr  ra, 7
   8(008) csrr  ra, 8
   9(009) csrr  ra, 9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

generators Related to backend/generator

Projects

Status: In progress

Development

Successfully merging this pull request may close these issues.

Add generator for capstone

3 participants