Skip to content

ghidra: Updates capa_explorer.py to enable users to select if namespaces, comments and bookmarks are added. Closes #1977 #2652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Shajal-Kumar
Copy link

closes #1977

Enhances the capa_explorer.py script by adding user-selectable options for annotations in Ghidra. Users can now choose whether to add labels/namespaces, comments, or bookmarks during analysis.

  • Allows user to select whether they want to add labels/namespaces, comments, or bookmarks using a dialog box which is made possible using Ghidra's askChoices API.
  • label_matches() has been split into create_capa_namespaces() and create_capa_comments().
  • parse_json() has new arguments which are further passed to the CapaMatchData class to faciliate the usage of guard clauses in create_capa_namespaces() and create_capa_comments().
  • CHANGELOG.md updated

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @Shajal-Kumar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request enhances the capa_explorer.py script in Ghidra, providing users with more control over the annotation process. Specifically, it introduces a dialog box that allows users to select whether to add labels/namespaces, comments, or bookmarks during the analysis. The label_matches() function has been split into create_capa_namespaces() and create_capa_comments() for better organization. The parse_json() function now accepts new arguments to facilitate the use of guard clauses in the new functions. Additionally, the CHANGELOG.md file has been updated to reflect these changes.

Highlights

  • User-Selectable Annotations: Introduces a dialog box using Ghidra's askChoices API, allowing users to choose whether to add labels/namespaces, comments, or bookmarks during analysis.
  • Code Reorganization: Splits the label_matches() function into create_capa_namespaces() and create_capa_comments() for improved code structure and readability.
  • Parameterization of parse_json(): Updates the parse_json() function to accept arguments that control the creation of namespaces, comments, and bookmarks, enabling more granular control over the analysis process.

Changelog

  • CHANGELOG.md
    • Adds user-options for adding comments, bookmarks, and namespaces via a dialog box in capa_explorer.py.
    • Implements granular control over annotations with create_capa_namespace and create_capa_comments.
  • capa/ghidra/capa_explorer.py
    • Adds do_labels, do_comments, and do_bookmarks parameters to the CapaMatchData class to control annotation creation.
    • Splits label_matches() into create_capa_namespace() and create_capa_comments() to handle namespace creation/labeling and commenting, respectively.
    • Modifies parse_json() to accept do_labels, do_comments, and do_bookmarks parameters and pass them to CapaMatchData.
    • Adds a dialog box to allow users to select annotation options before analysis begins (lines 398-402).
  • rules
    • Updates the subproject commit to 66975132455e3e22520a84dca14cca1d3afd292a
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A script's enhanced,
Choices now in user's hands,
Annotations bloom,
Banishing all gloom,
Analysis understands.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the capa_explorer.py script by adding user-selectable options for annotations in Ghidra, which is a valuable improvement. The code is generally well-structured, but there are a few areas that could be improved for clarity and efficiency.

Summary of Findings

  • Duplicated Code: There is duplicated code in label_matches and create_capa_comments for resolving the encompassing function and handling subscope matches. This duplication can be reduced by creating a helper function.
  • Conditional Logic: The nested conditional logic in label_matches and create_capa_comments can be simplified to improve readability.

Merge Readiness

The pull request introduces useful functionality and is generally well-implemented. However, addressing the duplicated code and simplifying the conditional logic would improve the code's maintainability and readability. I recommend addressing these issues before merging. I am unable to approve this pull request, and users should have others review and approve this code before merging.

Comment on lines 186 to 207
for sub_match in self.matches.get(addr):
for loc, node in sub_match.items():
sub_ghidra_addr = toAddr(hex(loc)) # type: ignore [name-defined] # noqa: F821

if node != {}:
if func is not None:
# basic block/ insn scope under resolved function
if func is not None:
# basic block/ insn scope under resolved function
# this would be a global/file scoped main match
# try to resolve the encompassing function via the subscope match, instead
# Ex. "run as service" rule
sub_func = getFunctionContaining(sub_ghidra_addr) # type: ignore [name-defined] # noqa: F821
if sub_func is not None:
sub_func_addr = sub_func.getEntryPoint()
# place function in capa namespace & create the subscope match label in Ghidra's global namespace
create_label(sub_func_addr, sub_func.getName(), capa_namespace)
else:
# addr is in some other file section like .data
# represent this location with a label symbol under the capa namespace
# Ex. See "Reference Base64 String" rule
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sym in symbol_table.getSymbols(sub_ghidra_addr):
if sym.getSymbolType() == SymbolType.LABEL:
sym.setNamespace(capa_namespace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code is also present in create_capa_comments. Consider refactoring this into a helper function to reduce duplication.

Comment on lines +189 to +207
if func is not None:
# basic block/ insn scope under resolved function
# this would be a global/file scoped main match
# try to resolve the encompassing function via the subscope match, instead
# Ex. "run as service" rule
sub_func = getFunctionContaining(sub_ghidra_addr) # type: ignore [name-defined] # noqa: F821
if sub_func is not None:
sub_func_addr = sub_func.getEntryPoint()
# place function in capa namespace & create the subscope match label in Ghidra's global namespace
create_label(sub_func_addr, sub_func.getName(), capa_namespace)
else:
# addr is in some other file section like .data
# represent this location with a label symbol under the capa namespace
# Ex. See "Reference Base64 String" rule
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sym in symbol_table.getSymbols(sub_ghidra_addr):
if sym.getSymbolType() == SymbolType.LABEL:
sym.setNamespace(capa_namespace)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The nested if func is not None and if sub_func is not None can be simplified by using guard clauses or combining the conditions. This would improve readability.

Comment on lines +231 to +253
if node != {}:
if func is not None:
# basic block / insn scope under resolved function
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# this would be a global/file scoped main match
# try to resolve the encompassing function via the subscope match, instead
# Ex. "run as service" rule
sub_func = getFunctionContaining(sub_ghidra_addr) # type: ignore [name-defined] # noqa: F821
if sub_func is not None:
sub_func_addr = sub_func.getEntryPoint()
self.set_plate_comment(sub_func_addr)
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# this would be a global/file scoped main match
# try to resolve the encompassing function via the subscope match, instead
# Ex. "run as service" rule
sub_func = getFunctionContaining(sub_ghidra_addr) # type: ignore [name-defined] # noqa: F821
if sub_func is not None:
sub_func_addr = sub_func.getEntryPoint()
# place function in capa namespace & create the subscope match label in Ghidra's global namespace
create_label(sub_func_addr, sub_func.getName(), capa_namespace)
self.set_plate_comment(sub_func_addr)
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# addr is in some other file section like .data
# represent this location with a label symbol under the capa namespace
# Ex. See "Reference Base64 String" rule
for sub_type, description in parse_node(node):
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sym in symbol_table.getSymbols(sub_ghidra_addr):
if sym.getSymbolType() == SymbolType.LABEL:
sym.setNamespace(capa_namespace)
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
# addr is in some other file section like .data
# represent this location with a label symbol under the capa namespace
# Ex. See "Reference Base64 String" rule
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of code is similar to the one in label_matches. Consider refactoring this into a helper function to reduce duplication.

Comment on lines +232 to +253
if func is not None:
# basic block / insn scope under resolved function
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# this would be a global/file scoped main match
# try to resolve the encompassing function via the subscope match, instead
# Ex. "run as service" rule
sub_func = getFunctionContaining(sub_ghidra_addr) # type: ignore [name-defined] # noqa: F821
if sub_func is not None:
sub_func_addr = sub_func.getEntryPoint()
self.set_plate_comment(sub_func_addr)
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# this would be a global/file scoped main match
# try to resolve the encompassing function via the subscope match, instead
# Ex. "run as service" rule
sub_func = getFunctionContaining(sub_ghidra_addr) # type: ignore [name-defined] # noqa: F821
if sub_func is not None:
sub_func_addr = sub_func.getEntryPoint()
# place function in capa namespace & create the subscope match label in Ghidra's global namespace
create_label(sub_func_addr, sub_func.getName(), capa_namespace)
self.set_plate_comment(sub_func_addr)
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
else:
# addr is in some other file section like .data
# represent this location with a label symbol under the capa namespace
# Ex. See "Reference Base64 String" rule
for sub_type, description in parse_node(node):
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sym in symbol_table.getSymbols(sub_ghidra_addr):
if sym.getSymbolType() == SymbolType.LABEL:
sym.setNamespace(capa_namespace)
self.set_pre_comment(sub_ghidra_addr, sub_type, description)
# addr is in some other file section like .data
# represent this location with a label symbol under the capa namespace
# Ex. See "Reference Base64 String" rule
# in many cases, these will be ghidra-labeled data, so just add the existing
# label symbol to the capa namespace
for sub_type, description in parse_node(node):
self.set_pre_comment(sub_ghidra_addr, sub_type, description)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The nested if func is not None and if sub_func is not None can be simplified by using guard clauses or combining the conditions. This would improve readability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ghidra: update capa_explorer.py to enable users to select if comments and bookmarks are added
1 participant