ghidra: Updates capa_explorer.py to enable users to select if namespaces, comments and bookmarks are added. Closes #1977 #2652

Shajal-Kumar · 2025-04-06T09:51:25Z

closes #1977

Enhances the capa_explorer.py script by adding user-selectable options for annotations in Ghidra. Users can now choose whether to add labels/namespaces, comments, or bookmarks during analysis.

Allows user to select whether they want to add labels/namespaces, comments, or bookmarks using a dialog box which is made possible using Ghidra's askChoices API.
label_matches() has been split into create_capa_namespaces() and create_capa_comments().
parse_json() has new arguments which are further passed to the CapaMatchData class to faciliate the usage of guard clauses in create_capa_namespaces() and create_capa_comments().
CHANGELOG.md updated

Checklist

No CHANGELOG update needed
No new tests needed
No documentation update needed

…ate functions for comment and namespace creation.

gemini-code-assist

Hello @Shajal-Kumar, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

This pull request enhances the capa_explorer.py script in Ghidra, providing users with more control over the annotation process. Specifically, it introduces a dialog box that allows users to select whether to add labels/namespaces, comments, or bookmarks during the analysis. The label_matches() function has been split into create_capa_namespaces() and create_capa_comments() for better organization. The parse_json() function now accepts new arguments to facilitate the use of guard clauses in the new functions. Additionally, the CHANGELOG.md file has been updated to reflect these changes.

Highlights

User-Selectable Annotations: Introduces a dialog box using Ghidra's askChoices API, allowing users to choose whether to add labels/namespaces, comments, or bookmarks during analysis.
Code Reorganization: Splits the label_matches() function into create_capa_namespaces() and create_capa_comments() for improved code structure and readability.
Parameterization of parse_json(): Updates the parse_json() function to accept arguments that control the creation of namespaces, comments, and bookmarks, enabling more granular control over the analysis process.

Changelog

CHANGELOG.md
- Adds user-options for adding comments, bookmarks, and namespaces via a dialog box in capa_explorer.py.
- Implements granular control over annotations with create_capa_namespace and create_capa_comments.
capa/ghidra/capa_explorer.py
- Adds do_labels, do_comments, and do_bookmarks parameters to the CapaMatchData class to control annotation creation.
- Splits label_matches() into create_capa_namespace() and create_capa_comments() to handle namespace creation/labeling and commenting, respectively.
- Modifies parse_json() to accept do_labels, do_comments, and do_bookmarks parameters and pass them to CapaMatchData.
- Adds a dialog box to allow users to select annotation options before analysis begins (lines 398-402).
rules
- Updates the subproject commit to 66975132455e3e22520a84dca14cca1d3afd292a

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A script's enhanced,
Choices now in user's hands,
Annotations bloom,
Banishing all gloom,
Analysis understands.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request enhances the capa_explorer.py script by adding user-selectable options for annotations in Ghidra, which is a valuable improvement. The code is generally well-structured, but there are a few areas that could be improved for clarity and efficiency.

Summary of Findings

Duplicated Code: There is duplicated code in label_matches and create_capa_comments for resolving the encompassing function and handling subscope matches. This duplication can be reduced by creating a helper function.
Conditional Logic: The nested conditional logic in label_matches and create_capa_comments can be simplified to improve readability.

Merge Readiness

The pull request introduces useful functionality and is generally well-implemented. However, addressing the duplicated code and simplifying the conditional logic would improve the code's maintainability and readability. I recommend addressing these issues before merging. I am unable to approve this pull request, and users should have others review and approve this code before merging.

gemini-code-assist · 2025-04-06T09:51:44Z

capa/ghidra/capa_explorer.py

                for sub_match in self.matches.get(addr):
                    for loc, node in sub_match.items():
                        sub_ghidra_addr = toAddr(hex(loc))  # type: ignore [name-defined] # noqa: F821
-
-                        if node != {}:
-                            if func is not None:
-                                # basic block/ insn scope under resolved function
+                        if func is not None:
+                            # basic block/ insn scope under resolved function
+                            # this would be a global/file scoped main match
+                            # try to resolve the encompassing function via the subscope match, instead
+                            # Ex. "run as service" rule
+                            sub_func = getFunctionContaining(sub_ghidra_addr)  # type: ignore [name-defined] # noqa: F821
+                            if sub_func is not None:
+                                sub_func_addr = sub_func.getEntryPoint()
+                                # place function in capa namespace & create the subscope match label in Ghidra's global namespace
+                                create_label(sub_func_addr, sub_func.getName(), capa_namespace)
+                            else:
+                                # addr is in some other file section like .data
+                                # represent this location with a label symbol under the capa namespace
+                                # Ex. See "Reference Base64 String" rule
+                                # in many cases, these will be ghidra-labeled data, so just add the existing
+                                # label symbol to the capa namespace
+                                for sym in symbol_table.getSymbols(sub_ghidra_addr):
+                                    if sym.getSymbolType() == SymbolType.LABEL:
+                                        sym.setNamespace(capa_namespace)


This block of code is also present in create_capa_comments. Consider refactoring this into a helper function to reduce duplication.

gemini-code-assist · 2025-04-06T09:51:44Z

capa/ghidra/capa_explorer.py

+                        if func is not None:
+                            # basic block/ insn scope under resolved function
+                            # this would be a global/file scoped main match
+                            # try to resolve the encompassing function via the subscope match, instead
+                            # Ex. "run as service" rule
+                            sub_func = getFunctionContaining(sub_ghidra_addr)  # type: ignore [name-defined] # noqa: F821
+                            if sub_func is not None:
+                                sub_func_addr = sub_func.getEntryPoint()
+                                # place function in capa namespace & create the subscope match label in Ghidra's global namespace
+                                create_label(sub_func_addr, sub_func.getName(), capa_namespace)
+                            else:
+                                # addr is in some other file section like .data
+                                # represent this location with a label symbol under the capa namespace
+                                # Ex. See "Reference Base64 String" rule
+                                # in many cases, these will be ghidra-labeled data, so just add the existing
+                                # label symbol to the capa namespace
+                                for sym in symbol_table.getSymbols(sub_ghidra_addr):
+                                    if sym.getSymbolType() == SymbolType.LABEL:
+                                        sym.setNamespace(capa_namespace)


The nested if func is not None and if sub_func is not None can be simplified by using guard clauses or combining the conditions. This would improve readability.

gemini-code-assist · 2025-04-06T09:51:44Z

capa/ghidra/capa_explorer.py

+                    if node != {}:
+                        if func is not None:
+                            # basic block / insn scope under resolved function
+                            for sub_type, description in parse_node(node):
+                                self.set_pre_comment(sub_ghidra_addr, sub_type, description)
+                        else:
+                            # this would be a global/file scoped main match
+                            # try to resolve the encompassing function via the subscope match, instead
+                            # Ex. "run as service" rule
+                            sub_func = getFunctionContaining(sub_ghidra_addr)  # type: ignore [name-defined] # noqa: F821
+                            if sub_func is not None:
+                                sub_func_addr = sub_func.getEntryPoint()
+                                self.set_plate_comment(sub_func_addr)
                                for sub_type, description in parse_node(node):
                                    self.set_pre_comment(sub_ghidra_addr, sub_type, description)
                            else:
-                                # this would be a global/file scoped main match
-                                # try to resolve the encompassing function via the subscope match, instead
-                                # Ex. "run as service" rule
-                                sub_func = getFunctionContaining(sub_ghidra_addr)  # type: ignore [name-defined] # noqa: F821
-                                if sub_func is not None:
-                                    sub_func_addr = sub_func.getEntryPoint()
-                                    # place function in capa namespace & create the subscope match label in Ghidra's global namespace
-                                    create_label(sub_func_addr, sub_func.getName(), capa_namespace)
-                                    self.set_plate_comment(sub_func_addr)
-                                    for sub_type, description in parse_node(node):
-                                        self.set_pre_comment(sub_ghidra_addr, sub_type, description)
-                                else:
-                                    # addr is in some other file section like .data
-                                    # represent this location with a label symbol under the capa namespace
-                                    # Ex. See "Reference Base64 String" rule
-                                    for sub_type, description in parse_node(node):
-                                        # in many cases, these will be ghidra-labeled data, so just add the existing
-                                        # label symbol to the capa namespace
-                                        for sym in symbol_table.getSymbols(sub_ghidra_addr):
-                                            if sym.getSymbolType() == SymbolType.LABEL:
-                                                sym.setNamespace(capa_namespace)
-                                        self.set_pre_comment(sub_ghidra_addr, sub_type, description)
+                                # addr is in some other file section like .data
+                                # represent this location with a label symbol under the capa namespace
+                                # Ex. See "Reference Base64 String" rule
+                                # in many cases, these will be ghidra-labeled data, so just add the existing
+                                # label symbol to the capa namespace
+                                for sub_type, description in parse_node(node):
+                                    self.set_pre_comment(sub_ghidra_addr, sub_type, description)


This block of code is similar to the one in label_matches. Consider refactoring this into a helper function to reduce duplication.

gemini-code-assist · 2025-04-06T09:51:44Z

capa/ghidra/capa_explorer.py

+                        if func is not None:
+                            # basic block / insn scope under resolved function
+                            for sub_type, description in parse_node(node):
+                                self.set_pre_comment(sub_ghidra_addr, sub_type, description)
+                        else:
+                            # this would be a global/file scoped main match
+                            # try to resolve the encompassing function via the subscope match, instead
+                            # Ex. "run as service" rule
+                            sub_func = getFunctionContaining(sub_ghidra_addr)  # type: ignore [name-defined] # noqa: F821
+                            if sub_func is not None:
+                                sub_func_addr = sub_func.getEntryPoint()
+                                self.set_plate_comment(sub_func_addr)
                                for sub_type, description in parse_node(node):
                                    self.set_pre_comment(sub_ghidra_addr, sub_type, description)
                            else:
-                                # this would be a global/file scoped main match
-                                # try to resolve the encompassing function via the subscope match, instead
-                                # Ex. "run as service" rule
-                                sub_func = getFunctionContaining(sub_ghidra_addr)  # type: ignore [name-defined] # noqa: F821
-                                if sub_func is not None:
-                                    sub_func_addr = sub_func.getEntryPoint()
-                                    # place function in capa namespace & create the subscope match label in Ghidra's global namespace
-                                    create_label(sub_func_addr, sub_func.getName(), capa_namespace)
-                                    self.set_plate_comment(sub_func_addr)
-                                    for sub_type, description in parse_node(node):
-                                        self.set_pre_comment(sub_ghidra_addr, sub_type, description)
-                                else:
-                                    # addr is in some other file section like .data
-                                    # represent this location with a label symbol under the capa namespace
-                                    # Ex. See "Reference Base64 String" rule
-                                    for sub_type, description in parse_node(node):
-                                        # in many cases, these will be ghidra-labeled data, so just add the existing
-                                        # label symbol to the capa namespace
-                                        for sym in symbol_table.getSymbols(sub_ghidra_addr):
-                                            if sym.getSymbolType() == SymbolType.LABEL:
-                                                sym.setNamespace(capa_namespace)
-                                        self.set_pre_comment(sub_ghidra_addr, sub_type, description)
+                                # addr is in some other file section like .data
+                                # represent this location with a label symbol under the capa namespace
+                                # Ex. See "Reference Base64 String" rule
+                                # in many cases, these will be ghidra-labeled data, so just add the existing
+                                # label symbol to the capa namespace
+                                for sub_type, description in parse_node(node):
+                                    self.set_pre_comment(sub_ghidra_addr, sub_type, description)


The nested if func is not None and if sub_func is not None can be simplified by using guard clauses or combining the conditions. This would improve readability.

Shajal-Kumar added 3 commits April 6, 2025 14:59

Fixes issue mandiant#1977. Adds user choices, guard clauses and separ…

d3804cd

…ate functions for comment and namespace creation.

Fixes create_capa_namespace and create_capa_comments.

7390fc0

Updates CHANGELOG.md

f2fae35

gemini-code-assist bot reviewed Apr 6, 2025

View reviewed changes

gemini-code-assist bot suggested changes Apr 6, 2025

View reviewed changes

Fixes code_style.

cd76fd7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ghidra: Updates capa_explorer.py to enable users to select if namespaces, comments and bookmarks are added. Closes #1977 #2652

ghidra: Updates capa_explorer.py to enable users to select if namespaces, comments and bookmarks are added. Closes #1977 #2652

Uh oh!

Shajal-Kumar commented Apr 6, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 6, 2025

Uh oh!

gemini-code-assist bot Apr 6, 2025

Uh oh!

gemini-code-assist bot Apr 6, 2025

Uh oh!

gemini-code-assist bot Apr 6, 2025

Uh oh!

Uh oh!

ghidra: Updates capa_explorer.py to enable users to select if namespaces, comments and bookmarks are added. Closes #1977 #2652

Are you sure you want to change the base?

ghidra: Updates capa_explorer.py to enable users to select if namespaces, comments and bookmarks are added. Closes #1977 #2652

Uh oh!

Conversation

Shajal-Kumar commented Apr 6, 2025

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

gemini-code-assist bot Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!