-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: refactor LLM model selection and attack surface analysis #233
base: release/2.1.1
Are you sure you want to change the base?
Conversation
The changes involve refactoring the codebase to replace references to "GPT" with "LLM" for generating vulnerability reports and related functionalities. This includes renaming functions, classes, variables, and configuration settings to reflect the new terminology. The update affects various components such as tasks, views, models, templates, and configuration files across the application.
- Introduced a new modal for selecting LLM models for attack surface analysis, allowing users to choose from a list of available models. - Added functionality to update the selected model in the database before proceeding with analysis. - Implemented a new API endpoint to fetch available LLM models and the currently selected model. - Refactored the show_attack_surface_modal function to include model selection and handle errors more robustly. - Removed hardcoded LLM definitions and prompts from definitions.py and moved them to a new configuration file. - Added new classes and methods for managing LLM configurations and generating vulnerability reports and attack suggestions. - Updated tests to cover new functionalities and removed obsolete tests related to LLM attack suggestions.
Reviewer's Guide by SourceryThis PR enhances the LLM (Language Learning Model) integration by introducing a new model selection modal and implementing a more robust API endpoint. The changes include refactoring the GPT-specific code into a more generic LLM framework, improving error handling, and adding support for multiple LLM providers including OpenAI and Ollama. Updated Class Diagram for LLM Vulnerability ReportclassDiagram
class LLMVulnerabilityReport {
+String url_path
+String title
+Text description
+Text impact
+Text remediation
+Text references
+String formatted_description()
+String formatted_impact()
+String formatted_remediation()
+String formatted_references()
}
class Vulnerability {
+String name
+String http_url
+Text description
+Text impact
+Text remediation
+Text references
+boolean is_llm_used
+String formatted_description()
+String formatted_impact()
+String formatted_remediation()
+String formatted_references()
}
LLMVulnerabilityReport --|> Vulnerability : updates
note for LLMVulnerabilityReport "Replaces GPTVulnerabilityReport with LLM support"
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
return Response({ | ||
'status': False, | ||
'error': 'Failed to fetch LLM models', | ||
'message': str(e) | ||
}, status=500) |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix AI about 3 hours ago
To fix the problem, we need to ensure that detailed error messages are not exposed to the user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling in the LLMModelsManager
class.
- Log the detailed error message using the
logger.error
method. - Return a generic error message to the user without including the exception details.
-
Copy modified lines R3098-R3099
@@ -3097,3 +3097,3 @@ | ||
'error': 'Failed to fetch LLM models', | ||
'message': str(e) | ||
}, status=500) | ||
'message': 'An internal error has occurred. Please try again later.' | ||
}, status=500) |
- Enhanced logging in tasks.py to include both the title and path of vulnerabilities when exceptions occur. - Updated variable naming from llm to vuln for clarity in vulnerability handling. - Minor text punctuation corrections in custom.js to ensure consistency in user messages. - Remove unused import
- Improved the UI of the LLM toolkit by updating the layout and adding badges for model status and capabilities. - Refactored the model management logic to use a centralized API call for fetching model data, improving error handling and reducing code duplication. - Updated the model requirements configuration to enhance readability and consistency in the description of model capabilities. - Adjusted the modal size for displaying model options to provide a better user experience.
- Introduced a method to convert markdown to HTML with added Bootstrap classes in the LLMAttackSuggestion class. - Implemented HTML sanitization using DOMPurify in various JavaScript functions to prevent XSS vulnerabilities.
…on features - Updated the URL validation function to correctly escape backslashes in the regex pattern. - Enhanced the send_llm__attack_surface_api_request function to support additional parameters for model selection and analysis options. - Introduced new functions regenerateAttackSurface, deleteAttackSurfaceAnalysis, and showAttackSurfaceModal to manage attack surface analysis lifecycle, including regeneration and deletion. - Refactored the modal handling logic to improve user interaction for model selection and analysis display. - Added input validation for model names in the LLMAttackSuggestionGenerator class and updated the attack suggestion generation process to accommodate model-specific requests.
- Enhanced markdown rendering by adding new extensions for better list handling and definition lists support. - Improved HTML formatting by converting ordered lists to unordered and cleaning up line breaks. - Removed "Beta" label from the LLM Toolkit page title and modal dialog title in the UI.
- Replaced the get_vulnerability_llm_report function with llm_vulnerability_report for generating and storing vulnerability reports using LLM. - Enhanced the LLM vulnerability report generation process by splitting it into distinct sections: technical description, business impact, remediation steps, and references. - Updated the data model to store references as text fields instead of using a separate VulnerabilityReference model. - Improved the HTML rendering of vulnerability descriptions, impacts, and remediations by converting markdown to HTML with proper styling. - Refactored the LLM response handling to use a dictionary format for easier manipulation and storage. - Removed redundant code and streamlined the process of updating vulnerabilities with LLM-generated data. - Adjusted the configuration and prompts for LLM to support more detailed and structured report generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @psyray - I've reviewed your changes and found some issues that need to be addressed.
Blocking issues:
- Avoid exposing raw exception messages in API responses (link)
Overall Comments:
- Consider adding more comprehensive test coverage for the new LLM functionality, particularly around error handling and edge cases.
- The key LLM interface methods would benefit from more detailed docstrings documenting expected inputs, outputs, and error scenarios.
Here's what I looked at during the review
- 🟡 General issues: 3 issues found
- 🔴 Security: 1 blocking issue, 1 other issue
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
if query.strip(): | ||
qs = qs & self.special_lookup(query.strip()) | ||
elif '|' in search_value: | ||
qs = Subdomain.objects.none() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Incorrect model class used for empty queryset initialization
This should be EndPoint.objects.none() to maintain type consistency with the rest of the method.
return Response({"dorks": serializer.data}) | ||
def get(self, request, format=None): | ||
req = self.request | ||
scan_id = safe_int_cast(req.query_params.get('scan_id')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): We've found these issues:
- Use named expression to simplify assignment and conditional (
use-named-expression
) - Hoist repeated code outside conditional statement [×2] (
hoist-statement-from-if
)
scan_id = safe_int_cast(req.query_params.get('scan_id')) | ||
if scan_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression
)
scan_id = safe_int_cast(req.query_params.get('scan_id')) | |
if scan_id: | |
if scan_id := safe_int_cast(req.query_params.get('scan_id')): |
def get(self, request, format=None): | ||
req = self.request | ||
scan_id = safe_int_cast(req.query_params.get('scan_id')) | ||
type = req.query_params.get('type') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (code-quality): Don't assign to builtin variable type
(avoid-builtin-shadow
)
Explanation
Python has a number ofbuiltin
variables: functions and constants thatform a part of the language, such as
list
, getattr
, and type
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:
list = [1, 2, 3]
However, this is considered poor practice.
- It will confuse other developers.
- It will confuse syntax highlighters and linters.
- It means you can no longer use that builtin for its original purpose.
How can you solve this?
Rename the variable something more specific, such as integers
.
In a pinch, my_list
and similar names are colloquially-recognized
placeholders.
scan_id = safe_int_cast(req.query_params.get('scan_id')) | ||
if scan_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression
)
scan_id = safe_int_cast(req.query_params.get('scan_id')) | |
if scan_id: | |
if scan_id := safe_int_cast(req.query_params.get('scan_id')): |
subdomain_names = [] | ||
|
||
for id in subdomain_ids: | ||
subdomain_names.append(Subdomain.objects.get(id=id).name) | ||
for id in subdomain_ids: | ||
subdomain_names.append(Subdomain.objects.get(id=id).name) | ||
|
||
if subdomain_names: | ||
return Response({'status': True, "results": subdomain_names}) | ||
if subdomain_names: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): We've found these issues:
- Convert for loop into list comprehension (
list-comprehension
) - Use named expression to simplify assignment and conditional (
use-named-expression
)
subdomain_names = [] | |
for id in subdomain_ids: | |
subdomain_names.append(Subdomain.objects.get(id=id).name) | |
for id in subdomain_ids: | |
subdomain_names.append(Subdomain.objects.get(id=id).name) | |
if subdomain_names: | |
return Response({'status': True, "results": subdomain_names}) | |
if subdomain_names: | |
if subdomain_names := [ | |
Subdomain.objects.get(id=id).name for id in subdomain_ids | |
]: |
Too many sourcery review here. |
Uncommented the code responsible for generating technical, impact, and remediation sections in the LLMVulnerabilityReportGenerator class.
- Removed extensive data from targetApp.json, including historical IPs, related domains, registrars, domain registrations, WHOIS status, nameservers, DNS records, and domain info. - Updated auth.json to modify permission names and codenames, and removed several user and group entries. - Added new todo notes in recon_note.json and updated existing ones. - Updated dashboard.json to modify project details and remove unused settings. - Modified scanEngine.json to update YAML configurations and add a new interesting lookup model. - Enabled remote debugging in docker-compose.dev.yml. - Updated a test in test_vulnerability.py to patch a different method for generating vulnerability reports
Overview - Improved handling of CVE references by parsing string representations of arrays and generating appropriate HTML content. - Updated the template to handle references more flexibly, displaying them as a list or paragraph based on their format. - Converted markdown to HTML for various sections in the LLM vulnerability report. - Removed redundant code in the LLM vulnerability report generator. - Simplified the get_refs_str method in the Vulnerability model to return references directly.
Modified the llm_vulnerability_report function in tasks.py to convert the references field from a list to a single string before converting it to HTML.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't have time yet to do an extensive test, but this is what I initially found. Will continue tomorrow.
Adjusted the logic for displaying the dropdown menu in the LLM Toolkit settings to only show when a model is not selected.
- Introduced a new modal for adding and managing models in the LLM Toolkit, allowing users to view and select recommended models with detailed information on RAM requirements. - Enhanced the model download process with progress tracking and error handling, including the ability to cancel downloads. - Implemented a new API endpoint to fetch available models from the Ollama library, caching the results for improved performance. - Updated the server configuration to support streaming responses for model downloads, improving user feedback during long operations. - Added new recommended models to the configuration, providing descriptions and size options for each model.
Summary
Fixes #232
Enhance the LLM model selection process and attack surface analysis by introducing a new modal for model selection and implementing a new API endpoint. Refactor existing functions to support these changes and improve error handling. Update tests to reflect the new functionalities and remove outdated tests.
New Features:
Introduce a new modal for selecting LLM models for attack surface analysis, allowing users to choose from a list of available models.
Implement a new API endpoint to fetch available LLM models and the currently selected model.
Enhancements:
Tests:
Todo
Summary by Sourcery
Enhance the LLM model selection process and attack surface analysis by introducing a new modal for model selection and implementing a new API endpoint. Refactor existing functions to support these changes and improve error handling. Update tests to reflect the new functionalities and remove outdated tests.
New Features:
Enhancements:
Tests: