fix: refactor LLM model selection and attack surface analysis #233

psyray · 2024-11-11T12:23:31Z

Summary

Fixes #232

Enhance the LLM model selection process and attack surface analysis by introducing a new modal for model selection and implementing a new API endpoint. Refactor existing functions to support these changes and improve error handling. Update tests to reflect the new functionalities and remove outdated tests.

New Features:

Introduce a new modal for selecting LLM models for attack surface analysis, allowing users to choose from a list of available models.
Implement a new API endpoint to fetch available LLM models and the currently selected model.

Enhancements:

Refactor the show_attack_surface_modal function to include model selection and handle errors more robustly.
Remove hardcoded LLM definitions and prompts from definitions.py and move them to a new configuration file.
Add new classes and methods for managing LLM configurations and generating vulnerability reports and attack suggestions.

Tests:

Update tests to cover new functionalities and remove obsolete tests related to LLM attack suggestions.

Todo

Test Attack Surface
Test Report Generation

Summary by Sourcery

Enhance the LLM model selection process and attack surface analysis by introducing a new modal for model selection and implementing a new API endpoint. Refactor existing functions to support these changes and improve error handling. Update tests to reflect the new functionalities and remove outdated tests.

New Features:

Introduce a new modal for selecting LLM models for attack surface analysis, allowing users to choose from a list of available models.
Implement a new API endpoint to fetch available LLM models and the currently selected model.

Enhancements:

Refactor the show_attack_surface_modal function to include model selection and handle errors more robustly.
Remove hardcoded LLM definitions and prompts from definitions.py and move them to a new configuration file.
Add new classes and methods for managing LLM configurations and generating vulnerability reports and attack suggestions.

Tests:

Update tests to cover new functionalities and remove obsolete tests related to LLM attack suggestions.

The changes involve refactoring the codebase to replace references to "GPT" with "LLM" for generating vulnerability reports and related functionalities. This includes renaming functions, classes, variables, and configuration settings to reflect the new terminology. The update affects various components such as tasks, views, models, templates, and configuration files across the application.

- Introduced a new modal for selecting LLM models for attack surface analysis, allowing users to choose from a list of available models. - Added functionality to update the selected model in the database before proceeding with analysis. - Implemented a new API endpoint to fetch available LLM models and the currently selected model. - Refactored the show_attack_surface_modal function to include model selection and handle errors more robustly. - Removed hardcoded LLM definitions and prompts from definitions.py and moved them to a new configuration file. - Added new classes and methods for managing LLM configurations and generating vulnerability reports and attack suggestions. - Updated tests to cover new functionalities and removed obsolete tests related to LLM attack suggestions.

sourcery-ai · 2024-11-11T12:23:34Z

Reviewer's Guide by Sourcery

This PR enhances the LLM (Language Learning Model) integration by introducing a new model selection modal and implementing a more robust API endpoint. The changes include refactoring the GPT-specific code into a more generic LLM framework, improving error handling, and adding support for multiple LLM providers including OpenAI and Ollama.

Updated Class Diagram for LLM Vulnerability Report

classDiagram
    class LLMVulnerabilityReport {
        +String url_path
        +String title
        +Text description
        +Text impact
        +Text remediation
        +Text references
        +String formatted_description()
        +String formatted_impact()
        +String formatted_remediation()
        +String formatted_references()
    }
    class Vulnerability {
        +String name
        +String http_url
        +Text description
        +Text impact
        +Text remediation
        +Text references
        +boolean is_llm_used
        +String formatted_description()
        +String formatted_impact()
        +String formatted_remediation()
        +String formatted_references()
    }
    LLMVulnerabilityReport --|> Vulnerability : updates
    note for LLMVulnerabilityReport "Replaces GPTVulnerabilityReport with LLM support"

File-Level Changes

Change	Details	Files
Refactored GPT-specific code into a generic LLM framework	Created new LLM configuration module with model definitions and system prompts Implemented base LLM generator class with common functionality Added specific generators for vulnerability reports and attack suggestions Created validators for LLM input/output data	`web/reNgine/llm/config.py` `web/reNgine/llm/llm.py` `web/reNgine/llm/validators.py` `web/reNgine/llm/utils.py`
Enhanced model selection with new modal interface	Added new modal for LLM model selection with detailed model information Implemented model capabilities display and selection UI Added support for regenerating and deleting attack surface analysis	`web/static/custom/custom.js` `web/scanEngine/templates/scanEngine/settings/llm_toolkit.html`
Improved vulnerability reporting and display	Added Markdown to HTML conversion for vulnerability reports Enhanced report formatting with Bootstrap styling Updated vulnerability model to use text field for references	`web/reNgine/llm/utils.py` `web/templates/report/template.html` `web/startScan/models.py`
Updated API endpoints and handlers	Renamed GPT-specific endpoints to LLM endpoints Added new endpoint for LLM model management Improved error handling and response formatting	`web/api/views.py` `web/api/urls.py`

Assessment against linked issues

Issue	Objective	Addressed	Explanation
#232	Allow using the selected Ollama LLM model for attack surface detection instead of hardcoding llama2-uncensored	✅

Possibly linked issues

bug(llm): LLM attack surface detector does not work on selected ollama LLM #232: The PR addresses the issue by enhancing LLM model selection and fixing the bug.
bug(llm): LLM attack surface detector does not work on selected ollama LLM #232: The PR addresses the issue by enhancing the LLM model selection process and fixing related bugs.

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

web/api/views.py

+            return Response({
+                'status': False,
+                'error': 'Failed to fetch LLM models',
+                'message': str(e)
+            }, status=500)


To fix the problem, we need to ensure that detailed error messages are not exposed to the user. Instead, we should log the detailed error message on the server and return a generic error message to the user. This can be achieved by modifying the exception handling in the LLMModelsManager class.

Log the detailed error message using the logger.error method.

Return a generic error message to the user without including the exception details.

web/reNgine/llm/validators.py

web/reNgine/tasks.py

web/tests/test_llm.py

web/static/custom/custom.js

- Enhanced logging in tasks.py to include both the title and path of vulnerabilities when exceptions occur. - Updated variable naming from llm to vuln for clarity in vulnerability handling. - Minor text punctuation corrections in custom.js to ensure consistency in user messages. - Remove unused import

- Improved the UI of the LLM toolkit by updating the layout and adding badges for model status and capabilities. - Refactored the model management logic to use a centralized API call for fetching model data, improving error handling and reducing code duplication. - Updated the model requirements configuration to enhance readability and consistency in the description of model capabilities. - Adjusted the modal size for displaying model options to provide a better user experience.

- Introduced a method to convert markdown to HTML with added Bootstrap classes in the LLMAttackSuggestion class. - Implemented HTML sanitization using DOMPurify in various JavaScript functions to prevent XSS vulnerabilities.

…on features - Updated the URL validation function to correctly escape backslashes in the regex pattern. - Enhanced the send_llm__attack_surface_api_request function to support additional parameters for model selection and analysis options. - Introduced new functions regenerateAttackSurface, deleteAttackSurfaceAnalysis, and showAttackSurfaceModal to manage attack surface analysis lifecycle, including regeneration and deletion. - Refactored the modal handling logic to improve user interaction for model selection and analysis display. - Added input validation for model names in the LLMAttackSuggestionGenerator class and updated the attack suggestion generation process to accommodate model-specific requests.

- Enhanced markdown rendering by adding new extensions for better list handling and definition lists support. - Improved HTML formatting by converting ordered lists to unordered and cleaning up line breaks. - Removed "Beta" label from the LLM Toolkit page title and modal dialog title in the UI.

- Replaced the get_vulnerability_llm_report function with llm_vulnerability_report for generating and storing vulnerability reports using LLM. - Enhanced the LLM vulnerability report generation process by splitting it into distinct sections: technical description, business impact, remediation steps, and references. - Updated the data model to store references as text fields instead of using a separate VulnerabilityReference model. - Improved the HTML rendering of vulnerability descriptions, impacts, and remediations by converting markdown to HTML with proper styling. - Refactored the LLM response handling to use a dictionary format for easier manipulation and storage. - Removed redundant code and streamlined the process of updating vulnerabilities with LLM-generated data. - Adjusted the configuration and prompts for LLM to support more detailed and structured report generation.

web/reNgine/llm/llm.py

sourcery-ai

Hey @psyray - I've reviewed your changes and found some issues that need to be addressed.

Blocking issues:

Avoid exposing raw exception messages in API responses (link)

Overall Comments:

Consider adding more comprehensive test coverage for the new LLM functionality, particularly around error handling and edge cases.
The key LLM interface methods would benefit from more detailed docstrings documenting expected inputs, outputs, and error scenarios.

Here's what I looked at during the review

🟡 General issues: 3 issues found
🔴 Security: 1 blocking issue, 1 other issue
🟢 Testing: all looks good
🟢 Complexity: all looks good
🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

web/static/custom/custom.js

web/api/views.py

sourcery-ai · 2024-11-12T22:43:53Z

web/api/views.py

+                        if query.strip():
+                            qs = qs & self.special_lookup(query.strip())
+                elif '|' in search_value:
+                    qs = Subdomain.objects.none()


issue (bug_risk): Incorrect model class used for empty queryset initialization

This should be EndPoint.objects.none() to maintain type consistency with the rest of the method.

sourcery-ai · 2024-11-12T22:43:54Z

web/api/views.py

-			return Response({"dorks": serializer.data})
+    def get(self, request, format=None):
+        req = self.request
+        scan_id = safe_int_cast(req.query_params.get('scan_id'))


issue (code-quality): We've found these issues:

Use named expression to simplify assignment and conditional (use-named-expression)

Hoist repeated code outside conditional statement [×2] (hoist-statement-from-if)

sourcery-ai · 2024-11-12T22:43:55Z

web/api/views.py

+        scan_id = safe_int_cast(req.query_params.get('scan_id'))
+        if scan_id:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

scan_id = safe_int_cast(req.query_params.get('scan_id'))

if scan_id:

if scan_id := safe_int_cast(req.query_params.get('scan_id')):

sourcery-ai · 2024-11-12T22:43:55Z

web/api/views.py

+    def get(self, request, format=None):
+        req = self.request
+        scan_id = safe_int_cast(req.query_params.get('scan_id'))
+        type = req.query_params.get('type')


issue (code-quality): Don't assign to builtin variable type (avoid-builtin-shadow)

Explanation
Python has a number of builtin variables: functions and constants that
form a part of the language, such as list, getattr, and type
(See https://docs.python.org/3/library/functions.html).
It is valid, in the language, to re-bind such variables:

list = [1, 2, 3]

However, this is considered poor practice.

It will confuse other developers.

It will confuse syntax highlighters and linters.

It means you can no longer use that builtin for its original purpose.

How can you solve this?

Rename the variable something more specific, such as integers.
In a pinch, my_list and similar names are colloquially-recognized
placeholders.

sourcery-ai · 2024-11-12T22:43:55Z

web/api/views.py

+        scan_id = safe_int_cast(req.query_params.get('scan_id'))
+        if scan_id:


suggestion (code-quality): Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

scan_id = safe_int_cast(req.query_params.get('scan_id'))

if scan_id:

if scan_id := safe_int_cast(req.query_params.get('scan_id')):

sourcery-ai · 2024-11-12T22:43:55Z

web/api/views.py

+        subdomain_names = []

-		for id in subdomain_ids:
-			subdomain_names.append(Subdomain.objects.get(id=id).name)
+        for id in subdomain_ids:
+            subdomain_names.append(Subdomain.objects.get(id=id).name)

-		if subdomain_names:
-			return Response({'status': True, "results": subdomain_names})
+        if subdomain_names:


suggestion (code-quality): We've found these issues:

Convert for loop into list comprehension (list-comprehension)

Use named expression to simplify assignment and conditional (use-named-expression)

Suggested change

subdomain_names = []

for id in subdomain_ids:

subdomain_names.append(Subdomain.objects.get(id=id).name)

for id in subdomain_ids:

subdomain_names.append(Subdomain.objects.get(id=id).name)

if subdomain_names:

return Response({'status': True, "results": subdomain_names})

if subdomain_names:

if subdomain_names := [

Subdomain.objects.get(id=id).name for id in subdomain_ids

]:

psyray · 2024-11-12T22:49:20Z

Too many sourcery review here.
Because it scan all the modified files.
It's not related to this PR.
Will fix this later in a security review

Uncommented the code responsible for generating technical, impact, and remediation sections in the LLMVulnerabilityReportGenerator class.

web/reNgine/llm/llm.py

- Removed extensive data from targetApp.json, including historical IPs, related domains, registrars, domain registrations, WHOIS status, nameservers, DNS records, and domain info. - Updated auth.json to modify permission names and codenames, and removed several user and group entries. - Added new todo notes in recon_note.json and updated existing ones. - Updated dashboard.json to modify project details and remove unused settings. - Modified scanEngine.json to update YAML configurations and add a new interesting lookup model. - Enabled remote debugging in docker-compose.dev.yml. - Updated a test in test_vulnerability.py to patch a different method for generating vulnerability reports

Overview - Improved handling of CVE references by parsing string representations of arrays and generating appropriate HTML content. - Updated the template to handle references more flexibly, displaying them as a list or paragraph based on their format. - Converted markdown to HTML for various sections in the LLM vulnerability report. - Removed redundant code in the LLM vulnerability report generator. - Simplified the get_refs_str method in the Vulnerability model to return references directly.

Modified the llm_vulnerability_report function in tasks.py to convert the references field from a list to a single string before converting it to HTML.

AnonymousWP

Didn't have time yet to do an extensive test, but this is what I initially found. Will continue tomorrow.

web/scanEngine/templates/scanEngine/settings/llm_toolkit.html

Adjusted the logic for displaying the dropdown menu in the LLM Toolkit settings to only show when a model is not selected.

- Introduced a new modal for adding and managing models in the LLM Toolkit, allowing users to view and select recommended models with detailed information on RAM requirements. - Enhanced the model download process with progress tracking and error handling, including the ability to cancel downloads. - Implemented a new API endpoint to fetch available models from the Ollama library, caching the results for improved performance. - Updated the server configuration to support streaming responses for model downloads, improving user feedback during long operations. - Added new recommended models to the configuration, providing descriptions and size options for each model.

psyray added 2 commits November 10, 2024 23:47

psyray changed the base branch from master to release/2.1.1 November 11, 2024 12:23

github-advanced-security bot found potential problems Nov 11, 2024

View reviewed changes

web/static/custom/custom.js Fixed Show fixed Hide fixed

web/static/custom/custom.js Fixed Show fixed Hide fixed

psyray self-assigned this Nov 11, 2024

psyray added bug Something isn't working enhancement New feature or request labels Nov 11, 2024

psyray linked an issue Nov 11, 2024 that may be closed by this pull request

bug(llm): LLM attack surface detector does not work on selected ollama LLM #232

Open

3 tasks

psyray added 2 commits November 11, 2024 14:31

psyray mentioned this pull request Nov 11, 2024

feat(scope): What about adding support for Groq, Antropic, Google AI, and more with simple curl? #234

Closed

3 tasks

psyray added 4 commits November 11, 2024 17:42

feat(llm): convert LLM markdown response to HTML sanitize it

d6a1b4b

- Introduced a method to convert markdown to HTML with added Bootstrap classes in the LLMAttackSuggestion class. - Implemented HTML sanitization using DOMPurify in various JavaScript functions to prevent XSS vulnerabilities.

psyray requested a review from AnonymousWP November 12, 2024 22:41

psyray marked this pull request as ready for review November 12, 2024 22:41

github-advanced-security bot found potential problems Nov 12, 2024

View reviewed changes

web/reNgine/llm/llm.py Fixed Show fixed Hide fixed

web/reNgine/llm/llm.py Fixed Show fixed Hide fixed

fix: remove unused imports

e7e56c1

sourcery-ai bot reviewed Nov 12, 2024

View reviewed changes

psyray changed the title ~~feat: enhance LLM model selection and attack surface analysis~~ fix: refactor LLM model selection and attack surface analysis Nov 12, 2024

psyray linked an issue Nov 12, 2024 that may be closed by this pull request

bug(ui): Fetch GPT Vulnerability Details always blank dates on high & critical vulns #12

Open

1 task

fix: enable section response generation

5390b37

Uncommented the code responsible for generating technical, impact, and remediation sections in the LLMVulnerabilityReportGenerator class.

github-advanced-security bot found potential problems Nov 12, 2024

View reviewed changes

web/reNgine/llm/llm.py Fixed Show fixed Hide fixed

web/reNgine/llm/llm.py Fixed Show fixed Hide fixed

web/reNgine/llm/llm.py Fixed Show fixed Hide fixed

psyray added 3 commits November 13, 2024 11:03

fix: task reference conversion

f5f37f4

Modified the llm_vulnerability_report function in tasks.py to convert the references field from a list to a single string before converting it to HTML.

AnonymousWP requested changes Nov 13, 2024

View reviewed changes

web/scanEngine/templates/scanEngine/settings/llm_toolkit.html Outdated Show resolved Hide resolved

psyray added 2 commits November 14, 2024 14:12

fix: update model selection logic

ac29d7f

Adjusted the logic for displaying the dropdown menu in the LLM Toolkit settings to only show when a model is not selected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: refactor LLM model selection and attack surface analysis #233

fix: refactor LLM model selection and attack surface analysis #233

psyray commented Nov 11, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 11, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Provide additional feedback

Please help us improve GitHub Copilot by sharing more details about this comment.

sourcery-ai bot left a comment

sourcery-ai bot Nov 12, 2024

sourcery-ai bot Nov 12, 2024

sourcery-ai bot Nov 12, 2024

sourcery-ai bot Nov 12, 2024

sourcery-ai bot Nov 12, 2024

sourcery-ai bot Nov 12, 2024

psyray commented Nov 12, 2024

AnonymousWP left a comment

		scan_id = safe_int_cast(req.query_params.get('scan_id'))
		if scan_id:

	scan_id = safe_int_cast(req.query_params.get('scan_id'))
	if scan_id:
	if scan_id := safe_int_cast(req.query_params.get('scan_id')):

fix: refactor LLM model selection and attack surface analysis #233

Are you sure you want to change the base?

fix: refactor LLM model selection and attack surface analysis #233

Conversation

psyray commented Nov 11, 2024 • edited by sourcery-ai bot Loading

Summary

Todo

Summary by Sourcery

sourcery-ai bot commented Nov 11, 2024 • edited Loading

Reviewer's Guide by Sourcery

Updated Class Diagram for LLM Vulnerability Report

File-Level Changes

Assessment against linked issues

Possibly linked issues

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Nov 12, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 12, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 12, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 12, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 12, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 12, 2024

Choose a reason for hiding this comment

psyray commented Nov 12, 2024

AnonymousWP left a comment

Choose a reason for hiding this comment

psyray commented Nov 11, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 11, 2024 •

edited

Loading