Skip to content

Conversation

ziadhany
Copy link
Collaborator

@ziadhany ziadhany commented Aug 30, 2025

I created an initial script to parse Git commit messages that can be easily integrated with our model. The script takes a Git repository as input, parses all commits, and returns the CVEs along with their corresponding fixed commits.

Issues:

results:

Found 192 unique CVEs
{
  "CVE-2025-4575": [
    "https://github.com/openssl/openssl/commit/0eb9acc24febb1f3f01f0320cfba9654cf66b0ac",
    "https://github.com/openssl/openssl/commit/e96d22446e633d117e6c9904cb15b4693e956eaa"
  ],
  "CVE-2024-12797": [
    "https://github.com/openssl/openssl/commit/6ae8e947d8e3f3f03eeb7d9ad993e341791900bc",
    "https://github.com/openssl/openssl/commit/798779d43494549b611233f92652f0da5328fbe7",
    "https://github.com/openssl/openssl/commit/87ebd203feffcf92ad5889df92f90bb0ee10a699",
    "https://github.com/openssl/openssl/commit/738d4f9fdeaad57660dcba50a619fafced3fd5e9"
  ],
  "CVE-2024-13176": [
    "https://github.com/openssl/openssl/commit/2af62e74fb59bc469506bc37eb2990ea408d9467",
    "https://github.com/openssl/openssl/commit/07272b05b04836a762b4baa874958af51d513844",
    "https://github.com/openssl/openssl/commit/fcebf0a79a0a69f63721b66e94b01400a7de332e",
    "https://github.com/openssl/openssl/commit/78f6c35b83713d33b263fb85e3727543463d6fd5",
    "https://github.com/openssl/openssl/commit/77c608f4c8857e63e98e66444e2e761c9627916f",
    "https://github.com/openssl/openssl/commit/4b1cb94a734a7d4ec363ac0a215a25c181e11f65",
    "https://github.com/openssl/openssl/commit/392dcb336405a0c94486aa6655057f59fd3a0902",
    "https://github.com/openssl/openssl/commit/3fc4b112da2e2107a65ae2556fb6137098e08801",
    "https://github.com/openssl/openssl/commit/f15294228451217b5e58e2b7f5ad4c7a42303212",
    "https://github.com/openssl/openssl/commit/7d8a8c20e1370e43b0cad17e47a460a6f8e81a34",
    "https://github.com/openssl/openssl/commit/63c40a66c5dc287485705d06122d3a6e74a6a203",
    "https://github.com/openssl/openssl/commit/c3144e102571517df6c15ccc049fa3660ab3cb0a"
  ],

openssl.json

Add a test for CollectRepoFixCommitPipeline

Signed-off-by: ziad hany <[email protected]>

def clone(self):
"""Clone the repository."""
self.repo_url = "https://github.com/torvalds/linux"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part should not be static

Copy link
Member

@keshav-space keshav-space left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ziadhany, see some suggestions.

self.log("Generating AdvisoryData objects from grouped commits.")
grouped_commits = self.collect_fix_commits()
for vuln_id, commits in grouped_commits.items():
references = [ReferenceV2(url=f"{self.repo_url}/commit/{cid}") for cid, _ in commits]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where are we storing proper fix commit? just keeping it in reference is not sufficient IMO.

Copy link
Collaborator Author

@ziadhany ziadhany Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space I was relying on this pipeline CollectFixCommitsPipeline to create fix commits , the issue is that CommitFixV2 is currently tied to both affected_package and advisory. IMO, storing fix commits as references can be useful.

summary_lines = [f"- {cid}: {msg}" for cid, msg in commits]
summary = f"Commits fixing {vuln_id}:\n" + "\n".join(summary_lines)
yield AdvisoryData(
advisory_id=vuln_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be problematic since we intend to collect fixed commits from multiple different repo here. Suppose we get a fix commit for CVE-000-000 in two different repo, we will end up with a conflict while inserting the advisory, as we use advisory_id prefixed with the pipeline_id to create unique AVID. In this case we will end up with same AVID for fix commits imported from two different git repos.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@keshav-space Not sure what the best solution for this is, but based on my understanding, we can make the pipeline_id dynamic

  • django_fix_commit
  • django_restframework_fix_commit

For example:

avid: "e.g., django_fix_commit/PYSEC-2020-2233"
avid: "e.g., django_restframework_fix_commit/PYSEC-2020-2233"

This will generate a different avid for each Git repository.
I’m not sure if this completely solves the problem, though.

Signed-off-by: ziad hany <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants