-
Notifications
You must be signed in to change notification settings - Fork 529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: out of memory during OSV database load #4710
Comments
Disabling data sources OSV, GAD and RSD allowed the database to be created. |
I think this is a duplicate of #4592 . But I'm still not sure what the right fix is. |
Actually, I'm going to re-open this because I think it's the more concisely described of the several issues that are related to this problem. Some stuff I know so far:
Some conjecture:
Next steps:
I'm open to more suggestions, most of that was from a quick brainstorming session this morning. |
Update the database in stages for each data source instead of all at once.. |
@terriko I think the issue might be with the OSV database load as removing this data source solved the problem. Committing every 1000 records or so rather than one big commit at the end may also be a useful improvement. Given the number of records and continued growtyh, I think we may be getting to the stage of relooking at the database architecture. Might be a bit too ambitious for GSOC 2025 but maybe some useful work could be done to move things along |
I'm flagging this for the hackathon folk: The problem as far as we know is happening during the OSV database load. We need some way to reduce the memory usage there. Ideas above but knowing the talent coming in for the hackathon I suspect some of you may know better than I do about how to fix this. OSV data source code can be found here: https://github.com/intel/cve-bin-tool/blob/main/cve_bin_tool/data_sources/osv_source.py Note that unlike many of our other data sources, OSV uses a google-based backend so we're using gsutils. This may be a factor in why it's worse than other data sources, or it may just be the sheer amount of data as more people move away from NVD. I'm still game to have a pre-parser that allows us to mirror the OSV data on cve-b.in then use our own mirrors as we do with NVD if that seems like it's the best solution. But I won't be shocked if our code just needs some tweaks to handle memory more appropriately. |
A note about the hackathon label: I've flagged a bunch of issues for folk participating in the Open Source Ecosyststems Hackathon March 3-7. Please leave these issues to hackathon participants. if they're not claimed after, say, March 10th, they're fair game to other people (including GSoC participants). |
It seems that nobody has claimed this issue yet. I would like to work on it. |
Description
Attempting to create an initial database results in the cve-bin-tool process being killed with out of memory message
To reproduce
cve-bin-tool -u now -n json-mirror afile
Expected behaviour:
Database is created
Actual behaviour:
Process is killed part way through the database load and the database file is not created
Version/platform info
Version of CVE-bin-tool( e.g. output of
cve-bin-tool --version
): 3.4Installed from pypi or github? pypi
Operating system: Linux/Windows (other platforms are unsupported but feel free to report issues anyhow)
WSL2 on Windows 11
Python version (e.g.
python3 --version
): 3.10.12Running in any particular CI environment we should know about? (e.g. Github Actions) Running in WSL2 (10GB RAM)
The text was updated successfully, but these errors were encountered: