-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: reduce consumption of virtual address space. #295
Conversation
…for reducing address space usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @plusvic and reviewers,
I'm Gemini, and I'm currently reviewing this pull request. In the meantime, I'm providing this summary to help everyone quickly understand the changes and intent.
This pull request addresses issue #292 by reducing the virtual address space consumption of YARA-X's WASM modules. The core problem is that each WASM module, by default, reserves a large amount of virtual address space (4GB on 64-bit systems), even if it doesn't need it. This is especially problematic in environments with limited virtual address space, such as Docker containers.
Here's a summary of the changes:
- Intent: Reduce the virtual address space reserved by each WASM module to improve YARA-X's compatibility with resource-constrained environments.
- Changes:
lib/src/scanner/mod.rs
(line 265): Modified theMemoryType
to explicitly set a maximum memory size. TheNone
value was replaced withSome(mem_size)
, limiting memory allocation.lib/src/wasm/mod.rs
(lines 725-744): Added configuration options to the WASM engine to limit memory reservation to 16MB (config.memory_reservation(0x1000000)
), prevent memory growth (config.memory_reservation_for_growth(0)
), and indicate that memory won't move (config.memory_may_move(false)
). This ensures that the WASM module only reserves 16MB of virtual address space, significantly reducing the overall consumption.
I'll have a more detailed review shortly.
And now, a little haiku to lighten the mood:
Memory's vast sea,
Reduced to a calm, still pond,
Code runs smoothly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request aims to reduce the virtual address space consumed by WASM modules within YARA-X. The approach of limiting each module's memory to 16MB seems reasonable, especially given the explanation that this accommodates large rule sets. The added comments and documentation explaining the rationale and referencing the related issue are helpful. Here are a few suggestions for improvement.
@@ -262,7 +262,7 @@ impl<'r> Scanner<'r> { | |||
// Create module's main memory. | |||
let main_memory = wasmtime::Memory::new( | |||
wasm_store.as_context_mut(), | |||
MemoryType::new(mem_size, None), | |||
MemoryType::new(mem_size, Some(mem_size)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since Memory::new
can fail, it's good practice to handle the potential error. While unwrapping is acceptable in some cases, explicitly handling the error enhances code robustness. Consider using the ?
operator or a match
statement to handle the Result
returned by Memory::new
. What specific error handling would be appropriate if memory allocation fails? Should the scan be aborted, or is there a fallback mechanism?
MemoryType::new(mem_size, Some(mem_size))?
// 16MB should be enough for each WASM module. Each module needs a | ||
// fixed amount of memory that is only a few KB long, plus a variable | ||
// amount that depends on the number of rules and patterns (1 bit per | ||
// rule and 1 bit per pattern). With 16MB there's enough space for | ||
// millions of rules and patterns. By default, this is 4GB in 64-bits | ||
// systems, which causes a reservation of 4GB of virtual address space | ||
// (not physical RAM) per module (and therefore per Scanner). In some | ||
// scenarios where virtual address space is limited (i.e: Docker | ||
// instances) this is problematic. See: | ||
// https://github.com/VirusTotal/yara-x/issues/292 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment explaining the rationale for the 16MB limit is very clear. However, per the Rust API Guidelines, comments should focus on why something is done, rather than what is done. The what should be clear from the code itself. Consider rephrasing the comment to emphasize the why even more. For example, instead of "By default, this is 4GB...causes a reservation of 4GB...", you could say "Limiting to 16MB mitigates excessive virtual address space consumption, particularly in environments like Docker, where address space is limited (see #292)."
// Limiting to 16MB mitigates excessive virtual address space consumption,
// particularly in environments like Docker, where address space is limited (see #292).
// Each module requires a fixed amount of memory (a few KB), plus a variable
// amount based on the number of rules and patterns (1 bit per rule and pattern).
// 16MB is sufficient for millions of rules and patterns.
Thanks for looking into this! |
Each WASM module tries to reserve a large amount of virtual address space (4GB in 64-bits systems). This doesn't means that they actually needs this amount of RAM, it only means that it reserves address space in case the memory required by the module grows dynamically. As each YARA-X scanner needs one of these WASM modules, this implies that YARA-X ends up reserving a very large amount of virtual address space. This is problematic in some environments where the amount of virtual address space per process is limited (in Docker containers, for instance).
This PR reduces the amount of virtual address space reserved by each WASM module, mainly by limiting the amount of memory of each WASM module to 16MB. This is enough to accommodate very large sets of rules, with millions of rules and patterns.
Closes #292.