Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: reduce consumption of virtual address space. #295

Merged
merged 5 commits into from
Jan 20, 2025

Conversation

plusvic
Copy link
Member

@plusvic plusvic commented Jan 20, 2025

Each WASM module tries to reserve a large amount of virtual address space (4GB in 64-bits systems). This doesn't means that they actually needs this amount of RAM, it only means that it reserves address space in case the memory required by the module grows dynamically. As each YARA-X scanner needs one of these WASM modules, this implies that YARA-X ends up reserving a very large amount of virtual address space. This is problematic in some environments where the amount of virtual address space per process is limited (in Docker containers, for instance).

This PR reduces the amount of virtual address space reserved by each WASM module, mainly by limiting the amount of memory of each WASM module to 16MB. This is enough to accommodate very large sets of rules, with millions of rules and patterns.

Closes #292.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @plusvic and reviewers,

I'm Gemini, and I'm currently reviewing this pull request. In the meantime, I'm providing this summary to help everyone quickly understand the changes and intent.

This pull request addresses issue #292 by reducing the virtual address space consumption of YARA-X's WASM modules. The core problem is that each WASM module, by default, reserves a large amount of virtual address space (4GB on 64-bit systems), even if it doesn't need it. This is especially problematic in environments with limited virtual address space, such as Docker containers.

Here's a summary of the changes:

  • Intent: Reduce the virtual address space reserved by each WASM module to improve YARA-X's compatibility with resource-constrained environments.
  • Changes:
    • lib/src/scanner/mod.rs (line 265): Modified the MemoryType to explicitly set a maximum memory size. The None value was replaced with Some(mem_size), limiting memory allocation.
    • lib/src/wasm/mod.rs (lines 725-744): Added configuration options to the WASM engine to limit memory reservation to 16MB (config.memory_reservation(0x1000000)), prevent memory growth (config.memory_reservation_for_growth(0)), and indicate that memory won't move (config.memory_may_move(false)). This ensures that the WASM module only reserves 16MB of virtual address space, significantly reducing the overall consumption.

I'll have a more detailed review shortly.


And now, a little haiku to lighten the mood:

Memory's vast sea,
Reduced to a calm, still pond,
Code runs smoothly.

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request aims to reduce the virtual address space consumed by WASM modules within YARA-X. The approach of limiting each module's memory to 16MB seems reasonable, especially given the explanation that this accommodates large rule sets. The added comments and documentation explaining the rationale and referencing the related issue are helpful. Here are a few suggestions for improvement.

@@ -262,7 +262,7 @@ impl<'r> Scanner<'r> {
// Create module's main memory.
let main_memory = wasmtime::Memory::new(
wasm_store.as_context_mut(),
MemoryType::new(mem_size, None),
MemoryType::new(mem_size, Some(mem_size)),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since Memory::new can fail, it's good practice to handle the potential error. While unwrapping is acceptable in some cases, explicitly handling the error enhances code robustness. Consider using the ? operator or a match statement to handle the Result returned by Memory::new. What specific error handling would be appropriate if memory allocation fails? Should the scan be aborted, or is there a fallback mechanism?

MemoryType::new(mem_size, Some(mem_size))?

Comment on lines +725 to +734
// 16MB should be enough for each WASM module. Each module needs a
// fixed amount of memory that is only a few KB long, plus a variable
// amount that depends on the number of rules and patterns (1 bit per
// rule and 1 bit per pattern). With 16MB there's enough space for
// millions of rules and patterns. By default, this is 4GB in 64-bits
// systems, which causes a reservation of 4GB of virtual address space
// (not physical RAM) per module (and therefore per Scanner). In some
// scenarios where virtual address space is limited (i.e: Docker
// instances) this is problematic. See:
// https://github.com/VirusTotal/yara-x/issues/292
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The comment explaining the rationale for the 16MB limit is very clear. However, per the Rust API Guidelines, comments should focus on why something is done, rather than what is done. The what should be clear from the code itself. Consider rephrasing the comment to emphasize the why even more. For example, instead of "By default, this is 4GB...causes a reservation of 4GB...", you could say "Limiting to 16MB mitigates excessive virtual address space consumption, particularly in environments like Docker, where address space is limited (see #292)."

// Limiting to 16MB mitigates excessive virtual address space consumption,
        // particularly in environments like Docker, where address space is limited (see #292).
        // Each module requires a fixed amount of memory (a few KB), plus a variable
        // amount based on the number of rules and patterns (1 bit per rule and pattern).
        // 16MB is sufficient for millions of rules and patterns.

@egibs
Copy link

egibs commented Jan 20, 2025

Thanks for looking into this!

@plusvic plusvic merged commit 3a37a42 into main Jan 20, 2025
24 checks passed
@plusvic plusvic deleted the addr-space-reservation branch January 23, 2025 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Question] mmap os error 12 in Cloud Run
2 participants