Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: memoized some module functions #311

Merged
merged 2 commits into from
Mar 3, 2025

Conversation

JonathanAnbary
Copy link
Contributor

@JonathanAnbary JonathanAnbary commented Feb 26, 2025

There are a bunch of functions in the elf, pe, and macho modules who's results can be memoized (imphash, import_md5, etc).
This can result in a significant performance boost in cases where a large number of calls to these functions are made (usually because a large number of rules are used in a single scanner).
This is very similar to what is done in the hash module.
I'm not very experienced with rust so apologies if I've missed anything obvious.

A simple example to demonstrate the benefits is to run the following rule on any sort of pe file:

rule memoize_example {
    import "pe"
    condition:
        pe.calculate_checksum() == 0 or 
        pe.calculate_checksum() == 1 or 
        pe.calculate_checksum() == 2 or 
        pe.calculate_checksum() == 3 or 
        ...
        pe.calculate_checksum() == 100
}

This is obviously a very contrived example but a similar thing can happen if there are lots of different rules compiled together which each use these functions.

Copy link

google-cla bot commented Feb 26, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@JonathanAnbary JonathanAnbary changed the title memoized hash's perf: memoized some module functions Feb 26, 2025
@plusvic
Copy link
Member

plusvic commented Feb 27, 2025

For the time being I'm not merging this because the goal is implementing common-subexpression-elimination for optimizing cases like these during the compilation of the rules, regardless of the functions being called. Also notice that the with statement can also be used for easier to read and more performant expressions. See: https://virustotal.github.io/yara-x/docs/writing_rules/rule-conditions/#the-with-statement.

@plusvic plusvic closed this Feb 27, 2025
@JonathanAnbary
Copy link
Contributor Author

I'm not sure if this is the right place to ask about this, but I was wondering if the plan is to have common sub-expression elimination across rule and namespace boundaries?
Meaning if I compiled a hundred different rules all with a call to the same function (or having the same sub-expression in general) would those be eliminated?

@plusvic
Copy link
Member

plusvic commented Mar 2, 2025

I'm not sure if this is the right place to ask about this, but I was wondering if the plan is to have common sub-expression elimination across rule and namespace boundaries? Meaning if I compiled a hundred different rules all with a call to the same function (or having the same sub-expression in general) would those be eliminated?

Good point, common-subexpression elimination would work at the rule level, which means that there's still some value in implementing memoization for commonly used functions.

@plusvic plusvic reopened this Mar 3, 2025
@plusvic plusvic merged commit 0e90769 into VirusTotal:main Mar 3, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants