Skip to content

[draft] feat: improve performance of mapping #4166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

Conversation

talboren
Copy link
Member

@talboren talboren commented Mar 22, 2025

close #4165

Improves the performance of mapping by moving the logic to the DB side.
Since rows is just a simple "table" (e.g., a CSV file), we can flatten the JSON column with DB functions (i.e., JSON_TABLE, or

WITH flattened AS (
                SELECT
                    value AS row_data
                FROM mappingrule, json_each(mappingrule.rows)
                WHERE mappingrule.id = :rule_id
            )
            SELECT row_data FROM flattened

in the case of SQLite, and run the respective query in the DB.

In cases where we have a lot of rows (>50K), it's much more performant then loading everything in memory on every event..

TBD:

  • Write tests for MySQL
  • Write tests for SQlite
  • Write tests for PG

Useful: https://docs.sqlalchemy.org/en/14/orm/loading_columns.html#deferred-column-loading

Copy link

vercel bot commented Mar 22, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
keep ⬜️ Ignored (Inspect) Visit Preview Apr 7, 2025 3:21pm

Copy link

codecov bot commented Mar 22, 2025

Codecov Report

Attention: Patch coverage is 7.72059% with 251 lines in your changes missing coverage. Please review.

Project coverage is 30.59%. Comparing base (b1d0c31) to head (b8dedc0).
Report is 39 commits behind head on main.

Files with missing lines Patch % Lines
keep/api/bl/mapping_rule_matcher.py 8.29% 177 Missing ⚠️
keep/api/bl/enrichments_bl.py 5.26% 72 Missing ⚠️
keep/api/core/db.py 33.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4166      +/-   ##
==========================================
- Coverage   30.95%   30.59%   -0.36%     
==========================================
  Files          92       93       +1     
  Lines       10257    10613     +356     
==========================================
+ Hits         3175     3247      +72     
- Misses       7082     7366     +284     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@talboren
Copy link
Member Author

@skynetigor i'm not sure if cel_to_sql is the right place to put it, but it kinda reminded me with what you're doing with formatting the query to work on the DB. That's why I put it over there.
Would love your review btw, because I think you have a lot of context about formatting queries ;)

@VladimirFilonov
Copy link
Contributor

@talboren crazy question - don't we want to convert CSV with table on upload, but not on runtime?

@skynetigor
Copy link
Contributor

@talboren Great idea about utilizing DB for this 👍
To help keep the folder structure clean and organized, I believe this file aligns more with the mapping rule business logic rather than being a generic utility.
Since CEL to SQL is a generic mechanism that can be reused for querying or faceting data across the DB, I’d suggest placing mapping_rule_matcher closer to the bl folder instead.

@talboren talboren force-pushed the chore/mapping-perf branch 3 times, most recently from 71d4156 to d533568 Compare March 25, 2025 20:04
@talboren talboren changed the title feat: improve performance of mapping [draft] feat: improve performance of mapping Mar 27, 2025
Copy link
Contributor

github-actions bot commented Mar 27, 2025

Hey there and thank you for opening this pull request! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.

Details:

No release type found in pull request title "[draft] feat: improve performance of mapping". Add a prefix to indicate what kind of release this pull request corresponds to. For reference, see https://www.conventionalcommits.org/

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit

@talboren
Copy link
Member Author

Still working on it! Will try to finalize next week.

@talboren talboren closed this Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[🐛 Bug]: Mapping performance improvements
3 participants