Skip to content

feat(snowflake): support pushdown allow + using LIKE rather than exact match for query log optimization #13649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

gabe-lyons
Copy link
Contributor

The pushdown_deny_usernames, param gave us an effective strategy for reducing noise in audit log. However, it didn't have the full flexibility to completely optimize our query. This approach adds two new improvements - support for LIKE and the ability to write an allowlist rather than deny list.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label May 28, 2025
Copy link

codecov bot commented May 28, 2025

Codecov Report

Attention: Patch coverage is 13.33333% with 13 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ub/ingestion/source/snowflake/snowflake_queries.py 7.14% 13 Missing ⚠️

❌ Your patch check has failed because the patch coverage (13.33%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

📢 Thoughts on this report? Let us know!

@datahub-cyborg datahub-cyborg bot added the needs-review Label for PRs that need review from a maintainer. label May 28, 2025
deny_conditions = []
for pattern in deny_usernames:
# Ensure pattern is uppercase for case-insensitive LIKE
deny_conditions.append(f"UPPER(user_name) NOT LIKE '{pattern.upper()}'")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use ILIKE here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah nice

default=[],
description="List of snowflake usernames (SQL LIKE patterns, e.g., 'ANALYST_%', '%_USER', 'MAIN_ACCOUNT') which WILL be considered for lineage/usage/queries extraction. "
"This is primarily useful for improving performance by filtering in only specific users. "
"If not specified, all users not in deny list are included.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are the allow and deny config flags mutually exclusive? it's not super clear if you can use both together, or if that'd even make sense to do

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should call out that these are case insensitive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks- ill clarify

if allow_conditions:
user_filters.append(f"({' OR '.join(allow_conditions)})")

users_filter_clause = " AND ".join(user_filters) if user_filters else "TRUE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might make sense to have a _build_user_filter(field_name) -> str helper method, ideally with some more focused unit tests

@datahub-cyborg datahub-cyborg bot added pending-submitter-merge and removed needs-review Label for PRs that need review from a maintainer. labels May 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata pending-submitter-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants