Skip to content

[FLINK-32609] Support Projection Pushdown #174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: v4.0
Choose a base branch
from

Conversation

fqshopify
Copy link

@fqshopify fqshopify commented May 9, 2025

Implements the SupportsProjectPushdown interface for KafkaDynamicSource

Benefits

  1. Improved resiliency
    • Currently SQL queries will fail if any field in the table undergoes a breaking schema change, even if the SQL query itself does not depend on that field.
    • After the changes in this PR, SQL queries will continue to work even if fields that they do not depend on experience breaking schema changes. See testProjectionPushdownWithJsonFormatAndBreakingSchemaChange for an example.
    • This particular improvement is only applicable to Formats that implement the ProjectableDecodingFormat interface and that can decode messages independently/dynamically e.g. json, avro-confluent, debezium-avro-confluent.
  2. Improved performance
    • Unneeded columns will be filtered out at an earlier stage of processing (specificially in the TableSourceScan node).
    • The amount of improvement in performance will vary depending on:
      • The number of columns selected
      • The number of columns in the source data
      • The DecodingFormat
      • etc.
    • Note: we cannot push projections all the way down into Kafka.
      • Projection pushdown is typically used as an I/O optimization technique by pushing projections down all the way into the storage layer.
      • Unfortunately the storage layer in our context, Kafka, does not support projection pushdown. As a result, we are not able to push projections down further than the deserialization step of our Flink pipelines.
      • We're still able to improve performance by eliminating unnecessary columns at an earlier stage of processing within our Flink pipeline but the performance benefits are relatively lower than they would have been otherwise.

Limitations

Different formats have different levels of support for projection pushdown:

  • Even though the AvroDecodingFormat implements the ProjectableDecodingFormat interface, it appears to have issues with various projections (see FLINK-35324 for more details).
  • The CSV format only supports top-level projection pushdown (probably because the CSV format has no standardized way to represent nested fields).
  • The JSON format supports both top-level and nested projection pushdown.

To account for the variety of supported projection pushdown levels, we expose a new pair of configs allowing the user to set the level of projection pushdown supported by the key and value format respectively.

@fqshopify fqshopify force-pushed the support_projection_pushdown branch 3 times, most recently from 218aec0 to 1192c9b Compare May 9, 2025 13:00
@fqshopify fqshopify force-pushed the support_projection_pushdown branch from 1192c9b to e22ed76 Compare June 15, 2025 10:17
@fqshopify fqshopify force-pushed the support_projection_pushdown branch from e22ed76 to 3dc9030 Compare June 25, 2025 18:00
@fqshopify fqshopify changed the base branch from main to v4.0 June 25, 2025 18:00
@fqshopify fqshopify force-pushed the support_projection_pushdown branch from 3dc9030 to acc9e2e Compare June 25, 2025 18:04
@fqshopify fqshopify marked this pull request as ready for review June 25, 2025 20:35
@fqshopify fqshopify force-pushed the support_projection_pushdown branch from acc9e2e to cc64369 Compare June 25, 2025 20:36
@fqshopify fqshopify force-pushed the support_projection_pushdown branch from cc64369 to 722df65 Compare June 25, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants