Skip to content

[Joins] Source Reader implementation for Joins #15874

@harshavamsi

Description

@harshavamsi

Is your feature request related to a problem? Please describe

As part of milestone 1 for #15185, we plan on introducing a source reader abstraction for join operations.

Describe the solution you'd like

Copy pasting from #15185

Purpose
Read rows from the data source. It can make use of an index or simply scan of all rows depending on the query passed to it. It doesn’t work on optimizing the query but blindly executes the query passed to it at the time of initialization. It must support pagination and producing rows in batched manner efficiently.

For lucene based implementation, SourceReader will have access to the corresponding shard, which is a lucene index, and will execute the given lucene query. It will make use of customized Collector to collect documents and generate rows with docID (optionally) and desired fields to fetch.

Properties
Type: Lucene
Source identifier: Shard ID
Input
Query: lucene query for lucene based implementation
Pagination info: page size
Fields: fields to fetch
Output
Iterator of matching rows. A row is a tuple of <docID, f1, f2, f3>. Output here is non-serialized version of iterator, for java implementation it will be a new Iterator class object with ability like nextPage() which will fetch all rows in next page.
Note: It is the responsibility of stream to consume this iterator and perform serialization to send it over network if needed.

Related component

Search:Performance

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

🆕 New

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions