Introduce a streaming read_parquet node #574

wence- · 2025-10-10T16:45:47Z

The provided options are collective across the participating ranks. The input file list will be split (approximately evenly) across ranks.

The caller can control the number of rows that are read per chunk sent to the output channel, as well as the number producer tasks that may be waiting, suspended, having read a chunk. Increasing the number of producer tasks can increase pipelining and latency-hiding opportunities for the execution scheduler, at the cost of higher memory pressure.

To ensure that, in this multi-producer setup, chunks are still inserted into the output channel in increasing sequence number order, we implement a Lineariser utility that one should always use when sending from multiple producers. The idea is that it is created with one input channel per producer and each producer sends its sequence of chunks into the input channel in order, the lineariser buffers them and sends things out in global total order.

Closes Ordering guarantees (or not) in rapidsmpf channels #599

copy-pr-bot · 2025-10-10T16:45:51Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpp/include/rapidsmpf/streaming/cudf/parquet.hpp

nirandaperera

I added some comments. Should we also add some c++ tests?

cpp/src/streaming/cudf/parquet.cpp

cpp/src/streaming/core/node.cpp

madsbk

First pass, looks good.
It would be good with some basic C++ testing

cpp/src/streaming/cudf/parquet.cpp

wence- · 2025-10-22T14:53:05Z

Added C++ tests, ready for another look

cpp/include/rapidsmpf/streaming/cudf/parquet.hpp

cpp/tests/streaming/test_read_parquet.cpp

rjzamora · 2025-10-22T20:23:58Z

cpp/src/streaming/cudf/parquet.cpp

+    std::shared_ptr<Channel> ch_out,
+    std::ptrdiff_t max_tickets,
+    cudf::io::parquet_reader_options options,
+    cudf::size_type num_rows_per_chunk


This num_rows_per_chunk argument is the only "knob" for tuning the chunk size, which is understandable.

We may have trouble using this variation of the read_parquet node in cudf-polars unless we implement dynamic partitioning, or we force the metadata sampling to gather all footer metadata before the streaming network is constructed. Recall that the statically-partitioned version of cudf-polars needs to know how many chunks the Scan operation will produce ahead of time.

We are already planning to do dynamic partitioning in the future, so temporarily collecting extra/redundant parquet metadata up-front probably isn't a big deal. Even so, I'm wondering if there is a more general way to configure the chunk size here?

For example, it would be great if we could set an upper limit on the row-count-per chunk, but could also ask for "N chunks per file" or "N files per chunk" (to temporarily satisfy the static partitioned case). Even if we do focus on a row-count-based configuration, it may make sense to align the chunks with row-group boundaries as long as the row-count doesn't exceed the upper limit.

Recall that the statically-partitioned version of cudf-polars needs to know how many chunks the Scan operation will produce ahead of time.

I still struggle to understand this statement. I understand that the output of a shuffle needs to specify ahead of time the number of partitions it is going to use, but the scan is an input.

We are already planning to do dynamic partitioning in the future, so temporarily collecting extra/redundant parquet metadata up-front probably isn't a big deal. Even so, I'm wondering if there is a more general way to configure the chunk size here?

I am very happy to try different chunking strategies but they have to be amenable to the multi-threaded read approach I have here.

I still struggle to understand this statement. I understand that the output of a shuffle needs to specify ahead of time the number of partitions it is going to use, but the scan is an input.

Yeah, I'm hoping that this is mostly true. The only immediate problem I can think of is if the "plan" tells us we will get a single chunk, but the IO node actually produces 2 or more. Since we cannot dynamically inject a concatenation or shuffle yet, we will have an issue if we run into an operation that is not supported for multiple partitions.

I am very happy to try different chunking strategies but they have to be amenable to the multi-threaded read approach I have here.

My naive hypothesis is that row-group aligned reads would be easier to do well, but I could be wrong.

wence- · 2025-10-29T18:23:49Z

This is ready for another look. I've updating things so that the read parquet sends chunks into the channel "in-order".

I will do some benchmarking tomorrow to check everything still looks good.

madsbk

Looks very good

cpp/include/rapidsmpf/streaming/core/lineariser.hpp

wence- · 2025-10-30T13:23:19Z

I will do some benchmarking tomorrow to check everything still looks good.

OK, I did this and it looks as before for the examples I have, so I think this is good.

cpp/include/rapidsmpf/streaming/core/lineariser.hpp

cpp/src/streaming/cudf/parquet.cpp

nirandaperera

Had some questions mostly.

cpp/include/rapidsmpf/streaming/core/lineariser.hpp

nirandaperera · 2025-10-30T16:29:51Z

cpp/src/streaming/cudf/parquet.cpp

+    RAPIDSMPF_EXPECTS(
+        files.size() < std::numeric_limits<int>::max(), "Trying to read too many files"
+    );
+    // TODO: Handle case where multiple ranks are reading from a single file.


nit, shall we open an issue to track these TODOs?

cpp/src/streaming/cudf/parquet.cpp

cpp/tests/streaming/test_lineariser.cpp

cpp/tests/test_allgather.cpp

We advertise in the type stubs that these are generics so we should probably allow them to be used as such.

It is convenient if we send zero-sized chunks so that we don't need to have out-of-band approaches to communicating the metadata (and hence shape) of any inserted chunks.

wence- force-pushed the wence/fea/streaming-read-parquet branch from a00dfcf to 67fb873 Compare October 14, 2025 16:56

TomAugspurger reviewed Oct 15, 2025

View reviewed changes

cpp/include/rapidsmpf/streaming/cudf/parquet.hpp Outdated Show resolved Hide resolved

wence- force-pushed the wence/fea/streaming-read-parquet branch 2 times, most recently from 17a4099 to c3a4fbd Compare October 17, 2025 11:48

wence- added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Oct 17, 2025

wence- force-pushed the wence/fea/streaming-read-parquet branch from c3a4fbd to 208c876 Compare October 17, 2025 12:01

wence- marked this pull request as ready for review October 17, 2025 12:01

wence- requested review from a team as code owners October 17, 2025 12:01

wence- requested a review from bdice October 17, 2025 12:01

nirandaperera reviewed Oct 17, 2025

View reviewed changes

cpp/src/streaming/cudf/parquet.cpp Outdated Show resolved Hide resolved

cpp/src/streaming/cudf/parquet.cpp Outdated Show resolved Hide resolved

cpp/src/streaming/cudf/parquet.cpp Outdated Show resolved Hide resolved

cpp/src/streaming/cudf/parquet.cpp Outdated Show resolved Hide resolved

madsbk reviewed Oct 20, 2025

View reviewed changes

cpp/src/streaming/core/node.cpp Outdated Show resolved Hide resolved

madsbk reviewed Oct 20, 2025

View reviewed changes

cpp/src/streaming/cudf/parquet.cpp Outdated Show resolved Hide resolved

wence- force-pushed the wence/fea/streaming-read-parquet branch from 4bfdd8a to 5ecc1e9 Compare October 22, 2025 14:52

wence- force-pushed the wence/fea/streaming-read-parquet branch from 5ecc1e9 to 17d72e1 Compare October 22, 2025 14:57

TomAugspurger mentioned this pull request Oct 22, 2025

[FEA]: Use rapidsmpf's streaming read_parquet node in cudf-polars rapidsai/cudf#20338

Open

madsbk reviewed Oct 22, 2025

View reviewed changes

cpp/include/rapidsmpf/streaming/cudf/parquet.hpp Show resolved Hide resolved

cpp/tests/streaming/test_read_parquet.cpp Outdated Show resolved Hide resolved

wence- force-pushed the wence/fea/streaming-read-parquet branch from c12668c to 4cdc520 Compare October 22, 2025 15:42

rjzamora reviewed Oct 22, 2025

View reviewed changes

wence- force-pushed the wence/fea/streaming-read-parquet branch 4 times, most recently from e1edb18 to 6deea8f Compare October 29, 2025 18:22

madsbk approved these changes Oct 30, 2025

View reviewed changes

cpp/include/rapidsmpf/streaming/core/lineariser.hpp Show resolved Hide resolved

cpp/include/rapidsmpf/streaming/core/lineariser.hpp Show resolved Hide resolved

rjzamora reviewed Oct 30, 2025

View reviewed changes

cpp/include/rapidsmpf/streaming/core/lineariser.hpp Show resolved Hide resolved

cpp/src/streaming/cudf/parquet.cpp Show resolved Hide resolved

wence- force-pushed the wence/fea/streaming-read-parquet branch from 92bf2b2 to 282f21e Compare October 30, 2025 15:44

nirandaperera reviewed Oct 30, 2025

View reviewed changes

wence- added 12 commits October 30, 2025 17:47

Upgrade mypy

1b7e528

Add __class_getitem__ to generic Channel and Message

ea8ea45

We advertise in the type stubs that these are generics so we should probably allow them to be used as such.

Allgather communicates zero-sized chunks

e26bcb9

It is convenient if we send zero-sized chunks so that we don't need to have out-of-band approaches to communicating the metadata (and hence shape) of any inserted chunks.

Ensure throttle is created with positive number of tickets

c4850c9

Implement lineariser to order multi-producer scenarios

49bb10b

Add a streaming read_parquet node

408f046

Expose read_parquet node to python

84f6e4d

Add example in docs and simplify drain implementation

7c72773

Expose lineariser to python

04ddc62

Test lineariser from python

a70531f

Add standalone test of lineariser

21b0a2d

Address review comments

2ad2c30

wence- force-pushed the wence/fea/streaming-read-parquet branch from 282f21e to 2ad2c30 Compare October 30, 2025 17:47

wence- mentioned this pull request Oct 30, 2025

Feature improvements for multi-rank read-parquet #615

Open

gforsyth approved these changes Oct 31, 2025

View reviewed changes

Introduce a streaming read_parquet node #574

Are you sure you want to change the base?

Introduce a streaming read_parquet node #574

Uh oh!

Conversation

wence- commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Oct 10, 2025

Uh oh!

Uh oh!

nirandaperera left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

madsbk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wence- commented Oct 22, 2025

Uh oh!

Uh oh!

Uh oh!

rjzamora Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

wence- Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

rjzamora Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

wence- commented Oct 29, 2025

Uh oh!

madsbk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

wence- commented Oct 30, 2025

Uh oh!

Uh oh!

Uh oh!

nirandaperera left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nirandaperera Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

wence- commented Oct 10, 2025 •

edited

Loading