Skip to content

Conversation

@raghuvanshraj
Copy link
Contributor

@raghuvanshraj raghuvanshraj commented Nov 10, 2025

IndexingMemoryController Integration for VSRs and ArrowWriters for parquet-data-format module

  • Integrated native memory tracking for VSRs and ArrowWriters in IndexingMemoryController
  • Fixed ArrowBufferPool allocator creation logic to have a single RootAllocator per shard and ChildAllocators for each ParquetWriter
  • Fixed VSR rotation bugs in ParquetDocumentInput.addToWriter code path

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

|                                                         Metric |         Task |       Value |   Unit |
|---------------------------------------------------------------:|-------------:|------------:|-------:|
|                     Cumulative indexing time of primary shards |              |           0 |    min |
|             Min cumulative indexing time across primary shards |              |           0 |    min |
|          Median cumulative indexing time across primary shards |              |           0 |    min |
|             Max cumulative indexing time across primary shards |              |           0 |    min |
|            Cumulative indexing throttle time of primary shards |              |     10.8082 |    min |
|    Min cumulative indexing throttle time across primary shards |              |     10.8082 |    min |
| Median cumulative indexing throttle time across primary shards |              |     10.8082 |    min |
|    Max cumulative indexing throttle time across primary shards |              |     10.8082 |    min |
|                        Cumulative merge time of primary shards |              |           0 |    min |
|                       Cumulative merge count of primary shards |              |           0 |        |
|                Min cumulative merge time across primary shards |              |           0 |    min |
|             Median cumulative merge time across primary shards |              |           0 |    min |
|                Max cumulative merge time across primary shards |              |           0 |    min |
|               Cumulative merge throttle time of primary shards |              |           0 |    min |
|       Min cumulative merge throttle time across primary shards |              |           0 |    min |
|    Median cumulative merge throttle time across primary shards |              |           0 |    min |
|       Max cumulative merge throttle time across primary shards |              |           0 |    min |
|                      Cumulative refresh time of primary shards |              |           0 |    min |
|                     Cumulative refresh count of primary shards |              |           2 |        |
|              Min cumulative refresh time across primary shards |              |           0 |    min |
|           Median cumulative refresh time across primary shards |              |           0 |    min |
|              Max cumulative refresh time across primary shards |              |           0 |    min |
|                        Cumulative flush time of primary shards |              |     0.34075 |    min |
|                       Cumulative flush count of primary shards |              |        2205 |        |
|                Min cumulative flush time across primary shards |              |     0.34075 |    min |
|             Median cumulative flush time across primary shards |              |     0.34075 |    min |
|                Max cumulative flush time across primary shards |              |     0.34075 |    min |
|                                        Total Young Gen GC time |              |       3.534 |      s |
|                                       Total Young Gen GC count |              |         175 |        |
|                                          Total Old Gen GC time |              |           0 |      s |
|                                         Total Old Gen GC count |              |           0 |        |
|                                                     Store size |              | 1.93715e-07 |     GB |
|                                                  Translog size |              | 5.12227e-08 |     GB |
|                                         Heap used for segments |              |           0 |     MB |
|                                       Heap used for doc values |              |           0 |     MB |
|                                            Heap used for terms |              |           0 |     MB |
|                                            Heap used for norms |              |           0 |     MB |
|                                           Heap used for points |              |           0 |     MB |
|                                    Heap used for stored fields |              |           0 |     MB |
|                                                  Segment count |              |           0 |        |
|                                                 Min Throughput | index-append |     42552.7 | docs/s |
|                                                Mean Throughput | index-append |     43295.4 | docs/s |
|                                              Median Throughput | index-append |     42993.1 | docs/s |
|                                                 Max Throughput | index-append |     46663.1 | docs/s |
|                                        50th percentile latency | index-append |     784.922 |     ms |
|                                        90th percentile latency | index-append |        1093 |     ms |
|                                        99th percentile latency | index-append |     1433.91 |     ms |
|                                      99.9th percentile latency | index-append |     1753.09 |     ms |
|                                     99.99th percentile latency | index-append |     2087.86 |     ms |
|                                       100th percentile latency | index-append |     2279.25 |     ms |
|                                   50th percentile service time | index-append |     784.922 |     ms |
|                                   90th percentile service time | index-append |        1093 |     ms |
|                                   99th percentile service time | index-append |     1433.91 |     ms |
|                                 99.9th percentile service time | index-append |     1753.09 |     ms |
|                                99.99th percentile service time | index-append |     2087.86 |     ms |
|                                  100th percentile service time | index-append |     2279.25 |     ms |
|                                                     error rate | index-append |           0 |      % |

------------------------------------
[INFO] ✅ SUCCESS (took 2362 seconds)
------------------------------------
Screenshot 2025-11-10 at 1 19 44 PM

@github-actions
Copy link
Contributor

❌ Gradle check result for 8990894: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Contributor

❌ Gradle check result for 801e698: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

…rquet-data-format module

- Integrated native memory tracking for VSRs and ArrowWriters in IndexingMemoryController
- Fixed ArrowBufferPool allocator creation logic to have a single RootAllocator per shard and ChildAllocators for each ParquetWriter
- Fixed VSR rotation bugs in ParquetDocumentInput.addToWriter code path

Signed-off-by: Raghuvansh Raj <[email protected]>
@raghuvanshraj raghuvanshraj force-pushed the feature/datafusion-imc-pr branch from 801e698 to 8792c8b Compare November 12, 2025 09:18
@github-actions
Copy link
Contributor

❌ Gradle check result for 839ffd0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?


public static final Setting<String> INDEX_MAX_NATIVE_ALLOCATION = Setting.simpleString(
"index.parquet.max_native_allocation",
DEFAULT_MAX_NATIVE_ALLOCATION,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add Validation for syntax of setting value.

private final VSRPool vsrPool;

public VSRManager(String fileName, Schema schema) {
public VSRManager(String fileName, Schema schema, ArrowBufferPool arrowBufferPool) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be able to consume VSRPool?

@bharath-techie bharath-techie merged commit 459ead6 into opensearch-project:feature/datafusion Nov 12, 2025
5 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants