Skip to content

Conversation

@t0mpere
Copy link
Contributor

@t0mpere t0mpere commented Oct 14, 2025

See: #17014

Changing behaviour to let MergeRollupTask pick up output.segment.dir.uri value from the TaskConfig first.

The current behaviour only reads from controller.data.dir and if this path points to a local directory, it's impossible to run a metadata MergeRollup job.

Example:

Controller config

controller.data.dir=/var/pinot/controller/data

Task config

"MergeRollupTask": {
          "1day.mergeType": "concat",
          "1day.bucketTimePeriod": "1d",
          "1day.bufferTimePeriod": "1d",
          "1day.maxNumRecordsPerSegment": "100000",
          "1day.maxNumRecordsPerTask": "500000",
          "1day.maxNumParallelBuckets": "10",
          "minionInstanceTag": "merge",
          "push.mode": "METADATA",
          "output.segment.dir.uri": "gs://my-bucket/LOADED_HOURLY/merged",
          "schedule": "0 1 * * * ?"
        }

Result

{
  "configs": {
    "push.mode": "TAR",
    ...
    "output.segment.dir.uri": "/var/pinot/controller/data/LOADED_HOURLY",
    ...
  },
  "tableName": "LOADED_HOURLY_OFFLINE",
  "taskId": "Task_MergeRollupTask_5cd6364b-7012-4f60-8e8b-58ee4a2196c1_1759943074336_0",
  "taskType": "MergeRollupTask"
}

Expected

{
  "configs": {
    "push.mode": "METADATA",
    ...
    "output.segment.dir.uri": "gs://my-bucket/LOADED_HOURLY/merged",
    ...
  },
  "tableName": "LOADED_HOURLY_OFFLINE",
  "taskId": "Task_MergeRollupTask_5cd6364b-7012-4f60-8e8b-58ee4a2196c1_1759943074336_0",
  "taskType": "MergeRollupTask"
}

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes the configuration behavior for MergeRollupTask to properly prioritize the output.segment.dir.uri value from TaskConfig over the controller's data directory. This enables METADATA push mode for merge rollup operations when using remote storage.

Key Changes:

  • Refactored getPushTaskConfig method to prioritize task-level configuration for output segment directory
  • Added new helper method getOutputSegmentDirURI to handle URI resolution with proper precedence
  • Restructured the push mode logic using a switch statement for better clarity

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
MinionTaskUtils.java Refactored push task configuration logic to prioritize task config over controller data dir and improved code structure
MinionTaskUtilsTest.java Added comprehensive test coverage for various push mode scenarios and configuration combinations

Comment on lines +130 to +132
URI outputSegmentDirURI = getOutputSegmentDirURI(taskConfigs, clusterInfoAccessor, tableName);
if (!isLocalOutputDir(outputSegmentDirURI.getScheme())) {
switch (segmentPushType) {
Copy link

Copilot AI Oct 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider extracting the switch statement logic into a separate method to improve readability and reduce the complexity of getPushTaskConfig.

Copilot uses AI. Check for mistakes.
…c/test/java/org/apache/pinot/plugin/minion/tasks/MinionTaskUtilsTest.java

Co-authored-by: Copilot <[email protected]>
@codecov-commenter
Copy link

codecov-commenter commented Oct 14, 2025

Codecov Report

❌ Patch coverage is 77.77778% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.20%. Comparing base (a23c610) to head (26b1baf).
⚠️ Report is 23 commits behind head on master.

Files with missing lines Patch % Lines
...che/pinot/plugin/minion/tasks/MinionTaskUtils.java 77.77% 4 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17015      +/-   ##
============================================
+ Coverage     63.18%   63.20%   +0.02%     
- Complexity     1425     1433       +8     
============================================
  Files          3123     3124       +1     
  Lines        184813   185318     +505     
  Branches      28320    28334      +14     
============================================
+ Hits         116765   117124     +359     
- Misses        59028    59163     +135     
- Partials       9020     9031      +11     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.16% <77.77%> (-0.01%) ⬇️
java-21 63.15% <77.77%> (+7.25%) ⬆️
temurin 63.20% <77.77%> (+0.02%) ⬆️
unittests 63.19% <77.77%> (+0.02%) ⬆️
unittests1 55.79% <ø> (-0.15%) ⬇️
unittests2 33.78% <77.77%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@shounakmk219 shounakmk219 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this functionality!

@t0mpere
Copy link
Contributor Author

t0mpere commented Oct 16, 2025

@Jackie-Jiang could you please have a look? Thanks :)

@Jackie-Jiang
Copy link
Contributor

Please take a look at the failed test, and also solve the conflict

@t0mpere
Copy link
Contributor Author

t0mpere commented Nov 19, 2025

@Jackie-Jiang: Fixed the conflict, and it appears that an unrelated test is now failing. Should I just try to re-run?

@Jackie-Jiang Jackie-Jiang merged commit 61edb7a into apache:master Nov 19, 2025
17 of 18 checks passed
@Jackie-Jiang
Copy link
Contributor

@t0mpere Thanks for the contribution! Could you please also help update the pinot doc about this change?

@t0mpere t0mpere changed the title MergeRollupTask config behaviour fix for output.segment.dir.uri MergeRollupTask config behaviour enhancement for output.segment.dir.uri Nov 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants