- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.4k
 
Open
Description
Based on this discussion on Slack.
I've found a bug where output.segment.dir.uri is never read from the task config and always from the controller config. This leads to an edge case where if the deep store is not configured globally, it's impossible to run a metadata job.
The function getPushTaskConfig should prioritise taskConfig over global controllerConfig.
I will open a PR to fix and refactor the function if we agree on the behaviour.
Example:
Controller config
controller.data.dir=/var/pinot/controller/data
Task config
"MergeRollupTask": {
          "1day.mergeType": "concat",
          "1day.bucketTimePeriod": "1d",
          "1day.bufferTimePeriod": "1d",
          "1day.maxNumRecordsPerSegment": "100000",
          "1day.maxNumRecordsPerTask": "500000",
          "1day.maxNumParallelBuckets": "10",
          "minionInstanceTag": "merge",
          "push.mode": "METADATA",
          "output.segment.dir.uri": "gs://my-bucket/LOADED_HOURLY/merged",
          "schedule": "0 1 * * * ?"
        }
Result
{
  "configs": {
    "push.mode": "TAR",
    ...
    "output.segment.dir.uri": "/var/pinot/controller/data/LOADED_HOURLY",
    ...
  },
  "tableName": "LOADED_HOURLY_OFFLINE",
  "taskId": "Task_MergeRollupTask_5cd6364b-7012-4f60-8e8b-58ee4a2196c1_1759943074336_0",
  "taskType": "MergeRollupTask"
}
Expected
{
  "configs": {
    "push.mode": "METADATA",
    ...
    "output.segment.dir.uri": "gs://my-bucket/LOADED_HOURLY/merged",
    ...
  },
  "tableName": "LOADED_HOURLY_OFFLINE",
  "taskId": "Task_MergeRollupTask_5cd6364b-7012-4f60-8e8b-58ee4a2196c1_1759943074336_0",
  "taskType": "MergeRollupTask"
}
cc: @shounakmk219