[Pluggable Dataformat] Adding support for FieldTypeCapabilities by darjisagar7 · Pull Request #21733 · opensearch-project/OpenSearch

darjisagar7 · 2026-05-19T08:52:08Z

Description

[Describe what this change achieves]

Related Issues

Resolves #[Issue number to be closed when this PR is merged]

Check List

Functionality includes testing.
API changes companion pull request created, if applicable.
Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Sagar Darji <darsaga@amazon.com>

github-actions · 2026-05-19T08:53:40Z

PR Reviewer Guide 🔍

(Review updated until commit `022ce52`)

Here are some key observations to aid the review process:

🧪 No relevant tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review Missing License Header The file is missing the Apache 2.0 license header that is present in all other files in this PR. This violates the project's licensing requirements. package org.opensearch.parquet.fields.core.metadata; Logic Inversion Bug The condition checks if type equals `NESTED_CONTENT_TYPE` AND pluggable data format is enabled, then sets `nested = true`. However, the original code set `nested = true` when type equals `NESTED_CONTENT_TYPE` unconditionally. The new logic inverts this: it only sets `nested = true` when the feature is enabled, but the subsequent code expects `nested` to be set whenever the type is `NESTED_CONTENT_TYPE`. This breaks nested object handling when pluggable data format is disabled. } else if (type.equals(NESTED_CONTENT_TYPE) && isPluggableDataFormatEnabled(parserContext.getSettings())) { nested = true; Null Dereference Risk If `fieldType.getCapabilityMap()` returns null (though the API suggests it should not), the code will throw NullPointerException at line 100 when calling `capMap.isEmpty()`. While the current implementation may guarantee non-null, defensive coding would check for null before calling methods on `capMap`. Map<DataFormat, Set<FieldTypeCapabilities.Capability>> capMap = fieldType.getCapabilityMap();

github-actions · 2026-05-19T08:54:28Z

PR Code Suggestions ✨

Latest suggestions up to 022ce52
Explore these optional code suggestions:

Category	Suggestion	Impact
Possible issue	Restore nested type parsing compatibility The nested type parsing is now gated behind `isPluggableDataFormatEnabled`, but the original logic allowed nested types unconditionally. This change breaks backward compatibility for indices created before the pluggable data format feature. Remove the feature flag check to preserve existing behavior. server/src/main/java/org/opensearch/index/mapper/ObjectMapper.java [420-422] if (type.equals(CONTENT_TYPE)) { builder.nested = Nested.NO; -} else if (type.equals(NESTED_CONTENT_TYPE) && isPluggableDataFormatEnabled(parserContext.getSettings())) { +} else if (type.equals(NESTED_CONTENT_TYPE)) { nested = true; } Suggestion importance[1-10]: 9 __ Why: The change gates `NESTED_CONTENT_TYPE` parsing behind `isPluggableDataFormatEnabled`, which breaks backward compatibility for existing indices. This is a critical issue that could prevent nested documents from being parsed correctly when the feature flag is disabled, affecting production workloads.	High
Possible issue	Prevent indexing failure for metadata field The `IndexParquetField` throws an exception when `addToGroup` is called, but this method is invoked during document indexing. If the `_index` field is registered and reaches this code path, indexing will fail. Consider either implementing proper handling or filtering out this field type earlier in the pipeline to prevent runtime failures. sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/fields/core/metadata/IndexParquetField.java [14-17] @Override protected void addToGroup(MappedFieldType fieldType, ManagedVSR managedVSR, Object parseValue) { - throw new IllegalStateException("Index field is not supposed to be added to group"); + // _index field is metadata-only and should not be stored in Parquet + // Silently skip to avoid indexing failures } Suggestion importance[1-10]: 7 __ Why: The `addToGroup` method throws an `IllegalStateException`, which could cause indexing failures if the `_index` field reaches this code path. The suggestion to silently skip is valid, though the PR may intentionally prevent this field from being indexed. The score reflects that this is a legitimate concern about potential runtime failures.	Medium
General	Validate format configuration before assignment When `configuredFormats` is empty but the field requests capabilities, the method silently sets an empty capability map. This causes fields to be skipped during indexing without clear indication of misconfiguration. Throw an exception or log a warning when required capabilities cannot be satisfied due to missing format configuration. server/src/main/java/org/opensearch/index/engine/dataformat/FieldCapabilityAssigner.java [44-55] public void assign(MappedFieldType fieldType) { Set<FieldTypeCapabilities.Capability> requested = fieldType.requestedCapabilities(); if (requested.isEmpty()) { fieldType.setCapabilityMap(Map.of()); return; } if (configuredFormats.isEmpty()) { - fieldType.setCapabilityMap(Map.of()); - return; + throw new MapperParsingException( + "Field [" + fieldType.name() + "] requires capabilities " + requested + + " but no data formats are configured for this index" + ); } Suggestion importance[1-10]: 8 __ Why: When `configuredFormats` is empty but the field requests capabilities, silently setting an empty map causes fields to be skipped during indexing without clear error messages. Throwing an exception would provide better feedback about misconfiguration. This is a significant improvement for error handling and user experience.	Medium
General	Add debug logging for skipped fields When the capability map is empty, the field is silently skipped. However, this could hide configuration errors where a field type should be indexed but isn't due to missing capability declarations. Consider logging a warning when skipping fields with empty capability maps to aid debugging. sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/index/LuceneDocumentInput.java [100-104] Map<DataFormat, Set<FieldTypeCapabilities.Capability>> capMap = fieldType.getCapabilityMap(); if (capMap.isEmpty()) { // No capability map set — no format declared support for this field type, skip it + logger.debug("Skipping field [{}] with type [{}]: no format declared support", fieldType.name(), fieldType.typeName()); return; } Suggestion importance[1-10]: 5 __ Why: Adding debug logging for skipped fields would improve observability and help with troubleshooting configuration issues. However, this is a minor enhancement that doesn't address a functional bug, and the silent skip behavior may be intentional for the per-format filtering design.	Low

Previous suggestions

Suggestions up to commit 3bde29b

Category	Suggestion	Impact
General	Document capability assignment order The capability assignment for metadata fields happens after the recursive walk of the root mapper. If metadata mappers contain nested structures or if the order of assignment matters for validation, this could lead to inconsistent state. Consider documenting why metadata fields are assigned separately or verify that the order is intentional. server/src/main/java/org/opensearch/index/mapper/DocumentMapper.java [126-136] if (dataFormatRegistry != null) { + // Assign capabilities to all non-metadata fields first assignCapabilitiesRecursive(rootObjectMapper, builderContext); + // Then assign to metadata fields - order matters for validation consistency for (MetadataFieldMapper metadataMapper : metadataMappers.values()) { builderContext.assignCapabilities(metadataMapper.fieldType()); } } Suggestion importance[1-10]: 5 __ Why: Adding a comment to clarify why metadata fields are assigned separately improves code maintainability and helps future developers understand the intentional ordering, though the code itself is functionally correct.	Low
	Remove redundant empty set check The condition `ownedCaps.isEmpty() == false` is redundant because `capMap.get(owningFormat)` returns `null` when the format doesn't own the field. The check should verify `ownedCaps != null` only, as an empty set would indicate the format owns zero capabilities (which shouldn't occur in practice but would still be a valid ownership state). sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/writer/ParquetDocumentInput.java [56-65] Map<DataFormat, Set<FieldTypeCapabilities.Capability>> capMap = fieldType.getCapabilityMap(); if (capMap.isEmpty()) { // No capability map set — no format declared support for this field type, skip it return; } Set<FieldTypeCapabilities.Capability> ownedCaps = capMap.get(owningFormat); -if (ownedCaps != null && ownedCaps.isEmpty() == false) { +if (ownedCaps != null) { collectedFields.add(new FieldValuePair(fieldType, value)); } Suggestion importance[1-10]: 4 __ Why: The suggestion correctly identifies that checking `ownedCaps.isEmpty() == false` is redundant when `ownedCaps != null` already filters out non-ownership. However, the impact is minimal as the current code is functionally correct, just slightly verbose.	Low
	Use immutable list collector The error message construction uses `Collectors.toList()` which creates a mutable list unnecessarily. For error messages, use `Collectors.toUnmodifiableList()` or collect to a string directly to avoid potential modification issues and improve clarity. server/src/main/java/org/opensearch/index/engine/dataformat/FieldCapabilityAssigner.java [72-85] if (remaining.isEmpty() == false) { throw new MapperParsingException( "Field [" + fieldType.name() + "] of type [" + typeName + "] requires capabilities " + requested + " but configured data formats cannot collectively cover them. Unsupported capabilities: " + remaining + ". Configured formats: " - + configuredFormats.stream().map(DataFormat::name).collect(Collectors.toList()) + + configuredFormats.stream().map(DataFormat::name).collect(Collectors.toUnmodifiableList()) ); } Suggestion importance[1-10]: 3 __ Why: Using `Collectors.toUnmodifiableList()` in error message construction is a minor improvement for immutability, but the impact is negligible since the list is only used for display and immediately discarded.	Low
Possible issue	Fix empty capability set handling The logic incorrectly treats an empty capability set as "another format owns this field". An empty set returned from `capMap.get(owningFormat)` means this format owns the field but with zero capabilities, which is semantically different from `null` (format doesn't own the field at all). The check should only verify `ownedCaps ==` `null`. sandbox/plugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/index/LuceneDocumentInput.java [100-109] Map<DataFormat, Set<FieldTypeCapabilities.Capability>> capMap = fieldType.getCapabilityMap(); if (capMap.isEmpty()) { // No capability map set — no format declared support for this field type, skip it return; } Set<FieldTypeCapabilities.Capability> ownedCaps = capMap.get(owningFormat); -if (ownedCaps == null \|\| ownedCaps.isEmpty()) { +if (ownedCaps == null) { // Another format owns this field — silently skip return; } Suggestion importance[1-10]: 4 __ Why: The suggestion raises a valid semantic point about distinguishing `null` (format doesn't own field) from empty set (format owns field with zero capabilities). However, in practice, an empty capability set for an owned field is unlikely and the current defensive check is reasonable.	Low

mgodwan · 2026-05-19T09:01:07Z

     * @param fieldFactoryRegistry the registry to use for field creation
     */
-    public LuceneDocumentInput(LuceneFieldFactoryRegistry fieldFactoryRegistry) {
+    public LuceneDocumentInput(DataFormat owningFormat, LuceneFieldFactoryRegistry fieldFactoryRegistry) {


We can hardcode to Lucene format here

mgodwan · 2026-05-19T09:03:49Z

+        if (ownedCaps == null || ownedCaps.isEmpty()) {
+            // Another format owns this field — silently skip
+            return;
+        }


Lets add tests for this

mgodwan · 2026-05-19T09:04:34Z

+        new FieldTypeCapabilities(IgnoredFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities(IdFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities(RoutingFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities(SourceFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities(SeqNoFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.COLUMNAR_STORAGE,
+            FieldTypeCapabilities.Capability.POINT_RANGE)),
+        new FieldTypeCapabilities(IndexFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.COLUMNAR_STORAGE,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities(NestedPathFieldMapper.NAME, Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities(VersionFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities(DocCountFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities("_feature", Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH)),
+        new FieldTypeCapabilities(FieldNamesFieldMapper.CONTENT_TYPE, Set.of(FieldTypeCapabilities.Capability.STORED_FIELDS,
+            FieldTypeCapabilities.Capability.FULL_TEXT_SEARCH))


how is this determined?

github-actions · 2026-05-19T09:04:59Z

❌ Gradle check result for 3bde29b: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

mgodwan · 2026-05-19T09:06:37Z


+        @Override
+        protected FieldTypeCapabilities.Capability searchCapability() {
+            return FieldTypeCapabilities.Capability.POINT_RANGE;


is this even supported?

mgodwan · 2026-05-19T09:07:00Z


+        @Override
+        protected FieldTypeCapabilities.Capability searchCapability() {
+            return FieldTypeCapabilities.Capability.POINT_RANGE;


Can you add for PrimaryTermFieldType as well?

Same POINT_RANGE right?

github-actions · 2026-05-19T17:30:56Z

Persistent review updated to latest commit 022ce52

github-actions · 2026-05-19T17:44:51Z

❌ Gradle check result for 022ce52: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Sagar Darji <darsaga@amazon.com>

github-actions · 2026-05-19T18:15:30Z

PR Code Analyzer ❗

AI-powered 'Code-Diff-Analyzer' found issues on commit 496920e.

Path	Line	Severity	Description
server/src/main/java/org/opensearch/index/mapper/ObjectMapper.java	421	medium	Nested type parsing is now gated behind `isPluggableDataFormatEnabled`. Any OpenSearch index that uses nested field mappings but does NOT have the experimental pluggable data format feature enabled will receive a MapperParsingException when creating or updating mappings. Nested types are a core, widely-used feature; silently breaking them for the majority of users unless an opt-in experimental flag is set is a significant and suspicious regression that is unrelated to the stated purpose of the per-format capability routing feature.
server/src/main/java/org/opensearch/index/engine/exec/PrimaryTermFieldType.java	41	low	The `PrimaryTermFieldType` constructor changes `hasDocValues` from `false` to `true`. This modifies the behaviour of an internal metadata field in a way that will cause it to request `COLUMNAR_STORAGE` via `requestedCapabilities()`, potentially routing it into the capability-assignment pipeline and altering how it is stored. The change is unexplained in the diff and could cause unintended side effects in capability assignment for this critical internal field.
sandbox/plugins/parquet-data-format/src/main/java/org/opensearch/parquet/fields/core/metadata/IndexParquetField.java	16	low	New `IndexParquetField` class throws `IllegalStateException` from `addToGroup`, which is the core ingestion path. While the per-format capability filter should prevent it from being invoked at runtime, the class is registered as the handler for `IndexFieldMapper.CONTENT_TYPE` in `MetadataFieldPlugin`. If capability map filtering is ever bypassed or misconfigured, this would cause an unhandled exception during document indexing. The absence of any graceful handling or override (getArrowType/getFieldType both return null) makes this pattern unusual and worth review.

The table above displays the top 10 most important findings.

Total: 3 | Critical: 0 | High: 0 | Medium: 1 | Low: 2

Pull Requests Author(s): Please update your Pull Request according to the report above.

Repository Maintainer(s): You can bypass diff analyzer by adding label skip-diff-analyzer after reviewing the changes carefully, then re-run failed actions. To re-enable the analyzer, remove the label, then re-run all actions.

⚠️ Note: The Code-Diff-Analyzer helps protect against potentially harmful code patterns. Please ensure you have thoroughly reviewed the changes beforehand.

Thanks.

[Pluggable Dataformat] Adding support for FieldTypeCapabilities

3bde29b

Signed-off-by: Sagar Darji <darsaga@amazon.com>

darjisagar7 requested review from a team, Bukhtawar, andrross, ashking94, cwperks, dbwiddis, gbbafna, jed326, kotwanikunal, mch2, msfroh, owaiskazi19, reta, sachinpkale, saratvemulapalli, shwetathareja and sohami as code owners May 19, 2026 08:52

mgodwan reviewed May 19, 2026

View reviewed changes

Comment thread ...lugins/analytics-backend-lucene/src/main/java/org/opensearch/be/lucene/LuceneDataFormat.java Outdated

mgodwan reviewed May 19, 2026

View reviewed changes

Addressing comments and more changes

496920e

Signed-off-by: Sagar Darji <darsaga@amazon.com>

darjisagar7 force-pushed the FTC branch from 022ce52 to 496920e Compare May 19, 2026 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pluggable Dataformat] Adding support for FieldTypeCapabilities#21733

[Pluggable Dataformat] Adding support for FieldTypeCapabilities#21733
darjisagar7 wants to merge 2 commits into
opensearch-project:mainfrom
darjisagar7:FTC

darjisagar7 commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

mgodwan May 19, 2026

Uh oh!

mgodwan May 19, 2026

Uh oh!

mgodwan May 19, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

mgodwan May 19, 2026

Uh oh!

mgodwan May 19, 2026

Uh oh!

darjisagar7 May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

darjisagar7 commented May 19, 2026

Description

Related Issues

Check List

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Reviewer Guide 🔍

(Review updated until commit 022ce52)

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Previous suggestions

Uh oh!

mgodwan May 19, 2026

Choose a reason for hiding this comment

Uh oh!

mgodwan May 19, 2026

Choose a reason for hiding this comment

Uh oh!

mgodwan May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

mgodwan May 19, 2026

Choose a reason for hiding this comment

Uh oh!

mgodwan May 19, 2026

Choose a reason for hiding this comment

Uh oh!

darjisagar7 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026

PR Code Analyzer ❗

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 19, 2026 •

edited

Loading

(Review updated until commit `022ce52`)

github-actions Bot commented May 19, 2026 •

edited

Loading