Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fields in nested attribute after text_embedding processor are lost #1244

Closed
StApostol opened this issue Mar 19, 2025 · 2 comments
Closed
Labels
bug Something isn't working

Comments

@StApostol
Copy link

StApostol commented Mar 19, 2025

Describe the bug

Only one value remains in the nested attribute after the processor's text_embedding is applied. This behaviour was observed after upgrading to version 2.19+

The source document contains two nested objects

"nested_field": [
        { "field1": "value1", "field2": "value2" },
        { "field1": "value3", "field2": "value4" }
      ],
      "categories": [
        { "id": "value1", "name": "value2", "test": true },
        { "id": "value3", "name": "value4", "test": true }
      ],

Result after apply processor

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_source": {
          "nested_field": [
            {
              "field1": "value1"
            },
            {
              "field1": "value3"
            }
          ],
          "price": {
             "USD": 100,
             "EUR": 125
          },
          "name": "Shoes name",
          "categories": [
            {
              "test": true
            },
            {
              "test": true
            }
          ],
          "ml_embedding": [
            -0.4356411,
            0.09718768,
            0.36040825,
            0.08407341,
            ....
          ],
          "ml_text": "Shoes"
        },
        "_ingest": {
          "timestamp": "2025-03-18T09:24:22.931543921Z"
        }
      }
    }
  ]
}

Related component

Other

To Reproduce

  1. Deploy ML model (used huggingface/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2)
  2. Simulate pipeline on document
POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
    {
      "text_embedding": {
        "model_id": "model_id",
        "field_map": {
          "ml_text": "ml_embedding"
        }
      }
    }
    ]
  },  
  "docs": [
    {
      "_source"  : {
      "name": "Shoes name",
      "ml_text": "Shoes",
      "nested_field": [
        { "field1": "value1", "field2": "value2" },
        { "field1": "value3", "field2": "value4" }
      ],
      "categories": [
        { "id": "value1", "name": "value2", "test": true },
        { "id": "value3", "name": "value4", "test": true }
      ],
      "ml_embedding": null,
      "price": {
        "USD": 100,
        "EUR": 125
      }
    
    }
    }
  ]
}

Expected behavior

Obtaining a document with the structure of the source data and additional data ml_embeddibg

{
  "docs": [
    {
      "doc": {
        "_index": "_index",
        "_id": "_id",
        "_source": {
          "nested_field": [
            {
              "field2": "value2",
              "field1": "value1"
            },
            {
              "field2": "value4",
              "field1": "value3"
            }
          ],
          "price": {
             "USD": 100,
             "EUR": 125
          },
          "name": "Shoes name",
          "categories": [
            {
              "name": "value2",
              "test": true,
              "id": "value1"
            },
            {
              "name": "value4",
              "test": true,
              "id": "value3"
            }
          ],
          "ml_embedding": [
            -0.4356411,
            0.09718768,
            0.36040825,
            0.08407341,
            .....
          ],
          "ml_text": "Shoes"
        },
        "_ingest": {
          "timestamp": "2025-03-18T09:22:40.359954328Z"
        }
      }
    }
  ]
}

Additional Details

Plugins
opensearch opensearch-alerting 2.19.1.0
opensearch opensearch-anomaly-detection 2.19.1.0
opensearch opensearch-asynchronous-search 2.19.1.0
opensearch opensearch-cross-cluster-replication 2.19.1.0
opensearch opensearch-custom-codecs 2.19.1.0
opensearch opensearch-flow-framework 2.19.1.0
opensearch opensearch-geospatial 2.19.1.0
opensearch opensearch-index-management 2.19.1.0
opensearch opensearch-job-scheduler 2.19.1.0
opensearch opensearch-knn 2.19.1.0
opensearch opensearch-ltr 2.19.1.0
opensearch opensearch-ml 2.19.1.0
opensearch opensearch-neural-search 2.19.1.0
opensearch opensearch-notifications 2.19.1.0
opensearch opensearch-notifications-core 2.19.1.0
opensearch opensearch-observability 2.19.1.0
opensearch opensearch-performance-analyzer 2.19.1.0
opensearch opensearch-reports-scheduler 2.19.1.0
opensearch opensearch-security 2.19.1.0
opensearch opensearch-security-analytics 2.19.1.0
opensearch opensearch-skills 2.19.1.0
opensearch opensearch-sql 2.19.1.0
opensearch opensearch-system-templates 2.19.1.0
opensearch query-insights 2.19.1.0

Screenshots
Image

Host/Environment (please complete the following information):

  • OS: [e.g. iOS] Docker image
  • Version 2.19.1
  • Build hash 2e4741fb45d1b150aaeeadf66d41445b23ff5982
  • Build date 2025-02-27T01:16:47.726162386Z

Additional context
Add any other context about the problem here.

@StApostol StApostol added bug Something isn't working untriaged labels Mar 19, 2025
@andrross
Copy link
Member

I believe https://github.com/opensearch-project/neural-search/ is where the text_embedding processor is defined.

@opensearch-project/admin Can this be transferred to https://github.com/opensearch-project/neural-search/ ?

@gaiksaya gaiksaya transferred this issue from opensearch-project/OpenSearch Mar 21, 2025
@heemin32
Copy link
Collaborator

The issue is resolved by #1204

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants