Skip to content

Request to Modify Code to Enable TEXT_SPLITTER_EMBEDDING_MODEL Customization through Configuration File #27

Open
@shawn-z11

Description

@shawn-z11

I am looking to create a Chinese RAG demo service using RetrievalAugmentedGeneration.

However, I encountered an issue where the default SentenceTransformersTokenTextSplitter model used in the RetrievalAugmentedGeneration/common/utils.py file is hardcoded as 'intfloat/e5-large-v2'. This model generates a significant number of [UNK] tokens when processing Chinese text.

I would like the ability to specify a specific model for the text splitter, similar to how the embedding model can be specified through the config.yaml file.

Thank you for your assistance and support.
image

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions