-
Notifications
You must be signed in to change notification settings - Fork 165
Add BERT Tokenizer as OpenSearch built-in analyzer #3719
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: zhichao-aws <[email protected]>
We can merge this PR after 3.0.0-beta1 get released |
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
Signed-off-by: zhichao-aws <[email protected]>
would you consider option 3 from the issue? #3708 (comment) |
Signed-off-by: zhichao-aws <[email protected]>
a626e03
to
dd84704
Compare
Signed-off-by: zhichao-aws <[email protected]>
LGTM! |
@zane-neo @xinyual @zhichao-aws we are already code freeze for any new feature PRs. This PR passed the code freeze deadline, would you work with release team to get an exception? or please revert this PR |
Checked with @peterzhuamazon . We can still push feature code to main, as long as not backport to 3.0 branch |
thanks for double checking. |
* bert analyzer Signed-off-by: zhichao-aws <[email protected]> * add license header Signed-off-by: zhichao-aws <[email protected]> * add rest test case Signed-off-by: zhichao-aws <[email protected]> * load from zip Signed-off-by: zhichao-aws <[email protected]> * address comments Signed-off-by: zhichao-aws <[email protected]> * retry for init Signed-off-by: zhichao-aws <[email protected]> --------- Signed-off-by: zhichao-aws <[email protected]> (cherry picked from commit 088c1a5)
…ct#3719) * bert analyzer Signed-off-by: zhichao-aws <[email protected]> * add license header Signed-off-by: zhichao-aws <[email protected]> * add rest test case Signed-off-by: zhichao-aws <[email protected]> * load from zip Signed-off-by: zhichao-aws <[email protected]> * address comments Signed-off-by: zhichao-aws <[email protected]> * retry for init Signed-off-by: zhichao-aws <[email protected]> --------- Signed-off-by: zhichao-aws <[email protected]> Signed-off-by: Abdul Muneer Kolarkunnu <[email protected]>
Description
This PR add bert-base-uncased tokenizer and bert-base-multilingual-uncased tokenizer as OpenSearch built-in analyzer/tokenizer. Users can use them via analyze API without doing any special settings:
We also have a follow up PR in neural-search to make neural-sparse search work with the analyzer.
Related Issues
Resolves #3708
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.