Skip to content
This repository was archived by the owner on Nov 2, 2020. It is now read-only.
This repository was archived by the owner on Nov 2, 2020. It is now read-only.

Search for words including special characters does not produce expected result  #3

@kahlep

Description

@kahlep

Search for e.g. "wa§§er" does not match/highlight the results correctly.
Results include hits for "wa" and "er".

EDIT:
The tokenizer of Solr omits search words that are put in quotes when processing the query.
E.g. searching for "wa§§er" (with quotes) reduces the hits to 20 from 121 in collection 4048 (as users are working here, numbers may differ by now).
However, the hits also include strings like "wa= §§er".

While experimenting with the search feature for this issue I came across the following things:

  • Special character escaping: In TrpSearcher there is a searchText.trim().replaceAll(" ", "\ ") call which deals with single spaces. Older versions of Solr required escaping of characters, reserved by the query syntax (see https://lucene.apache.org/core/2_9_4/queryparsersyntax.html#Escaping%20Special%20Characters). Might this be needed here too, in order to make those chars searchable or is it already done elsewhere?
  • Although the quoted search term narrows results, the highlighting does not work as expected. This should be checked in postprocessing of solr result if there is possibly an issue with that.
  • Faulty pagination of results in TranskribusSwtGui: unclear if this is a bug within TranskribusSearch or in another component. Although there should be 10 (of 20) results on each page, when using the query from above the first page includes 8 hits and the second page 5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions