Skip to content

Commit baf0faa

Browse files
committed
result highlighting
1 parent 1143038 commit baf0faa

File tree

54 files changed

+2202
-1667
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+2202
-1667
lines changed

README.md

+76-11
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ All persistent variants are optimized for larger sized indexes under heavy workl
106106
- [basic-resolver](example/browser-legacy/basic-resolver)
107107
- [basic-worker](example/browser-legacy/basic-worker)
108108
- [document](example/browser-legacy/document)
109+
- [document-highlighting](example/browser-legacy/document-highlighting)
109110
- [document-persistent](example/browser-legacy/document-persistent)
110111
- [document-worker](example/browser-legacy/document-worker)
111112
- [language-pack](example/browser-legacy/language-pack)
@@ -117,6 +118,7 @@ All persistent variants are optimized for larger sized indexes under heavy workl
117118
- [basic-worker](example/browser-module/basic-worker)
118119
- [basic-worker-extern-config](example/browser-module/basic-worker-extern-config)
119120
- [document](example/browser-module/document)
121+
- [document-highlighting](example/browser-module/document-highlighting)
120122
- [document-persistent](example/browser-module/document-persistent)
121123
- [document-worker](example/browser-module/document-worker)
122124
- [document-worker-extern-config](example/browser-module/document-worker-extern-config)
@@ -1101,6 +1103,69 @@ const result = index.search("a short query", {
11011103
});
11021104
```
11031105

1106+
## Result Highlighting
1107+
1108+
Result highlighting could be just enabled when using Document-Index with enabled Data-Store. Also when you just want to add id-content-pairs you'll need to use a DocumentIndex for this feature (just define a simple document descriptor as shown below).
1109+
1110+
```js
1111+
// create the document index
1112+
const index = new Document({
1113+
document: {
1114+
store: true,
1115+
index: [{
1116+
field: "title",
1117+
tokenize: "forward",
1118+
encoder: Charset.LatinBalance
1119+
}]
1120+
}
1121+
});
1122+
1123+
// add data
1124+
index.add({
1125+
"id": 1,
1126+
"title": "Carmencita"
1127+
});
1128+
index.add({
1129+
"id": 2,
1130+
"title": "Le clown et ses chiens"
1131+
});
1132+
1133+
// perform a query
1134+
const result = index.search({
1135+
query: "karmen or clown or not found",
1136+
suggest: true,
1137+
// set enrich to true (required)
1138+
enrich: true,
1139+
// highlight template
1140+
// $1 is a placeholder for the matched partial
1141+
highlight: "<b>$1</b>"
1142+
});
1143+
```
1144+
1145+
The result will look like:
1146+
1147+
```js
1148+
[{
1149+
"field": "title",
1150+
"result": [{
1151+
"id": 1,
1152+
"doc": {
1153+
"id": 1,
1154+
"title": "Carmencita"
1155+
},
1156+
"highlight": "<b>Carmen</b>cita"
1157+
},{
1158+
"id": 2,
1159+
"doc": {
1160+
"id": 2,
1161+
"title": "Le clown et ses chiens"
1162+
},
1163+
"highlight": "Le <b>clown</b> et ses chiens"
1164+
}
1165+
]
1166+
}]
1167+
```
1168+
11041169
## Big In-Memory Keystores
11051170

11061171
The default maximum keystore limit for the In-Memory index is 2^24 of distinct terms/partials being stored (so-called "cardinality"). An additional register could be enabled and is dividing the index into self-balanced partitions.
@@ -1560,7 +1625,7 @@ You can overcome this issue by passing the filepath to the worker file like `wor
15601625
Fuzzysearch describes a basic concept of how making queries more tolerant. FlexSearch provides several methods to achieve fuzziness:
15611626

15621627
1. Use a tokenizer: `forward`, `reverse` or `full`
1563-
2. Don't forget to use any of the builtin encoder `simple` > `balanced` > `advanced` > `extra` > `soundex` (sorted by fuzziness)
1628+
2. Don't forget to use any of the builtin encoder `simple` > `balance` > `advanced` > `extra` > `soundex` (sorted by fuzziness)
15641629
3. Use one of the language specific presets e.g. `/lang/en.js` for en-US specific content
15651630
4. Enable suggestions by passing the search option `suggest: true`
15661631

@@ -1573,13 +1638,13 @@ Original term which was indexed: "Struldbrugs"
15731638
<table>
15741639
<tr>
15751640
<th align="left">Encoder:</th>
1576-
<th><code>exact</code></th>
1577-
<th><code>default</code></th>
1578-
<th><code>simple</code></th>
1579-
<th><code>balance</code></th>
1580-
<th><code>advanced</code></th>
1581-
<th><code>extra</code></th>
1582-
<th><code>soundex</code></th>
1641+
<th><code>LatinExact</code></th>
1642+
<th><code>LatinDefault</code></th>
1643+
<th><code>LatinSimple</code></th>
1644+
<th><code>LatinBalance</code></th>
1645+
<th><code>LatinAdvanced</code></th>
1646+
<th><code>LatinExtra</code></th>
1647+
<th><code>LatinSoundex</code></th>
15831648
</tr>
15841649
<tr>
15851650
<th align="left">Index Size</th>
@@ -1694,7 +1759,7 @@ If you get some good results please feel free to share your encoder.
16941759

16951760
> This is an experimental feature with limited support which probably might drop in future release. You're welcome to give some feedback.
16961761
1697-
When using Server-Side-Rendering you can create a different export which instantly boot up. Especially when using Server-side rendered content, this could help to restore a __static__ index on page load. Document-Indexes aren't supported yet for this method.
1762+
When using Server-Side-Rendering you can create a different export which instantly boot up. Especially when using Server-side rendered content, this could help to restore a __<u>static</u>__ index on page load. Document-Indexes aren't supported yet for this method.
16981763

16991764
> When your index is too large you should use the default export/import mechanism.
17001765
@@ -1720,7 +1785,7 @@ function inject(index){
17201785
}
17211786
```
17221787

1723-
You could store this function by e.g. `fs.writeFileSync("inject.js", fn_string);` or place it as string in your markup.
1788+
You can save this function by e.g. `fs.writeFileSync("inject.js", fn_string);` or place it as string in your SSR-generated markup.
17241789

17251790
After creating the index on client side just call the inject method like:
17261791

@@ -2201,7 +2266,7 @@ The custom build will be saved to `dist/flexsearch.custom.xxxx.min.js` or when f
22012266

22022267
### Misc
22032268

2204-
A formula to determine a well balanced value for the `resolution` is: $2*floor(\sqrt{content.length})$ where content is the value pushed by `index.add()`. Here the maximum length of all contents should be used.
2269+
A formula to determine a well-balanced value for the `resolution` is: $2*floor(\sqrt{content.length})$ where content is the value pushed by `index.add()`. Here the maximum length of all contents should be used.
22052270

22062271
## Migration
22072272

dist/db/indexeddb/index.cjs

+60-14
Original file line numberDiff line numberDiff line change
@@ -2331,6 +2331,11 @@ function intersect$1(arrays, resolution, limit, offset, suggest, boost, resolve)
23312331

23322332
id = ids[z];
23332333

2334+
// todo the persistent implementation will count term matches
2335+
// and also aggregate the score (group by id)
2336+
// min(score): suggestions off (already covered)
2337+
// sum(score): suggestions on (actually not covered)
2338+
23342339
if((count = check[id])){
23352340
check[id]++;
23362341
// tmp.count++;
@@ -2399,17 +2404,21 @@ function intersect$1(arrays, resolution, limit, offset, suggest, boost, resolve)
23992404
break;
24002405
}
24012406
}
2402-
return final.length > 1
2407+
result = final.length > 1
24032408
? concat(final)
24042409
: final[0];
24052410
}
2411+
2412+
return result;
24062413
}
24072414
}
24082415
else {
24092416

24102417
result = result.length > 1
24112418
? union$1(result, offset, limit, resolve, 0)
2412-
: result[0];
2419+
: ((result = result[0]).length > limit) || offset
2420+
? result.slice(offset, limit + offset)
2421+
: result;
24132422
}
24142423
}
24152424

@@ -3123,6 +3132,7 @@ Document.prototype.search = function(query, limit, options, _promises){
31233132
continue;
31243133
}
31253134
else {
3135+
31263136
res = index.search(query, limit, opt);
31273137
// restore enrich state
31283138
opt && enrich && (opt.enrich = enrich);
@@ -3306,8 +3316,9 @@ Document.prototype.search = function(query, limit, options, _promises){
33063316

33073317
/*
33083318
3309-
some matching term
3310-
3319+
karmen or clown or not found
3320+
[Carmen]cita
3321+
Le [clown] et ses chiens
33113322
33123323
*/
33133324

@@ -3318,31 +3329,66 @@ function highlight_fields(result, query, index, field, tree, template, limit, of
33183329
// }
33193330

33203331
let encoder;
3332+
let query_enc;
3333+
let tokenize;
33213334

3322-
for(let i = 0, res, field, enc, path; i < result.length; i++){
3335+
for(let i = 0, res, res_field, enc, idx, path; i < result.length; i++){
33233336

33243337
res = result[i].result;
3325-
field = result[i].field;
3326-
enc = index.get(field).encoder;
3327-
path = tree[field.indexOf(field)];
3338+
res_field = result[i].field;
3339+
idx = index.get(res_field);
3340+
enc = idx.encoder;
3341+
tokenize = idx.tokenize;
3342+
path = tree[field.indexOf(res_field)];
33283343

33293344
if(enc !== encoder){
33303345
encoder = enc;
3331-
encoder.encode(query);
3346+
query_enc = encoder.encode(query);
33323347
}
33333348

33343349
for(let j = 0; j < res.length; j++){
33353350
let str = "";
33363351
let content = parse_simple(res[j].doc, path);
3352+
let doc_enc = encoder.encode(content);
3353+
let doc_org = content.split(encoder.split);
3354+
3355+
for(let k = 0, doc_enc_cur, doc_org_cur; k < doc_enc.length; k++){
3356+
doc_enc_cur = doc_enc[k];
3357+
doc_org_cur = doc_org[k];
3358+
let found;
3359+
for(let l = 0, query_enc_cur; l < query_enc.length; l++){
3360+
query_enc_cur = query_enc[l];
3361+
// todo tokenize could be custom also when "strict" was used
3362+
if(tokenize === "strict"){
3363+
if(doc_enc_cur === query_enc_cur){
3364+
str += (str ? " " : "") + template.replace("$1", doc_org_cur);
3365+
found = true;
3366+
break;
3367+
}
3368+
}
3369+
else {
3370+
const position = doc_enc_cur.indexOf(query_enc_cur);
3371+
if(position > -1){
3372+
str += (str ? " " : "") +
3373+
// prefix
3374+
doc_org_cur.substring(0, position) +
3375+
// match
3376+
template.replace("$1", doc_org_cur.substring(position, query_enc_cur.length)) +
3377+
// suffix
3378+
doc_org_cur.substring(position + query_enc_cur.length);
3379+
found = true;
3380+
break;
3381+
}
3382+
}
33373383

3338-
let split = encoder.encode(content);
3384+
//str += doc_enc[k].replace(new RegExp("(" + doc_enc[k] + ")", "g"), template.replace("$1", content))
3385+
}
33393386

3340-
for(let k = 0; k < split.length; k++){
3341-
str += split[k].replace(new RegExp("(" + split[k] + ")", "g"), template.replace("$1", content));
3387+
if(!found){
3388+
str += (str ? " " : "") + doc_org[k];
3389+
}
33423390
}
33433391

3344-
console.log(result,index, template);
3345-
33463392
res[j].highlight = str;
33473393
}
33483394
}

0 commit comments

Comments
 (0)