Skip to content

Commit 6d1e770

Browse files
committed
Add span queries
Signed-off-by: Fanit Kolchina <[email protected]>
1 parent 0cb3f7c commit 6d1e770

File tree

11 files changed

+1108
-24
lines changed

11 files changed

+1108
-24
lines changed

_query-dsl/span-query.md

Lines changed: 0 additions & 24 deletions
This file was deleted.

_query-dsl/span/index.md

Lines changed: 101 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,101 @@
1+
---
2+
layout: default
3+
title: Span queries
4+
has_children: true
5+
nav_order: 60
6+
redirect_from:
7+
- /opensearch/query-dsl/span-query/
8+
- /query-dsl/query-dsl/span-query/
9+
- /query-dsl/span-query/
10+
- /query-dsl/span/index/
11+
---
12+
13+
# Span queries
14+
15+
You can use span queries to perform precise positional searches. Span queries are low-level, specific queries that provide control over the order and proximity of specified query terms. They are primarily used to search legal documents and patents.
16+
17+
Span queries include the following query types:
18+
19+
- [**Span containing**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-containing/): Returns larger spans that contain smaller spans within them. Useful for finding specific terms or phrases within a broader context. The opposite of `span_within` query.
20+
21+
- [**Span field masking**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-field-masking/): Allows span queries to work across different fields by making one field appear as another. Particularly useful when the same text is indexed using different analyzers.
22+
23+
- [**Span first**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-first/): Matches terms or phrases that appear within a specified number of positions from the start of a field. Useful for finding content at the beginning of text.
24+
25+
- [**Span multi-term**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-multi-term/): Enables multi-term queries (like prefix, wildcard, or fuzzy) to work within span queries. Allows for more flexible matching patterns in span searches.
26+
27+
- [**Span near**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-near/): Finds terms or phrases that appear within a specified distance of each other. Can require matches to appear in a specific order and control how many words can appear between them.
28+
29+
- [**Span not**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-not/): Excludes matches that overlap with another span query. Useful for finding terms except when they appear in specific phrases or contexts.
30+
31+
- [**Span or**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-or/): Matches documents that satisfy any of the provided span queries. Combines multiple span patterns with OR logic.
32+
33+
- [**Span term**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-term/): The basic building block for span queries. Matches a single term while maintaining position information for use in other span queries.
34+
35+
- [**Span within**]({{site.url}}{{site.baseurl}}/query-dsl/span/span-within/): Returns smaller spans that are enclosed by larger spans. The opposite of `span_containing` query.
36+
37+
## Setup
38+
39+
To try the examples in this section, use the following steps to configure an example index.
40+
41+
### Step 1: Create an index
42+
43+
First, create an index for an e-commerce clothing website. The `description` field uses the default `standard` analyzer, while the `description.stemmed` subfield applies the `english` analyzer to enable stemming:
44+
45+
```json
46+
PUT /clothing
47+
{
48+
"mappings": {
49+
"properties": {
50+
"description": {
51+
"type": "text",
52+
"analyzer": "standard",
53+
"fields": {
54+
"stemmed": {
55+
"type": "text",
56+
"analyzer": "english"
57+
}
58+
}
59+
}
60+
}
61+
}
62+
}
63+
```
64+
{% include copy-curl.html %}
65+
66+
### Step 2: Index data
67+
68+
Index sample documents into the index:
69+
70+
```json
71+
POST /clothing/_doc/1
72+
{
73+
"description": "Long-sleeved dress shirt with a formal collar and button cuffs. "
74+
}
75+
76+
```
77+
{% include copy-curl.html %}
78+
79+
```json
80+
POST /clothing/_doc/2
81+
{
82+
"description": "Beautiful long dress in red silk, perfect for formal events."
83+
}
84+
```
85+
{% include copy-curl.html %}
86+
87+
```json
88+
POST /clothing/_doc/3
89+
{
90+
"description": "Short-sleeved shirt with a button-down collar, can be dressed up or down."
91+
}
92+
```
93+
{% include copy-curl.html %}
94+
95+
```json
96+
POST /clothing/_doc/4
97+
{
98+
"description": "A set of two midi silk shirt dresses with long sleeves in black. "
99+
}
100+
```
101+
{% include copy-curl.html %}

_query-dsl/span/span-containing.md

Lines changed: 112 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,112 @@
1+
---
2+
layout: default
3+
title: Span containing
4+
parent: Span queries
5+
grand_parent: Query DSL
6+
nav_order: 10
7+
---
8+
9+
# Span containing query
10+
11+
The `span_containing` query finds matches where a larger text pattern (like a phrase or a set of words) contains a smaller text pattern within its boundaries. Think of it as finding a word or phrase, but only when it appears within a specific larger context.
12+
13+
For example, you can use the `span_containing` query to perform the following searches:
14+
15+
- Find the word "quick" but only when it appears in sentences that talk about both foxes and behavior.
16+
- Ensure that certain terms appear within the context of other terms, not just anywhere in the document.
17+
- Search for specific words that appear within larger meaningful phrases.
18+
19+
## Example
20+
21+
To try the examples in this section, complete the [setup steps]({{site.url}}{{site.baseurl}}/query-dsl/span/#setup).
22+
{: .tip}
23+
24+
The following query searches for occurrences of the word "red" that appear within a larger span containing the words "silk" and "dress" (not necessarily in this order) within 5 words of each other:
25+
26+
```json
27+
GET /clothing/_search
28+
{
29+
"query": {
30+
"span_containing": {
31+
"little": {
32+
"span_term": {
33+
"description": "red"
34+
}
35+
},
36+
"big": {
37+
"span_near": {
38+
"clauses": [
39+
{
40+
"span_term": {
41+
"description": "silk"
42+
}
43+
},
44+
{
45+
"span_term": {
46+
"description": "dress"
47+
}
48+
}
49+
],
50+
"slop": 5,
51+
"in_order": false
52+
}
53+
}
54+
}
55+
}
56+
}
57+
```
58+
{% include copy-curl.html %}
59+
60+
The query matches document 1 because:
61+
62+
- It finds a span in which "silk" and "dress" appear within at most 5 words of each other ("...dress in red silk..."). The terms "silk" and "dress" are within 2 words of each other (there are 2 words between them).
63+
- Within this larger span, it finds the term "red".
64+
65+
<details markdown="block">
66+
<summary>
67+
Response
68+
</summary>
69+
{: .text-delta}
70+
71+
```json
72+
{
73+
"took": 4,
74+
"timed_out": false,
75+
"_shards": {
76+
"total": 1,
77+
"successful": 1,
78+
"skipped": 0,
79+
"failed": 0
80+
},
81+
"hits": {
82+
"total": {
83+
"value": 1,
84+
"relation": "eq"
85+
},
86+
"max_score": 1.1577396,
87+
"hits": [
88+
{
89+
"_index": "clothing",
90+
"_id": "2",
91+
"_score": 1.1577396,
92+
"_source": {
93+
"description": "Beautiful long dress in red silk, perfect for formal events."
94+
}
95+
}
96+
]
97+
}
98+
}
99+
```
100+
101+
</details>
102+
103+
Both `little` and `big` parameters can contain any type of span query, allowing for complex nested span queries when needed.
104+
105+
## Parameters
106+
107+
The following table lists all top-level parameters supported by `span_containing` queries. All parameters are required.
108+
109+
| Parameter | Data type | Description |
110+
|:-----------|:------|:-------------|
111+
| `little` | Object | The span query that must be contained within the `big` span. This defines the span you're looking for within a larger context. |
112+
| `big` | Object | The containing span query that defines the boundaries within which the `little` span must appear. This establishes the context for your search. |
Lines changed: 116 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,116 @@
1+
---
2+
layout: default
3+
title: Span field masking
4+
parent: Span queries
5+
grand_parent: Query DSL
6+
nav_order: 20
7+
---
8+
9+
# Span field masking query
10+
11+
The `field_masking_span` query allows span queries to match across different fields by "masking" the true field of a query. This is particularly useful when working with multi-fields (the same content indexed with different analyzers) or when you need to run span queries like `span_near` or `span_or` across different fields (which is normally not allowed).
12+
13+
For example, you can use the `field_masking_span` query to:
14+
- Match terms across a raw field and its stemmed version
15+
- Combine span queries on different fields in a single span operation
16+
- Work with the same content indexed using different analyzers
17+
18+
When using field masking, the relevance score is calculated using the characteristics (norms) of the masked field rather than the actual field being searched. This means that if the masked field has different properties (like length or boost values) than the field being searched, you might receive unexpected scoring results.
19+
{: .note}
20+
21+
## Example
22+
23+
To try the examples in this section, complete the [setup steps]({{site.url}}{{site.baseurl}}/query-dsl/span/#setup).
24+
{: .tip}
25+
26+
The following query searches for the word "long" near variations of the word "sleeve" in the stemmed field:
27+
28+
```json
29+
GET /clothing/_search
30+
{
31+
"query": {
32+
"span_near": {
33+
"clauses": [
34+
{
35+
"span_term": {
36+
"description": "long"
37+
}
38+
},
39+
{
40+
"field_masking_span": {
41+
"query": {
42+
"span_term": {
43+
"description.stemmed": "sleev"
44+
}
45+
},
46+
"field": "description"
47+
}
48+
}
49+
],
50+
"slop": 1,
51+
"in_order": true
52+
}
53+
}
54+
}
55+
56+
```
57+
{% include copy-curl.html %}
58+
59+
The query matches documents 1 and 4:
60+
- The term "long" appears in the `description` field in both documents.
61+
- Document 1 contains the word "sleeved", and document 4 contains the word "sleeves".
62+
- The `field_masking_span` makes the stemmed field match appear as if it were in the raw field.
63+
- The terms appear within 1 position of each other in the specified order ("long" must appear before "sleeve").
64+
65+
<details markdown="block">
66+
<summary>
67+
Response
68+
</summary>
69+
{: .text-delta}
70+
71+
```json
72+
{
73+
"took": 7,
74+
"timed_out": false,
75+
"_shards": {
76+
"total": 1,
77+
"successful": 1,
78+
"skipped": 0,
79+
"failed": 0
80+
},
81+
"hits": {
82+
"total": {
83+
"value": 2,
84+
"relation": "eq"
85+
},
86+
"max_score": 0.7444251,
87+
"hits": [
88+
{
89+
"_index": "clothing",
90+
"_id": "1",
91+
"_score": 0.7444251,
92+
"_source": {
93+
"description": "Long-sleeved dress shirt with a formal collar and button cuffs. "
94+
}
95+
},
96+
{
97+
"_index": "clothing",
98+
"_id": "4",
99+
"_score": 0.4291246,
100+
"_source": {
101+
"description": "A set of two midi silk shirt dresses with long fluttered sleeves in black. "
102+
}
103+
}
104+
]
105+
}
106+
}
107+
```
108+
109+
## Parameters
110+
111+
The following table lists all top-level parameters supported by `field_masking_span` queries. All parameters are required.
112+
113+
| Parameter | Data type | Description |
114+
|:----------|:-----|:------------|
115+
| `query` | Object | The span query to execute on the actual field. |
116+
| `field` | String | The field name to mask the query as. Other span queries will treat this query as if it were executing on this field. |

0 commit comments

Comments
 (0)