Skip to content

Commit ecb2592

Browse files
committed
Introduce SearchResult, SearchResults, Score and Similarity.
Closes: #3285
1 parent 0032561 commit ecb2592

25 files changed

+1445
-41
lines changed

src/main/antora/modules/ROOT/nav.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
** xref:repositories/query-methods.adoc[]
88
** xref:repositories/definition.adoc[]
99
** xref:repositories/query-methods-details.adoc[]
10+
** xref:repositories/vector-search.adoc[]
1011
** xref:repositories/create-instances.adoc[]
1112
** xref:repositories/custom-implementations.adoc[]
1213
** xref:repositories/core-domain-events.adoc[]
Lines changed: 167 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,167 @@
1+
[[vector-search]]
2+
= Vector Search
3+
4+
With the rise of Generative AI, Vector databases have gained strong traction in the world of databases.
5+
These databases enable efficient storage and querying of high-dimensional vectors, making them well-suited for tasks such as semantic search, recommendation systems, and natural language understanding.
6+
7+
Vector search is a technique that retrieves semantically similar data by comparing vector representations (also known as embeddings) rather than relying on traditional exact-match queries.
8+
This approach enables intelligent, context-aware applications that go beyond keyword-based retrieval.
9+
10+
In the context of Spring Data, vector search opens new possibilities for building intelligent, context-aware applications, particularly in domains like natural language processing, recommendation systems, and generative AI.
11+
By modelling vector-based querying using familiar repository abstractions, Spring Data allows developers to seamlessly integrate similarity-based vector-capable databases with the simplicity and consistency of the Spring Data programming model.
12+
13+
ifdef::vector-search-intro-include[]
14+
include::{vector-search-intro-include}[]
15+
endif::[]
16+
17+
[[vector-search.model]]
18+
== Vector Model
19+
20+
To support vector search in a type-safe and idiomatic way, Spring Data introduces the following core abstractions:
21+
22+
* <<vector-search.model.vector,`Vector`>>
23+
* <<vector-search.model.search-result,`SearchResults<T>` and `SearchResult<T>`>>
24+
* <<vector-search.model.scoring,`Score`, `Similarity` and Scoring Functions>>
25+
26+
[[vector-search.model.vector]]
27+
=== `Vector`
28+
29+
The `Vector` type represents an n-dimensional numerical embedding, typically produced by embedding models.
30+
In Spring Data, it is defined as a lightweight wrapper around an array of floating-point numbers, ensuring immutability and consistency.
31+
This type can be used as an input for search queries or as a property on a domain entity to store the associated vector representation.
32+
33+
====
34+
[source,java]
35+
----
36+
Vector vector = Vector.of(0.23f, 0.11f, 0.77f);
37+
----
38+
====
39+
40+
Using `Vector` in your domain model removes the need to work with raw arrays or lists of numbers, providing a more type-safe and expressive way to handle vector data.
41+
This abstraction also allows for easy integration with various vector databases and libraries.
42+
It also allows for implementing vendor-specific optimizations such as binary or quantized vectors that do not map to a standard floating point (`float` and `double` as of https://en.wikipedia.org/wiki/IEEE_754[IEEE 754]) representation.
43+
A domain object can have a vector property, which can be used for similarity searches.
44+
Consider the following example:
45+
46+
ifdef::vector-search-model-include[]
47+
include::{vector-search-model-include}[]
48+
endif::[]
49+
50+
NOTE: Associating a vector with a domain object results in the vector being loaded and stored as part of the entity lifecycle, which may introduce additional overhead on retrieval and persistence operations.
51+
52+
[[vector-search.model.search-result]]
53+
=== Search Results
54+
55+
The `SearchResult<T>` type encapsulates the results of a vector similarity query.
56+
It includes both the matched domain object and a relevance score that indicates how closely it matches the query vector.
57+
This abstraction provides a structured way to handle result ranking and enables developers to easily work with both the data and its contextual relevance.
58+
59+
ifdef::vector-search-repository-include[]
60+
include::{vector-search-repository-include}[]
61+
endif::[]
62+
63+
In this example, the `searchByCountryAndEmbeddingNear` method returns a `SearchResults<Comment>` object, which contains a list of `SearchResult<Comment>` instances.
64+
Each result includes the matched `Comment` entity and its relevance score.
65+
66+
Relevance score is a numerical value that indicates how closely the matched vector aligns with the query vector.
67+
Depending on whether a score represents distance or similarity a higher score can mean a closer match or a more distant one.
68+
69+
The scoring function used to calculate this score can vary based on the underlying database, index or input parameters.
70+
71+
[[vector-search.model.scoring]]
72+
=== Score, Similarity, and Scoring Functions
73+
74+
The `Score` type holds a numerical value indicating the relevance of a search result.
75+
It can be used to rank results based on their similarity to the query vector.
76+
The `Score` type is typically a floating-point number, and its interpretation (higher is better or lower is better) depends on the specific similarity function used.
77+
Scores are a by-product of vector search and are not required for a successful search operation.
78+
Score values are not part of a domain model and therefore represented best as out-of-band data.
79+
80+
Generally, a Score is computed by a `ScoringFunction`.
81+
The actual scoring function used to calculate this score can depends on the underlying database and can be obtained from a search index or input parameters.
82+
83+
Spring Data supports declares constants for commonly used functions such as:
84+
85+
Euclidean distance:: Calculates the straight-line distance in n-dimensional space involving the square root of the sum of squared differences.
86+
Cosine similarity:: Measures the angle between two vectors by calculating the Dot product first and then normalizing its result by dividing by the product of their lengths.
87+
Dot product:: Computes the sum of element-wise multiplications.
88+
89+
The choice of similarity function can impact both the performance and semantics of the search and is often determined by the underlying database or index being used.
90+
Spring Data adopts to the database's native scoring function capabilities and whether the score can be used to limit results.
91+
92+
ifdef::vector-search-scoring-include[]
93+
include::{vector-search-scoring-include}[]
94+
endif::[]
95+
96+
[[vector-search.methods]]
97+
== Vector Search Methods
98+
99+
Vector search methods are defined in repositories using the same conventions as standard Spring Data query methods.
100+
These methods return `SearchResults<T>` and require a `Vector` parameter to define the query vector.
101+
The actual implementation depends on the actual internals of the underlying data store and its capabilities around vector search.
102+
103+
NOTE: If you are new to Spring Data repositories, make sure to familiarize yourself with the xref:repositories/core-concepts.adoc[basics of repository definitions and query methods].
104+
105+
Generally, you have the choice of declaring a search method using two approaches:
106+
107+
* Query Derivation
108+
* Declaring a String-based Query
109+
110+
Generally, Vector Search methods must declare a `Vector` parameter to define the query vector.
111+
112+
[[vector-search.method.derivation]]
113+
=== Derived Search Methods
114+
115+
A derived search method uses the name of the method to derive the query.
116+
Vector Search supports the following keywords to run a Vector search when declaring a search method:
117+
118+
.Query predicate keywords
119+
[options="header",cols="1,3"]
120+
|===============
121+
|Logical keyword|Keyword expressions
122+
|`NEAR`|`Near`, `IsNear`
123+
|`WITHIN`|`Within`, `IsWithin`
124+
|===============
125+
126+
ifdef::vector-search-method-derived-include[]
127+
include::{vector-search-method-derived-include}[]
128+
endif::[]
129+
130+
Derived search methods are typically easier to read and maintain, as they rely on the method name to express the query intent.
131+
However, a derived search method requires either to declare a `Score`, `Range<Score>` or `ScoreFunction` as second argument to the `Near`/`Within` keyword to limit search results by their score.
132+
133+
[[vector-search.method.string]]
134+
=== Annotated Search Methods
135+
136+
Annotated methods provide full control over the query semantics and parameters.
137+
Unlike derived methods, they do not rely on method name conventions.
138+
139+
ifdef::vector-search-method-annotated-include[]
140+
include::{vector-search-method-annotated-include}[]
141+
endif::[]
142+
143+
With more control over the actual query, Spring Data can make fewer assumptions about the query and its parameters.
144+
For example, `Similarity` normalization uses the native score function within the query to normalize the given similarity into a score predicate value and vice versa.
145+
If an annotated query doesn't define e.g. the score, then the score value in the returned `SearchResult<T>` will be zero.
146+
147+
[[vector-search.method.sorting]]
148+
=== Sorting
149+
150+
By default, search results are ordered according to their score.
151+
You can override sorting by using the `Sort` parameter:
152+
153+
.Using `Sort` in Repository Search Methods
154+
====
155+
[source,java]
156+
----
157+
interface CommentRepository extends Repository<Comment, String> {
158+
159+
SearchResults<Comment> searchByEmbeddingNearOrderByCountry(Vector vector, Score score);
160+
161+
SearchResults<Comment> searchByEmbeddingWithin(Vector vector, Score score, Sort sort);
162+
}
163+
----
164+
====
165+
166+
Please note that custom sorting does not allow expressing the score as a sorting criteria.
167+
You can only refer to domain properties.

src/main/java/org/springframework/data/domain/Page.java

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,4 +69,5 @@ static <T> Page<T> empty(Pageable pageable) {
6969
*/
7070
@Override
7171
<U> Page<U> map(Function<? super T, ? extends U> converter);
72+
7273
}

src/main/java/org/springframework/data/domain/Range.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,7 @@ public boolean contains(T value, Comparator<T> comparator) {
223223
/**
224224
* Apply a mapping {@link Function} to the lower and upper boundary values.
225225
*
226-
* @param mapper must not be {@literal null}. If the mapper returns {@code null}, then the corresponding boundary
226+
* @param mapper must not be {@literal null}. If the mapper returns {@literal null}, then the corresponding boundary
227227
* value represents an {@link Bound#unbounded()} boundary.
228228
* @return a new {@link Range} after applying the value to the mapper.
229229
* @param <R> target type of the mapping function.
@@ -430,7 +430,7 @@ public boolean isInclusive() {
430430
/**
431431
* Apply a mapping {@link Function} to the boundary value.
432432
*
433-
* @param mapper must not be {@literal null}. If the mapper returns {@code null}, then the boundary value
433+
* @param mapper must not be {@literal null}. If the mapper returns {@literal null}, then the boundary value
434434
* corresponds with {@link Bound#unbounded()}.
435435
* @return a new {@link Bound} after applying the value to the mapper.
436436
* @param <R>
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
/*
2+
* Copyright 2025 the original author or authors.
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* https://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package org.springframework.data.domain;
17+
18+
import java.io.Serializable;
19+
20+
import org.springframework.util.ObjectUtils;
21+
22+
/**
23+
* Value object representing a search result score computed via a {@link ScoringFunction}.
24+
* <p>
25+
* Encapsulates the numeric score and the scoring function used to derive it. Scores are primarily used to rank search
26+
* results. Depending on the used {@link ScoringFunction} higher scores can indicate either a higher distance or a
27+
* higher similarity. Use the {@link Similarity} class to indicate usage of a normalized score across representing
28+
* effectively the similarity.
29+
* <p>
30+
* Instances of this class are immutable and suitable for use in comparison, sorting, and range operations.
31+
*
32+
* @author Mark Paluch
33+
* @since 4.0
34+
* @see Similarity
35+
*/
36+
public sealed class Score implements Serializable permits Similarity {
37+
38+
private final double value;
39+
private final ScoringFunction function;
40+
41+
Score(double value, ScoringFunction function) {
42+
this.value = value;
43+
this.function = function;
44+
}
45+
46+
/**
47+
* Creates a new {@link Score} from a plain {@code score} value using {@link ScoringFunction#unspecified()}.
48+
*
49+
* @param score the score value without a specific {@link ScoringFunction}.
50+
* @return the new {@link Score}.
51+
*/
52+
public static Score of(double score) {
53+
return of(score, ScoringFunction.unspecified());
54+
}
55+
56+
/**
57+
* Creates a new {@link Score} from a {@code score} value using the given {@link ScoringFunction}.
58+
*
59+
* @param score the score value.
60+
* @param function the scoring function that has computed the {@code score}.
61+
* @return the new {@link Score}.
62+
*/
63+
public static Score of(double score, ScoringFunction function) {
64+
return new Score(score, function);
65+
}
66+
67+
/**
68+
* Creates a {@link Range} from the given minimum and maximum {@code Score} values.
69+
*
70+
* @param min the lower score value, must not be {@literal null}.
71+
* @param max the upper score value, must not be {@literal null}.
72+
* @return a {@link Range} over {@link Score} bounds.
73+
*/
74+
public static Range<Score> between(Score min, Score max) {
75+
return Range.from(Range.Bound.inclusive(min)).to(Range.Bound.inclusive(max));
76+
}
77+
78+
/**
79+
* Returns the raw numeric value of the score.
80+
*
81+
* @return the score value.
82+
*/
83+
public double getValue() {
84+
return value;
85+
}
86+
87+
/**
88+
* Returns the {@link ScoringFunction} that was used to compute this score.
89+
*
90+
* @return the associated scoring function.
91+
*/
92+
public ScoringFunction getFunction() {
93+
return function;
94+
}
95+
96+
@Override
97+
public boolean equals(Object o) {
98+
if (!(o instanceof Score other)) {
99+
return false;
100+
}
101+
if (value != other.value) {
102+
return false;
103+
}
104+
return ObjectUtils.nullSafeEquals(function, other.function);
105+
}
106+
107+
@Override
108+
public int hashCode() {
109+
return ObjectUtils.nullSafeHash(value, function);
110+
}
111+
112+
@Override
113+
public String toString() {
114+
return function instanceof UnspecifiedScoringFunction ? Double.toString(value)
115+
: "%s (%s)".formatted(Double.toString(value), function.getName());
116+
}
117+
118+
}

0 commit comments

Comments
 (0)