-
Notifications
You must be signed in to change notification settings - Fork 25.2k
Add a simple 'fetch fields' phase. #55639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a simple 'fetch fields' phase. #55639
Conversation
10d498d
to
358ade3
Compare
search: | ||
index: test | ||
body: | ||
fields: [keyword, integer_range] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll no longer use a flat list of fields once we introduce support for formatting the values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just have this temporarily because it is simpler to parse?
import static org.hamcrest.Matchers.equalTo; | ||
import static org.hamcrest.Matchers.hasItems; | ||
|
||
public class FetchFieldsPhaseTests extends ESSingleNodeTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried hard to write proper unit tests (in the style of FetchSourcePhaseTests
), but it's currently challenging to stub out the required pieces like source lookup. I think some broader refactors are necessary before it's possible to unit test these classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you made something like:
public void hitsExecute(Function<SearchHit, Map<String, Object>> sourceLookup, SearchHit[] hits)
and unit tested that? Then and the only thing missing a unit test would be the map from search context to "simple" classes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had been hesitant to do this, because I thought it would make FetchFieldsPhase
harder to read... but I actually think it turned out pretty cleanly. I don't think I followed your suggestion exactly though, curious to hear what you think.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
358ade3
to
facc1cc
Compare
Pinging @elastic/es-search (:Search/Search) |
👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great start! I left some "for later" stuff and some "maybe now" stuff.
search: | ||
index: test | ||
body: | ||
fields: [keyword, integer_range] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just have this temporarily because it is simpler to parse?
// all. | ||
// This call doesn't really need to support writing any kind of object, since the values | ||
// here are always serializable to xContent. Each value could be a leaf types like a string, | ||
// number, or boolean, a list of such values, or a map of such values with string keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! Except it won't be for stuff like geo_points
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do wonder if it'd be nicer to have the "type" of the field available so we don't need all the nasty reflection in value
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that is a problem for another time, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is a really unfortunate part of the current PR. It's on the top of my mind in terms of what needs figuring out for a follow-up.
@@ -244,6 +247,12 @@ public SearchSourceBuilder(StreamInput in) throws IOException { | |||
sliceBuilder = in.readOptionalWriteable(SliceBuilder::new); | |||
collapse = in.readOptionalWriteable(CollapseBuilder::new); | |||
trackTotalHitsUpTo = in.readOptionalInt(); | |||
|
|||
if (in.getVersion().onOrAfter(Version.V_8_0_0)) { | |||
if (in.readBoolean()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
readOptionalStringList
?
@@ -298,6 +307,13 @@ public void writeTo(StreamOutput out) throws IOException { | |||
out.writeOptionalWriteable(sliceBuilder); | |||
out.writeOptionalWriteable(collapse); | |||
out.writeOptionalInt(trackTotalHitsUpTo); | |||
|
|||
if (out.getVersion().onOrAfter(Version.V_8_0_0)) { | |||
out.writeBoolean(fetchFields != null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
writeOptionalStringCollection
?
@@ -1120,6 +1154,11 @@ public void parseXContent(XContentParser parser, boolean checkTrailingTokens) th | |||
while ((token = parser.nextToken()) != XContentParser.Token.END_ARRAY) { | |||
docValueFields.add(FieldAndFormat.fromXContent(parser)); | |||
} | |||
} else if (FETCH_FIELDS_FIELD.match(currentFieldName, parser.getDeprecationHandler())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder when it'll be time to convert this to ObjectParser
.....
Object value = entry.getValue(); | ||
List<Object> values = value instanceof List | ||
? (List<Object>) value | ||
: List.of(value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see wanting to switch this to something a little less instanceof
soon. But this'll get the job done for now!
@@ -201,6 +202,10 @@ public InnerHitsContext innerHits() { | |||
|
|||
public abstract SearchContext docValuesContext(FetchDocValuesContext docValuesContext); | |||
|
|||
public abstract FetchFieldsContext fetchFieldsContext(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have javadoc for this.
/** | ||
* For each of the provided paths, return its value in the source. Note that | ||
*/ | ||
public Map<String, Object> extractValues(Collection<String> paths) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this'd be better kept in the new fetch phase, especially if we're going to migrate away from it.
I wonder if a better interface for collecting the values is Map<String, Consumer<?>>
or something shaped like that. I feel like it'd let you build flexible casting and formatting pretty naturally. But that can wait too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I'll move this up into FetchFieldsPhase
. I'll wait to refine the interface until I have a better handle on the approach to value formatting.
@@ -188,7 +188,7 @@ public void testFromXContentLenientParsing() throws IOException { | |||
XContentType xContentType = randomFrom(XContentType.values()); | |||
SearchHit searchHit = createTestItem(xContentType, true, true); | |||
BytesReference originalBytes = toXContent(searchHit, xContentType, true); | |||
Predicate<String> pathsToExclude = path -> (path.endsWith("highlight") || path.endsWith("fields") || path.contains("_source") | |||
Predicate<String> pathsToExclude = path -> (path.endsWith("highlight") || path.contains("fields") || path.contains("_source") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
This on looks like it took a while to find!
import static org.hamcrest.Matchers.equalTo; | ||
import static org.hamcrest.Matchers.hasItems; | ||
|
||
public class FetchFieldsPhaseTests extends ESSingleNodeTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if you made something like:
public void hitsExecute(Function<SearchHit, Map<String, Object>> sourceLookup, SearchHit[] hits)
and unit tested that? Then and the only thing missing a unit test would be the map from search context to "simple" classes?
And none of it really blocks merging, especially to a feature branch. But I do think now would be a simpler time to talk about the unit test and where to put the source extraction logic. |
Thanks @nik9000 for the speedy review, it's ready for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
/** | ||
* The context needed to retrieve fields. | ||
*/ | ||
public class FetchFieldsContext { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough!
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. There are several aspects that need improvement -- this PR just lays out the initial class structure and tests.
Note: the PR is opened against the
field-retrieval
feature branch.Currently the phase just looks up each field name in the _source and returns its values in the 'fields' section of the response. This PR just lays out the initial class structure and tests -- much of the interesting functionality will come in subsequent PRs.
Known issues that will be addressed in future commits:
SourceLookup
method they have planned to efficiently load particular values: Partially parsesource
documents to speed upsource
access #52591. I plan to make use of this when it's available, but for now added a placeholder methodextractValues
.Relates to #55363.