-
Notifications
You must be signed in to change notification settings - Fork 25.2k
ES|QL query log #124094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ES|QL query log #124094
Conversation
|
||
private static Map<String, Object> prepareMap(Result esqlResult, String query) { | ||
Map<String, Object> messageFields = new HashMap<>(); | ||
messageFields.put("elasticsearch.slowlog.message", esqlResult.executionInfo().clusterAliases()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit surprising, message contains cluster aliases, is it intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we need that at all.
SearchSlowLog has searchContext.indexShard().shardId()
in there, maybe we want something different
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm removing it for now, in case we'll reintroduce it
messageFields.put("elasticsearch.slowlog.planning.took_millis", esqlResult.executionInfo().planningTookTime().millis()); | ||
messageFields.put("elasticsearch.slowlog.search_type", "ESQL"); | ||
String source = escapeJson(query); | ||
messageFields.put("elasticsearch.slowlog.source", source); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think source is a bit ambiguous. Should we call it query here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I don't like it very much, query
would be much better.
I'm using source
only because it's what Search does, but I'm not sure we want to be consistent with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, source for _search is the actual request body, while in our case it's only the query string.
I'll rename it as "query" for now.
return; // TODO review, it happens in some tests, not sure if it's a thing also in prod | ||
} | ||
long tookInNanos = esqlResult.executionInfo().overallTook().nanos(); | ||
var queryWarnThreshold = clusterSettings.get(ESQL_SLOWLOG_THRESHOLD_QUERY_WARN_SETTING).nanos(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: org.elasticsearch.common.settings.Setting#get logic is a bit tricky.
We should copy the value to the field and watch for updates using org.elasticsearch.common.settings.AbstractScopedSettings#initializeAndWatch
final String node1, node2; | ||
if (randomBoolean()) { | ||
internalCluster().ensureAtLeastNumDataNodes(2); | ||
node1 = randomDataNode().getName(); | ||
node2 = randomValueOtherThan(node1, () -> randomDataNode().getName()); | ||
} else { | ||
node1 = randomDataNode().getName(); | ||
node2 = randomDataNode().getName(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Added some comments/questions
TimeValue.timeValueNanos(-1), | ||
TimeValue.timeValueMillis(-1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some reason for the default to be -1ns instead of -1ms? I guess both would work the same, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's the same. let me change it
if (execInfo.isCrossClusterSearch()) { | ||
execInfo.markEndPlanning(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a bug? Was it working when not CCS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really.
This method was introduced with CCS implementation, and planningTookTime
was only used in that contex.
Now we need it also for other purposes, so I'll set it also for non-CCS queries.
latch.countDown(); | ||
} | ||
})); | ||
latch.await(30, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we failing the test if the countdown isn't 0? I think await()
does nothing about it, right?
Not sure what will happen here exactly if the query fails.
I see other test doing an assert like:
assertEquals("All requests must respond, requests: " + requests, 0, latch.getCount());
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use safeAwait(latch);
instead. It should handle correctly the cases when operation times out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see other test doing an assert like:
One more assert won't hurt, I'm adding it.
|
||
private static final Logger queryLogger = LogManager.getLogger(SLOWLOG_PREFIX + ".query"); | ||
|
||
private static final ToXContent.Params FORMAT_PARAMS = new ToXContent.MapParams(Collections.singletonMap("pretty", "false")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we using this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @luigidellaquila, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments.
It would be really useful to see an example of what the esql slow logs looks like (I assume it will be part of the doc).
planTelemetryManager.publish(planTelemetry, true); | ||
slowLog.onQueryPhase(x, request.query()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To help with encapsulation, please create two internal methods called when the query is successfully executed or is failing and move the metrics, telemetry and now the slow log there (e.g. onQuerySuccesful(), onQueryFailure())
this(settings, null); | ||
} | ||
|
||
public void onQueryPhase(Result esqlResult, String query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method and onQueryFailure are very similar - same code should be reused across both of them.
jsonFields.put("elasticsearch.slowlog.error.message", exception.getMessage() == null ? "" : exception.getMessage()); | ||
jsonFields.put("elasticsearch.slowlog.error.type", exception.getClass().getName()); | ||
jsonFields.put("elasticsearch.slowlog.took", took); | ||
jsonFields.put("elasticsearch.slowlog.took_millis", took / 1_000_000); | ||
return new ESLogMessage().withFields(jsonFields); | ||
} | ||
|
||
private static Map<String, Object> prepareMap(Result esqlResult, String query, boolean success, User user) { | ||
Map<String, Object> messageFields = new HashMap<>(); | ||
if (user != null) { | ||
messageFields.put("user.name", user.principal()); | ||
} | ||
if (esqlResult != null) { | ||
messageFields.put("elasticsearch.slowlog.took", esqlResult.executionInfo().overallTook().nanos()); | ||
messageFields.put("elasticsearch.slowlog.took_millis", esqlResult.executionInfo().overallTook().millis()); | ||
messageFields.put("elasticsearch.slowlog.planning.took", esqlResult.executionInfo().planningTookTime().nanos()); | ||
messageFields.put("elasticsearch.slowlog.planning.took_millis", esqlResult.executionInfo().planningTookTime().millis()); | ||
} | ||
String source = escapeJson(query); | ||
messageFields.put("elasticsearch.slowlog.success", success); | ||
messageFields.put("elasticsearch.slowlog.search_type", "ESQL"); | ||
messageFields.put("elasticsearch.slowlog.query", source); | ||
return messageFields; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract constants out of these repetitive strings including during their construction, such as the common prefix ("elasticsearch.slowlog")
Also there's a repetition on took/took_millis, which seems to be overridden always in of method.
Thanks for the feedback folks! @costin I added two examples of slow log output to the PR description. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a comment on combining the SlowLogFieldProviders
.
@@ -48,6 +50,11 @@ public Map<String, String> searchFields() { | |||
} | |||
return Map.of(); | |||
} | |||
|
|||
@Override | |||
public Map<String, String> queryFields() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
return fields.stream() | ||
.flatMap(f -> f.indexFields().entrySet().stream()) | ||
.collect(Collectors.toMap(Map.Entry::getKey, Map.Entry::getValue)); | ||
SlowLogFieldProvider slowLogFieldProvider = new SlowLogFieldProvider() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of this is that it will merge all field providers provided by the plugins into one (in practice I think that's only the one in the security plugin). So this should ideally have an implementation for merging queryFields
too and then this instance should be passed to the PluginService
(instead of the list of providers).
If you don't want to implement the merge for indexFields
and searchFields
in the provider without index settings you could add an empty default implementation in the interface in the same way you did for queryFields
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing the suggested changes. LGTM pending CI and the comment I left above.
Thanks @jfreden! |
SlowLog -> QueryLog The intention is to make it more generic in the future, so that we can log queries not based on the execution time, but also based on other criteria. |
Docs PR here elastic/docs-content#816 |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Adding Query Log to ES|QL.
It can be configured at cluster level, using the following cluster settings (TimeValue):
By default, queries will be logged to
Example log in case of success:
Example log in case of failure:
TODO