Skip to content

Add DELETE task API for stored task results#21727

Open
cwperks wants to merge 1 commit into
opensearch-project:mainfrom
cwperks:cleanup-task
Open

Add DELETE task API for stored task results#21727
cwperks wants to merge 1 commit into
opensearch-project:mainfrom
cwperks:cleanup-task

Conversation

@cwperks
Copy link
Copy Markdown
Member

@cwperks cwperks commented May 19, 2026

Description

Adds Layer 1 of #21724 with a dedicated DELETE /_tasks/{task_id} API for deleting stored completed task results.

The new delete API removes a completed task result from the .tasks index and returns an acknowledged response. Running tasks are not cancelled or deleted by this API; callers should continue to use the cancel tasks API for running tasks.

The change wires the new action through transport, REST routing, REST API spec, the Java transport client, and the high-level REST client.

API Behavior

Delete a stored completed task result:

curl -XDELETE "http://localhost:9200/_tasks/node-1:12345"

Successful response:

{
  "acknowledged": true
}

After the stored result is deleted, a follow-up GET /_tasks/node-1:12345 returns 404 because the task is no longer running and no stored task result remains.

This API intentionally does not cancel or remove running tasks. If the task is still running on the node that owns it, OpenSearch returns a conflict response:

{
  "error": {
    "type": "opensearch_status_exception",
    "reason": "task [node-1:12345] is still running and cannot be deleted; use the cancel tasks API to cancel running tasks"
  },
  "status": 409
}

If the task id does not correspond to a running task or a stored result in .tasks, OpenSearch returns 404:

{
  "error": {
    "type": "resource_not_found_exception",
    "reason": "task [node-1:12345] isn't running and hasn't stored its results"
  },
  "status": 404
}

If a task id belongs to a node that is no longer part of the cluster, the delete API still attempts to remove a stored result from .tasks. If no stored result exists, OpenSearch returns 404 with context that the owning node is no longer in the cluster.

If a stored task result is found but is not marked completed, OpenSearch returns 409 and leaves the stored document in place.

Related Issues

Resolves #21724

Check List

  • New functionality includes testing.
  • All tests pass.
  • New functionality has been documented.

Testing

  • ./gradlew :server:test --tests org.opensearch.action.admin.cluster.node.tasks.delete.DeleteTaskRequestTests :client:rest-high-level:test --tests org.opensearch.client.TasksRequestConvertersTests
  • ./gradlew :server:internalClusterTest --tests org.opensearch.action.admin.cluster.node.tasks.TasksIT.testTaskStoringSuccessfulResult --tests org.opensearch.action.admin.cluster.node.tasks.TasksIT.testTaskStoringFailureResult --tests org.opensearch.action.admin.cluster.node.tasks.TasksIT.testNodeNotFoundButTaskFound
  • ./gradlew :server:precommit :client:rest-high-level:precommit :rest-api-spec:validateRestSpec :rest-api-spec:validateNoKeywords
Manual REST verification against a local 3.7.0-SNAPSHOT node

With OpenSearch running at http://localhost:9200:

Start from a clean slate so the test indices do not inherit state from a previous run.

curl -s -XDELETE 'http://localhost:9200/movies_source,movies_archive?ignore_unavailable=true'

Create a small source index with no replicas so the test is deterministic on a single local node.

curl -s -XPUT 'http://localhost:9200/movies_source' \
  -H 'Content-Type: application/json' \
  -d '{"settings":{"number_of_shards":1,"number_of_replicas":0}}'

Index a couple of documents and refresh immediately so the async reindex task has known source content.

curl -s -XPOST 'http://localhost:9200/movies_source/_bulk?refresh=true' \
  -H 'Content-Type: application/x-ndjson' \
  -d '{"index":{"_id":"1"}}
{"title":"The Matrix","year":1999}
{"index":{"_id":"2"}}
{"title":"Spirited Away","year":2001}
'

Start _reindex asynchronously. This returns a task id and causes OpenSearch to persist the completed task result in .tasks.

TASK_ID=$(curl -s -XPOST 'http://localhost:9200/_reindex?wait_for_completion=false' \
  -H 'Content-Type: application/json' \
  -d '{"source":{"index":"movies_source"},"dest":{"index":"movies_archive"}}' \
  | jq -r '.task')

Fetch the task with wait_for_completion=true. This waits for completion and confirms the completed stored task result can be read.

curl -s -i "http://localhost:9200/_tasks/${TASK_ID}?wait_for_completion=true"

Fetch the same task again. This confirms GET /_tasks/{task_id} remains read-only and does not remove the stored result.

curl -s -i "http://localhost:9200/_tasks/${TASK_ID}"

Delete the stored completed task result using the dedicated delete API.

curl -s -i -XDELETE "http://localhost:9200/_tasks/${TASK_ID}"

Fetch the task again. The expected 404 confirms the stored completed task result was deleted.

curl -s -i "http://localhost:9200/_tasks/${TASK_ID}"

Remove the temporary indices created by the manual test.

curl -s -XDELETE 'http://localhost:9200/movies_source,movies_archive?ignore_unavailable=true'

Expected behavior:

  • The first fetch returns 200 for the completed task result.
  • The second fetch also returns 200, confirming GET preserved the stored result.
  • DELETE /_tasks/${TASK_ID} returns 200 with "acknowledged":true.
  • A subsequent fetch returns 404, confirming the stored completed task result was deleted.

@github-actions github-actions Bot added the enhancement Enhancement or improvement to existing feature or request label May 19, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Reviewer Guide 🔍

(Review updated until commit fec3337)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Race Condition

Between checking if a task is running (line 109) and deleting its result from the index (lines 122-134), the task could complete and store its result. This creates a window where a completed task's result might be incorrectly reported as "still running" or where the deletion could fail unexpectedly. The check at line 109 uses taskManager.getTask() which only sees in-memory tasks, while the deletion targets the persistent index.

if (taskManager.getTask(request.getTaskId().getId()) != null) {
    listener.onFailure(
        new OpenSearchStatusException(
            "task [{}] is still running and cannot be deleted; use the cancel tasks API to cancel running tasks",
            RestStatus.CONFLICT,
            request.getTaskId()
        )
    );
    return;
}
deleteFinishedTaskFromIndex(thisTask, request, listener);
Incomplete Error Handling

When the GET request fails with an IndexNotFoundException (line 127), the code wraps it in a ResourceNotFoundException stating the task "isn't running and hasn't stored its results". However, an IndexNotFoundException could also occur if the .tasks index was deleted after the task completed, which is a different scenario than a task never having stored results. This conflates two distinct failure modes.

client.get(get, ActionListener.wrap(r -> onGetFinishedTaskFromIndex(r, thisTask, listener), e -> {
    if (ExceptionsHelper.unwrap(e, IndexNotFoundException.class) != null) {
        listener.onFailure(
            new ResourceNotFoundException("task [{}] isn't running and hasn't stored its results", e, request.getTaskId())
        );
    } else {
        listener.onFailure(e);
    }
}));

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Code Suggestions ✨

Latest suggestions up to fec3337

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Add reference equality check first

The equals method should check for reference equality first for performance
optimization. If this == obj, return true immediately before performing more
expensive checks.

client/rest-high-level/src/main/java/org/opensearch/client/tasks/DeleteTaskRequest.java [38-47]

 @Override
 public boolean equals(Object obj) {
+    if (this == obj) {
+        return true;
+    }
     if (obj == null) {
         return false;
     }
     if (getClass() != obj.getClass()) {
         return false;
     }
     DeleteTaskRequest other = (DeleteTaskRequest) obj;
     return Objects.equals(nodeId, other.nodeId) && taskId == other.taskId;
 }
Suggestion importance[1-10]: 5

__

Why: Valid optimization that adds a reference equality check before more expensive comparisons. While correct, this is a minor performance improvement that has limited impact on overall functionality.

Low
Use idiomatic boolean negation

Using false == is an unconventional pattern in Java. Replace with the more idiomatic
!getTaskId().isSet() for better readability and consistency with standard Java
coding practices.

server/src/main/java/org/opensearch/action/admin/cluster/node/tasks/delete/DeleteTaskRequest.java [57-64]

 @Override
 public ActionRequestValidationException validate() {
     ActionRequestValidationException validationException = null;
-    if (false == getTaskId().isSet()) {
+    if (!getTaskId().isSet()) {
         validationException = addValidationError("task id is required", validationException);
     }
     return validationException;
 }
Suggestion importance[1-10]: 4

__

Why: The suggestion correctly identifies a style issue where false == should be replaced with !. While this improves readability and follows Java conventions, it's a minor stylistic change with no functional impact.

Low

Previous suggestions

Suggestions up to commit 09efb27
CategorySuggestion                                                                                                                                    Impact
General
Add validation for cleanup parameter

The validation should verify that cleanup is only used with waitForCompletion, since
cleanup only makes sense for completed tasks. Without this validation, users could
set cleanup=true without waitForCompletion, which would have no effect and could
cause confusion.

server/src/main/java/org/opensearch/action/admin/cluster/node/tasks/get/GetTaskRequest.java [141-148]

 public ActionRequestValidationException validate() {
     ActionRequestValidationException validationException = null;
     if (waitForCompletion == false && timeout != null) {
         validationException = addValidationError("timeout can only be specified if wait_for_completion is true", validationException);
     }
+    if (cleanup && !waitForCompletion) {
+        validationException = addValidationError("cleanup can only be specified if wait_for_completion is true", validationException);
+    }
     return validationException;
 }
Suggestion importance[1-10]: 7

__

Why: This is a valid suggestion that adds important validation logic. The cleanup parameter should only be used with waitForCompletion=true since cleanup only applies to completed tasks. Without this validation, users could set cleanup=true without waitForCompletion, leading to confusing behavior where the cleanup flag is silently ignored.

Medium
Remove redundant exception handling

The cleanup operation should only be attempted when the task result was successfully
retrieved and is completed. Currently, cleanupTaskResult is called after parsing the
response, but there's no verification that the cleanup flag was actually set.
Consider adding a check to ensure cleanup is only performed when explicitly
requested.

server/src/main/java/org/opensearch/action/admin/cluster/node/tasks/get/TransportGetTaskAction.java [267-275]

 private void cleanupTaskResult(String taskResultId) {
-    try {
-        client.delete(new DeleteRequest(TaskResultsService.TASK_INDEX, taskResultId), ActionListener.wrap(response -> {
-            logger.trace("Cleaned up stored task result [{}]", taskResultId);
-        }, e -> { logger.debug(() -> new ParameterizedMessage("Failed to clean up stored task result [{}]", taskResultId), e); }));
-    } catch (Exception e) {
-        logger.debug(() -> new ParameterizedMessage("Failed to clean up stored task result [{}]", taskResultId), e);
-    }
+    client.delete(new DeleteRequest(TaskResultsService.TASK_INDEX, taskResultId), ActionListener.wrap(response -> {
+        logger.trace("Cleaned up stored task result [{}]", taskResultId);
+    }, e -> { logger.debug(() -> new ParameterizedMessage("Failed to clean up stored task result [{}]", taskResultId), e); }));
 }
Suggestion importance[1-10]: 5

__

Why: The suggestion correctly identifies redundant exception handling in cleanupTaskResult. The outer try-catch duplicates the error handling already present in the ActionListener, making the code unnecessarily verbose. However, the suggestion's description incorrectly mentions verifying the cleanup flag, which is already checked at line 261 before calling this method.

Low
Suggestions up to commit 210ff74
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix cleanup default for backward compatibility

When deserializing from older versions, the cleanup field defaults to false, but the
class default is true. This inconsistency could cause unexpected behavior where
tasks from older nodes won't be cleaned up even when the user expects them to be.

server/src/main/java/org/opensearch/action/admin/cluster/node/tasks/get/GetTaskRequest.java [70]

-cleanup = in.getVersion().onOrAfter(Version.V_3_7_0) ? in.readBoolean() : false;
+cleanup = in.getVersion().onOrAfter(Version.V_3_7_0) ? in.readBoolean() : true;
Suggestion importance[1-10]: 8

__

Why: This identifies a critical inconsistency where deserialization from older versions defaults cleanup to false, while the class field default is true. This could cause unexpected behavior in mixed-version clusters where tasks from older nodes won't be cleaned up despite the user's expectation based on the default value.

Medium
General
Add refresh policy for cleanup

The cleanup operation should not be attempted if the task result was already deleted
or doesn't exist. Add a check to verify the task result exists before attempting
deletion to avoid unnecessary delete operations and potential race conditions.

server/src/main/java/org/opensearch/action/admin/cluster/node/tasks/get/TransportGetTaskAction.java [267-275]

 private void cleanupTaskResult(String taskResultId) {
     try {
-        client.delete(new DeleteRequest(TaskResultsService.TASK_INDEX, taskResultId), ActionListener.wrap(response -> {
+        DeleteRequest deleteRequest = new DeleteRequest(TaskResultsService.TASK_INDEX, taskResultId);
+        deleteRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
+        client.delete(deleteRequest, ActionListener.wrap(response -> {
             logger.trace("Cleaned up stored task result [{}]", taskResultId);
         }, e -> { logger.debug(() -> new ParameterizedMessage("Failed to clean up stored task result [{}]", taskResultId), e); }));
     } catch (Exception e) {
         logger.debug(() -> new ParameterizedMessage("Failed to clean up stored task result [{}]", taskResultId), e);
     }
 }
Suggestion importance[1-10]: 3

__

Why: The suggestion adds a refresh policy to the delete request, but the existing_code and improved_code are nearly identical except for the refresh policy line. The suggestion doesn't address the stated concern about checking if the task result exists before deletion. The refresh policy addition is a minor optimization that may not be necessary for all use cases.

Low
Suggestions up to commit 536a025
CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix default cleanup value for old versions

When deserializing from older versions, the cleanup field should default to true
(matching the field's default value) instead of false. This ensures backward
compatibility and consistent behavior across versions.

server/src/main/java/org/opensearch/action/admin/cluster/node/tasks/get/GetTaskRequest.java [70]

-cleanup = in.getVersion().onOrAfter(Version.V_3_7_0) ? in.readBoolean() : false;
+cleanup = in.getVersion().onOrAfter(Version.V_3_7_0) ? in.readBoolean() : true;
Suggestion importance[1-10]: 9

__

Why: This is a critical bug. The cleanup field is initialized to true at line 57, but when deserializing from older versions (pre-V_3_7_0), it defaults to false at line 70. This inconsistency breaks backward compatibility and could lead to unexpected behavior where tasks are not cleaned up when they should be.

High

@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 210ff74

@cwperks cwperks marked this pull request as ready for review May 19, 2026 02:15
@cwperks cwperks requested a review from a team as a code owner May 19, 2026 02:15
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit 09efb27

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 09efb27: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

❌ Gradle check result for 09efb27: null

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

✅ Gradle check result for 09efb27: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.50%. Comparing base (8f2d058) to head (fec3337).
⚠️ Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
...earch/transport/client/support/AbstractClient.java 0.00% 5 Missing ⚠️
...c/main/java/org/opensearch/client/TasksClient.java 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21727      +/-   ##
============================================
+ Coverage     73.46%   73.50%   +0.04%     
- Complexity    74825    74842      +17     
============================================
  Files          5997     5996       -1     
  Lines        339688   339694       +6     
  Branches      48961    48961              
============================================
+ Hits         249558   249700     +142     
+ Misses        70272    70117     -155     
- Partials      19858    19877      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kkhatua kkhatua requested a review from sgup432 May 19, 2026 15:44
@rishabhmaurya
Copy link
Copy Markdown
Contributor

Thanks @cwperks for working on it. I'm not sure if passing a param to GET api is the right way to cleanup stored tasks here.

@cwperks
Copy link
Copy Markdown
Member Author

cwperks commented May 19, 2026

Thanks @cwperks for working on it. I'm not sure if passing a param to GET api is the right way to cleanup stored tasks here.

@rishabhmaurya what do you suggest for cleanup? FYI if you look at the linked issue this PR is one of 2 proposals. The other is a TTL on .tasks entries so they can periodically be cleaned up on expiry (if the task is completed). wdyt?

I suggested this because I think it would make sense for a caller to explicitly request cleanup when they are done with the task.

@rishabhmaurya
Copy link
Copy Markdown
Contributor

rishabhmaurya commented May 19, 2026

@cwperks can we support POST on this API?
TTL based approach might be a breaking change for users as polling for async task beyond this TTL would result in empty response. Ideally, we can start with a high TTL, say 15 or 30 days as default and also have this POST api with cleanup param. wdyt?

@sgup432
Copy link
Copy Markdown
Contributor

sgup432 commented May 19, 2026

I am not sure if adding a cleanup parameter is the right way to do it.
I believe we are trying to provide a way for users to easily manage their growing .tasks index file. If yes, then introducing a cleanup parameter is as good as explicitly calling DELETE API on a task_id, as it requires extra manual step to do it. In fact existing DELETE api is a better way to do it, with the reasons @rishabhmaurya mentioned.

If we really want to solve this, a better way might be to introduce a dynamic cluster type setting where user can define their own time interval, and retain the task results until a certain point after it got completed or fetched. By default it would be disabled i.e. the current behavior.

@cwperks
Copy link
Copy Markdown
Member Author

cwperks commented May 19, 2026

@cwperks can we support POST on this API? TTL based approach might be a breaking change for users as polling for async task beyond this TTL would result in empty response. Ideally, we can start with a high TTL, say 15 or 30 days as default and also have this POST api with cleanup param. wdyt?

We can make sure the cleanup would only clean completed tasks and make sure they don't linger in perpetuity.

I like @sgup432 suggestion of a separate DELETE _task API as well. This actually doesn't exist.

See https://docs.opensearch.org/latest/api-reference/tasks/tasks/ for the list of Tasks APIs.

@sgup432
Copy link
Copy Markdown
Contributor

sgup432 commented May 20, 2026

I like @sgup432 suggestion of a separate DELETE _task API as well. This actually doesn't exist.

Oh wow! I am surprised that we don't have any way to delete a task_id!? We certainly need this then.

Actually considering .tasks is a system index, can we not use the document API to delete a task_id using DELETE /.tasks/_doc/<task_id>? @cwperks

Signed-off-by: Craig Perkins <craig5008@gmail.com>
@cwperks cwperks requested a review from peternied as a code owner May 20, 2026 00:17
@github-actions
Copy link
Copy Markdown
Contributor

Persistent review updated to latest commit fec3337

@cwperks cwperks changed the title Add cleanup parameter for get task Add DELETE task API for stored task results May 20, 2026
@cwperks
Copy link
Copy Markdown
Member Author

cwperks commented May 20, 2026

Actually considering .tasks is a system index, can we not use the document API to delete a task_id using DELETE /.tasks/_doc/<task_id>? @cwperks

Not in a cluster with FGAC. Any mutating operation to a system index is blocked when using FGAC as writing to a system index can cause corruption and any writing should be done through dedicated APIs where the system is the one doing system index access.

I pivoted this PR to reveal a DELETE _task/{task_id} API to delete from this index and it properly guards to ensure that in-flight tasks cannot be deleted from the index.

@github-actions
Copy link
Copy Markdown
Contributor

❕ Gradle check result for fec3337: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Enhancement or improvement to existing feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] Automatic cleanup of completed task results in .tasks index

3 participants