Support wait_for_status and timeout query params on root endpoint #18377

estolfo · 2025-10-28T13:32:49Z

This PR adds support for two query params on the root api: wait_for_status and timeout.
They mirror what the same query params do on the elasticsearch cluster health status endpoint.

wait_for_status: One of green, yellow or red.timeout is required along with a status. Will wait (until the timeout provided) until the status of the service changes to the one provided or better, i.e. green > yellow > red.

timeout: Period to wait for the status to reach the requested target status. If the target status is not reached before the timeout expires, the request returns http status 408.

The status of the service will be checked with an exponential backoff until the timeout is reached.

Short description of the behavior:

valid timeout is provided with no status: return immediately
valid status is provided with no timeout: return error response that timeout is required with status and http status 400
invalid status is provided (i.e. not one of [green, yellow, red] - return error response and http status 400
invalid timeout is provided (required input is that it's an integer, and with units) - return error response and http status 400
valid status is provided with a valid timeout: wait for the given status or a better one (green > yellow > red). When target status or a better one is reached, return normal response
valid status is provided with a valid timeout: wait for the given status or a better one (green > yellow > red). When the timeout is reached and neither the target status nor a better one is reached: return error response and http status 408
neither status nor timeout provided: return normal response

Open Questions/ToDo:

Right now, the implementation doesn't wait for a status that is "better, i.e. green > yellow > red", as the Elasticsearch implementation does. Do we want to adjust our implementation to also have this behavior or is that overkill? The implementation will do the same as Elasticsearch-- it will wait for a status that matches the target or "better".
If the timeout is expired on the Elasticsearch cluster health endpoint before the target status is reached (or a better one), the request fails and returns an error. This implementation currently just returns as normal. Do we want to have the same behavior as Elasticsearch? The request will return 503 if the request times out and the target status is not reached, as does Elasticsearch
~~What examples should be used for "host" and "name" in the openapi documentation? The guidelines suggest api.example.com but that doesn't seem to fit the example of calling logstash's root api.~~ Update: used logstash-pipelines, logstash-pipelines.example.com
~~Confirm that the status code 503 should be used when the request times out. Define message based on what elasticsearch returns.~~ Update: testing showed that status code 408 is returned when the request times out.
When an invalid timeout or status are provided, status 400 should be used with an error message.
Should a timeout unit be required, like for ES? i.e. "1s" for the timeout.

Resolves #17457

github-actions · 2025-10-28T13:33:01Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

mergify · 2025-10-28T13:33:26Z

This pull request does not have a backport label. Could you fix it @estolfo? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit.
If no backport is necessary, please add the backport-skip label

yaauie

IIRC, the origial requirement is to have a non-200 status code if the status is not met.

logstash-core/lib/logstash/api/modules/root.rb

… better

estolfo · 2025-10-29T15:58:29Z

~~note: handle the unknown status being in the Status enum. Should it be removed from the HEALTH_STATUS constant?~~
Update: removed unknown as a valid Status.

Co-authored-by: Rye Biesemeyer <[email protected]>

estolfo · 2025-11-04T13:48:09Z

Note: I tested with Elasticsearch and found that some assumptions about its behavior were incorrect:

Elasticsearch returns HTTP status code 408 when the request times out waiting for the target status, not 503. Fixed in 6fdf433
If no timeout is provided, the request blocks indefinitely until the target status is reached. This is surprising given that the documentation says By default, will not wait for any status.

estolfo · 2025-11-05T09:28:52Z

Update: changed behavior to require a valid timeout with a valid status. This differs from Elasticsearch's behavior; Elasticsearch will wait until the network request timeout for the status if no timeout query param is provided.

…status

elasticmachine · 2025-11-05T11:00:37Z

💛 Build succeeded, but was flaky

Buildkite Build
Commit: 0081e59

Failed CI Steps

:java: Java unit tests

History

💛 Build #3839 was flaky ba52df2
💚 Build #3826 succeeded 6fdf433
💚 Build #3822 succeeded 405589b
💚 Build #3817 succeeded fa8816c
💛 Build #3803 was flaky 2a54b33
💚 Build #3801 succeeded 8f0dfea

yaauie · 2025-11-10T20:17:09Z

logstash-core/lib/logstash/api/modules/root.rb

+          if input_status
+            return status_error_response(input_status) unless target_status = parse_status(input_status)
+          end
+
+          if input_timeout
+            return timeout_error_response(input_timeout) unless timeout_s = parse_timeout_s(input_timeout)
+          end


The assignment in the nested modifier condition is easy to lose track of. The Ruby Style Guide calls out wrapping assignment-in-conditionals in parenthesis, which helps a bit:

Suggested change

if input_status

return status_error_response(input_status) unless target_status = parse_status(input_status)

end

if input_timeout

return timeout_error_response(input_timeout) unless timeout_s = parse_timeout_s(input_timeout)

end

if input_status

return status_error_response(input_status) unless (target_status = parse_status(input_status))

end

if input_timeout

return timeout_error_response(input_timeout) unless (timeout_s = parse_timeout_s(input_timeout))

end

But I think that the complexity is still buried.

If we pull the assignment up into the top conditional, I think it meaningfully pulls the complexity of the assignment forward (instead of deferring it to the modifier clause):

Suggested change

if input_status

return status_error_response(input_status) unless target_status = parse_status(input_status)

end

if input_timeout

return timeout_error_response(input_timeout) unless timeout_s = parse_timeout_s(input_timeout)

end

if input_status && !(target_status = parse_status(input_status))

return status_error_response(input_status)

end

if input_timeout && !(timeout_s = parse_timeout_s(input_timeout))

return timeout_error_response(input_timeout)

end

yaauie · 2025-11-10T20:19:26Z

logstash-core/lib/logstash/api/modules/root.rb

+            current_status = HEALTH_STATUS.index(agent.health_observer.status.external_value)
+            break if current_status <= HEALTH_STATUS.index(target_status)
+
+            if Time.now > deadline


If we're already at the deadline, no use spinning another wait cycle:

Suggested change

if Time.now > deadline

if Time.now >= deadline

yaauie · 2025-11-10T20:24:18Z

logstash-core/lib/logstash/api/modules/root.rb

+            if Time.now > deadline
+              return respond_with(RequestTimeout.new(TIMED_OUT_WAITING_FOR_STATUS_MESSAGE % [target_status]))
+            end
+
+            sleep(wait_interval)
+            wait_interval = wait_interval * 2


If we are doubling our sleep with each attempt, we risk over-sleeping.

ittr wait_interval effective timeout

1 0.2 0.2

2 0.4 0.6

3 0.8 1.4

4 1.6 3.0

5 3.2 6.2

6 6.4 12.4

7 12.8 25.4

8 25.6 51.0

For example, a request for timeout=30s, and the wait_for_status condition has not been met after ~25.4s, the current code will sleep another 25.6s and not check again until a total of 51s has elapsed.

We can keep the doubling factor and limit the last sleep to no more than the requested amount:

Suggested change

if Time.now > deadline

return respond_with(RequestTimeout.new(TIMED_OUT_WAITING_FOR_STATUS_MESSAGE % [target_status]))

end

sleep(wait_interval)

wait_interval = wait_interval * 2

time_remaining = deadline - Time.now

if time_remaining <= 0

return respond_with(RequestTimeout.new(TIMED_OUT_WAITING_FOR_STATUS_MESSAGE % [target_status]))

end

sleep((time_remaining <= wait_interval) ? time_remaining : wait_interval)

wait_interval = wait_interval * 2

estolfo added 3 commits October 28, 2025 12:40

Support wait_for_status query param on root endpoint

6034a90

Use Java::OrgLogstashHealth::Status enum values for status constant

870b8aa

Add docs for root api including new query params

75a1338

github-actions bot deployed to docs-preview October 28, 2025 13:33 View deployment

Be consistent with return statuses definition in specs

804ce9d

github-actions bot deployed to docs-preview October 28, 2025 13:39 View deployment

yaauie reviewed Oct 28, 2025

View reviewed changes

logstash-core/lib/logstash/api/modules/root.rb Outdated Show resolved Hide resolved

estolfo commented Oct 28, 2025

View reviewed changes

logstash-core/lib/logstash/api/modules/root.rb Outdated Show resolved Hide resolved

yaauie reviewed Oct 28, 2025

View reviewed changes

logstash-core/lib/logstash/api/modules/root.rb Outdated Show resolved Hide resolved

Use org.logstash.health.Status for constant and in tests

265952e

github-actions bot deployed to docs-preview October 29, 2025 14:02 View deployment

Return http status code 503 when timeout and support target status or…

d7b7429

… better

github-actions bot deployed to docs-preview October 29, 2025 15:55 View deployment

Update docs with note about returning status code 503

e329a4b

github-actions bot deployed to docs-preview October 29, 2025 15:57 View deployment

estolfo added 4 commits October 30, 2025 10:00

Use Timeout instead while waiting for the status

fdf667b

Only use green, yellow, and red statuses in constant (exclude unknown)

13a1c27

Handle non integer timeout string

c68ba40

Add the loop back in to timeout block

4741401

github-actions bot deployed to docs-preview October 30, 2025 09:44 View deployment

Update tests

aec5543

estolfo force-pushed the wait_for_status branch from 71ecbbd to aec5543 Compare October 30, 2025 09:46

github-actions bot deployed to docs-preview October 30, 2025 09:46 View deployment

Refactor and fix tests

00351c4

github-actions bot deployed to docs-preview October 31, 2025 12:34 View deployment

Not necessary to handle status code in app helpers anymore

ea0bb63

github-actions bot deployed to docs-preview October 31, 2025 12:40 View deployment

Fix shared examples name

6f1e092

github-actions bot deployed to docs-preview November 4, 2025 08:48 View deployment

estolfo and others added 2 commits November 4, 2025 09:51

Update docs/static/spec/openapi/logstash-api.yaml

68e3539

Co-authored-by: Rye Biesemeyer <[email protected]>

Update docs/static/spec/openapi/logstash-api.yaml

fa8816c

Co-authored-by: Rye Biesemeyer <[email protected]>

github-actions bot deployed to docs-preview November 4, 2025 08:52 View deployment

estolfo added 2 commits November 4, 2025 11:53

Use deadline instead of Timeout

e11d22a

Be more explicit about class being Float

596c1e9

github-actions bot deployed to docs-preview November 4, 2025 10:54 View deployment

No need for timeout

537ed6a

github-actions bot deployed to docs-preview November 4, 2025 10:55 View deployment

Test timeout units in ms

a0b51c8

github-actions bot deployed to docs-preview November 4, 2025 11:06 View deployment

Minor updates to specs

5657d07

github-actions bot deployed to docs-preview November 4, 2025 11:13 View deployment

Update comment

405589b

estolfo marked this pull request as ready for review November 4, 2025 11:15

github-actions bot deployed to docs-preview November 4, 2025 11:15 View deployment

Use status code 408 when the request times out

6fdf433

github-actions bot deployed to docs-preview November 4, 2025 14:37 View deployment

Require timeout along with wait_for_status

ba52df2

github-actions bot deployed to docs-preview November 5, 2025 09:45 View deployment

estolfo requested a review from yaauie November 5, 2025 10:19

Update docs to be clearer about timeout being required with wait_for_…

83ebd0b

…status

github-actions bot deployed to docs-preview November 5, 2025 10:25 View deployment

Move before block to shared examples

0081e59

github-actions bot deployed to docs-preview November 5, 2025 10:36 View deployment

yaauie reviewed Nov 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support wait_for_status and timeout query params on root endpoint #18377

Support wait_for_status and timeout query params on root endpoint #18377

estolfo commented Oct 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

mergify bot commented Oct 28, 2025

Uh oh!

yaauie left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

estolfo commented Oct 29, 2025 •

edited

Loading

Uh oh!

estolfo commented Nov 4, 2025 •

edited

Loading

Uh oh!

estolfo commented Nov 5, 2025

Uh oh!

elasticmachine commented Nov 5, 2025

Uh oh!

yaauie Nov 10, 2025

Uh oh!

yaauie Nov 10, 2025

Uh oh!

yaauie Nov 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ittr	`wait_interval`	effective timeout
1	0.2	0.2
2	0.4	0.6
3	0.8	1.4
4	1.6	3.0
5	3.2	6.2
6	6.4	12.4
7	12.8	25.4
8	25.6	51.0

Support wait_for_status and timeout query params on root endpoint #18377

Are you sure you want to change the base?

Support wait_for_status and timeout query params on root endpoint #18377

Conversation

estolfo commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2025

🤖 GitHub comments

Uh oh!

mergify bot commented Oct 28, 2025

Uh oh!

yaauie left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

estolfo commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

estolfo commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

estolfo commented Nov 5, 2025

Uh oh!

elasticmachine commented Nov 5, 2025

💛 Build succeeded, but was flaky

Failed CI Steps

History

Uh oh!

yaauie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

yaauie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

yaauie Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

estolfo commented Oct 28, 2025 •

edited

Loading

yaauie left a comment •

edited

Loading

estolfo commented Oct 29, 2025 •

edited

Loading

estolfo commented Nov 4, 2025 •

edited

Loading