-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Support wait_for_status and timeout query params on root endpoint #18377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🤖 GitHub commentsExpand to view the GitHub comments
Just comment with:
|
|
This pull request does not have a backport label. Could you fix it @estolfo? 🙏
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, the origial requirement is to have a non-200 status code if the status is not met.
|
|
71ecbbd to
aec5543
Compare
Co-authored-by: Rye Biesemeyer <[email protected]>
Co-authored-by: Rye Biesemeyer <[email protected]>
|
Note: I tested with Elasticsearch and found that some assumptions about its behavior were incorrect:
|
|
Update: changed behavior to require a valid timeout with a valid status. This differs from Elasticsearch's behavior; Elasticsearch will wait until the network request timeout for the status if no timeout query param is provided. |
💛 Build succeeded, but was flaky
Failed CI StepsHistory
|
| if input_status | ||
| return status_error_response(input_status) unless target_status = parse_status(input_status) | ||
| end | ||
|
|
||
| if input_timeout | ||
| return timeout_error_response(input_timeout) unless timeout_s = parse_timeout_s(input_timeout) | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assignment in the nested modifier condition is easy to lose track of. The Ruby Style Guide calls out wrapping assignment-in-conditionals in parenthesis, which helps a bit:
| if input_status | |
| return status_error_response(input_status) unless target_status = parse_status(input_status) | |
| end | |
| if input_timeout | |
| return timeout_error_response(input_timeout) unless timeout_s = parse_timeout_s(input_timeout) | |
| end | |
| if input_status | |
| return status_error_response(input_status) unless (target_status = parse_status(input_status)) | |
| end | |
| if input_timeout | |
| return timeout_error_response(input_timeout) unless (timeout_s = parse_timeout_s(input_timeout)) | |
| end |
But I think that the complexity is still buried.
If we pull the assignment up into the top conditional, I think it meaningfully pulls the complexity of the assignment forward (instead of deferring it to the modifier clause):
| if input_status | |
| return status_error_response(input_status) unless target_status = parse_status(input_status) | |
| end | |
| if input_timeout | |
| return timeout_error_response(input_timeout) unless timeout_s = parse_timeout_s(input_timeout) | |
| end | |
| if input_status && !(target_status = parse_status(input_status)) | |
| return status_error_response(input_status) | |
| end | |
| if input_timeout && !(timeout_s = parse_timeout_s(input_timeout)) | |
| return timeout_error_response(input_timeout) | |
| end |
| current_status = HEALTH_STATUS.index(agent.health_observer.status.external_value) | ||
| break if current_status <= HEALTH_STATUS.index(target_status) | ||
|
|
||
| if Time.now > deadline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're already at the deadline, no use spinning another wait cycle:
| if Time.now > deadline | |
| if Time.now >= deadline |
| if Time.now > deadline | ||
| return respond_with(RequestTimeout.new(TIMED_OUT_WAITING_FOR_STATUS_MESSAGE % [target_status])) | ||
| end | ||
|
|
||
| sleep(wait_interval) | ||
| wait_interval = wait_interval * 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are doubling our sleep with each attempt, we risk over-sleeping.
| ittr | wait_interval |
effective timeout |
|---|---|---|
| 1 | 0.2 | 0.2 |
| 2 | 0.4 | 0.6 |
| 3 | 0.8 | 1.4 |
| 4 | 1.6 | 3.0 |
| 5 | 3.2 | 6.2 |
| 6 | 6.4 | 12.4 |
| 7 | 12.8 | 25.4 |
| 8 | 25.6 | 51.0 |
For example, a request for timeout=30s, and the wait_for_status condition has not been met after ~25.4s, the current code will sleep another 25.6s and not check again until a total of 51s has elapsed.
We can keep the doubling factor and limit the last sleep to no more than the requested amount:
| if Time.now > deadline | |
| return respond_with(RequestTimeout.new(TIMED_OUT_WAITING_FOR_STATUS_MESSAGE % [target_status])) | |
| end | |
| sleep(wait_interval) | |
| wait_interval = wait_interval * 2 | |
| time_remaining = deadline - Time.now | |
| if time_remaining <= 0 | |
| return respond_with(RequestTimeout.new(TIMED_OUT_WAITING_FOR_STATUS_MESSAGE % [target_status])) | |
| end | |
| sleep((time_remaining <= wait_interval) ? time_remaining : wait_interval) | |
| wait_interval = wait_interval * 2 |
This PR adds support for two query params on the root api:
wait_for_statusandtimeout.They mirror what the same query params do on the elasticsearch cluster health status endpoint.
wait_for_status: One of green, yellow or red.timeoutis required along with a status. Will wait (until the timeout provided) until the status of the service changes to the one provided or better, i.e. green > yellow > red.timeout: Period to wait for the status to reach the requested target status. If the target status is not reached before the timeout expires, the request returns http status 408.The status of the service will be checked with an exponential backoff until the timeout is reached.
Short description of the behavior:
green,yellow,red] - return error response and http status 400Open Questions/ToDo:
Right now, the implementation doesn't wait for a status that is "better, i.e. green > yellow > red", as the Elasticsearch implementation does. Do we want to adjust our implementation to also have this behavior or is that overkill?The implementation will do the same as Elasticsearch-- it will wait for a status that matches the target or "better".If the timeout is expired on the Elasticsearch cluster health endpoint before the target status is reached (or a better one), the request fails and returns an error. This implementation currently just returns as normal. Do we want to have the same behavior as Elasticsearch?The request will return 503 if the request times out and the target status is not reached, as does ElasticsearchWhat examples should be used for "host" and "name" in the openapi documentation? The guidelines suggestUpdate: usedapi.example.combut that doesn't seem to fit the example of calling logstash's root api.logstash-pipelines,logstash-pipelines.example.comConfirm that the status code 503 should be used when the request times out. Define message based on what elasticsearch returns.Update: testing showed that status code 408 is returned when the request times out.Resolves #17457