Skip to content

Add a min-requests constraint#700

Draft
sjmonson wants to merge 3 commits intomainfrom
feat/min_constraints
Draft

Add a min-requests constraint#700
sjmonson wants to merge 3 commits intomainfrom
feat/min_constraints

Conversation

@sjmonson
Copy link
Copy Markdown
Collaborator

Summary

Adds a --min-requests constraint that acts like --max-requests but keeps scheduling requests until the last request under the threshold completes.

Details

The normal --max-requests variant can have unexpectedly low per-request throughput / latency due to request trail-off at the end of the benchmark. The current solution to this problem is to set --max-requests so high that the proportion of trail-off time to total benchmark time is small. If we continue to schedule requests even after hitting the constraint we ensure that the requested rate is maintained for the entire duration of measurement.

Note that --min-requests is a bit of a misnomer when combined with other constraints, since any other active constraints can trigger the benchmark to end before --min-requests. Other name suggestions are welcome.

Test Plan

Here is an example benchmark which should cause --min-requests to behave differently from --max-requests:

guidellm benchmark run \
    --target http://127.0.0.1:8000 \
    --request-format /v1/completions \
    --profile concurrent \
    --rate 30 \
    --data prompt_tokens=50,output_tokens=50 \
    --min-requests 50

  • "I certify that all code in this PR is my own, except as noted below."

Use of AI

  • Includes AI-assisted code completion
  • Includes code generated by an AI application
  • Includes AI-generated tests (NOTE: AI written tests should have a docstring that includes ## WRITTEN BY AI ##)

Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
Signed-off-by: Samuel Monson <smonson@redhat.com>
@sjmonson sjmonson requested a review from jaredoconnell April 20, 2026 20:50
@sjmonson sjmonson self-assigned this Apr 20, 2026
@sjmonson sjmonson added priority-low feature Represents a new user-visible feature labels Apr 20, 2026
Copy link
Copy Markdown
Collaborator

@jaredoconnell jaredoconnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure of the arg name. I can see arguments both ways.

But I think this would be a good opportunity to add a markdown file detailing all of the constraints.

I added two comments.

"""
Constraint that limits execution based on minimum request counts.

Like MinNumberConstraint but instead of stopping request generation after reaching
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mistake: It should say "Like MaxNumberConstraint"

I think this wording doesn't emphasize the nuances of this implementation enough. Maybe clarify generation and processing, and why this may be helpful. It's identical except that it doesn't stop queueing until the max processed quantity is reached.



@ConstraintsInitializerFactory.register( # type: ignore[arg-type]
["min_number", "min_num", "min_requests", "min_req"]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may make sense to instead rename this to max-processed. I think this would be less confusing. But I can see the argument for min, since it's going to keep scheduling past that until the max-processed is reached. So I'm not sure what should be done.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah... max-processed is both a little too vague and also incorrect since we can end up processing more requests then set. I think min is fine actually. I'll just add some notes to the docs that clarify constraints are OR not AND. Maybe in the future we can support AND constraint combinations.

@sjmonson sjmonson marked this pull request as draft April 22, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature Represents a new user-visible feature priority-low

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants