Skip to content

[tx] Add separate JIT vs post-JIT timing measurement to memory benchmark#1021

Merged
pcmoritz merged 4 commits intoNovaSky-AI:mainfrom
raulchen:benchmark-warm
Feb 4, 2026
Merged

[tx] Add separate JIT vs post-JIT timing measurement to memory benchmark#1021
pcmoritz merged 4 commits intoNovaSky-AI:mainfrom
raulchen:benchmark-warm

Conversation

@raulchen
Copy link
Contributor

@raulchen raulchen commented Feb 4, 2026

Summary

  • Separate JIT compilation time from post-JIT runtime measurement by sending multiple requests per test
  • First request triggers JIT compilation and is measured separately
  • Subsequent requests (default: 3) measure actual post-JIT performance and are averaged
  • Extract common timing logic into _run_timed_requests helper method
  • Use random tokens (with fixed seed) for reproducible benchmarks

Changes

  • Add jit_e2e_sec field to capture first request (JIT) time
  • Rename client_e2e_sec to post_jit_e2e_sec for clarity
  • Add --num-measurement-iters flag (default: 3) for post-JIT iterations
  • Update CSV output and summary tables to show both metrics

The benchmark now sends multiple requests per test: the first request
triggers JIT compilation and subsequent requests measure actual post-JIT
runtime. This provides clearer insight into compilation overhead vs
steady-state performance.

- Add jit_e2e_sec field to capture first request (JIT) time
- Rename client_e2e_sec to post_jit_e2e_sec for clarity
- Add --num-measurement-iters flag (default: 3) for post-JIT iterations
- Update CSV output and summary tables to show both metrics

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully introduces separate timing measurements for JIT compilation and post-JIT runtime, providing a clearer understanding of performance characteristics. The changes involve updating the BenchmarkConfig and TestResult data structures, modifying the test execution logic to perform warmup and multiple measurement iterations, and updating output formats (CLI summary and CSV) to reflect the new metrics. The addition of the --num-measurement-iters CLI flag makes the benchmark more configurable.

@pcmoritz pcmoritz added the tx label Feb 4, 2026
Address PR feedback:
- Rename parameter to match CLI flag name for consistency
- Remove default value since it's always passed from caller
- Extract common timing logic into _run_timed_requests helper method
- Use random tokens for each request

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@raulchen
Copy link
Contributor Author

raulchen commented Feb 4, 2026

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request successfully introduces separate timing measurements for JIT compilation and post-JIT runtime, significantly enhancing the memory benchmark's insights. The changes are well-integrated across the configuration, data structures, execution logic, and reporting. The addition of _run_timed_requests effectively encapsulates the new measurement methodology, and the updated output formats provide clear visibility into the new metrics.

raulchen and others added 2 commits February 4, 2026 13:26
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@pcmoritz pcmoritz merged commit ae4917d into NovaSky-AI:main Feb 4, 2026
4 of 5 checks passed
@raulchen raulchen deleted the benchmark-warm branch February 4, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants