-
Notifications
You must be signed in to change notification settings - Fork 92
perf: revisit varrangechecker #2401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Tuanlinh12312
wants to merge
52
commits into
develop-v2
Choose a base branch
from
perf/revisit-varrangechecker
base: develop-v2
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
After merging, I will rebase `develop-v1.6.0` onto `develop-new-hintstore`
closes INT-5318 INT-5320 INT-5392 --------- Co-authored-by: Stephen Hwang <stephenh@intrinsictech.xyz>
Resolves INT-5394
Co-authored-by: stephenh-axiom-xyz <stephenh@intrinsictech.xyz>
- add a `generate_gpu_memory_chart` function to generate gpu memory usage chart and table - [sample markdown](https://hackmd.io/yugFLvPqSm2HraUqDvcFHQ?view#GPU-Memory-Usage)
Updating `openvm-prof` to handle the new metrics from openvm-org/stark-backend#184 I have tested it locally.
- replace `SegmentationLimits` with `SegmentationConfig` in `SystemConfig` - add a `interaction_cell_weight` parameter to `SegmentationConfig` that specifies how much cells does an interaction contribute at each row
Resolves INT-5611
For large metrics, mermaid is too much text. Also switched to outputting detailed metrics in a separate markdown file. In the benchmark CI, we still cat the detailed metrics back into the main markdown file. The svg chart is uploaded to public s3 similar to flamegraphs so they can be viewed from the markdown. For later: - I feel like we can switch to having the detailed metrics be stored in a sqlite file. That way it can be downloaded and processed more easily for complex metrics. For now I just split it into a separate markdown for simplicity.
closes INT-5391
segment_ctx.rs: - DEFAULT_MAX_CELLS → DEFAULT_MAX_MEMORY = 15gb - max_cells → max_memory in SegmentationLimits - set_max_cells → set_max_memory ctx.rs: - with_max_cells → with_max_memory metered_cost.rs: - Updated import to use DEFAULT_MAX_MEMORY cli/src/commands/prove.rs: - Updated import to DEFAULT_MAX_MEMORY - segment_max_cells → segment_max_memory - with_max_cells → with_max_memory benchmarks/prove/src/bin/async_regex.rs: - segment_max_cells → segment_max_memory - set_max_cells → set_max_memory benchmarks/prove/src/util.rs: - segment_max_cells → segment_max_memory - set_max_cells → set_max_memory --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Due to the difference in error types, switched to using `should_panic` for now. We should go through later and switch everything back to the precise error types. closes INT-5904
Resolves INT-5850.
Also shows delta for "Parallel Proof Time (N provers)" column
Resolves INT-5851.
Resolves INT-5853 and INT-5906.
…n] (#2366) Resolves INT-5852.
…and ECC (#2372) - **Replace expensive BigUint computation during preflight with fast native field arithmetic** (halo2curves/blstrs) for all known field types (K256, P256, BN254, BLS12-381) and ECC curve operations. The trace filler already re-executes with BigUint for constraint generation, so preflight only needs to compute outputs for memory writes. - **Cache modulus constants** with `once_cell::Lazy<BigUint>` to eliminate repeated hex string parsing in `get_field_type()`/`get_fp2_field_type()` and `get_curve_type()` (previously called on every instruction). - **Cache `FieldType`/`CurveType` on executor structs** at construction time, eliminating per-instruction BigUint comparisons in preflight. - **Remove `DynArray` heap allocations** in preflight by using stack-allocated typed arrays directly from adapter read/write, with `as_flattened()` for zero-cost conversions. - **Add `adapter()` accessor** to `FieldExpressionExecutor` for use by custom `PreflightExecutor` implementations. SETUP operations and unknown field types fall back to `run_field_expression_precomputed`. - [x] `cargo nextest run -p openvm-algebra-circuit` — all 18 non-pre-existing-failure tests pass (8 modular addsub/muldiv, 2 is_equal positive, 8 fp2_chip) - [x] `cargo nextest run -p openvm-ecc-circuit` — all 8 tests pass (3 add_ne, 5 double including nonzero_a) - [x] `cargo clippy -p openvm-algebra-circuit -p openvm-ecc-circuit --all-targets` — no new warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
keccak (`p3_inner_tracegen`) - force `__noinline__` - get -92% stack size BigUintGPU::mod_div - force `__noinline__` - get -85% stack size sha256 (first and second pass) - -76% & -90%: - `generate_block_trace`, `generate_missing_cells` → `__noinline__` - `generate_carry_ae`, `generate_intermed_4`, `generate_intermed_12` → Compute on-the-fly The goal: reduce memory peak (and get close to mem tracker report) Was +2 GB -> reduced to +0.9 GB Should be tested on various blocks (was tested on 21M)
a0ab899 to
61263de
Compare
Contributor
|
@Tuanlinh12312 please clean up this PR |
Remove selector_inverse and is_not_wrap columns from VariableRangeCheckerAir by using a "monotonic sum" constraint approach instead of selector-based branching. This reduces trace width from 6 to 4 columns while maintaining max constraint degree of 2. Key insight: (value + two_to_max_bits) equals (row_index + 1), forming a strictly increasing sequence. By constraining this sum to increase by exactly 1 each row, combined with constraints that value can only be 0 or increment and max_bits can only stay or increment, the trace is forced into the correct enumeration. The last-row constraint acts as a checksum. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add missing imports (VerificationError, BabyBearBlake3Engine, StarkFriEngine) and remove unused dead code to allow circuit-primitives tests to compile. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
efe9ccd to
1cabea1
Compare
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
61263de to
a98d800
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Resolves INT-5965.