Skip to content

Conversation

@Tuanlinh12312
Copy link

Resolves INT-5965.

Maillew and others added 27 commits February 9, 2026 21:18
After merging, I will rebase `develop-v1.6.0` onto
`develop-new-hintstore`
closes INT-5318 INT-5320 INT-5392

---------

Co-authored-by: Stephen Hwang <stephenh@intrinsictech.xyz>
Co-authored-by: stephenh-axiom-xyz <stephenh@intrinsictech.xyz>
- add a `generate_gpu_memory_chart` function to generate gpu memory
usage chart and table
- [sample
markdown](https://hackmd.io/yugFLvPqSm2HraUqDvcFHQ?view#GPU-Memory-Usage)
Updating `openvm-prof` to handle the new metrics from
openvm-org/stark-backend#184

I have tested it locally.
- replace `SegmentationLimits` with `SegmentationConfig` in
`SystemConfig`
- add a `interaction_cell_weight` parameter to `SegmentationConfig` that
specifies how much cells does an interaction contribute at each row
`main_cells_used` is inaccurate because trace height calculations are
not accurate when cuda tracegen is enabled. to avoid confusion, don't
emit these metrics
shuklaayush and others added 23 commits February 9, 2026 21:18
For large metrics, mermaid is too much text.

Also switched to outputting detailed metrics in a separate markdown
file.
In the benchmark CI, we still cat the detailed metrics back into the
main markdown file.
The svg chart is uploaded to public s3 similar to flamegraphs so they
can be viewed from the markdown.

For later:
- I feel like we can switch to having the detailed metrics be stored in
a sqlite file. That way it can be downloaded and processed more easily
for complex metrics. For now I just split it into a separate markdown
for simplicity.
segment_ctx.rs:
  - DEFAULT_MAX_CELLS → DEFAULT_MAX_MEMORY = 15gb
  - max_cells → max_memory in SegmentationLimits
  - set_max_cells → set_max_memory

  ctx.rs:
  - with_max_cells → with_max_memory

  metered_cost.rs:
  - Updated import to use DEFAULT_MAX_MEMORY

  cli/src/commands/prove.rs:
  - Updated import to DEFAULT_MAX_MEMORY
  - segment_max_cells → segment_max_memory
  - with_max_cells → with_max_memory

  benchmarks/prove/src/bin/async_regex.rs:
  - segment_max_cells → segment_max_memory
  - set_max_cells → set_max_memory

  benchmarks/prove/src/util.rs:
  - segment_max_cells → segment_max_memory
  - set_max_cells → set_max_memory

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Due to the difference in error types, switched to using `should_panic`
for now. We should go through later and switch everything back to the
precise error types.

closes INT-5904
Also shows delta for "Parallel Proof Time (N provers)" column
…and ECC (#2372)

- **Replace expensive BigUint computation during preflight with fast
native field arithmetic** (halo2curves/blstrs) for all known field types
(K256, P256, BN254, BLS12-381) and ECC curve operations. The trace
filler already re-executes with BigUint for constraint generation, so
preflight only needs to compute outputs for memory writes.
- **Cache modulus constants** with `once_cell::Lazy<BigUint>` to
eliminate repeated hex string parsing in
`get_field_type()`/`get_fp2_field_type()` and `get_curve_type()`
(previously called on every instruction).
- **Cache `FieldType`/`CurveType` on executor structs** at construction
time, eliminating per-instruction BigUint comparisons in preflight.
- **Remove `DynArray` heap allocations** in preflight by using
stack-allocated typed arrays directly from adapter read/write, with
`as_flattened()` for zero-cost conversions.
- **Add `adapter()` accessor** to `FieldExpressionExecutor` for use by
custom `PreflightExecutor` implementations. SETUP operations and unknown
field types fall back to `run_field_expression_precomputed`.

- [x] `cargo nextest run -p openvm-algebra-circuit` — all 18
non-pre-existing-failure tests pass (8 modular addsub/muldiv, 2 is_equal
positive, 8 fp2_chip)
- [x] `cargo nextest run -p openvm-ecc-circuit` — all 8 tests pass (3
add_ne, 5 double including nonzero_a)
- [x] `cargo clippy -p openvm-algebra-circuit -p openvm-ecc-circuit
--all-targets` — no new warnings

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
keccak (`p3_inner_tracegen`) - force `__noinline__` - get -92% stack
size
BigUintGPU::mod_div - force `__noinline__` - get -85% stack size
sha256 (first and second pass) - -76% & -90%:
- `generate_block_trace`, `generate_missing_cells` → `__noinline__`
- `generate_carry_ae`, `generate_intermed_4`, `generate_intermed_12` →
Compute on-the-fly

The goal: reduce memory peak (and get close to mem tracker report)
Was +2 GB -> reduced to +0.9 GB
Should be tested on various blocks (was tested on 21M)
@stephenh-axiom-xyz
Copy link
Contributor

@Tuanlinh12312 please clean up this PR

Tuanlinh12312 and others added 2 commits February 10, 2026 15:32
Remove selector_inverse and is_not_wrap columns from VariableRangeCheckerAir
by using a "monotonic sum" constraint approach instead of selector-based
branching. This reduces trace width from 6 to 4 columns while maintaining
max constraint degree of 2.

Key insight: (value + two_to_max_bits) equals (row_index + 1), forming a
strictly increasing sequence. By constraining this sum to increase by
exactly 1 each row, combined with constraints that value can only be 0 or
increment and max_bits can only stay or increment, the trace is forced
into the correct enumeration. The last-row constraint acts as a checksum.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add missing imports (VerificationError, BabyBearBlake3Engine, StarkFriEngine)
and remove unused dead code to allow circuit-primitives tests to compile.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@Tuanlinh12312 Tuanlinh12312 force-pushed the perf/revisit-varrangechecker branch from efe9ccd to 1cabea1 Compare February 10, 2026 15:32
@claude
Copy link

claude bot commented Feb 10, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants