Feat/new compiler #92

shreyas-londhe · 2025-03-28T15:39:56Z

Note

Replaces the old Rust compiler and extensive Circom tests with a new Bun/TypeScript scripts suite (gen-regex, gen-inputs), shared utilities, and Jest-based tests.

Tooling/Scripts (new):
- Add Bun/TypeScript scripts for generating regex artifacts: circom/scripts/gen-regex.ts, noir/scripts/gen-regex.ts, noir/scripts/gen-inputs.ts.
- Introduce shared utilities (utils/*) for logging, file ops, subprocess, and types.
- Add project config: tsconfig.json, jest.config.ts, jest.setup.ts, bun.lock, and package.json.
Tests (new):
- Add Jest integration and unit tests under scripts/__tests__ for script workflows and utilities.
Compiler (removed):
- Remove legacy Rust compiler package (packages/compiler/*).
Circom Tests (removed):
- Remove numerous legacy Circom test circuits and test cases under packages/circom/tests/*.

^{Written by Cursor Bugbot for commit 3d37c31. This will update automatically on new commits. Configure here.}

Summary by CodeRabbit

New Features
- Adds many new regex verification circuits for email headers and addresses and a compiler to generate them; Noir support added alongside Circom.
Infrastructure & Build
- New CI/workflow and pre-push hook to keep generated templates in sync; new package/tsconfig/bundler configs for tooling.
Documentation
- Major README overhaul and new docs for compiler, Circom, and Noir subprojects with usage and troubleshooting.
Tests
- End-to-end circuit tests and sample inputs added for common email regexes.
Chores
- Simplified lint/format configs and updated workspace/manifest and .gitignore rules.

socket-security · 2025-03-31T07:26:18Z

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Package	Supply Chain Security	Quality
npm/@types/bun@1.3.2
npm/@zk-email/circuits@6.4.1-alpha.0
npm/circom_tester@0.0.20 ⏵ 0.0.24	⁺³	^-1
cargo/thiserror@2.0.12
npm/@types/node@22.13.0 ⏵ 20.19.24
cargo/serde@1.0.219
cargo/serde_json@1.0.140
cargo/wasm-bindgen@0.2.100
npm/prettier@3.6.2
npm/typescript@5.9.3
cargo/wasm-bindgen-test@0.2.50
npm/ts-jest@29.4.5
cargo/heck@0.4.1
cargo/regex-syntax@0.8.5
cargo/regex-automata@0.4.7
cargo/clap@4.2.1
cargo/serde-wasm-bindgen@0.6.5

View full report

BornPsych · 2025-09-30T15:37:39Z

@cursor please review this

cursor · 2025-09-30T15:37:46Z

I've thoroughly reviewed the "Feat/new compiler" PR. I started by exploring the codebase, examining package.json files for dependency changes, and then delved into the compiler's structure, error handling, and test files. I also paid close attention to the Socket security report and the Cargo.toml files for Rust dependencies.

Here's a summary of my findings:

This PR introduces a substantial new ZK-Regex compiler with a well-structured architecture, robust error handling, and support for multiple proving frameworks (Circom and Noir). Key strengths include:

Architecture & Design: Modular structure, clear API, and support for multiple frameworks.
Error Handling: Comprehensive error taxonomy with user-friendly messages.
Developer Experience: Excellent CLI tool, automatic template regeneration, and good documentation.
Code Quality: Follows Rust best practices with clean abstractions.

However, I identified several areas for improvement and potential issues:

Security Concerns: The Socket report highlighted the addition of @zk-email/[email protected] and an upgrade of circom_tester. Using alpha versions in production is a risk.
Test Coverage: I found no unit tests in the Rust compiler code, and integration tests appear limited.
Dependencies: There's a dependency on a specific GitHub branch (comptime from feat/comptime-codegen), which could lead to instability.
Documentation: Some inline documentation is missing, and there's no architecture diagram.

My main recommendations are to:

Add comprehensive unit tests for core compiler functionality.
Replace alpha dependencies with stable versions.
Address the external GitHub branch dependency.
Add performance benchmarks.

Overall, this is a high-quality PR, but the critical issues regarding tests and dependencies should be addressed before merging.

5 tasks completed

Review PR changes for new compiler feature
Analyze new dependencies added (@types/bun, @zk-email/circuits)
Review Socket security report findings
Verify new compiler implementation and integration
Provide comprehensive review feedback

_{Learn more about Cursor Agents}

…re, preserves special groups

…oups inside of public patterns into non capture

…group conversion The convert_capturing_to_non_capturing function was naively converting all '(' to '(?:' without considering: - Escaped parentheses like \( which match literal parens - Parentheses inside character classes like [()] which are literals This caused regex semantics to break. For example: - \(foo\) would become \(?:foo\) (wrong) - [()] would become [(?:)] (matches extra chars) Fixed by tracking two states during string scan: - in_char_class: are we inside [...]? - escaped: was previous char a backslash? Only convert ( to (?:) when it's not escaped, not in a char class, and not already followed by ?. Added 5 new tests covering escaped parens, character classes, and mixed cases. All 23 tests pass. Addresses PR review feedback: #109 (comment)

Extended convert_capturing_to_non_capturing to handle both PCRE-style (?<name>...) and Rust-style (?P<name>...) named capture groups, not just bare capturing groups. This prevents named captures in private patterns from interfering with public capture group numbering. Key changes: - Detect and convert both named capture syntaxes to (?:...) - Distinguish lookbehind assertions (?<=, ?<!) from named captures - Updated test_preserve_named_groups to test_convert_pcre_named_groups with correct expected behavior - Added 7 new tests covering both named capture styles and edge cases Fixes code review finding that named captures were being preserved instead of converted, which would cause capture group numbering mismatches in generated circuits.

…ure-groups-inside-of-private-patterns Rutefig/reg 574 fix prevent capture groups inside of private patterns

coderabbitai · 2025-10-31T09:46:07Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a Rust-based zk-regex compiler, backend generators for Circom and Noir, IR and pass pipelines, wasm bindings, Circom/Noir artifacts and tests, developer tooling (scripts, hooks, CI), many generated Circom circuits/graphs/regexes, and project configuration/file removals and reorganizations.

Changes

Cohort / File(s)	Summary
Removed configs `\`.cargo/config``,` `.eslintrc.json``,` `.eslintignore``,` babel.config.js`,` .prettierrc`	Deleted legacy tooling/config files (cargo, ESLint, Babel, Prettier)
Repo config & ignores `\`Cargo.toml``,` `.gitignore``,` bunfig.toml``	Reworked workspace Cargo manifest, updated .gitignore entries, added Bun test timeout config
CI workflows `\.github/workflows/test.yaml`, `\.github/workflows/test.yml`	Removed old workflow and added new Rust/Bun-based GitHub Actions workflow (Circom install, Bun runner, tests)
Root docs & metadata `\`README.md``,` `LICENSE``	Large README rewrite and minor license text tweak
Compiler crate & packaging `compiler/Cargo.toml`, `compiler/package.json`, `compiler/README.md`	New compiler crate manifest, npm package metadata, and compiler README
Compiler library & API `compiler/src/lib.rs`, `compiler/src/driver.rs`, `compiler/src/error.rs`, `compiler/src/types.rs`, `compiler/src/wasm.rs`, `compiler/src/utils.rs`	New public compiler API, structured errors, wasm bindings, utilities for composing/generating outputs
IR: NFA & intermediate `compiler/src/ir/*` (`graph.rs`, `intermediate.rs`, `nfa.rs`, `mod.rs`)	New IR: NFANode/NFAGraph, IntermediateNFA, epsilon-elimination, path-finding, verification and serialization
Passes & errors `compiler/src/passes/*` (`builder.rs`, `error.rs`, `mod.rs`)	New builder pass converting Thompson NFA → Intermediate → NFAGraph and NFA error types mapped to CompilerError
Backend generators `compiler/src/backend/*` (`mod.rs`, `circom.rs`, `noir.rs`, `shared.rs`)	Circom and Noir code generators and shared extraction logic for transitions/captures; Circom code emitter API
CLI binary `compiler/src/bin/zk-regex.rs`	New CLI with subcommands: Decomposed, Raw, GenerateCircuitInput
Gen / scripting `circom/scripts/gen-regex.ts`, `git-hooks/pre-push`	Script to generate Circom circuits from JSON using the compiler; pre-push hook enforcing template regeneration
Circom package & configs `circom/package.json`, `circom/tsconfig.json`, `circom/bunfig.toml`, `circom/index.ts`, `circom/.gitignore`	New Circom package manifest, TS config, Bun config, entry log, and Circom-specific .gitignore
Circom regex helpers & templates `circom/circuits/regex_helpers.circom`, `circom/circuits/common/*.circom`	Added MultiOR, transition checks, CaptureSubstring and many generated Circom templates implementing DFA-based regexes (email, subject, body_hash, message_id, to/from, simple, reversed_bracket, etc.)
Circom DFA graphs & regexes `circom/circuits/common/_graph.json`, `circom/regexes/.json`	Added generated DFA JSON graphs and source regex JSON descriptors for each pattern
Circom tests `circom/circuits/tests/`, `circom/circuits/tests/.test.ts`	Added Circom test circuits and TypeScript test suites exercising generated circuits (body_hash, email_addr, subject_all, etc.)
Noir integration & samples `noir/Nargo.toml`, `noir/README.md`, `noir/common/.json`, `noir/common/sample_haystacks/*`	Added Noir project manifest, docs, pattern JSONs, and sample haystacks / circuit input traces
Repo tooling & docs `CLAUDE.md`, `circom/README.md`	Added developer documentation and Circom-specific README

Sequence Diagram(s)

%%{init: { "themeVariables": { "actorBkg": "#E8F6FF", "actorBorder": "#8CCBEA", "noteBkg": "#F6F8FA" }}}%%
sequenceDiagram
    participant User
    participant CLI as zk-regex CLI
    participant Driver
    participant Builder as NFABuilder
    participant Backend as CodeGen
    participant Filesystem

    User->>CLI: gen-from-raw(pattern, template, framework)
    CLI->>Driver: Driver::compile(pattern, config)
    Driver->>Builder: NFAGraph::build(pattern)
    Builder->>Builder: Thompson NFA → IntermediateNFA → NFAGraph
    Builder->>Driver: NFAGraph (graph JSON)
    Driver->>Backend: generate_code(nfa, template, pattern, max_bytes)
    Backend->>Filesystem: emit Circom/Noir code & graph
    Filesystem-->>User: (graph.json, template.circom / template.nr)

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Areas requiring extra attention:

compiler/src/ir/* (IntermediateNFA, epsilon elimination, path-finding)
compiler/src/backend/* (Circom/Noir emission correctness, capture handling)
compiler/src/wasm.rs (error translation & JS bindings)
circom/circuits/common/*_regex.circom (per-byte transition logic and captures)
git-hooks/pre-push (stashing, build-and-regenerate flow)

Possibly related PRs

Rutefig/reg 579 fix noir proof generation #110 — modifies epsilon-closure/capture handling in compiler IR; strongly related to intermediate/NFA capture semantics.

Suggested reviewers

BornPsych

Poem

🐇 In burrows of bytes I nimbly dig,

Patterns to circuits, neat and big.
States hop, captures snug and warm,
Proofs bloom from regex into storm —
Hooray! A compiled trail, a rabbit's jig. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Feat/new compiler' is vague and does not clearly convey the main changes in the PR. While it mentions a compiler feature, it fails to specify that this replaces the old compiler, introduces Bun/TypeScript scripts, and significantly restructures the tooling.	Consider a more descriptive title such as 'Rewrite compiler in Rust with Bun/TypeScript tooling' or 'Replace legacy compiler with new Rust-based architecture' to better communicate the scope and nature of the changes.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/new-compiler

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cursor

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

cursor · 2025-10-31T09:58:06Z

compiler/src/bin/zk-regex.rs

+                            eprintln!("Supported frameworks: circom, noir");
+                            std::process::exit(1);
+                        }
+                    };


Bug: Bug

The proving_framework argument is parsed inconsistently across commands. While Decomposed and Raw use ProvingFramework::from_str, GenerateCircuitInput manually parses it with a match statement. This duplicates logic and creates maintenance challenges, potentially leading to behavioral inconsistencies.

cursor · 2025-10-31T09:58:06Z

noir/scripts/gen-inputs.ts

+}
+
+// Run the script if this file is executed directly
+if (import.meta.url === `file://${process.argv[1]}`) {


Bug: Bug

The direct execution check for the main function in gen-inputs.ts and gen-regex.ts uses an incorrect comparison between import.meta.url (a URL string) and process.argv[1] (a file path). This mismatch means the scripts' main functions never run when executed directly.

Additional Locations (1)

noir/scripts/gen-regex.ts#L203-L204

coderabbitai

Actionable comments posted: 25

🧹 Nitpick comments (12)

circom/regexes/reversed_bracket.json (1)

5-5: Consider whether greedy matching is intentional.

The pattern <.* uses greedy matching (.*) which will match everything after the < character. If the intent is to match content only up to the next delimiter or end of string, you might want to use a more constrained pattern.

However, if the greedy behavior is intentional for your use case, this is fine as-is.

circom/circuits/common/to_all_graph.json (1)

1-50: Clarify generation and maintenance process for DFA graph file.

This 2256-line JSON file encodes a deterministic finite automaton (DFA) for the regex pattern (?:\r\n|^)to:([^\r\n]+)\r\n. The file structure suggests automated generation rather than manual authoring.

Key concerns:

Is this auto-generated? If yes, add build artifacts to .gitignore or document the generation process.

Maintenance model: If manually maintained, this is unmaintainable. If generated, provide:

Documentation on the generator tool

Instructions for regeneration when regex patterns change

Validation that generated states are correct and complete

Reachability verification: Confirm all states are reachable and accept_states are terminal.

Without clarity on ownership, future maintainers won't know how to update this when regex patterns change.

I can help document the generation process or create a verification script. Would you like me to generate a script to validate DFA reachability and accept state correctness?
circom/regexes/message_id.json (1)
1-7: Consider expanding the message-id character class to align with RFC 5322.

The PublicPattern on line 4 uses <[A-Za-z0-9=@\\.\\+_-]+>, which may be too restrictive. RFC 5322 allows additional characters in message-id local and domain parts, such as colons (:) and slashes (/). Modern message-ids may include these characters.

If stricter validation is intended, this is acceptable. Otherwise, consider expanding the pattern:
-        { "PublicPattern": ["<[A-Za-z0-9=@\\.\\+_-]+>", 128] },
+        { "PublicPattern": ["<[A-Za-z0-9=@\\.\\+_\\-:/]+>", 128] },
circom/README.md (1)

1-62: LGTM - Comprehensive documentation with minor style improvements available.

The README provides clear installation instructions, prerequisites, and troubleshooting guidance. The content is accurate and aligns well with the Circom integration introduced in this PR.

For improved markdown consistency, consider fixing the indentation issues flagged by markdownlint. Nested list items should use 2-space indentation instead of 4-space:

Lines 13-14, 19-23, 27-31, 35-40 have 4-space indentation

Line 51 uses emphasis instead of a proper heading

These are purely stylistic and can be addressed in a future cleanup if desired.

noir/common/message_id.json (1)

1-7: Pattern duplication with circom/regexes/message_id.json.

This file is identical to circom/regexes/message_id.json. While having separate regex definitions for Circom and Noir toolchains is reasonable, consider whether these could be maintained in a single source to prevent drift.

Additionally, the same character class limitation applies here: the pattern may be too restrictive for modern message-ids per RFC 5322. See the review comment on circom/regexes/message_id.json for details.
compiler/README.md (2)
13-40: Fix markdown list indentation to comply with MD007.

Lines use 4-space indentation for nested list items; markdown style guide requires 2 spaces. Update the list indentation throughout (lines 13–40, 51, 65, 81, 97) for consistency.
-    -   **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`**
-        Parses the input regex `pattern` string.
+  -   **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`**
+      Parses the input regex `pattern` string.
51-51: Replace bold text with heading syntax for "Examples" sections (MD036).

Lines 51, 65, 81, and 97 use bold text (e.g., **Example 1:**) instead of proper heading syntax. Convert to heading level (e.g., ### Example 1:) for better accessibility and document structure.
-**Example 1: Compile a simple regex to NFA**
+### Example 1: Compile a simple regex to NFA
Also applies to: 65-65, 81-81, 97-97
compiler/src/utils.rs (2)
168-196: Consider validating the file_extension parameter.

The function accepts file_extension without validation. Invalid extensions could create confusion (e.g., extensions with path separators, empty strings, or non-standard characters).

Add basic validation:
 pub fn save_outputs(
     nfa: &NFAGraph,
     circom_code: String,
     output_dir: &PathBuf,
     template_name: &str,
     file_extension: &str,
 ) -> Result<(), Box<dyn std::error::Error>> {
     validate_cli_template_name(template_name)?;
+    
+    // Validate file extension
+    if file_extension.is_empty() || file_extension.contains('/') || file_extension.contains('\\') {
+        return Err("Invalid file extension".into());
+    }

     // Create output directory if it doesn't exist
198-464: Strong test coverage! Consider adding edge case tests.

The test suite is comprehensive and well-organized. To complement the issues raised above, consider adding tests for:

Escaped brackets in character classes: r"[\[]" and r"[\]]"

Character class starting with ]: r"[]a]"

Malformed named captures: r"(?<name...)" (missing >)

Empty template name validation
compiler/src/wasm.rs (1)
155-189: Minor optimization: avoid cloning max_substring_bytes twice.

The code clones max_substring_bytes on line 161 (and 180 for Noir), but then the original is also referenced in the closure. Since it's only Option<Vec<usize>>, the clone is cheap, but the pattern is slightly redundant.

Consider restructuring to avoid the clone:
+    let num_capture_groups = nfa.num_capture_groups;
+    let max_bytes_len = max_substring_bytes.as_ref().map(|v| v.len()).unwrap_or(0);
+
     let code = match proving_framework {
         ProvingFramework::Circom => {
-            let max_bytes_clone = config.max_bytes.clone();
             crate::backend::generate_circom_code(
                 &nfa,
                 &template_name.0,
                 &raw_regex.0,
-                max_substring_bytes.clone(),
+                max_substring_bytes,
             )
             .map_err(|nfa_err| {
                 match nfa_err {
                     crate::passes::NFAError::InvalidCapture(_) => {
                         CompilerError::invalid_capture_config(
-                            nfa.num_capture_groups,
-                            max_substring_bytes.as_ref().map(|v| v.len()).unwrap_or(0),
+                            num_capture_groups,
+                            max_bytes_len,
                         )
                     }
compiler/src/lib.rs (1)
73-88: Consider exposing the optimize parameter.

The optimize flag is hardcoded to true in the CompilationConfig. Depending on use cases, users might want to disable optimization for debugging or to compare output.

Consider adding an optional parameter:
pub fn gen_from_raw(
    pattern: &str,
    max_bytes: Option<Vec<usize>>,
    template_name: &str,
    proving_framework: ProvingFramework,
    optimize: Option<bool>,
) -> CompilerResult<(NFAGraph, String)> {
    let config = CompilationConfig {
        template_name: template_name.to_string(),
        proving_framework,
        max_bytes,
        optimize: optimize.unwrap_or(true),
    };
    // ...
}
compiler/src/ir/intermediate.rs (1)
340-396: State compaction logic is correct but clones nodes.

The compact_states method correctly remaps all state indices and transitions. The node cloning on line 353 is safe but could be optimized if this becomes a performance bottleneck.

If performance is critical, consider using std::mem::take:
let mut node = std::mem::take(&mut self.nodes[old_idx]);
Though this would require making self.nodes temporarily inconsistent, so the current clone approach is safer.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fb1bda and 3d37c31.

⛔ Files ignored due to path filters (5)

.DS_Store is excluded by !**/.DS_Store
Cargo.lock is excluded by !**/*.lock
bun.lock is excluded by !**/*.lock
circom/bun.lock is excluded by !**/*.lock
packages/circom/pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml

📒 Files selected for processing (78)

.cargo/config (0 hunks)
.eslintignore (0 hunks)
.eslintrc.json (0 hunks)
.github/workflows/test.yaml (0 hunks)
.github/workflows/test.yml (1 hunks)
.gitignore (1 hunks)
.prettierrc (0 hunks)
Cargo.toml (1 hunks)
LICENSE (1 hunks)
README.md (2 hunks)
babel.config.js (0 hunks)
bunfig.toml (1 hunks)
circom/.gitignore (1 hunks)
circom/README.md (1 hunks)
circom/bunfig.toml (1 hunks)
circom/circuits/common/body_hash_regex.circom (1 hunks)
circom/circuits/common/email_addr_graph.json (1 hunks)
circom/circuits/common/email_addr_regex.circom (1 hunks)
circom/circuits/common/email_domain_graph.json (1 hunks)
circom/circuits/common/email_domain_regex.circom (1 hunks)
circom/circuits/common/from_all_graph.json (1 hunks)
circom/circuits/common/from_all_regex.circom (1 hunks)
circom/circuits/common/message_id_graph.json (1 hunks)
circom/circuits/common/message_id_regex.circom (1 hunks)
circom/circuits/common/reversed_bracket_regex.circom (1 hunks)
circom/circuits/common/simple_graph.json (1 hunks)
circom/circuits/common/simple_regex.circom (1 hunks)
circom/circuits/common/subject_all_graph.json (1 hunks)
circom/circuits/common/subject_all_regex.circom (1 hunks)
circom/circuits/common/to_all_graph.json (1 hunks)
circom/circuits/common/to_all_regex.circom (1 hunks)
circom/circuits/regex_helpers.circom (1 hunks)
circom/circuits/tests/body_hash.test.ts (1 hunks)
circom/circuits/tests/circuits/test_body_hash_regex.circom (1 hunks)
circom/index.ts (1 hunks)
circom/package.json (1 hunks)
circom/regexes/body_hash.json (1 hunks)
circom/regexes/email_addr.json (1 hunks)
circom/regexes/email_domain.json (1 hunks)
circom/regexes/from_all.json (1 hunks)
circom/regexes/message_id.json (1 hunks)
circom/regexes/reversed_bracket.json (1 hunks)
circom/regexes/simple.json (1 hunks)
circom/regexes/subject_all.json (1 hunks)
circom/regexes/to_all.json (1 hunks)
circom/scripts/gen-regex.ts (1 hunks)
circom/tsconfig.json (1 hunks)
compiler/Cargo.toml (1 hunks)
compiler/README.md (1 hunks)
compiler/package.json (1 hunks)
compiler/src/backend/circom.rs (1 hunks)
compiler/src/backend/mod.rs (1 hunks)
compiler/src/backend/noir.rs (1 hunks)
compiler/src/backend/shared.rs (1 hunks)
compiler/src/bin/zk-regex.rs (1 hunks)
compiler/src/driver.rs (1 hunks)
compiler/src/error.rs (1 hunks)
compiler/src/ir/graph.rs (1 hunks)
compiler/src/ir/intermediate.rs (1 hunks)
compiler/src/ir/mod.rs (1 hunks)
compiler/src/ir/nfa.rs (1 hunks)
compiler/src/lib.rs (1 hunks)
compiler/src/passes/builder.rs (1 hunks)
compiler/src/passes/error.rs (1 hunks)
compiler/src/passes/mod.rs (1 hunks)
compiler/src/types.rs (1 hunks)
compiler/src/utils.rs (1 hunks)
compiler/src/wasm.rs (1 hunks)
git-hooks/pre-push (1 hunks)
noir/Nargo.toml (1 hunks)
noir/README.md (1 hunks)
noir/common/body_hash.json (1 hunks)
noir/common/email_addr.json (1 hunks)
noir/common/email_domain.json (1 hunks)
noir/common/from_all.json (1 hunks)
noir/common/message_id.json (1 hunks)
noir/common/sample_haystacks/body_hash.json (1 hunks)
noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json (1 hunks)

💤 Files with no reviewable changes (6)

.github/workflows/test.yaml
.prettierrc
.eslintignore
.eslintrc.json
babel.config.js
.cargo/config

🧰 Additional context used

🧬 Code graph analysis (16)

compiler/src/passes/builder.rs (3)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (2)

nfa (191-193)

nfa (195-197)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/nfa.rs (1)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/mod.rs (2)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (2)

nfa (191-193)

nfa (195-197)

compiler/src/bin/zk-regex.rs (6)

compiler/src/lib.rs (3)

gen_circuit_inputs (160-297)

gen_from_decomposed (114-126)

gen_from_raw (73-88)

compiler/src/utils.rs (2)

save_outputs (168-196)

validate_cli_template_name (156-166)

compiler/src/types.rs (1)

from_str (51-59)

compiler/src/error.rs (2)

code (139-148)

user_message (151-224)

compiler/src/ir/nfa.rs (1)

from_json (80-84)

compiler/src/passes/error.rs (1)

from (53-166)

compiler/src/backend/shared.rs (5)

compiler/src/passes/error.rs (1)

from (53-166)

compiler/src/backend/noir.rs (8)

inputs (272-276)

inputs (283-287)

inputs (291-295)

nfa (191-193)

nfa (195-197)

start_states (350-353)

accept_states (379-382)

start (430-430)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/nfa.rs (1)

new (63-71)

circom/circuits/tests/body_hash.test.ts (1)

compiler/src/wasm.rs (1)

genCircuitInputs (198-217)

compiler/src/driver.rs (6)

compiler/src/lib.rs (1)

compile (39-41)

compiler/src/backend/circom.rs (3)

nfa (229-231)

nfa (233-235)

generate_circom_code (62-432)

compiler/src/backend/noir.rs (3)

nfa (191-193)

nfa (195-197)

generate_noir_code (30-265)

compiler/src/passes/builder.rs (1)

build (21-37)

compiler/src/error.rs (3)

code (139-148)

invalid_capture_config (278-290)

circuit_generation_failed (268-276)

compiler/src/passes/error.rs (1)

from (53-166)

circom/scripts/gen-regex.ts (6)

compiler/src/bin/zk-regex.rs (1)

main (83-225)

scripts/utils/logger.ts (2)

logger (41-41)

error (36-38)

scripts/utils/file-operations.ts (3)

fileExists (75-82)

ensureDirectory (24-30)

listFilesWithExtension (99-104)

scripts/utils/types.ts (1)

ScriptError (69-74)

scripts/utils/string-utils.ts (1)

toPascalCase (15-19)

scripts/utils/subprocess.ts (1)

executeCargo (196-267)

compiler/src/lib.rs (6)

compiler/src/driver.rs (1)

compile (59-130)

compiler/src/passes/builder.rs (1)

build (21-37)

compiler/src/backend/shared.rs (3)

from (33-44)

from (48-59)

generate_circuit_inputs (127-228)

compiler/src/passes/error.rs (1)

from (53-166)

compiler/src/utils.rs (1)

decomposed_to_composed_regex (131-154)

compiler/src/error.rs (2)

input_too_long (292-300)

no_match_found (302-310)

compiler/src/utils.rs (3)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/nfa.rs (1)

new (63-71)

compiler/src/lib.rs (1)

gen_from_decomposed (114-126)

compiler/src/ir/graph.rs (5)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (3)

nfa (191-193)

nfa (195-197)

start (430-430)

compiler/src/ir/intermediate.rs (1)

new (53-61)

compiler/src/ir/nfa.rs (2)

new (63-71)

node (96-99)

compiler/src/backend/shared.rs (2)

path (163-163)

path (164-164)

compiler/src/passes/error.rs (2)

compiler/src/wasm.rs (5)

from (21-26)

from (30-42)

from (54-56)

from (60-62)

from (66-68)

compiler/src/error.rs (2)

nfa_construction_failed (259-266)

serialization_error (312-318)

compiler/src/backend/noir.rs (1)

compiler/src/backend/shared.rs (4)

escape_regex_for_display (230-243)

generate_circuit_data (62-125)

from (33-44)

from (48-59)

compiler/src/backend/circom.rs (1)

compiler/src/backend/shared.rs (2)

escape_regex_for_display (230-243)

generate_circuit_data (62-125)

compiler/src/ir/intermediate.rs (3)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (3)

nfa (191-193)

nfa (195-197)

start (430-430)

compiler/src/ir/nfa.rs (2)

new (63-71)

node (96-99)

compiler/src/wasm.rs (6)

compiler/src/backend/circom.rs (3)

generate_circom_code (62-432)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (6)

generate_noir_code (30-265)

nfa (191-193)

nfa (195-197)

inputs (272-276)

inputs (283-287)

inputs (291-295)

compiler/src/lib.rs (2)

compile (39-41)

gen_circuit_inputs (160-297)

compiler/src/utils.rs (1)

decomposed_to_composed_regex (131-154)

compiler/src/passes/error.rs (1)

from (53-166)

compiler/src/error.rs (3)

code (139-148)

serialization_error (312-318)

invalid_capture_config (278-290)

🪛 actionlint (1.7.8)

.github/workflows/test.yml

23-23: the runner of "actions/cache@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 markdownlint-cli2 (0.18.1)

README.md

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

compiler/README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

19-19: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

28-28: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

29-29: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

30-30: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

37-37: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

38-38: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

39-39: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

51-51: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

65-65: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

81-81: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

97-97: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

circom/README.md

13-13: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

14-14: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

15-15: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

19-19: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

20-20: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

21-21: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

22-22: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

23-23: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

27-27: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

28-28: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

29-29: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

30-30: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

31-31: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

35-35: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

36-36: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

37-37: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

38-38: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

39-39: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

40-40: Unordered list indentation
Expected: 2; Actual: 4

(MD007, ul-indent)

51-51: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Socket Security: Pull Request Alerts
GitHub Check: Cursor Bugbot
GitHub Check: test
GitHub Check: test

🔇 Additional comments (34)

LICENSE (1)

674-674: URL standardization looks correct.

The reference to https://www.gnu.org/licenses/why-not-lgpl.html is the canonical FSF link for GPL v3 licensing guidance. This is a standard maintenance update with no license compliance implications.

noir/Nargo.toml (2)

1-5: Populate authors and review version constraints.

The authors field (line 4) is empty, which is a metadata quality issue. For published or team projects, author information should be populated. Additionally, the compiler_version constraint uses only a lower bound (>=1.0.0) with no upper limit, which could lead to compatibility issues if future Noir releases introduce breaking changes.

Please populate the authors field with actual author name(s), e.g., authors = ["Your Name"]

Consider adding an upper bound to compiler_version to maintain compatibility, e.g., compiler_version = ">=1.0.0, <2.0.0" (adjust based on stability guarantees)

7-8: Verify the dependency specification and git tag availability.

The sort dependency is sourced from a GitHub repository via a git tag (v0.2.3) rather than a version-pinned crates.io release. While this approach is valid in Noir, please confirm:

The git repository URL and tag v0.2.3 are accessible and stable.

This dependency sourcing approach aligns with the project's practices for managing external dependencies.

The tag points to a compatible and stable version of the library.

If the dependency is well-maintained and the tag is stable, consider adding a comment explaining why a git-sourced dependency is preferred over a registry release (if available).

git-hooks/pre-push (1)

17-30: comment

circom/regexes/simple.json (1)

1-3: LGTM!

Valid JSON structure defining a simple regex pattern for testing the compiler pipeline.

noir/common/email_addr.json (1)

1-10: LGTM!

Email validation pattern with appropriate maximum length (320 characters per RFC 3696). The pattern includes standard email characters and properly escaped special characters.

noir/README.md (1)

1-20: LGTM!

Clear and helpful documentation with an important warning about test placement. The guidance prevents potential issues with automated test generation scripts.

circom/regexes/to_all.json (1)

1-7: LGTM!

Well-structured multi-part pattern for parsing email "To:" headers. Correctly handles CRLF line endings and constrains the captured content to a reasonable maximum length.

circom/circuits/common/simple_graph.json (1)

1-71: LGTM with observation.

The NFA graph correctly represents the regex "a*b" with proper state transitions. The multiple start states (0, 2, 3) are unusual but may be intentional for the compiler's graph representation or optimization strategy.

Note: Ensure the graph consumer correctly handles multiple start states if this is a generated artifact from the compiler pipeline.

circom/regexes/from_all.json (1)

1-7: LGTM — new regex pattern data asset.

This JSON file adds a regex definition for parsing email "from:" headers. The structure and patterns are consistent with the regex data assets introduced elsewhere in the PR.

noir/common/from_all.json (1)

1-7: LGTM — matching Noir regex pattern.

This mirrors the Circom version for cross-framework consistency. Both frameworks now share the same "from:" header parsing patterns.

compiler/Cargo.toml (1)

13-23: Verify workspace dependency versions in root Cargo.toml.

All dependencies are correctly delegated to the workspace. However, ensure the versions specified in the root workspace manifest are appropriate and not using unstable sources (except where necessary).

circom/regexes/body_hash.json (1)

1-8: LGTM — DKIM body hash regex pattern.

Well-structured regex for extracting DKIM body hash values. The base64 pattern and 44-character limit align with SHA256-based DKIM signatures.

circom/tsconfig.json (1)

1-27: LGTM — well-configured TypeScript settings.

The configuration enforces strict type checking while maintaining development flexibility. Modern settings for bundler-based workflows are appropriate.

noir/common/sample_haystacks/body_hash.json (1)

1-9: LGTM - Well-structured test fixture.

The test cases appropriately cover:

Pass case 1: body hash pattern without leading CRLF (line start anchor)

Pass case 2: body hash pattern with leading CRLF

Fail case: pattern preceded by another header field ("to:"), ensuring the regex doesn't match mid-line

circom/circuits/tests/circuits/test_body_hash_regex.circom (1)

5-5: No issues identified—parameter values are correct.

The BodyHashRegex template is instantiated with maxHaystackBytes=300 and maxMatchBytes=299. These parameters are appropriately configured: the haystack (input data to search) is 300 bytes, and the maximum match length is 299 bytes. Having the match length slightly less than the haystack size is a standard and intentional configuration.

compiler/src/passes/mod.rs (1)

6-9: Dismiss this review comment — the builder module is correctly kept private.

The builder module contains no public type definitions, only impl blocks that add methods to public types (NFAGraph, IntermediateNFA) defined in the ir/ module. External code already has full access to builder functionality through public methods like NFAGraph::build(). Keeping the builder module private is appropriate; it's an internal implementation detail, not a public API surface requiring exposure.

Likely an incorrect or invalid review comment.

circom/circuits/common/subject_all_regex.circom (1)

43-148: Solid DFA wiring

The transition matrix, capture propagation, and path-length/linkage checks line up cleanly with (?:\r\n|^)subject:([^\r\n]+)\r\n; I don’t see correctness gaps here.

circom/circuits/common/message_id_regex.circom (1)

43-153: Regex automaton looks sound

State transitions and capture-group handling match the message-id pattern, and the accept-state accounting mirrors the other proven templates—looks good.

circom/circuits/common/subject_all_graph.json (1)

1-2321: Graph data matches the circuit

The DFA states, transitions, and capture annotations mirror the Circom template’s expectations, so the data asset is consistent.

circom/circuits/common/from_all_regex.circom (1)

43-167: From-header automaton looks consistent

The transitions and capture plumbing reflect the intended (from: …) header pattern, with the same path-length safeguards as the other templates. No issues spotted.

compiler/src/wasm.rs (2)

45-69: Good use of newtype pattern for type safety at WASM boundaries.

The wrapper types provide clear type distinctions at the WASM interface, which helps prevent parameter ordering mistakes.

194-242: Clean error handling in circuit input generation.

The function properly deserializes the NFA graph, generates inputs, and serializes the result with appropriate error handling at each step.

compiler/src/lib.rs (3)

24-41: Clean and well-documented API entry point.

The compile function provides a clear, simple interface for NFA construction with proper error type conversion.

90-126: Clean decomposition and delegation pattern.

The function appropriately delegates to gen_from_raw after composing the regex, avoiding code duplication.

267-294: Thorough error context enrichment for debugging.

The code carefully adds input context to errors that lack it, which will significantly help users debug issues. The pattern of augmenting base errors with additional context is a good practice.

compiler/src/ir/intermediate.rs (3)

13-61: Well-designed intermediate representation for epsilon elimination.

The structures cleanly separate epsilon and byte transitions, and the new method properly initializes all fields.

111-145: Clean conversion pipeline from intermediate to final NFA.

The finalize method correctly orchestrates epsilon elimination and final validation, with clear separation of concerns.

184-201: Verify capture group merging logic during epsilon elimination—no automated tests found.

The concern in the original review is valid: no tests were found that specifically cover epsilon transitions combined with capture groups. The capture merging logic (adding start captures from source closure and end captures from target closure) appears algorithmically sound, but is complex and lacks dedicated test coverage.

Tests that should exist but were not found:

Epsilon transitions with capture groups

Nested capture groups across epsilon transitions

Multiple chained epsilon transitions with captures

Start/end captures at different positions

compiler/src/error.rs (4)

9-74: Well-organized error code system with clear categorization.

The error codes are logically grouped into ranges, making it easy to identify error categories at a glance. The Display implementation is complete.

76-135: Excellent error design with rich context and actionable suggestions.

The error variants provide comprehensive context for debugging while keeping related information grouped logically. The consistent inclusion of suggestion fields is particularly user-friendly.

137-235: Clean implementation of error utility methods.

The methods provide clear access to error codes and well-formatted user messages. The is_recoverable() classification is useful for error handling strategies.

237-319: Comprehensive and consistent convenience constructors.

The factory methods provide easy-to-use interfaces for creating common errors with appropriate codes and helpful suggestions. All constructors properly populate their respective error variants.

compiler/src/utils.rs (1)

131-154: No validation needed in decomposed_to_composed_regex—downstream backend code already validates max_len values.

The backend generators validate that max_substring_bytes contains no zero values and return InvalidCapture errors if validation fails. The decomposed_to_composed_regex function correctly collects max_len values; validation happens in the appropriate downstream layer.

Likely an incorrect or invalid review comment.

coderabbitai · 2025-10-31T09:59:23Z

.github/workflows/test.yml

+        toolchain: stable
+
+    - name: Cache Cargo
+      uses: actions/cache@v3


⚠️ Potential issue | 🟠 Major

Update actions/cache to v4 for GitHub Actions compatibility.

The actions/cache@v3 action is outdated per actionlint. Update to v4 to ensure compatibility with current GitHub Actions standards and to receive security updates.

- uses: actions/cache@v3 + uses: actions/cache@v4

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

uses: actions/cache@v3

uses: actions/cache@v4

🧰 Tools

🪛 actionlint (1.7.8)

23-23: the runner of "actions/cache@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🤖 Prompt for AI Agents

In .github/workflows/test.yml around line 23, the workflow references actions/cache@v3 which is outdated; update the action reference to actions/cache@v4 to satisfy actionlint and gain compatibility/security updates. Modify the uses entry to point to actions/cache@v4 (no other changes required), commit the workflow, and verify the workflow lints and runs successfully in CI.

coderabbitai · 2025-10-31T09:59:23Z

Cargo.toml

-# mail-parser = { version = "0.8", git = "https://github.com/stalwartlabs//mail-parser", tag = "0.8.0" }
+[workspace.dependencies]
+clap = "4.2.1"
+comptime = { git = "https://github.com/jp4g/sparse_array", branch = "feat/comptime-codegen" }


⚠️ Potential issue | 🔴 Critical

CRITICAL: Replace feature-branch git dependency with stable crate version.

The comptime dependency uses an unstable feature branch from a fork, which creates a production risk. The official comptime crate version 1.0.0 is available on crates.io. Unless the fork contains essential unreleased features, migrate to the stable version:

-comptime = { git = "https://github.com/jp4g/sparse_array", branch = "feat/comptime-codegen" } +comptime = "1.0.0"

If fork-specific functionality is required, open an issue with the upstream maintainers or document the dependency override rationale in comments.

Also applies to: 9-9

🤖 Prompt for AI Agents

In Cargo.toml around lines 9 to 9, the comptime dependency is pointing at a feature branch of a forked repo; replace that git dependency with the stable crates.io release by changing the dependency to use version = "1.0.0" (remove the git/branch fields), run cargo update / cargo build to refresh Cargo.lock, and search the repo for any other occurrences of the git override to update or document; if fork-specific behavior is truly required, add a short comment explaining the rationale and open an upstream issue instead of keeping the git dependency.

coderabbitai · 2025-10-31T09:59:24Z

circom/.gitignore

+
+# logs
+logs
+_.log


⚠️ Potential issue | 🟡 Minor

Fix typo in log file pattern.

Line 15 has _.log which will only match a file literally named "_.log". This should be *.log to match all log files with the .log extension.

Apply this diff:

-_.log +*.log

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_.log

*.log

🤖 Prompt for AI Agents

In circom/.gitignore around line 15, the entry currently reads "_.log" which only matches a literal file named "_.log"; change this to "*.log" so the gitignore will match all files ending with the .log extension; update the line to use an asterisk prefix (no other changes needed).

coderabbitai · 2025-10-31T09:59:24Z

circom/circuits/common/email_addr_graph.json

@@ -0,0 +1,943 @@
+{
+  "regex": "([A-Za-z0-9!#$%&'*+=?\\-\\^_`{|}~./@]+@[A-Za-z0-9.\\-]+)",


⚠️ Potential issue | 🟠 Major

Remove extra ‘@’ from the local-part class.

The generated regex on Line 2 still includes @ in the local-part character class, so this graph will accept inputs like foo@[email protected]: the first @ keeps you in the local-part loop, and the second @ transitions to the domain. That violates the single-@ requirement for mailboxes and will yield false positives. Please drop @ from the first character class (and regen this graph) so only the delimiter @ switches to the domain portion.

🤖 Prompt for AI Agents

In circom/circuits/common/email_addr_graph.json around line 2 the local-part character class incorrectly includes '@', allowing multiple '@' characters (e.g., foo@[email protected]); remove the '@' from the first character class so only the single delimiter '@' separates local-part and domain, update the regex accordingly, and regenerate the graph artifact so the change is reflected in the generated file.

coderabbitai · 2025-10-31T09:59:24Z

circom/circuits/common/email_addr_regex.circom

+    signal isWithinPathLengthMinusOne[maxMatchBytes-2];
+    signal isTransitionLinked[maxMatchBytes];
+
+    signal isValidStartState;
+
+    signal reachedAcceptState[maxMatchBytes];
+
+    component isValidTraversal[maxMatchBytes];
+
+    // Select the haystack from the input
+    signal haystack[maxMatchBytes] <== SelectSubArray(maxHaystackBytes, maxMatchBytes)(inHaystack, matchStart, matchLength);
+
+    // Check if the first state in the haystack is a valid start state
+    isValidStartState <== IsEqual()([startStates[0], currStates[0]]);
+
+    for (var i = 0; i < maxMatchBytes; i++) {
+        isWithinPathLength[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength]);
+
+        // Check if the traversal is a valid path
+        if (i < maxMatchBytes-2) {
+            isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]);
+            isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]);
+            isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i];
+        }


⚠️ Potential issue | 🔴 Critical

Enforce transition linkage for the final step

When matchLength == maxMatchBytes, the constraint block at Line 53 never executes for i = maxMatchBytes - 2, so nextStates[maxMatchBytes-2] is never forced to equal currStates[maxMatchBytes-1]. An attacker can supply an inconsistent final transition and still satisfy all constraints, breaking soundness for full-length matches. Please extend the guard/value array so the last link is checked as well. Example fix:

- signal isWithinPathLengthMinusOne[maxMatchBytes-2]; + signal isWithinPathLengthMinusOne[maxMatchBytes-1]; @@ - if (i < maxMatchBytes-2) { + if (i < maxMatchBytes-1) { isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]); isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]); isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i]; }

🤖 Prompt for AI Agents

In circom/circuits/common/email_addr_regex.circom around lines 34 to 57, the arrays and loop guard for checking transition linkage stop at maxMatchBytes-2 so the final link (between nextStates[maxMatchBytes-2] and currStates[maxMatchBytes-1]) is not enforced when matchLength == maxMatchBytes; extend the arrays and adjust the loop condition so the last link is validated: declare isWithinPathLengthMinusOne and isTransitionLinked with size maxMatchBytes-1 (instead of maxMatchBytes-2 and maxMatchBytes), change the in-loop guard to if (i < maxMatchBytes-1) and compute isWithinPathLengthMinusOne[i] with LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]) then set isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]) and constrain isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i] so the final transition is enforced for full-length matches.

coderabbitai · 2025-10-31T09:59:28Z

compiler/src/utils.rs

+            } else if i + 2 < chars.len() && chars[i + 2] == '<' {
+                // Could be: (?<=...) positive lookbehind, or (?<!...) negative lookbehind, or (?<name>...) PCRE named capture
+                if i + 3 < chars.len() && (chars[i + 3] == '=' || chars[i + 3] == '!') {
+                    // Lookbehind assertion: (?<=...) or (?<!...)
+                    // These are special groups, preserve as-is
+                    result.push(ch);
+                } else {
+                    // PCRE named capture: (?<name>...)
+                    // Convert to non-capturing and skip the name
+                    result.push_str("(?:");
+                    i += 2; // Skip '?' and '<'
+                    // Skip until we find the closing '>'
+                    while i + 1 < chars.len() && chars[i + 1] != '>' {
+                        i += 1;
+                    }
+                    if i + 1 < chars.len() && chars[i + 1] == '>' {
+                        i += 1; // Skip the '>'
+                    }
+                }
+            } else if i + 3 < chars.len() && chars[i + 2] == 'P' && chars[i + 3] == '<' {
+                // Rust named capture: (?P<name>...)
+                // Convert to non-capturing and skip the name
+                result.push_str("(?:");
+                i += 3; // Skip '?', 'P', and '<'
+                // Skip until we find the closing '>'
+                while i + 1 < chars.len() && chars[i + 1] != '>' {
+                    i += 1;
+                }
+                if i + 1 < chars.len() && chars[i + 1] == '>' {
+                    i += 1; // Skip the '>'
+                }


⚠️ Potential issue | 🟠 Major

Validate named capture group syntax to prevent silent pattern corruption.

The code skips to the closing > for named captures but doesn't validate that it exists. Malformed patterns like (?<name... without a closing > will scan to the end of the pattern, potentially corrupting the output.

Consider adding validation:

// Skip until we find the closing '>' while i + 1 < chars.len() && chars[i + 1] != '>' { i += 1; } if i + 1 < chars.len() && chars[i + 1] == '>' { i += 1; // Skip the '>' + } else { + // Malformed named capture - no closing '>' + return format!("(?:MALFORMED_CAPTURE)"); }

Or better yet, document that this function expects well-formed regex patterns and should be called after regex validation.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In compiler/src/utils.rs around lines 68 to 98, the named-capture skipping logic does not verify a closing '>' and will run to the end of the input on malformed patterns; update the logic to detect whether the loop found a '>' and handle the failure explicitly: after the loop check if the terminating '>' was not found and either (a) return an error/Result indicating a malformed regex so callers can validate before calling, or (b) restore/emit the original characters (do not drop them) and stop trying to skip the name (to avoid silently corrupting output); implement one of these behaviors consistently for both PCRE (?<name>) and Rust (?P<name>) branches and add a brief comment explaining the chosen failure mode.

coderabbitai · 2025-10-31T09:59:28Z

compiler/src/utils.rs

+pub fn validate_cli_template_name(name: &str) -> Result<String, String> {
+    // Convert to PascalCase to normalize
+    let pascal_name = name.to_pascal_case();
+
+    // Verify it's valid PascalCase
+    if pascal_name != name {
+        return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());
+    }
+
+    Ok(name.to_string())
+}


⚠️ Potential issue | 🟡 Minor

Validate against empty template names.

The function doesn't check for empty strings. An empty template name would pass validation since "".to_pascal_case() == "".

Add an early check:

pub fn validate_cli_template_name(name: &str) -> Result<String, String> { + if name.is_empty() { + return Err("Template name cannot be empty".into()); + } + // Convert to PascalCase to normalize let pascal_name = name.to_pascal_case();

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn validate_cli_template_name(name: &str) -> Result<String, String> {

// Convert to PascalCase to normalize

let pascal_name = name.to_pascal_case();

// Verify it's valid PascalCase

if pascal_name != name {

return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());

}

Ok(name.to_string())

}

pub fn validate_cli_template_name(name: &str) -> Result<String, String> {

if name.is_empty() {

return Err("Template name cannot be empty".into());

}

// Convert to PascalCase to normalize

let pascal_name = name.to_pascal_case();

// Verify it's valid PascalCase

if pascal_name != name {

return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into());

}

Ok(name.to_string())

}

🤖 Prompt for AI Agents

In compiler/src/utils.rs around lines 156 to 166, the validate_cli_template_name function does not reject empty strings (""), which currently pass since "".to_pascal_case() == ""; add an early check at the top to return Err("Template name cannot be empty" or similar) if name.trim().is_empty(), then proceed with the existing PascalCase normalization and comparison; ensure the error message is consistent with other validation errors.

coderabbitai · 2025-10-31T09:59:28Z

compiler/src/wasm.rs

+impl From<WasmError> for JsValue {
+    fn from(err: WasmError) -> Self {
+        // Return a structured error message that JavaScript can parse
+        match err {
+            WasmError::CompilerError { code, message } => {
+                let error_json = format!(
+                    r#"{{"type": "CompilerError", "code": "{}", "message": "{}"}}"#,
+                    code,
+                    message.replace('"', "\\\"").replace('\n', "\\n")
+                );
+                JsValue::from_str(&error_json)
+            }
+        }
+    }


⚠️ Potential issue | 🟠 Major

Use proper JSON serialization instead of manual string escaping.

The manual JSON construction with string replacement is fragile and doesn't handle all special characters (e.g., \r, \t, \, control characters). This could lead to invalid JSON or injection issues.

Use serde_json for safe serialization:

+use serde_json::json; + impl From<WasmError> for JsValue { fn from(err: WasmError) -> Self { - // Return a structured error message that JavaScript can parse match err { WasmError::CompilerError { code, message } => { - let error_json = format!( - r#"{{"type": "CompilerError", "code": "{}", "message": "{}"}}"#, - code, - message.replace('"', "\\\"").replace('\n', "\\n") - ); - JsValue::from_str(&error_json) + let error_json = json!({ + "type": "CompilerError", + "code": code, + "message": message + }); + JsValue::from_str(&error_json.to_string()) } } }

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents

In compiler/src/wasm.rs around lines 29 to 42, the From<WasmError> impl manually constructs JSON with string replacements which is unsafe for many characters; replace the manual formatting with proper serde_json serialization: derive or build a small serializable struct or use serde_json::json! to produce a valid JSON string, call serde_json::to_string (or to_string_pretty) and pass the result to JsValue::from_str, and handle serialization errors (e.g., log and return a generic error JSON) and add/import serde_json in Cargo.toml and at the top of the file.

coderabbitai · 2025-10-31T09:59:28Z

noir/common/email_domain.json

+{
+    "parts": [
+        { "Pattern": "[A-Za-z0-9!#$%&'*+=?\\-\\^_`{|}~./]+@" },
+        { "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }


⚠️ Potential issue | 🟠 Major

Remove '@' from domain character class.

The PublicPattern includes @ in the character class [A-Za-z0-9.\\-@]+, but email domains should not contain @ symbols. Since Part 1 already ends with @, allowing it in the domain part would permit malformed emails like user@[email protected].

Apply this fix:

- { "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] } + { "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

{ "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }

{ "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }

🤖 Prompt for AI Agents

In noir/common/email_domain.json around line 4, the PublicPattern character class wrongly includes '@' allowing malformed domains; remove the '@' from the class so the domain regex does not accept '@' characters, and ensure the hyphen is either escaped or placed last in the class (e.g. [A-Za-z0-9.\-]+ or [A-Za-z0-9.-]+) while keeping the same length limit (128).

coderabbitai · 2025-10-31T09:59:28Z

README.md


- [ ] support character class regex grammar (so as to simplify the regular expressions)
- [ ] better compatibility with regex grammar (will need a set of different tests to assert the compatibility between circuit and regex in languages) -->
+## License

-## Assumptions
-Some email providers put not only the sender's email address but also their username to the From field.
-ALthough its concrete formats differ among email providers, our FromAddrRegex template assumes that the email address appears at the end of the From field.
-If this assumption does not hold, i.e., the username appears after the email address, an adversary can output an arbitrary email address from that template by including a dummy email address in the username.
+This project is licensed under the [Specify License Here - e.g., MIT License or Apache 2.0].


⚠️ Potential issue | 🟡 Minor

Fix license placeholder.

The README now ends with “Specify License Here…”, which contradicts the actual MIT license declared elsewhere. Please replace the placeholder with the real license text or link so downstream users get the correct compliance information.

🤖 Prompt for AI Agents

In README.md around lines 229 to 232, the license section still contains a placeholder; replace "Specify License Here - e.g., MIT License or Apache 2.0" with the actual MIT license reference by updating the heading to "License" and either inserting the short MIT license blurb plus year/owner or linking to the full text (e.g., "This project is licensed under the MIT License - see LICENSE file or https://opensource.org/licenses/MIT"), ensuring it matches the project's LICENSE file.

…aystack and regex definition json

…roup and rerun regex, input and test generation

… and tests - fails input that enters in conditional path

The epsilon closure computation was incorrectly tracking capture groups globally, causing captures from unrelated conditional branches to leak into other paths. This resulted in incorrect capture behavior when conditionals preceded capture groups in regex patterns. Changes: - Modified EpsilonClosure to use per_state_captures (BTreeMap) instead of global captures set - Updated capture tracking to maintain separate capture sets for each epsilon path - Fixed start state logic to check for start captures across all epsilon paths - Regenerated Circom and Noir templates with corrected logic - Updated test inputs to reflect correct capture behavior - Added email_addr test case demonstrating conditional before capture group

…eneration Rutefig/reg 579 fix noir proof generation

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

circom/circuits/common/email_addr_regex.circom (1)

34-61: Enforce the final transition link for full-length matches

When matchLength == maxMatchBytes, the loop never enters the guard with i = maxMatchBytes - 2, so nextStates[maxMatchBytes-2] is never constrained to equal currStates[maxMatchBytes-1]. That lets an inconsistent final hop satisfy all constraints, breaking soundness. Please extend the helper arrays and guard so the last link is also enforced.
-    signal isWithinPathLengthMinusOne[maxMatchBytes-2];
-    signal isTransitionLinked[maxMatchBytes];
+    signal isWithinPathLengthMinusOne[maxMatchBytes-1];
+    signal isTransitionLinked[maxMatchBytes-1];
@@
-        if (i < maxMatchBytes-2) {
+        if (i < maxMatchBytes-1) {
             isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]);
             isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]);
             isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i];
         }

compiler/src/ir/intermediate.rs (1)

63-109: Accept states validation is still missing.

The past review comment correctly identified that accept_states indices are not validated to be within bounds. While start_states, byte transition targets, and epsilon transition targets are validated, an out-of-bounds accept state could cause issues later.

Apply the suggested fix from the previous review:

         // Validate transition targets are within bounds
         for (state_idx, node) in self.nodes.iter().enumerate() {
             // Check byte transitions
             for destinations in node.byte_transitions.values() {
                 for &dest in destinations {
                     if dest >= self.nodes.len() {
                         return Err(NFAError::InvalidTransition(format!(
                             "Byte transition from state {state_idx} to invalid state {dest}"
                         )));
                     }
                 }
             }

             // Check epsilon transitions
             for &dest in &node.epsilon_transitions {
                 if dest >= self.nodes.len() {
                     return Err(NFAError::InvalidTransition(format!(
                         "Epsilon transition from state {state_idx} to invalid state {dest}"
                     )));
                 }
             }
         }
+
+        // Validate accept states are within bounds
+        for &accept in &self.accept_states {
+            if accept >= self.nodes.len() {
+                return Err(NFAError::InvalidStateId(format!(
+                    "Accept state {accept} is out of bounds"
+                )));
+            }
+        }

         Ok(())

🧹 Nitpick comments (1)

compiler/src/ir/intermediate.rs (1)
432-438: Remove unused PathInfo struct (lines 432-438).

Verification confirms this struct is never referenced elsewhere in the codebase. Remove the definition to eliminate dead code:
-/// Information about captures encountered on a specific epsilon path
-#[derive(Debug, Clone)]
-struct PathInfo {
-    /// The destination state of this path
-    target_state: usize,
-    /// Captures encountered on epsilon transitions along this path
-    captures: BTreeSet<(usize, bool)>,
-}

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d37c31 and 8f9f916.

📒 Files selected for processing (11)

.gitignore (1 hunks)
CLAUDE.md (1 hunks)
circom/circuits/common/email_addr_regex.circom (1 hunks)
circom/circuits/tests/circuits/test_email_addr_regex.circom (1 hunks)
circom/circuits/tests/circuits/test_subject_all_regex.circom (1 hunks)
circom/circuits/tests/email_addr.test.ts (1 hunks)
circom/circuits/tests/subject_all.test.ts (1 hunks)
circom/regexes/email_addr.json (1 hunks)
compiler/src/ir/intermediate.rs (1 hunks)
noir/common/email_addr.json (1 hunks)
noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json (1 hunks)

✅ Files skipped from review due to trivial changes (1)

circom/circuits/tests/circuits/test_subject_all_regex.circom

🚧 Files skipped from review as they are similar to previous changes (3)

noir/common/email_addr.json
noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json
circom/regexes/email_addr.json

🧰 Additional context used

🧬 Code graph analysis (3)

circom/circuits/tests/subject_all.test.ts (1)

compiler/src/wasm.rs (1)

genCircuitInputs (198-217)

circom/circuits/tests/email_addr.test.ts (1)

compiler/src/wasm.rs (1)

genCircuitInputs (198-217)

compiler/src/ir/intermediate.rs (5)

compiler/src/backend/circom.rs (2)

nfa (229-231)

nfa (233-235)

compiler/src/backend/noir.rs (3)

nfa (191-193)

nfa (195-197)

start (430-430)

compiler/src/ir/nfa.rs (2)

new (63-71)

node (96-99)

compiler/src/lib.rs (1)

compile (39-41)

compiler/src/backend/shared.rs (2)

path (163-163)

path (164-164)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test
GitHub Check: test

🔇 Additional comments (11)

.gitignore (2)

1-19: Verify that inputs.txt and nfa.json are always generated artifacts and never committed source files.

Search of the codebase found no references to either inputs.txt or nfa.json in code, scripts, or documentation, and neither file exists in the repository. While this is consistent with both being generated artifacts, it is inconclusive—these entries may be generated by external tools, planned for future use, or no longer needed. Confirm these files are artifacts generated during normal development/build workflows before finalizing the gitignore changes.

11-16: I need to check if there are actually any .js files in the scripts/ directories that would be affected by these patterns, and examine the build configuration:

I need to search for actual JavaScript files and examine the build configuration:

Based on my verification, I can now provide the rewritten review comment. The analysis reveals that while the patterns are appropriate for ignoring compiled TypeScript artifacts, there's a subtle scope issue with how the patterns work.

Verify nested directory handling—pattern scope may be too narrow.

The patterns **/scripts/*.js and **/scripts/*.d.ts only match files directly in scripts/ directories, not in nested subdirectories like scripts/utils/ or scripts/__tests__/. When there is a separator at the beginning or middle of the pattern, then the pattern is relative to the directory level of the .gitignore file itself. Since compiled output from .ts files typically goes to dist/ or similar separate directories rather than inline with source, the current patterns are appropriate for the legacy/temporary compiled artifacts. However, if the build process ever emits .js files within nested scripts/ subdirectories, those would not be caught. Confirm that your TypeScript build configuration outputs to a directory outside the scripts/ tree (e.g., dist/), or extend the patterns to **/scripts/**/*.js if needed.

CLAUDE.md (6)

210-239: API signatures verified—no issues found.

All four documented Rust API entry points (compile, gen_from_raw, gen_from_decomposed, gen_circuit_inputs) have been verified against compiler/src/lib.rs. Each function signature, including parameter types and return types, matches the documentation in CLAUDE.md exactly. The CompilerResult<T> type alias is also correctly defined. Documentation is accurate.

39-51: All documented build commands align with configuration—no issues found.

Verification confirms that all build commands referenced in CLAUDE.md are properly configured:

bun run build and bun run build-release exist in package.json and correctly delegate to the compiler directory

cargo build and cargo build --release are executed within the build scripts

wasm-pack build --target nodejs is configured in compiler/package.json

wasm-pack build --target web is also configured as an additional build target

Cargo workspace properly includes compiler as a member

The documentation accurately reflects the build system configuration.

127-151: Pre-push hook implementation verified—no issues found.

The documentation in CLAUDE.md is accurate. Verification confirms:

Hook file exists at git-hooks/pre-push with full implementation

Postinstall script correctly installs via install-hooks command

Template regeneration commands exist and are properly integrated

Hook scans commits for compiler/src/ changes, regenerates templates, validates synchronization, and blocks push when needed

All described error recovery steps and emergency bypass are correct

153-161: The original review comment is incorrect—circom implementation of this workflow does not exist.

Verification confirms:

gen-inputs:noir script exists in scripts/package.json ✓

gen-inputs:circom script does not exist (only gen-regex:circom is defined)

noir/common/sample_haystacks/ directory exists with JSON source files ✓

circom/common/ directory does not exist

circom/scripts/ contains only gen-regex.ts, not gen-inputs.ts

The documentation references a test generation workflow for circom that is not implemented. Only the noir implementation has the complete input generation pipeline. Either remove the circom references from the documentation or implement the missing circom workflow.

Likely an incorrect or invalid review comment.

56-67: Disregard this review comment—the premise is incorrect.

The PR does not remove legacy Circom tests. The documentation in CLAUDE.md accurately reflects the current state: test:circom runs the existing Circom circuit tests (body_hash, email_addr, subject_all) that continue to exist in circom/circuits/tests/. No circom test files were modified or removed in this PR.

Likely an incorrect or invalid review comment.

72-80: ****

The documentation is accurate. gen-inputs:circom does not exist in package.json because input generation for Circom is not part of the new TypeScript/Bun architecture. The Noir framework has separate template generation (gen-regex:noir) and input generation (gen-inputs:noir) scripts, while Circom only has template generation (gen-regex:circom). CLAUDE.md correctly reflects this architectural choice by documenting only gen-inputs:noir. Listing a non-existent command would be inaccurate.

Likely an incorrect or invalid review comment.

compiler/src/ir/intermediate.rs (3)

148-264: Well-designed epsilon elimination algorithm.

The epsilon elimination implementation is sophisticated and carefully preserves capture group semantics. Notable strengths:

Lines 189-196: The "CRITICAL FIX" correctly handles per-path captures for epsilon transitions

Lines 221-243: Thoughtful handling of start states to avoid bypassing start captures

Lines 199-207: Proper accumulation of end captures from epsilon closures

The algorithm correctly handles the complex interaction between epsilon transitions and capture groups, which is essential for regex patterns with optional groups.

440-496: Excellent regression test coverage.

The test comprehensively validates the epsilon elimination fix for optional patterns before capture groups. It correctly:

Tests both paths through the regex (with and without the optional prefix)

Validates rejection of non-matching inputs

Specifically checks for exactly one capture start event, preventing the duplicate marker bug

This is a high-quality regression test that will prevent future breakage of the epsilon elimination logic.

298-303: No issues found. The review comment is incorrect.

The capture_groups map only contains entries for epsilon transition targets by design. This is evident from add_capture_transition (builder.rs lines 209-230), which is the only place where capture_groups entries are created. In that method, a target is inserted into both epsilon_transitions and capture_groups together—there is no separate storage of byte transition captures in capture_groups.

Therefore, the DFS function at lines 298-303 correctly iterates over all capture_groups entries because all such entries correspond to epsilon targets. The comment "Add captures from this state's epsilon transitions" is accurate.

coderabbitai · 2025-11-08T22:55:08Z

CLAUDE.md

+## Error Handling
+
+Structured error system with error codes:
+- **E1001**: Invalid regex syntax
+- **E1002**: Unsupported regex features
+- **E3002**: Invalid capture group configuration
+- **E5001**: Invalid configuration parameters
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify error codes and CompilerError enum structure.

The documented error codes (E1001, E1002, E3002, E5001) should be verified against the actual CompilerError enum implementation. The gap between E3002 and E5001 suggests either:

Additional error codes are missing from the documentation

There is intentional error code grouping/categorization

Ensure documentation is complete and matches implementation.

🏁 Script executed:

#!/bin/bash # Description: Extract error codes from CompilerError implementation echo "=== CompilerError enum and error codes ===" rg 'enum CompilerError|pub enum CompilerError' compiler/src/ -A 30 -U echo -e "\n=== Error code constants or variants ===" rg 'E\d{4}|"E[0-9]{4}"' compiler/src/error.rs 2>/dev/null || \ rg 'E\d{4}|"E[0-9]{4}"' compiler/src/ --type rust

Length of output: 3767

Update documentation to reflect all implemented error codes and their categories.

The documentation lists only 4 error codes (E1001, E1002, E3002, E5001), but the actual ErrorCode enum in compiler/src/error.rs contains 23 codes organized into 6 categories:

E1xxx (Regex validation): E1001–E1004

E2xxx (NFA construction): E2001–E2004

E3xxx (Circuit generation): E3001–E3004

E4xxx (Input processing): E4001–E4004

E5xxx (Configuration): E5001–E5003

E9xxx (Serialization/Internal): E9001–E9003

Update the CLAUDE.md documentation to include all codes and their intended purposes, matching the implementation.

🤖 Prompt for AI Agents

In CLAUDE.md around lines 245 to 252, the Error Handling section currently lists only four error codes; update this section to enumerate all 23 implemented codes and their categories to match compiler/src/error.rs. Replace the short list with the six categories (E1xxx Regex validation: E1001–E1004 with brief purpose for each, E2xxx NFA construction: E2001–E2004, E3xxx Circuit generation: E3001–E3004, E4xxx Input processing: E4001–E4004, E5xxx Configuration: E5001–E5003, E9xxx Serialization/Internal: E9001–E9003) and add a one-line description for each code or consolidated description per category reflecting the enum comments/intent.

shreyas-londhe and others added 16 commits March 3, 2025 07:56

fix: 3 char issue

e861888

feat: v2 fresh start

02150d4

feat: remove epsilon states (wip)

6ec7675

fix: epsilon-nfa -> nfa

e309004

feat: circom codegen

6667e88

fix: conditional capture group logic

7613e48

feat: added substring extraction logic

a61a224

feat: basic case working

09c7cd8

noir compiler for regex matching (no substring capture

499db3e

feat: basic compiler functions (untested)

f45751c

update to work with bin and new compiler

f755d92

split functionality, add packed capture group data to lut

9bb586f

fix: capture group logic

2901ec1

feat: added function to generate inputs for circuit

7930a41

feat: basic e2e working

20d465a

feat: added wasm functions

8a24437

shreyas-londhe and others added 13 commits April 2, 2025 18:20

feat: minor changes

2121a41

feat: added sample circom circuits

20ee1d3

merge

d9d839b

input gen works for non-capture

0738447

feat: improve epsilon removal

7f7a6e9

optimized substring capture single case

e39ea6c

common utilities moved to common

90c2027

codegen for capture groups

5e1d6cb

codegen works with non-captures and captures

29ab2b6

constrain capture start end masks

0735934

fix: epsilon removal

49573cf

Merge branch 'feat/new-compiler' into feat/new-compiler-noir

3983096

updated regex compiler

c7cc953

BornPsych added 9 commits September 30, 2025 17:12

fix: changes in pre-push file for preserving tests

c1ac7ad

chore: regenerate templates with updated import paths

a8a670c

fix: added code to preserve mannual code

5124dff

chore: increased the test timeout for CI tests

fa0b546

chore: removed the unsupported bun deps

7de226d

fix: CI for body hash test

9fc345a

fix: CI error due to wrong version of bun

a54aced

fix: add root bunfig.toml with 30s timeout for all tests

2d5cd24

fix: pin Bun version to 1.2.22 in CI workflow

09b8a86

rutefig added 7 commits October 30, 2025 09:59

fix: rust compilation errors - unstable expressions

89c9442

chore: convert bare capture groups on private patterns into non captu…

181ddd8

…re, preserves special groups

fix: doctest trying to access to private function

12b8a6f

chore: added test to check if gen_from_decomposed converts capture gr…

bb530c5

…oups inside of public patterns into non capture

Merge pull request #109 from zkemail/rutefig/reg-574-fix-prevent-capt…

3d37c31

…ure-groups-inside-of-private-patterns Rutefig/reg 574 fix prevent capture groups inside of private patterns

cursor bot reviewed Oct 31, 2025

View reviewed changes

coderabbitai bot reviewed Oct 31, 2025

View reviewed changes

rutefig added 8 commits November 7, 2025 16:17

fix: do not preserve any test - all tests are dependant from sample h…

2757917

…aystack and regex definition json

fix: console escaping

b7d500c

chore(noir): changed email_addr regex to conditional before capture g…

ddbf452

…roup and rerun regex, input and test generation

chore(circom): add subject_all tests

52ac646

chore(circom): email_addr regex with conditional before capture group…

bffa6ad

… and tests - fails input that enters in conditional path

docs: added claude.md file

2bb2bc7

Merge pull request #110 from zkemail/rutefig/reg-579-fix-noir-proof-g…

8f9f916

…eneration Rutefig/reg 579 fix noir proof generation

coderabbitai bot reviewed Nov 8, 2025

View reviewed changes

		@@ -0,0 +1,943 @@
		{
		"regex": "([A-Za-z0-9!#$%&'*+=?\\-\\^_`{\|}~./@]+@[A-Za-z0-9.\\-]+)",

	{ "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }
	{ "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }

Feat/new compiler #92

Are you sure you want to change the base?

Feat/new compiler #92

Uh oh!

Conversation

shreyas-londhe commented Mar 28, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

socket-security bot commented Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BornPsych commented Sep 30, 2025

Uh oh!

cursor bot commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

cursor bot Oct 31, 2025

Choose a reason for hiding this comment

Bug: Bug

Uh oh!

cursor bot Oct 31, 2025

Choose a reason for hiding this comment

Bug: Bug

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 8, 2025

Choose a reason for hiding this comment

Uh oh!

shreyas-londhe commented Mar 28, 2025 •

edited by coderabbitai bot

Loading

socket-security bot commented Mar 31, 2025 •

edited

Loading

cursor bot commented Sep 30, 2025 •

edited

Loading

coderabbitai bot commented Oct 31, 2025 •

edited

Loading