-
Notifications
You must be signed in to change notification settings - Fork 42
Feat/new compiler #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Review the following changes in direct dependencies. Learn more about Socket for GitHub. |
|
@cursor please review this |
|
I've thoroughly reviewed the "Feat/new compiler" PR. I started by exploring the codebase, examining Here's a summary of my findings: This PR introduces a substantial new ZK-Regex compiler with a well-structured architecture, robust error handling, and support for multiple proving frameworks (Circom and Noir). Key strengths include:
However, I identified several areas for improvement and potential issues:
My main recommendations are to:
Overall, this is a high-quality PR, but the critical issues regarding tests and dependencies should be addressed before merging. 5 tasks completed
Learn more about Cursor Agents |
…re, preserves special groups
…oups inside of public patterns into non capture
…group conversion
The convert_capturing_to_non_capturing function was naively converting
all '(' to '(?:' without considering:
- Escaped parentheses like \( which match literal parens
- Parentheses inside character classes like [()] which are literals
This caused regex semantics to break. For example:
- \(foo\) would become \(?:foo\) (wrong)
- [()] would become [(?:)] (matches extra chars)
Fixed by tracking two states during string scan:
- in_char_class: are we inside [...]?
- escaped: was previous char a backslash?
Only convert ( to (?:) when it's not escaped, not in a char class,
and not already followed by ?.
Added 5 new tests covering escaped parens, character classes, and
mixed cases. All 23 tests pass.
Addresses PR review feedback: #109 (comment)
Extended convert_capturing_to_non_capturing to handle both PCRE-style (?<name>...) and Rust-style (?P<name>...) named capture groups, not just bare capturing groups. This prevents named captures in private patterns from interfering with public capture group numbering. Key changes: - Detect and convert both named capture syntaxes to (?:...) - Distinguish lookbehind assertions (?<=, ?<!) from named captures - Updated test_preserve_named_groups to test_convert_pcre_named_groups with correct expected behavior - Added 7 new tests covering both named capture styles and edge cases Fixes code review finding that named captures were being preserved instead of converted, which would cause capture group numbering mismatches in generated circuits.
…ure-groups-inside-of-private-patterns Rutefig/reg 574 fix prevent capture groups inside of private patterns
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds a Rust-based zk-regex compiler, backend generators for Circom and Noir, IR and pass pipelines, wasm bindings, Circom/Noir artifacts and tests, developer tooling (scripts, hooks, CI), many generated Circom circuits/graphs/regexes, and project configuration/file removals and reorganizations. Changes
Sequence Diagram(s)%%{init: { "themeVariables": { "actorBkg": "#E8F6FF", "actorBorder": "#8CCBEA", "noteBkg": "#F6F8FA" }}}%%
sequenceDiagram
participant User
participant CLI as zk-regex CLI
participant Driver
participant Builder as NFABuilder
participant Backend as CodeGen
participant Filesystem
User->>CLI: gen-from-raw(pattern, template, framework)
CLI->>Driver: Driver::compile(pattern, config)
Driver->>Builder: NFAGraph::build(pattern)
Builder->>Builder: Thompson NFA → IntermediateNFA → NFAGraph
Builder->>Driver: NFAGraph (graph JSON)
Driver->>Backend: generate_code(nfa, template, pattern, max_bytes)
Backend->>Filesystem: emit Circom/Noir code & graph
Filesystem-->>User: (graph.json, template.circom / template.nr)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| eprintln!("Supported frameworks: circom, noir"); | ||
| std::process::exit(1); | ||
| } | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Bug
The proving_framework argument is parsed inconsistently across commands. While Decomposed and Raw use ProvingFramework::from_str, GenerateCircuitInput manually parses it with a match statement. This duplicates logic and creates maintenance challenges, potentially leading to behavioral inconsistencies.
| } | ||
|
|
||
| // Run the script if this file is executed directly | ||
| if (import.meta.url === `file://${process.argv[1]}`) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Bug
The direct execution check for the main function in gen-inputs.ts and gen-regex.ts uses an incorrect comparison between import.meta.url (a URL string) and process.argv[1] (a file path). This mismatch means the scripts' main functions never run when executed directly.
Additional Locations (1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 25
🧹 Nitpick comments (12)
circom/regexes/reversed_bracket.json (1)
5-5: Consider whether greedy matching is intentional.The pattern
<.*uses greedy matching (.*) which will match everything after the<character. If the intent is to match content only up to the next delimiter or end of string, you might want to use a more constrained pattern.However, if the greedy behavior is intentional for your use case, this is fine as-is.
circom/circuits/common/to_all_graph.json (1)
1-50: Clarify generation and maintenance process for DFA graph file.This 2256-line JSON file encodes a deterministic finite automaton (DFA) for the regex pattern
(?:\r\n|^)to:([^\r\n]+)\r\n. The file structure suggests automated generation rather than manual authoring.Key concerns:
- Is this auto-generated? If yes, add build artifacts to
.gitignoreor document the generation process.- Maintenance model: If manually maintained, this is unmaintainable. If generated, provide:
- Documentation on the generator tool
- Instructions for regeneration when regex patterns change
- Validation that generated states are correct and complete
- Reachability verification: Confirm all states are reachable and
accept_statesare terminal.Without clarity on ownership, future maintainers won't know how to update this when regex patterns change.
I can help document the generation process or create a verification script. Would you like me to generate a script to validate DFA reachability and accept state correctness?
circom/regexes/message_id.json (1)
1-7: Consider expanding the message-id character class to align with RFC 5322.The PublicPattern on line 4 uses
<[A-Za-z0-9=@\\.\\+_-]+>, which may be too restrictive. RFC 5322 allows additional characters in message-id local and domain parts, such as colons (:) and slashes (/). Modern message-ids may include these characters.If stricter validation is intended, this is acceptable. Otherwise, consider expanding the pattern:
- { "PublicPattern": ["<[A-Za-z0-9=@\\.\\+_-]+>", 128] }, + { "PublicPattern": ["<[A-Za-z0-9=@\\.\\+_\\-:/]+>", 128] },circom/README.md (1)
1-62: LGTM - Comprehensive documentation with minor style improvements available.The README provides clear installation instructions, prerequisites, and troubleshooting guidance. The content is accurate and aligns well with the Circom integration introduced in this PR.
For improved markdown consistency, consider fixing the indentation issues flagged by markdownlint. Nested list items should use 2-space indentation instead of 4-space:
- Lines 13-14, 19-23, 27-31, 35-40 have 4-space indentation
- Line 51 uses emphasis instead of a proper heading
These are purely stylistic and can be addressed in a future cleanup if desired.
noir/common/message_id.json (1)
1-7: Pattern duplication with circom/regexes/message_id.json.This file is identical to
circom/regexes/message_id.json. While having separate regex definitions for Circom and Noir toolchains is reasonable, consider whether these could be maintained in a single source to prevent drift.Additionally, the same character class limitation applies here: the pattern may be too restrictive for modern message-ids per RFC 5322. See the review comment on
circom/regexes/message_id.jsonfor details.compiler/README.md (2)
13-40: Fix markdown list indentation to comply with MD007.Lines use 4-space indentation for nested list items; markdown style guide requires 2 spaces. Update the list indentation throughout (lines 13–40, 51, 65, 81, 97) for consistency.
- - **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`** - Parses the input regex `pattern` string. + - **`compile(pattern: &str) -> Result<NFAGraph, CompilerError>`** + Parses the input regex `pattern` string.
51-51: Replace bold text with heading syntax for "Examples" sections (MD036).Lines 51, 65, 81, and 97 use bold text (e.g.,
**Example 1:**) instead of proper heading syntax. Convert to heading level (e.g.,### Example 1:) for better accessibility and document structure.-**Example 1: Compile a simple regex to NFA** +### Example 1: Compile a simple regex to NFAAlso applies to: 65-65, 81-81, 97-97
compiler/src/utils.rs (2)
168-196: Consider validating the file_extension parameter.The function accepts
file_extensionwithout validation. Invalid extensions could create confusion (e.g., extensions with path separators, empty strings, or non-standard characters).Add basic validation:
pub fn save_outputs( nfa: &NFAGraph, circom_code: String, output_dir: &PathBuf, template_name: &str, file_extension: &str, ) -> Result<(), Box<dyn std::error::Error>> { validate_cli_template_name(template_name)?; + + // Validate file extension + if file_extension.is_empty() || file_extension.contains('/') || file_extension.contains('\\') { + return Err("Invalid file extension".into()); + } // Create output directory if it doesn't exist
198-464: Strong test coverage! Consider adding edge case tests.The test suite is comprehensive and well-organized. To complement the issues raised above, consider adding tests for:
- Escaped brackets in character classes:
r"[\[]"andr"[\]]"- Character class starting with
]:r"[]a]"- Malformed named captures:
r"(?<name...)"(missing>)- Empty template name validation
compiler/src/wasm.rs (1)
155-189: Minor optimization: avoid cloning max_substring_bytes twice.The code clones
max_substring_byteson line 161 (and 180 for Noir), but then the original is also referenced in the closure. Since it's onlyOption<Vec<usize>>, the clone is cheap, but the pattern is slightly redundant.Consider restructuring to avoid the clone:
+ let num_capture_groups = nfa.num_capture_groups; + let max_bytes_len = max_substring_bytes.as_ref().map(|v| v.len()).unwrap_or(0); + let code = match proving_framework { ProvingFramework::Circom => { - let max_bytes_clone = config.max_bytes.clone(); crate::backend::generate_circom_code( &nfa, &template_name.0, &raw_regex.0, - max_substring_bytes.clone(), + max_substring_bytes, ) .map_err(|nfa_err| { match nfa_err { crate::passes::NFAError::InvalidCapture(_) => { CompilerError::invalid_capture_config( - nfa.num_capture_groups, - max_substring_bytes.as_ref().map(|v| v.len()).unwrap_or(0), + num_capture_groups, + max_bytes_len, ) }compiler/src/lib.rs (1)
73-88: Consider exposing the optimize parameter.The
optimizeflag is hardcoded totruein theCompilationConfig. Depending on use cases, users might want to disable optimization for debugging or to compare output.Consider adding an optional parameter:
pub fn gen_from_raw( pattern: &str, max_bytes: Option<Vec<usize>>, template_name: &str, proving_framework: ProvingFramework, optimize: Option<bool>, ) -> CompilerResult<(NFAGraph, String)> { let config = CompilationConfig { template_name: template_name.to_string(), proving_framework, max_bytes, optimize: optimize.unwrap_or(true), }; // ... }compiler/src/ir/intermediate.rs (1)
340-396: State compaction logic is correct but clones nodes.The
compact_statesmethod correctly remaps all state indices and transitions. The node cloning on line 353 is safe but could be optimized if this becomes a performance bottleneck.If performance is critical, consider using
std::mem::take:let mut node = std::mem::take(&mut self.nodes[old_idx]);Though this would require making
self.nodestemporarily inconsistent, so the current clone approach is safer.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (5)
.DS_Storeis excluded by!**/.DS_StoreCargo.lockis excluded by!**/*.lockbun.lockis excluded by!**/*.lockcircom/bun.lockis excluded by!**/*.lockpackages/circom/pnpm-lock.yamlis excluded by!**/pnpm-lock.yaml
📒 Files selected for processing (78)
.cargo/config(0 hunks).eslintignore(0 hunks).eslintrc.json(0 hunks).github/workflows/test.yaml(0 hunks).github/workflows/test.yml(1 hunks).gitignore(1 hunks).prettierrc(0 hunks)Cargo.toml(1 hunks)LICENSE(1 hunks)README.md(2 hunks)babel.config.js(0 hunks)bunfig.toml(1 hunks)circom/.gitignore(1 hunks)circom/README.md(1 hunks)circom/bunfig.toml(1 hunks)circom/circuits/common/body_hash_regex.circom(1 hunks)circom/circuits/common/email_addr_graph.json(1 hunks)circom/circuits/common/email_addr_regex.circom(1 hunks)circom/circuits/common/email_domain_graph.json(1 hunks)circom/circuits/common/email_domain_regex.circom(1 hunks)circom/circuits/common/from_all_graph.json(1 hunks)circom/circuits/common/from_all_regex.circom(1 hunks)circom/circuits/common/message_id_graph.json(1 hunks)circom/circuits/common/message_id_regex.circom(1 hunks)circom/circuits/common/reversed_bracket_regex.circom(1 hunks)circom/circuits/common/simple_graph.json(1 hunks)circom/circuits/common/simple_regex.circom(1 hunks)circom/circuits/common/subject_all_graph.json(1 hunks)circom/circuits/common/subject_all_regex.circom(1 hunks)circom/circuits/common/to_all_graph.json(1 hunks)circom/circuits/common/to_all_regex.circom(1 hunks)circom/circuits/regex_helpers.circom(1 hunks)circom/circuits/tests/body_hash.test.ts(1 hunks)circom/circuits/tests/circuits/test_body_hash_regex.circom(1 hunks)circom/index.ts(1 hunks)circom/package.json(1 hunks)circom/regexes/body_hash.json(1 hunks)circom/regexes/email_addr.json(1 hunks)circom/regexes/email_domain.json(1 hunks)circom/regexes/from_all.json(1 hunks)circom/regexes/message_id.json(1 hunks)circom/regexes/reversed_bracket.json(1 hunks)circom/regexes/simple.json(1 hunks)circom/regexes/subject_all.json(1 hunks)circom/regexes/to_all.json(1 hunks)circom/scripts/gen-regex.ts(1 hunks)circom/tsconfig.json(1 hunks)compiler/Cargo.toml(1 hunks)compiler/README.md(1 hunks)compiler/package.json(1 hunks)compiler/src/backend/circom.rs(1 hunks)compiler/src/backend/mod.rs(1 hunks)compiler/src/backend/noir.rs(1 hunks)compiler/src/backend/shared.rs(1 hunks)compiler/src/bin/zk-regex.rs(1 hunks)compiler/src/driver.rs(1 hunks)compiler/src/error.rs(1 hunks)compiler/src/ir/graph.rs(1 hunks)compiler/src/ir/intermediate.rs(1 hunks)compiler/src/ir/mod.rs(1 hunks)compiler/src/ir/nfa.rs(1 hunks)compiler/src/lib.rs(1 hunks)compiler/src/passes/builder.rs(1 hunks)compiler/src/passes/error.rs(1 hunks)compiler/src/passes/mod.rs(1 hunks)compiler/src/types.rs(1 hunks)compiler/src/utils.rs(1 hunks)compiler/src/wasm.rs(1 hunks)git-hooks/pre-push(1 hunks)noir/Nargo.toml(1 hunks)noir/README.md(1 hunks)noir/common/body_hash.json(1 hunks)noir/common/email_addr.json(1 hunks)noir/common/email_domain.json(1 hunks)noir/common/from_all.json(1 hunks)noir/common/message_id.json(1 hunks)noir/common/sample_haystacks/body_hash.json(1 hunks)noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json(1 hunks)
💤 Files with no reviewable changes (6)
- .github/workflows/test.yaml
- .prettierrc
- .eslintignore
- .eslintrc.json
- babel.config.js
- .cargo/config
🧰 Additional context used
🧬 Code graph analysis (16)
compiler/src/passes/builder.rs (3)
compiler/src/backend/circom.rs (2)
nfa(229-231)nfa(233-235)compiler/src/backend/noir.rs (2)
nfa(191-193)nfa(195-197)compiler/src/ir/intermediate.rs (1)
new(53-61)
compiler/src/ir/nfa.rs (1)
compiler/src/ir/intermediate.rs (1)
new(53-61)
compiler/src/ir/mod.rs (2)
compiler/src/backend/circom.rs (2)
nfa(229-231)nfa(233-235)compiler/src/backend/noir.rs (2)
nfa(191-193)nfa(195-197)
compiler/src/bin/zk-regex.rs (6)
compiler/src/lib.rs (3)
gen_circuit_inputs(160-297)gen_from_decomposed(114-126)gen_from_raw(73-88)compiler/src/utils.rs (2)
save_outputs(168-196)validate_cli_template_name(156-166)compiler/src/types.rs (1)
from_str(51-59)compiler/src/error.rs (2)
code(139-148)user_message(151-224)compiler/src/ir/nfa.rs (1)
from_json(80-84)compiler/src/passes/error.rs (1)
from(53-166)
compiler/src/backend/shared.rs (5)
compiler/src/passes/error.rs (1)
from(53-166)compiler/src/backend/noir.rs (8)
inputs(272-276)inputs(283-287)inputs(291-295)nfa(191-193)nfa(195-197)start_states(350-353)accept_states(379-382)start(430-430)compiler/src/backend/circom.rs (2)
nfa(229-231)nfa(233-235)compiler/src/ir/intermediate.rs (1)
new(53-61)compiler/src/ir/nfa.rs (1)
new(63-71)
circom/circuits/tests/body_hash.test.ts (1)
compiler/src/wasm.rs (1)
genCircuitInputs(198-217)
compiler/src/driver.rs (6)
compiler/src/lib.rs (1)
compile(39-41)compiler/src/backend/circom.rs (3)
nfa(229-231)nfa(233-235)generate_circom_code(62-432)compiler/src/backend/noir.rs (3)
nfa(191-193)nfa(195-197)generate_noir_code(30-265)compiler/src/passes/builder.rs (1)
build(21-37)compiler/src/error.rs (3)
code(139-148)invalid_capture_config(278-290)circuit_generation_failed(268-276)compiler/src/passes/error.rs (1)
from(53-166)
circom/scripts/gen-regex.ts (6)
compiler/src/bin/zk-regex.rs (1)
main(83-225)scripts/utils/logger.ts (2)
logger(41-41)error(36-38)scripts/utils/file-operations.ts (3)
fileExists(75-82)ensureDirectory(24-30)listFilesWithExtension(99-104)scripts/utils/types.ts (1)
ScriptError(69-74)scripts/utils/string-utils.ts (1)
toPascalCase(15-19)scripts/utils/subprocess.ts (1)
executeCargo(196-267)
compiler/src/lib.rs (6)
compiler/src/driver.rs (1)
compile(59-130)compiler/src/passes/builder.rs (1)
build(21-37)compiler/src/backend/shared.rs (3)
from(33-44)from(48-59)generate_circuit_inputs(127-228)compiler/src/passes/error.rs (1)
from(53-166)compiler/src/utils.rs (1)
decomposed_to_composed_regex(131-154)compiler/src/error.rs (2)
input_too_long(292-300)no_match_found(302-310)
compiler/src/utils.rs (3)
compiler/src/ir/intermediate.rs (1)
new(53-61)compiler/src/ir/nfa.rs (1)
new(63-71)compiler/src/lib.rs (1)
gen_from_decomposed(114-126)
compiler/src/ir/graph.rs (5)
compiler/src/backend/circom.rs (2)
nfa(229-231)nfa(233-235)compiler/src/backend/noir.rs (3)
nfa(191-193)nfa(195-197)start(430-430)compiler/src/ir/intermediate.rs (1)
new(53-61)compiler/src/ir/nfa.rs (2)
new(63-71)node(96-99)compiler/src/backend/shared.rs (2)
path(163-163)path(164-164)
compiler/src/passes/error.rs (2)
compiler/src/wasm.rs (5)
from(21-26)from(30-42)from(54-56)from(60-62)from(66-68)compiler/src/error.rs (2)
nfa_construction_failed(259-266)serialization_error(312-318)
compiler/src/backend/noir.rs (1)
compiler/src/backend/shared.rs (4)
escape_regex_for_display(230-243)generate_circuit_data(62-125)from(33-44)from(48-59)
compiler/src/backend/circom.rs (1)
compiler/src/backend/shared.rs (2)
escape_regex_for_display(230-243)generate_circuit_data(62-125)
compiler/src/ir/intermediate.rs (3)
compiler/src/backend/circom.rs (2)
nfa(229-231)nfa(233-235)compiler/src/backend/noir.rs (3)
nfa(191-193)nfa(195-197)start(430-430)compiler/src/ir/nfa.rs (2)
new(63-71)node(96-99)
compiler/src/wasm.rs (6)
compiler/src/backend/circom.rs (3)
generate_circom_code(62-432)nfa(229-231)nfa(233-235)compiler/src/backend/noir.rs (6)
generate_noir_code(30-265)nfa(191-193)nfa(195-197)inputs(272-276)inputs(283-287)inputs(291-295)compiler/src/lib.rs (2)
compile(39-41)gen_circuit_inputs(160-297)compiler/src/utils.rs (1)
decomposed_to_composed_regex(131-154)compiler/src/passes/error.rs (1)
from(53-166)compiler/src/error.rs (3)
code(139-148)serialization_error(312-318)invalid_capture_config(278-290)
🪛 actionlint (1.7.8)
.github/workflows/test.yml
23-23: the runner of "actions/cache@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 markdownlint-cli2 (0.18.1)
README.md
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
compiler/README.md
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
19-19: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
20-20: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
28-28: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
29-29: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
30-30: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
31-31: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
35-35: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
36-36: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
37-37: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
38-38: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
39-39: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
40-40: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
51-51: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
65-65: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
81-81: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
97-97: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
circom/README.md
13-13: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
14-14: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
15-15: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
19-19: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
20-20: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
23-23: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
28-28: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
29-29: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
30-30: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
31-31: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
35-35: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
36-36: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
37-37: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
38-38: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
39-39: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
40-40: Unordered list indentation
Expected: 2; Actual: 4
(MD007, ul-indent)
51-51: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Socket Security: Pull Request Alerts
- GitHub Check: Cursor Bugbot
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (34)
LICENSE (1)
674-674: URL standardization looks correct.The reference to
https://www.gnu.org/licenses/why-not-lgpl.htmlis the canonical FSF link for GPL v3 licensing guidance. This is a standard maintenance update with no license compliance implications.noir/Nargo.toml (2)
1-5: Populate authors and review version constraints.The
authorsfield (line 4) is empty, which is a metadata quality issue. For published or team projects, author information should be populated. Additionally, thecompiler_versionconstraint uses only a lower bound (>=1.0.0) with no upper limit, which could lead to compatibility issues if future Noir releases introduce breaking changes.
- Please populate the
authorsfield with actual author name(s), e.g.,authors = ["Your Name"]- Consider adding an upper bound to
compiler_versionto maintain compatibility, e.g.,compiler_version = ">=1.0.0, <2.0.0"(adjust based on stability guarantees)
7-8: Verify the dependency specification and git tag availability.The
sortdependency is sourced from a GitHub repository via a git tag (v0.2.3) rather than a version-pinned crates.io release. While this approach is valid in Noir, please confirm:
- The git repository URL and tag
v0.2.3are accessible and stable.- This dependency sourcing approach aligns with the project's practices for managing external dependencies.
- The tag points to a compatible and stable version of the library.
If the dependency is well-maintained and the tag is stable, consider adding a comment explaining why a git-sourced dependency is preferred over a registry release (if available).
git-hooks/pre-push (1)
17-30: commentcircom/regexes/simple.json (1)
1-3: LGTM!Valid JSON structure defining a simple regex pattern for testing the compiler pipeline.
noir/common/email_addr.json (1)
1-10: LGTM!Email validation pattern with appropriate maximum length (320 characters per RFC 3696). The pattern includes standard email characters and properly escaped special characters.
noir/README.md (1)
1-20: LGTM!Clear and helpful documentation with an important warning about test placement. The guidance prevents potential issues with automated test generation scripts.
circom/regexes/to_all.json (1)
1-7: LGTM!Well-structured multi-part pattern for parsing email "To:" headers. Correctly handles CRLF line endings and constrains the captured content to a reasonable maximum length.
circom/circuits/common/simple_graph.json (1)
1-71: LGTM with observation.The NFA graph correctly represents the regex "a*b" with proper state transitions. The multiple start states (0, 2, 3) are unusual but may be intentional for the compiler's graph representation or optimization strategy.
Note: Ensure the graph consumer correctly handles multiple start states if this is a generated artifact from the compiler pipeline.
circom/regexes/from_all.json (1)
1-7: LGTM — new regex pattern data asset.This JSON file adds a regex definition for parsing email "from:" headers. The structure and patterns are consistent with the regex data assets introduced elsewhere in the PR.
noir/common/from_all.json (1)
1-7: LGTM — matching Noir regex pattern.This mirrors the Circom version for cross-framework consistency. Both frameworks now share the same "from:" header parsing patterns.
compiler/Cargo.toml (1)
13-23: Verify workspace dependency versions in root Cargo.toml.All dependencies are correctly delegated to the workspace. However, ensure the versions specified in the root workspace manifest are appropriate and not using unstable sources (except where necessary).
circom/regexes/body_hash.json (1)
1-8: LGTM — DKIM body hash regex pattern.Well-structured regex for extracting DKIM body hash values. The base64 pattern and 44-character limit align with SHA256-based DKIM signatures.
circom/tsconfig.json (1)
1-27: LGTM — well-configured TypeScript settings.The configuration enforces strict type checking while maintaining development flexibility. Modern settings for bundler-based workflows are appropriate.
noir/common/sample_haystacks/body_hash.json (1)
1-9: LGTM - Well-structured test fixture.The test cases appropriately cover:
- Pass case 1: body hash pattern without leading CRLF (line start anchor)
- Pass case 2: body hash pattern with leading CRLF
- Fail case: pattern preceded by another header field ("to:"), ensuring the regex doesn't match mid-line
circom/circuits/tests/circuits/test_body_hash_regex.circom (1)
5-5: No issues identified—parameter values are correct.The BodyHashRegex template is instantiated with
maxHaystackBytes=300andmaxMatchBytes=299. These parameters are appropriately configured: the haystack (input data to search) is 300 bytes, and the maximum match length is 299 bytes. Having the match length slightly less than the haystack size is a standard and intentional configuration.compiler/src/passes/mod.rs (1)
6-9: Dismiss this review comment — the builder module is correctly kept private.The builder module contains no public type definitions, only impl blocks that add methods to public types (NFAGraph, IntermediateNFA) defined in the
ir/module. External code already has full access to builder functionality through public methods likeNFAGraph::build(). Keeping the builder module private is appropriate; it's an internal implementation detail, not a public API surface requiring exposure.Likely an incorrect or invalid review comment.
circom/circuits/common/subject_all_regex.circom (1)
43-148: Solid DFA wiringThe transition matrix, capture propagation, and path-length/linkage checks line up cleanly with
(?:\r\n|^)subject:([^\r\n]+)\r\n; I don’t see correctness gaps here.circom/circuits/common/message_id_regex.circom (1)
43-153: Regex automaton looks soundState transitions and capture-group handling match the
message-idpattern, and the accept-state accounting mirrors the other proven templates—looks good.circom/circuits/common/subject_all_graph.json (1)
1-2321: Graph data matches the circuitThe DFA states, transitions, and capture annotations mirror the Circom template’s expectations, so the data asset is consistent.
circom/circuits/common/from_all_regex.circom (1)
43-167: From-header automaton looks consistentThe transitions and capture plumbing reflect the intended
(from: …)header pattern, with the same path-length safeguards as the other templates. No issues spotted.compiler/src/wasm.rs (2)
45-69: Good use of newtype pattern for type safety at WASM boundaries.The wrapper types provide clear type distinctions at the WASM interface, which helps prevent parameter ordering mistakes.
194-242: Clean error handling in circuit input generation.The function properly deserializes the NFA graph, generates inputs, and serializes the result with appropriate error handling at each step.
compiler/src/lib.rs (3)
24-41: Clean and well-documented API entry point.The
compilefunction provides a clear, simple interface for NFA construction with proper error type conversion.
90-126: Clean decomposition and delegation pattern.The function appropriately delegates to
gen_from_rawafter composing the regex, avoiding code duplication.
267-294: Thorough error context enrichment for debugging.The code carefully adds input context to errors that lack it, which will significantly help users debug issues. The pattern of augmenting base errors with additional context is a good practice.
compiler/src/ir/intermediate.rs (3)
13-61: Well-designed intermediate representation for epsilon elimination.The structures cleanly separate epsilon and byte transitions, and the
newmethod properly initializes all fields.
111-145: Clean conversion pipeline from intermediate to final NFA.The finalize method correctly orchestrates epsilon elimination and final validation, with clear separation of concerns.
184-201: Verify capture group merging logic during epsilon elimination—no automated tests found.The concern in the original review is valid: no tests were found that specifically cover epsilon transitions combined with capture groups. The capture merging logic (adding start captures from source closure and end captures from target closure) appears algorithmically sound, but is complex and lacks dedicated test coverage.
Tests that should exist but were not found:
- Epsilon transitions with capture groups
- Nested capture groups across epsilon transitions
- Multiple chained epsilon transitions with captures
- Start/end captures at different positions
compiler/src/error.rs (4)
9-74: Well-organized error code system with clear categorization.The error codes are logically grouped into ranges, making it easy to identify error categories at a glance. The Display implementation is complete.
76-135: Excellent error design with rich context and actionable suggestions.The error variants provide comprehensive context for debugging while keeping related information grouped logically. The consistent inclusion of
suggestionfields is particularly user-friendly.
137-235: Clean implementation of error utility methods.The methods provide clear access to error codes and well-formatted user messages. The
is_recoverable()classification is useful for error handling strategies.
237-319: Comprehensive and consistent convenience constructors.The factory methods provide easy-to-use interfaces for creating common errors with appropriate codes and helpful suggestions. All constructors properly populate their respective error variants.
compiler/src/utils.rs (1)
131-154: No validation needed in decomposed_to_composed_regex—downstream backend code already validates max_len values.The backend generators validate that max_substring_bytes contains no zero values and return InvalidCapture errors if validation fails. The
decomposed_to_composed_regexfunction correctly collects max_len values; validation happens in the appropriate downstream layer.Likely an incorrect or invalid review comment.
| toolchain: stable | ||
|
|
||
| - name: Cache Cargo | ||
| uses: actions/cache@v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update actions/cache to v4 for GitHub Actions compatibility.
The actions/cache@v3 action is outdated per actionlint. Update to v4 to ensure compatibility with current GitHub Actions standards and to receive security updates.
- uses: actions/cache@v3
+ uses: actions/cache@v4📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| uses: actions/cache@v3 | |
| uses: actions/cache@v4 |
🧰 Tools
🪛 actionlint (1.7.8)
23-23: the runner of "actions/cache@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🤖 Prompt for AI Agents
In .github/workflows/test.yml around line 23, the workflow references
actions/cache@v3 which is outdated; update the action reference to
actions/cache@v4 to satisfy actionlint and gain compatibility/security updates.
Modify the uses entry to point to actions/cache@v4 (no other changes required),
commit the workflow, and verify the workflow lints and runs successfully in CI.
| # mail-parser = { version = "0.8", git = "https://github.com/stalwartlabs//mail-parser", tag = "0.8.0" } | ||
| [workspace.dependencies] | ||
| clap = "4.2.1" | ||
| comptime = { git = "https://github.com/jp4g/sparse_array", branch = "feat/comptime-codegen" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CRITICAL: Replace feature-branch git dependency with stable crate version.
The comptime dependency uses an unstable feature branch from a fork, which creates a production risk. The official comptime crate version 1.0.0 is available on crates.io. Unless the fork contains essential unreleased features, migrate to the stable version:
-comptime = { git = "https://github.com/jp4g/sparse_array", branch = "feat/comptime-codegen" }
+comptime = "1.0.0"If fork-specific functionality is required, open an issue with the upstream maintainers or document the dependency override rationale in comments.
Also applies to: 9-9
🤖 Prompt for AI Agents
In Cargo.toml around lines 9 to 9, the comptime dependency is pointing at a
feature branch of a forked repo; replace that git dependency with the stable
crates.io release by changing the dependency to use version = "1.0.0" (remove
the git/branch fields), run cargo update / cargo build to refresh Cargo.lock,
and search the repo for any other occurrences of the git override to update or
document; if fork-specific behavior is truly required, add a short comment
explaining the rationale and open an upstream issue instead of keeping the git
dependency.
|
|
||
| # logs | ||
| logs | ||
| _.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo in log file pattern.
Line 15 has _.log which will only match a file literally named "_.log". This should be *.log to match all log files with the .log extension.
Apply this diff:
-_.log
+*.log📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _.log | |
| *.log |
🤖 Prompt for AI Agents
In circom/.gitignore around line 15, the entry currently reads "_.log" which
only matches a literal file named "_.log"; change this to "*.log" so the
gitignore will match all files ending with the .log extension; update the line
to use an asterisk prefix (no other changes needed).
| @@ -0,0 +1,943 @@ | |||
| { | |||
| "regex": "([A-Za-z0-9!#$%&'*+=?\\-\\^_`{|}~./@]+@[A-Za-z0-9.\\-]+)", | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra ‘@’ from the local-part class.
The generated regex on Line 2 still includes @ in the local-part character class, so this graph will accept inputs like foo@[email protected]: the first @ keeps you in the local-part loop, and the second @ transitions to the domain. That violates the single-@ requirement for mailboxes and will yield false positives. Please drop @ from the first character class (and regen this graph) so only the delimiter @ switches to the domain portion.
🤖 Prompt for AI Agents
In circom/circuits/common/email_addr_graph.json around line 2 the local-part
character class incorrectly includes '@', allowing multiple '@' characters
(e.g., foo@[email protected]); remove the '@' from the first character class so
only the single delimiter '@' separates local-part and domain, update the regex
accordingly, and regenerate the graph artifact so the change is reflected in the
generated file.
| signal isWithinPathLengthMinusOne[maxMatchBytes-2]; | ||
| signal isTransitionLinked[maxMatchBytes]; | ||
|
|
||
| signal isValidStartState; | ||
|
|
||
| signal reachedAcceptState[maxMatchBytes]; | ||
|
|
||
| component isValidTraversal[maxMatchBytes]; | ||
|
|
||
| // Select the haystack from the input | ||
| signal haystack[maxMatchBytes] <== SelectSubArray(maxHaystackBytes, maxMatchBytes)(inHaystack, matchStart, matchLength); | ||
|
|
||
| // Check if the first state in the haystack is a valid start state | ||
| isValidStartState <== IsEqual()([startStates[0], currStates[0]]); | ||
|
|
||
| for (var i = 0; i < maxMatchBytes; i++) { | ||
| isWithinPathLength[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength]); | ||
|
|
||
| // Check if the traversal is a valid path | ||
| if (i < maxMatchBytes-2) { | ||
| isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]); | ||
| isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]); | ||
| isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i]; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enforce transition linkage for the final step
When matchLength == maxMatchBytes, the constraint block at Line 53 never executes for i = maxMatchBytes - 2, so nextStates[maxMatchBytes-2] is never forced to equal currStates[maxMatchBytes-1]. An attacker can supply an inconsistent final transition and still satisfy all constraints, breaking soundness for full-length matches. Please extend the guard/value array so the last link is checked as well. Example fix:
- signal isWithinPathLengthMinusOne[maxMatchBytes-2];
+ signal isWithinPathLengthMinusOne[maxMatchBytes-1];
@@
- if (i < maxMatchBytes-2) {
+ if (i < maxMatchBytes-1) {
isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]);
isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]);
isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i];
}🤖 Prompt for AI Agents
In circom/circuits/common/email_addr_regex.circom around lines 34 to 57, the
arrays and loop guard for checking transition linkage stop at maxMatchBytes-2 so
the final link (between nextStates[maxMatchBytes-2] and
currStates[maxMatchBytes-1]) is not enforced when matchLength == maxMatchBytes;
extend the arrays and adjust the loop condition so the last link is validated:
declare isWithinPathLengthMinusOne and isTransitionLinked with size
maxMatchBytes-1 (instead of maxMatchBytes-2 and maxMatchBytes), change the
in-loop guard to if (i < maxMatchBytes-1) and compute
isWithinPathLengthMinusOne[i] with LessThan(log2Ceil(maxMatchBytes))([i,
matchLength-1]) then set isTransitionLinked[i] <== IsEqual()([nextStates[i],
currStates[i+1]]) and constrain isTransitionLinked[i] *
isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i] so the final
transition is enforced for full-length matches.
| } else if i + 2 < chars.len() && chars[i + 2] == '<' { | ||
| // Could be: (?<=...) positive lookbehind, or (?<!...) negative lookbehind, or (?<name>...) PCRE named capture | ||
| if i + 3 < chars.len() && (chars[i + 3] == '=' || chars[i + 3] == '!') { | ||
| // Lookbehind assertion: (?<=...) or (?<!...) | ||
| // These are special groups, preserve as-is | ||
| result.push(ch); | ||
| } else { | ||
| // PCRE named capture: (?<name>...) | ||
| // Convert to non-capturing and skip the name | ||
| result.push_str("(?:"); | ||
| i += 2; // Skip '?' and '<' | ||
| // Skip until we find the closing '>' | ||
| while i + 1 < chars.len() && chars[i + 1] != '>' { | ||
| i += 1; | ||
| } | ||
| if i + 1 < chars.len() && chars[i + 1] == '>' { | ||
| i += 1; // Skip the '>' | ||
| } | ||
| } | ||
| } else if i + 3 < chars.len() && chars[i + 2] == 'P' && chars[i + 3] == '<' { | ||
| // Rust named capture: (?P<name>...) | ||
| // Convert to non-capturing and skip the name | ||
| result.push_str("(?:"); | ||
| i += 3; // Skip '?', 'P', and '<' | ||
| // Skip until we find the closing '>' | ||
| while i + 1 < chars.len() && chars[i + 1] != '>' { | ||
| i += 1; | ||
| } | ||
| if i + 1 < chars.len() && chars[i + 1] == '>' { | ||
| i += 1; // Skip the '>' | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate named capture group syntax to prevent silent pattern corruption.
The code skips to the closing > for named captures but doesn't validate that it exists. Malformed patterns like (?<name... without a closing > will scan to the end of the pattern, potentially corrupting the output.
Consider adding validation:
// Skip until we find the closing '>'
while i + 1 < chars.len() && chars[i + 1] != '>' {
i += 1;
}
if i + 1 < chars.len() && chars[i + 1] == '>' {
i += 1; // Skip the '>'
+ } else {
+ // Malformed named capture - no closing '>'
+ return format!("(?:MALFORMED_CAPTURE)");
}Or better yet, document that this function expects well-formed regex patterns and should be called after regex validation.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In compiler/src/utils.rs around lines 68 to 98, the named-capture skipping logic
does not verify a closing '>' and will run to the end of the input on malformed
patterns; update the logic to detect whether the loop found a '>' and handle the
failure explicitly: after the loop check if the terminating '>' was not found
and either (a) return an error/Result indicating a malformed regex so callers
can validate before calling, or (b) restore/emit the original characters (do not
drop them) and stop trying to skip the name (to avoid silently corrupting
output); implement one of these behaviors consistently for both PCRE (?<name>)
and Rust (?P<name>) branches and add a brief comment explaining the chosen
failure mode.
| pub fn validate_cli_template_name(name: &str) -> Result<String, String> { | ||
| // Convert to PascalCase to normalize | ||
| let pascal_name = name.to_pascal_case(); | ||
|
|
||
| // Verify it's valid PascalCase | ||
| if pascal_name != name { | ||
| return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into()); | ||
| } | ||
|
|
||
| Ok(name.to_string()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validate against empty template names.
The function doesn't check for empty strings. An empty template name would pass validation since "".to_pascal_case() == "".
Add an early check:
pub fn validate_cli_template_name(name: &str) -> Result<String, String> {
+ if name.is_empty() {
+ return Err("Template name cannot be empty".into());
+ }
+
// Convert to PascalCase to normalize
let pascal_name = name.to_pascal_case();📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| pub fn validate_cli_template_name(name: &str) -> Result<String, String> { | |
| // Convert to PascalCase to normalize | |
| let pascal_name = name.to_pascal_case(); | |
| // Verify it's valid PascalCase | |
| if pascal_name != name { | |
| return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into()); | |
| } | |
| Ok(name.to_string()) | |
| } | |
| pub fn validate_cli_template_name(name: &str) -> Result<String, String> { | |
| if name.is_empty() { | |
| return Err("Template name cannot be empty".into()); | |
| } | |
| // Convert to PascalCase to normalize | |
| let pascal_name = name.to_pascal_case(); | |
| // Verify it's valid PascalCase | |
| if pascal_name != name { | |
| return Err("Template name must be in PascalCase (e.g., ThisIsATemplate)".into()); | |
| } | |
| Ok(name.to_string()) | |
| } |
🤖 Prompt for AI Agents
In compiler/src/utils.rs around lines 156 to 166, the validate_cli_template_name
function does not reject empty strings (""), which currently pass since
"".to_pascal_case() == ""; add an early check at the top to return Err("Template
name cannot be empty" or similar) if name.trim().is_empty(), then proceed with
the existing PascalCase normalization and comparison; ensure the error message
is consistent with other validation errors.
| impl From<WasmError> for JsValue { | ||
| fn from(err: WasmError) -> Self { | ||
| // Return a structured error message that JavaScript can parse | ||
| match err { | ||
| WasmError::CompilerError { code, message } => { | ||
| let error_json = format!( | ||
| r#"{{"type": "CompilerError", "code": "{}", "message": "{}"}}"#, | ||
| code, | ||
| message.replace('"', "\\\"").replace('\n', "\\n") | ||
| ); | ||
| JsValue::from_str(&error_json) | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use proper JSON serialization instead of manual string escaping.
The manual JSON construction with string replacement is fragile and doesn't handle all special characters (e.g., \r, \t, \, control characters). This could lead to invalid JSON or injection issues.
Use serde_json for safe serialization:
+use serde_json::json;
+
impl From<WasmError> for JsValue {
fn from(err: WasmError) -> Self {
- // Return a structured error message that JavaScript can parse
match err {
WasmError::CompilerError { code, message } => {
- let error_json = format!(
- r#"{{"type": "CompilerError", "code": "{}", "message": "{}"}}"#,
- code,
- message.replace('"', "\\\"").replace('\n', "\\n")
- );
- JsValue::from_str(&error_json)
+ let error_json = json!({
+ "type": "CompilerError",
+ "code": code,
+ "message": message
+ });
+ JsValue::from_str(&error_json.to_string())
}
}
}Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In compiler/src/wasm.rs around lines 29 to 42, the From<WasmError> impl manually
constructs JSON with string replacements which is unsafe for many characters;
replace the manual formatting with proper serde_json serialization: derive or
build a small serializable struct or use serde_json::json! to produce a valid
JSON string, call serde_json::to_string (or to_string_pretty) and pass the
result to JsValue::from_str, and handle serialization errors (e.g., log and
return a generic error JSON) and add/import serde_json in Cargo.toml and at the
top of the file.
| { | ||
| "parts": [ | ||
| { "Pattern": "[A-Za-z0-9!#$%&'*+=?\\-\\^_`{|}~./]+@" }, | ||
| { "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove '@' from domain character class.
The PublicPattern includes @ in the character class [A-Za-z0-9.\\-@]+, but email domains should not contain @ symbols. Since Part 1 already ends with @, allowing it in the domain part would permit malformed emails like user@[email protected].
Apply this fix:
- { "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] }
+ { "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { "PublicPattern": ["[A-Za-z0-9.\\-@]+", 128] } | |
| { "PublicPattern": ["[A-Za-z0-9.\\-]+", 128] } |
🤖 Prompt for AI Agents
In noir/common/email_domain.json around line 4, the PublicPattern character
class wrongly includes '@' allowing malformed domains; remove the '@' from the
class so the domain regex does not accept '@' characters, and ensure the hyphen
is either escaped or placed last in the class (e.g. [A-Za-z0-9.\-]+ or
[A-Za-z0-9.-]+) while keeping the same length limit (128).
|
|
||
| - [ ] support character class regex grammar (so as to simplify the regular expressions) | ||
| - [ ] better compatibility with regex grammar (will need a set of different tests to assert the compatibility between circuit and regex in languages) --> | ||
| ## License | ||
|
|
||
| ## Assumptions | ||
| Some email providers put not only the sender's email address but also their username to the From field. | ||
| ALthough its concrete formats differ among email providers, our FromAddrRegex template assumes that the email address appears at the end of the From field. | ||
| If this assumption does not hold, i.e., the username appears after the email address, an adversary can output an arbitrary email address from that template by including a dummy email address in the username. | ||
| This project is licensed under the [Specify License Here - e.g., MIT License or Apache 2.0]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix license placeholder.
The README now ends with “Specify License Here…”, which contradicts the actual MIT license declared elsewhere. Please replace the placeholder with the real license text or link so downstream users get the correct compliance information.
🤖 Prompt for AI Agents
In README.md around lines 229 to 232, the license section still contains a
placeholder; replace "Specify License Here - e.g., MIT License or Apache 2.0"
with the actual MIT license reference by updating the heading to "License" and
either inserting the short MIT license blurb plus year/owner or linking to the
full text (e.g., "This project is licensed under the MIT License - see LICENSE
file or https://opensource.org/licenses/MIT"), ensuring it matches the project's
LICENSE file.
…aystack and regex definition json
…roup and rerun regex, input and test generation
… and tests - fails input that enters in conditional path
The epsilon closure computation was incorrectly tracking capture groups globally, causing captures from unrelated conditional branches to leak into other paths. This resulted in incorrect capture behavior when conditionals preceded capture groups in regex patterns. Changes: - Modified EpsilonClosure to use per_state_captures (BTreeMap) instead of global captures set - Updated capture tracking to maintain separate capture sets for each epsilon path - Fixed start state logic to check for start captures across all epsilon paths - Regenerated Circom and Noir templates with corrected logic - Updated test inputs to reflect correct capture behavior - Added email_addr test case demonstrating conditional before capture group
…eneration Rutefig/reg 579 fix noir proof generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
circom/circuits/common/email_addr_regex.circom (1)
34-61: Enforce the final transition link for full-length matchesWhen
matchLength == maxMatchBytes, the loop never enters the guard withi = maxMatchBytes - 2, sonextStates[maxMatchBytes-2]is never constrained to equalcurrStates[maxMatchBytes-1]. That lets an inconsistent final hop satisfy all constraints, breaking soundness. Please extend the helper arrays and guard so the last link is also enforced.- signal isWithinPathLengthMinusOne[maxMatchBytes-2]; - signal isTransitionLinked[maxMatchBytes]; + signal isWithinPathLengthMinusOne[maxMatchBytes-1]; + signal isTransitionLinked[maxMatchBytes-1]; @@ - if (i < maxMatchBytes-2) { + if (i < maxMatchBytes-1) { isWithinPathLengthMinusOne[i] <== LessThan(log2Ceil(maxMatchBytes))([i, matchLength-1]); isTransitionLinked[i] <== IsEqual()([nextStates[i], currStates[i+1]]); isTransitionLinked[i] * isWithinPathLengthMinusOne[i] === isWithinPathLengthMinusOne[i]; }compiler/src/ir/intermediate.rs (1)
63-109: Accept states validation is still missing.The past review comment correctly identified that
accept_statesindices are not validated to be within bounds. Whilestart_states, byte transition targets, and epsilon transition targets are validated, an out-of-bounds accept state could cause issues later.Apply the suggested fix from the previous review:
// Validate transition targets are within bounds for (state_idx, node) in self.nodes.iter().enumerate() { // Check byte transitions for destinations in node.byte_transitions.values() { for &dest in destinations { if dest >= self.nodes.len() { return Err(NFAError::InvalidTransition(format!( "Byte transition from state {state_idx} to invalid state {dest}" ))); } } } // Check epsilon transitions for &dest in &node.epsilon_transitions { if dest >= self.nodes.len() { return Err(NFAError::InvalidTransition(format!( "Epsilon transition from state {state_idx} to invalid state {dest}" ))); } } } + + // Validate accept states are within bounds + for &accept in &self.accept_states { + if accept >= self.nodes.len() { + return Err(NFAError::InvalidStateId(format!( + "Accept state {accept} is out of bounds" + ))); + } + } Ok(())
🧹 Nitpick comments (1)
compiler/src/ir/intermediate.rs (1)
432-438: Remove unusedPathInfostruct (lines 432-438).Verification confirms this struct is never referenced elsewhere in the codebase. Remove the definition to eliminate dead code:
-/// Information about captures encountered on a specific epsilon path -#[derive(Debug, Clone)] -struct PathInfo { - /// The destination state of this path - target_state: usize, - /// Captures encountered on epsilon transitions along this path - captures: BTreeSet<(usize, bool)>, -}
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
.gitignore(1 hunks)CLAUDE.md(1 hunks)circom/circuits/common/email_addr_regex.circom(1 hunks)circom/circuits/tests/circuits/test_email_addr_regex.circom(1 hunks)circom/circuits/tests/circuits/test_subject_all_regex.circom(1 hunks)circom/circuits/tests/email_addr.test.ts(1 hunks)circom/circuits/tests/subject_all.test.ts(1 hunks)circom/regexes/email_addr.json(1 hunks)compiler/src/ir/intermediate.rs(1 hunks)noir/common/email_addr.json(1 hunks)noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- circom/circuits/tests/circuits/test_subject_all_regex.circom
🚧 Files skipped from review as they are similar to previous changes (3)
- noir/common/email_addr.json
- noir/common/sample_haystacks/circuit_inputs/body_hash_pass_0.json
- circom/regexes/email_addr.json
🧰 Additional context used
🧬 Code graph analysis (3)
circom/circuits/tests/subject_all.test.ts (1)
compiler/src/wasm.rs (1)
genCircuitInputs(198-217)
circom/circuits/tests/email_addr.test.ts (1)
compiler/src/wasm.rs (1)
genCircuitInputs(198-217)
compiler/src/ir/intermediate.rs (5)
compiler/src/backend/circom.rs (2)
nfa(229-231)nfa(233-235)compiler/src/backend/noir.rs (3)
nfa(191-193)nfa(195-197)start(430-430)compiler/src/ir/nfa.rs (2)
new(63-71)node(96-99)compiler/src/lib.rs (1)
compile(39-41)compiler/src/backend/shared.rs (2)
path(163-163)path(164-164)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test
- GitHub Check: test
🔇 Additional comments (11)
.gitignore (2)
1-19: Verify thatinputs.txtandnfa.jsonare always generated artifacts and never committed source files.Search of the codebase found no references to either
inputs.txtornfa.jsonin code, scripts, or documentation, and neither file exists in the repository. While this is consistent with both being generated artifacts, it is inconclusive—these entries may be generated by external tools, planned for future use, or no longer needed. Confirm these files are artifacts generated during normal development/build workflows before finalizing the gitignore changes.
11-16: I need to check if there are actually any.jsfiles in thescripts/directories that would be affected by these patterns, and examine the build configuration:I need to search for actual JavaScript files and examine the build configuration:
Based on my verification, I can now provide the rewritten review comment. The analysis reveals that while the patterns are appropriate for ignoring compiled TypeScript artifacts, there's a subtle scope issue with how the patterns work.
Verify nested directory handling—pattern scope may be too narrow.
The patterns
**/scripts/*.jsand**/scripts/*.d.tsonly match files directly inscripts/directories, not in nested subdirectories likescripts/utils/orscripts/__tests__/. When there is a separator at the beginning or middle of the pattern, then the pattern is relative to the directory level of the .gitignore file itself. Since compiled output from .ts files typically goes todist/or similar separate directories rather than inline with source, the current patterns are appropriate for the legacy/temporary compiled artifacts. However, if the build process ever emits.jsfiles within nestedscripts/subdirectories, those would not be caught. Confirm that your TypeScript build configuration outputs to a directory outside thescripts/tree (e.g.,dist/), or extend the patterns to**/scripts/**/*.jsif needed.CLAUDE.md (6)
210-239: API signatures verified—no issues found.All four documented Rust API entry points (
compile,gen_from_raw,gen_from_decomposed,gen_circuit_inputs) have been verified againstcompiler/src/lib.rs. Each function signature, including parameter types and return types, matches the documentation in CLAUDE.md exactly. TheCompilerResult<T>type alias is also correctly defined. Documentation is accurate.
39-51: All documented build commands align with configuration—no issues found.Verification confirms that all build commands referenced in CLAUDE.md are properly configured:
bun run buildandbun run build-releaseexist in package.json and correctly delegate to the compiler directorycargo buildandcargo build --releaseare executed within the build scriptswasm-pack build --target nodejsis configured in compiler/package.jsonwasm-pack build --target webis also configured as an additional build target- Cargo workspace properly includes compiler as a member
The documentation accurately reflects the build system configuration.
127-151: Pre-push hook implementation verified—no issues found.The documentation in CLAUDE.md is accurate. Verification confirms:
- Hook file exists at
git-hooks/pre-pushwith full implementation- Postinstall script correctly installs via
install-hookscommand- Template regeneration commands exist and are properly integrated
- Hook scans commits for
compiler/src/changes, regenerates templates, validates synchronization, and blocks push when needed- All described error recovery steps and emergency bypass are correct
153-161: The original review comment is incorrect—circom implementation of this workflow does not exist.Verification confirms:
gen-inputs:noirscript exists in scripts/package.json ✓gen-inputs:circomscript does not exist (onlygen-regex:circomis defined)noir/common/sample_haystacks/directory exists with JSON source files ✓circom/common/directory does not exist- circom/scripts/ contains only
gen-regex.ts, notgen-inputs.tsThe documentation references a test generation workflow for circom that is not implemented. Only the noir implementation has the complete input generation pipeline. Either remove the circom references from the documentation or implement the missing circom workflow.
Likely an incorrect or invalid review comment.
56-67: Disregard this review comment—the premise is incorrect.The PR does not remove legacy Circom tests. The documentation in CLAUDE.md accurately reflects the current state:
test:circomruns the existing Circom circuit tests (body_hash, email_addr, subject_all) that continue to exist incircom/circuits/tests/. No circom test files were modified or removed in this PR.Likely an incorrect or invalid review comment.
72-80: ****The documentation is accurate.
gen-inputs:circomdoes not exist in package.json because input generation for Circom is not part of the new TypeScript/Bun architecture. The Noir framework has separate template generation (gen-regex:noir) and input generation (gen-inputs:noir) scripts, while Circom only has template generation (gen-regex:circom). CLAUDE.md correctly reflects this architectural choice by documenting onlygen-inputs:noir. Listing a non-existent command would be inaccurate.Likely an incorrect or invalid review comment.
compiler/src/ir/intermediate.rs (3)
148-264: Well-designed epsilon elimination algorithm.The epsilon elimination implementation is sophisticated and carefully preserves capture group semantics. Notable strengths:
- Lines 189-196: The "CRITICAL FIX" correctly handles per-path captures for epsilon transitions
- Lines 221-243: Thoughtful handling of start states to avoid bypassing start captures
- Lines 199-207: Proper accumulation of end captures from epsilon closures
The algorithm correctly handles the complex interaction between epsilon transitions and capture groups, which is essential for regex patterns with optional groups.
440-496: Excellent regression test coverage.The test comprehensively validates the epsilon elimination fix for optional patterns before capture groups. It correctly:
- Tests both paths through the regex (with and without the optional prefix)
- Validates rejection of non-matching inputs
- Specifically checks for exactly one capture start event, preventing the duplicate marker bug
This is a high-quality regression test that will prevent future breakage of the epsilon elimination logic.
298-303: No issues found. The review comment is incorrect.The
capture_groupsmap only contains entries for epsilon transition targets by design. This is evident fromadd_capture_transition(builder.rs lines 209-230), which is the only place wherecapture_groupsentries are created. In that method, a target is inserted into bothepsilon_transitionsandcapture_groupstogether—there is no separate storage of byte transition captures incapture_groups.Therefore, the DFS function at lines 298-303 correctly iterates over all
capture_groupsentries because all such entries correspond to epsilon targets. The comment "Add captures from this state's epsilon transitions" is accurate.
| ## Error Handling | ||
|
|
||
| Structured error system with error codes: | ||
| - **E1001**: Invalid regex syntax | ||
| - **E1002**: Unsupported regex features | ||
| - **E3002**: Invalid capture group configuration | ||
| - **E5001**: Invalid configuration parameters | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Verify error codes and CompilerError enum structure.
The documented error codes (E1001, E1002, E3002, E5001) should be verified against the actual CompilerError enum implementation. The gap between E3002 and E5001 suggests either:
- Additional error codes are missing from the documentation
- There is intentional error code grouping/categorization
Ensure documentation is complete and matches implementation.
🏁 Script executed:
#!/bin/bash
# Description: Extract error codes from CompilerError implementation
echo "=== CompilerError enum and error codes ==="
rg 'enum CompilerError|pub enum CompilerError' compiler/src/ -A 30 -U
echo -e "\n=== Error code constants or variants ==="
rg 'E\d{4}|"E[0-9]{4}"' compiler/src/error.rs 2>/dev/null || \
rg 'E\d{4}|"E[0-9]{4}"' compiler/src/ --type rustLength of output: 3767
Update documentation to reflect all implemented error codes and their categories.
The documentation lists only 4 error codes (E1001, E1002, E3002, E5001), but the actual ErrorCode enum in compiler/src/error.rs contains 23 codes organized into 6 categories:
- E1xxx (Regex validation): E1001–E1004
- E2xxx (NFA construction): E2001–E2004
- E3xxx (Circuit generation): E3001–E3004
- E4xxx (Input processing): E4001–E4004
- E5xxx (Configuration): E5001–E5003
- E9xxx (Serialization/Internal): E9001–E9003
Update the CLAUDE.md documentation to include all codes and their intended purposes, matching the implementation.
🤖 Prompt for AI Agents
In CLAUDE.md around lines 245 to 252, the Error Handling section currently lists
only four error codes; update this section to enumerate all 23 implemented codes
and their categories to match compiler/src/error.rs. Replace the short list with
the six categories (E1xxx Regex validation: E1001–E1004 with brief purpose for
each, E2xxx NFA construction: E2001–E2004, E3xxx Circuit generation:
E3001–E3004, E4xxx Input processing: E4001–E4004, E5xxx Configuration:
E5001–E5003, E9xxx Serialization/Internal: E9001–E9003) and add a one-line
description for each code or consolidated description per category reflecting
the enum comments/intent.
Note
Replaces the old Rust compiler and extensive Circom tests with a new Bun/TypeScript scripts suite (gen-regex, gen-inputs), shared utilities, and Jest-based tests.
circom/scripts/gen-regex.ts,noir/scripts/gen-regex.ts,noir/scripts/gen-inputs.ts.utils/*) for logging, file ops, subprocess, and types.tsconfig.json,jest.config.ts,jest.setup.ts,bun.lock, andpackage.json.scripts/__tests__for script workflows and utilities.packages/compiler/*).packages/circom/tests/*.Written by Cursor Bugbot for commit 3d37c31. This will update automatically on new commits. Configure here.
Summary by CodeRabbit