Skip to content

Commit dbd1811

Browse files
committed
Update CLAUDE.md to reflect Rust 2024 edition and test improvements
1 parent b98a448 commit dbd1811

File tree

5 files changed

+208
-96
lines changed

5 files changed

+208
-96
lines changed

CLAUDE.md

Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
# Claude Development Notes
2+
3+
## Project Overview
4+
5+
This is a Rust implementation of IDNA (Internationalized Domain Names in Applications) based on the C++ implementation from [ada-url/ada](https://github.com/ada-url/ada). The project aims to provide a zero-dependency, high-performance IDNA library for Rust.
6+
7+
## Implementation Details
8+
9+
### Core Components
10+
11+
- **`src/domain.rs`** - Main IDNA conversion functions (`to_ascii`, `to_unicode`)
12+
- **`src/punycode.rs`** - RFC 3492 Punycode encoding/decoding (1:1 match with C++ implementation)
13+
- **`src/mapping.rs`** - Character mapping and case folding
14+
- **`src/normalization.rs`** - Unicode NFC normalization with composition tables
15+
- **`src/validation.rs`** - Character and domain validation
16+
- **`src/unicode.rs`** - UTF-8 ↔ UTF-32 conversion utilities
17+
- **`src/unicode_tables.rs`** - Unicode lookup tables extracted from C++ implementation
18+
19+
### Key Implementation Notes
20+
21+
1. **Zero Dependencies**: No external crates are used. All Unicode processing is implemented manually.
22+
23+
2. **Punycode Implementation**: Exact 1:1 match with C++ [ada_idna.cpp](https://raw.githubusercontent.com/ada-url/ada/refs/heads/main/src/ada_idna.cpp) implementation:
24+
- Uses same constants (BASE=36, TMIN=1, TMAX=26, SKEW=38, DAMP=700, INITIAL_BIAS=72, INITIAL_N=128)
25+
- Identical algorithm flow and bias adaptation
26+
27+
3. **Unicode Tables**: Extracted from C++ implementation ([ada_idna.cpp](https://raw.githubusercontent.com/ada-url/ada/refs/heads/main/src/ada_idna.cpp)) with proper dimensions:
28+
- `DECOMPOSITION_BLOCK`: 67×257 elements
29+
- `CANONICAL_COMBINING_CLASS_BLOCK`: 67×257 elements
30+
- `COMPOSITION_BLOCK`: 67×257 elements (17,219 total)
31+
32+
4. **Unicode Normalization**: Complete NFC implementation matching C++ behavior:
33+
- Canonical decomposition
34+
- Canonical combining class ordering
35+
- Canonical composition using two-level lookup tables
36+
37+
## Test Coverage
38+
39+
Comprehensive test suite covering:
40+
- Basic IDNA conversion (`to_ascii_tests.rs`, `to_unicode_tests.rs`)
41+
- Unicode identifier validation (`identifier_tests.rs`)
42+
- Punycode encoding/decoding (`punycode_tests.rs`)
43+
- Mapping and normalization (`mapping_tests.rs`, `normalization_tests.rs`)
44+
- Web Platform Tests compatibility (`wpt_tests.rs`)
45+
46+
## Development Commands
47+
48+
**⚠️ IMPORTANT: Always run tests, formatter, and clippy before committing changes ⚠️**
49+
50+
```bash
51+
# Build
52+
cargo build
53+
54+
# Run tests (ALWAYS run before committing)
55+
cargo test
56+
57+
# Lint (ALWAYS run before committing)
58+
cargo clippy
59+
60+
# Format (ALWAYS run before committing)
61+
cargo fmt
62+
```
63+
64+
### Pre-commit Checklist
65+
1. `cargo test` - All tests must pass
66+
2. `cargo clippy` - No clippy warnings allowed
67+
3. `cargo fmt` - Code must be properly formatted
68+
69+
## Current Status
70+
71+
**⚠️ INCOMPLETE IMPLEMENTATION ⚠️**
72+
73+
Known limitations:
74+
- Some test cases may fail due to expected value discrepancies
75+
- Unicode table data may need refinement
76+
- Error handling needs improvement
77+
- API subject to change
78+
79+
## Source References
80+
81+
- Original C++ header: https://raw.githubusercontent.com/ada-url/ada/refs/heads/main/include/ada/ada_idna.h
82+
- Original C++ implementation: https://raw.githubusercontent.com/ada-url/ada/refs/heads/main/src/ada_idna.cpp
83+
- Test cases adapted from: https://github.com/ada-url/idna/tree/main/tests
84+
85+
## Architecture Decisions
86+
87+
1. **Static vs Const Arrays**: Large Unicode tables use `static` instead of `const` to avoid stack overflow during compilation.
88+
89+
2. **UTF-32 Processing**: All Unicode processing is done in UTF-32 code points for simplicity and correctness.
90+
91+
3. **Error Handling**: Custom `IdnaError` enum for specific IDNA-related errors.
92+
93+
4. **Performance**: Optimized for common ASCII cases while maintaining full Unicode support.
94+
95+
## Build Configuration
96+
97+
- **Target**: Rust 2024 edition
98+
- **Dependencies**: None (zero-dependency implementation)
99+
- **Features**: No optional features
100+
- **Minimum Rust Version**: 1.85+ (for Rust 2024 edition)

tests/mapping_tests.rs

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -55,10 +55,9 @@ fn test_map_unicode_normalization() {
5555
("ø", "ø"),
5656
];
5757

58-
for (input, _expected) in test_cases {
58+
for (input, expected) in test_cases {
5959
let result = mapping::map(input);
60-
println!("Unicode mapping: '{}' -> '{}'", input, result);
61-
// Note: Exact expected values may need adjustment based on normalization rules
60+
assert_eq!(result, expected, "Unicode mapping mismatch for '{}'", input);
6261
}
6362
}
6463

@@ -78,10 +77,9 @@ fn test_map_case_folding() {
7877
("Ελληνικά", "ελληνικά"),
7978
];
8079

81-
for (input, _expected) in test_cases {
80+
for (input, expected) in test_cases {
8281
let result = mapping::map(input);
83-
println!("Case folding: '{}' -> '{}'", input, result);
84-
// Note: Some Unicode case folding rules are complex
82+
assert_eq!(result, expected, "Case folding mismatch for '{}'", input);
8583
}
8684
}
8785

@@ -103,10 +101,13 @@ fn test_map_special_characters() {
103101
), // Multiple handling
104102
];
105103

106-
for (input, _expected) in test_cases {
104+
for (input, expected) in test_cases {
107105
let result = mapping::map(input);
108-
println!("Special chars: '{}' -> '{}'", input, result);
109-
// Note: Expected behavior may vary based on IDNA mapping rules
106+
assert_eq!(
107+
result, expected,
108+
"Special chars mapping mismatch for '{}'",
109+
input
110+
);
110111
}
111112
}
112113

@@ -168,10 +169,13 @@ fn test_map_international_scripts() {
168169
("ไทย", "ไทย"),
169170
];
170171

171-
for (input, _expected) in test_cases {
172+
for (input, expected) in test_cases {
172173
let result = mapping::map(input);
173-
println!("International script: '{}' -> '{}'", input, result);
174-
// Note: Scripts without case distinctions should remain unchanged
174+
assert_eq!(
175+
result, expected,
176+
"International script mapping mismatch for '{}'",
177+
input
178+
);
175179
}
176180
}
177181

@@ -189,9 +193,12 @@ fn test_map_bidirectional_characters() {
189193
("test\u{202c}example", "testexample"),
190194
];
191195

192-
for (input, _expected) in test_cases {
196+
for (input, expected) in test_cases {
193197
let result = mapping::map(input);
194-
println!("Bidirectional: '{}' -> '{}'", input, result);
195-
// Note: Expected behavior depends on IDNA mapping rules for bidi chars
198+
assert_eq!(
199+
result, expected,
200+
"Bidirectional mapping mismatch for '{}'",
201+
input
202+
);
196203
}
197204
}

tests/to_ascii_tests.rs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ fn test_to_ascii_basic() {
1313
("bücher.example", "xn--bcher-kva.example"),
1414
];
1515

16-
for (input, _expected) in test_cases {
16+
for (input, expected) in test_cases {
1717
let result = to_ascii(input);
1818
assert!(
1919
result.is_ok(),
@@ -82,7 +82,7 @@ fn test_to_ascii_edge_cases() {
8282
("simple.café.com", "simple.xn--caf-dma.com"),
8383
];
8484

85-
for (input, _expected) in test_cases {
85+
for (input, expected) in test_cases {
8686
let result = to_ascii(input);
8787
assert!(
8888
result.is_ok(),

0 commit comments

Comments
 (0)