|
| 1 | +# Claude Development Notes |
| 2 | + |
| 3 | +## Project Overview |
| 4 | + |
| 5 | +This is a Rust implementation of IDNA (Internationalized Domain Names in Applications) based on the C++ implementation from [ada-url/ada](https://github.com/ada-url/ada). The project aims to provide a zero-dependency, high-performance IDNA library for Rust. |
| 6 | + |
| 7 | +## Implementation Details |
| 8 | + |
| 9 | +### Core Components |
| 10 | + |
| 11 | +- **`src/domain.rs`** - Main IDNA conversion functions (`to_ascii`, `to_unicode`) |
| 12 | +- **`src/punycode.rs`** - RFC 3492 Punycode encoding/decoding (1:1 match with C++ implementation) |
| 13 | +- **`src/mapping.rs`** - Character mapping and case folding |
| 14 | +- **`src/normalization.rs`** - Unicode NFC normalization with composition tables |
| 15 | +- **`src/validation.rs`** - Character and domain validation |
| 16 | +- **`src/unicode.rs`** - UTF-8 ↔ UTF-32 conversion utilities |
| 17 | +- **`src/unicode_tables.rs`** - Unicode lookup tables extracted from C++ implementation |
| 18 | + |
| 19 | +### Key Implementation Notes |
| 20 | + |
| 21 | +1. **Zero Dependencies**: No external crates are used. All Unicode processing is implemented manually. |
| 22 | + |
| 23 | +2. **Punycode Implementation**: Exact 1:1 match with C++ [ada_idna.cpp](https://raw.githubusercontent.com/ada-url/ada/refs/heads/main/src/ada_idna.cpp) implementation: |
| 24 | + - Uses same constants (BASE=36, TMIN=1, TMAX=26, SKEW=38, DAMP=700, INITIAL_BIAS=72, INITIAL_N=128) |
| 25 | + - Identical algorithm flow and bias adaptation |
| 26 | + |
| 27 | +3. **Unicode Tables**: Extracted from C++ implementation ([ada_idna.cpp](https://raw.githubusercontent.com/ada-url/ada/refs/heads/main/src/ada_idna.cpp)) with proper dimensions: |
| 28 | + - `DECOMPOSITION_BLOCK`: 67×257 elements |
| 29 | + - `CANONICAL_COMBINING_CLASS_BLOCK`: 67×257 elements |
| 30 | + - `COMPOSITION_BLOCK`: 67×257 elements (17,219 total) |
| 31 | + |
| 32 | +4. **Unicode Normalization**: Complete NFC implementation matching C++ behavior: |
| 33 | + - Canonical decomposition |
| 34 | + - Canonical combining class ordering |
| 35 | + - Canonical composition using two-level lookup tables |
| 36 | + |
| 37 | +## Test Coverage |
| 38 | + |
| 39 | +Comprehensive test suite covering: |
| 40 | +- Basic IDNA conversion (`to_ascii_tests.rs`, `to_unicode_tests.rs`) |
| 41 | +- Unicode identifier validation (`identifier_tests.rs`) |
| 42 | +- Punycode encoding/decoding (`punycode_tests.rs`) |
| 43 | +- Mapping and normalization (`mapping_tests.rs`, `normalization_tests.rs`) |
| 44 | +- Web Platform Tests compatibility (`wpt_tests.rs`) |
| 45 | + |
| 46 | +## Development Commands |
| 47 | + |
| 48 | +**⚠️ IMPORTANT: Always run tests, formatter, and clippy before committing changes ⚠️** |
| 49 | + |
| 50 | +```bash |
| 51 | +# Build |
| 52 | +cargo build |
| 53 | + |
| 54 | +# Run tests (ALWAYS run before committing) |
| 55 | +cargo test |
| 56 | + |
| 57 | +# Lint (ALWAYS run before committing) |
| 58 | +cargo clippy |
| 59 | + |
| 60 | +# Format (ALWAYS run before committing) |
| 61 | +cargo fmt |
| 62 | +``` |
| 63 | + |
| 64 | +### Pre-commit Checklist |
| 65 | +1. `cargo test` - All tests must pass |
| 66 | +2. `cargo clippy` - No clippy warnings allowed |
| 67 | +3. `cargo fmt` - Code must be properly formatted |
| 68 | + |
| 69 | +## Current Status |
| 70 | + |
| 71 | +**⚠️ INCOMPLETE IMPLEMENTATION ⚠️** |
| 72 | + |
| 73 | +Known limitations: |
| 74 | +- Some test cases may fail due to expected value discrepancies |
| 75 | +- Unicode table data may need refinement |
| 76 | +- Error handling needs improvement |
| 77 | +- API subject to change |
| 78 | + |
| 79 | +## Source References |
| 80 | + |
| 81 | +- Original C++ header: https://raw.githubusercontent.com/ada-url/ada/refs/heads/main/include/ada/ada_idna.h |
| 82 | +- Original C++ implementation: https://raw.githubusercontent.com/ada-url/ada/refs/heads/main/src/ada_idna.cpp |
| 83 | +- Test cases adapted from: https://github.com/ada-url/idna/tree/main/tests |
| 84 | + |
| 85 | +## Architecture Decisions |
| 86 | + |
| 87 | +1. **Static vs Const Arrays**: Large Unicode tables use `static` instead of `const` to avoid stack overflow during compilation. |
| 88 | + |
| 89 | +2. **UTF-32 Processing**: All Unicode processing is done in UTF-32 code points for simplicity and correctness. |
| 90 | + |
| 91 | +3. **Error Handling**: Custom `IdnaError` enum for specific IDNA-related errors. |
| 92 | + |
| 93 | +4. **Performance**: Optimized for common ASCII cases while maintaining full Unicode support. |
| 94 | + |
| 95 | +## Build Configuration |
| 96 | + |
| 97 | +- **Target**: Rust 2024 edition |
| 98 | +- **Dependencies**: None (zero-dependency implementation) |
| 99 | +- **Features**: No optional features |
| 100 | +- **Minimum Rust Version**: 1.85+ (for Rust 2024 edition) |
0 commit comments