Skip to content

Commit 77d4d20

Browse files
authored
v3.2: streaming, severity, obs-route support, CFWS tolerance (#51)
All v3.2 roadmap items. Fully additive — no breaking changes for v3.1 callers; two notable tolerance expansions (inputs previously rejected as invalid now parse) that existing "strict whitespace" validators may want to know about. Streaming batch parsing: - Parse::parseStream(iterable, string): Generator<ParsedEmailAddress> — lazy parsing that yields one typed address at a time. Each input item may itself contain multiple separator-delimited addresses. Use for large batches where holding every parsed result in memory is undesirable. Severity classification: - ValidationSeverity backed enum (Critical / Warning / Info) with stable string backing values. - ParseErrorCode::severity() classifies every code. 13 codes are Warning (UTF-8 gating, C0/C1 controls, empty-quoted, FQDN, IP global-range, length limits, punycode conversion) — all structural/unparseable failures are Critical. - ParsedEmailAddress::invalidSeverity() returns the derived severity or null when the address is valid. - Rationale: callers can now distinguish "unparseable input" from "well-formed but policy-rejected" and choose to accept Warning-level failures in non-SMTP contexts. RFC 5322 §4.4 obs-route support: - ParseOptions::$allowObsRoute rule property (readonly, default false; enabled by default in rfc5322() and rfc2822() presets). - withAllowObsRoute() fluent builder. - New STATE_OBS_ROUTE absorbs `@host1,@host2:` source-route prefixes inside angle-addr, then resumes normal addr-spec parsing on the ':' terminator. - Captured route is exposed as ParsedEmailAddress::$obsRoute (null when no route was consumed). Also present as the `obs_route` key on the legacy array output. - Incomplete obs-route (`<@host>` with no colon before `>`) is flagged invalid with ParseErrorCode::IncompleteAddress. CFWS tolerance (RFC 5322 §3.2.2): - Folding whitespace is now absorbed via look-ahead in the whitespace handler at four positions that were previously rejected: * Trailing CFWS on local-part dot-atom: "local @Domain" * Leading CFWS on domain dot-atom: "local@ domain" * Leading CFWS inside angle-addr: "< local@domain>" * Trailing CFWS inside angle-addr before '>': "<local@domain >" - Folded whitespace (LF + WSP) is handled as part of the same run. - Comments in these positions were already supported in v3.0. - The look-ahead is positional: only whitespace directly preceding an '@', '>', or the first atext inside an angle-addr is absorbed. Whitespace in other positions (e.g. between atext tokens in a dot-atom) still errors per existing behavior. Tests: 42 tests / 445 assertions (up from 36 / 426 in v3.1). Project coverage: 89.61% lines (up from 88.78%). Docs: CHANGELOG v3.2.0, UPGRADE v3.1 → v3.2 with migration notes for the CFWS and obs-route tolerance changes, ROADMAP items flipped to [x], README updated with parseStream example and allowObsRoute rule property. Tooling: composer stan now runs with --memory-limit=512M to accommodate the larger codebase.
1 parent e3236cd commit 77d4d20

11 files changed

Lines changed: 567 additions & 28 deletions

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,27 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66

77
## [Unreleased]
88

9+
## [3.2.0]
10+
11+
Streaming batch parsing, severity classification for validation errors, RFC 5322 §4.4 obs-route support, and broader CFWS tolerance around addr-spec boundaries. All additions are non-breaking for v3.1 callers.
12+
13+
### Added
14+
- `Parse::parseStream(iterable, string): Generator<ParsedEmailAddress>` — lazy batch parsing that yields one typed address at a time, reducing memory footprint for large inputs (CSV rows, pipelines, etc.). Each input item may itself contain multiple separator-delimited addresses.
15+
- `ValidationSeverity` backed enum with `Critical`, `Warning`, `Info` cases. Callers can distinguish structural parse failures (Critical) from policy violations where the address is syntactically well-formed (Warning) to accept soft failures in non-SMTP contexts.
16+
- `ParseErrorCode::severity(): ValidationSeverity` — every error code is now classified. 13 codes are Warning (UTF-8 rejection, C0/C1 controls, empty-quoted, FQDN requirement, IP global-range, length limits, punycode conversion); all others are Critical.
17+
- `ParsedEmailAddress::invalidSeverity(): ?ValidationSeverity` — derived from `invalidReasonCode`; returns `null` when the address is valid.
18+
- RFC 5322 §4.4 obs-route support: `<@host1,@host2:user@host3>` source-route prefixes are recognized and stripped; the real addr-spec becomes the parsed address. The route string is captured on `ParsedEmailAddress::$obsRoute`. Gated by `ParseOptions::$allowObsRoute` (default `false`; enabled in `rfc5322()` and `rfc2822()`).
19+
- `ParseOptions::$allowObsRoute` property and `withAllowObsRoute()` fluent builder.
20+
- `obs_route` field on the array output of `Parse::parse()` (populated when an obs-route is consumed; `null` otherwise).
21+
22+
### Changed
23+
- RFC 5322 §3.2.2 CFWS: folding whitespace is now absorbed at dot-atom boundaries and around angle-addr delimiters via look-ahead in the whitespace handler. Previously-rejected inputs like `local @domain.com`, `local@ domain.com`, `< local@domain.com >`, `<local @ domain.com>`, and multi-line folded whitespace now parse successfully.
24+
- Parser internal: added `STATE_OBS_ROUTE` state for absorbing obs-route prefixes; added `in_angle_addr` and `obs_route` tracking fields to the internal email-address accumulator.
25+
- `composer stan` now runs with `--memory-limit=512M` to accommodate the larger codebase.
26+
27+
### Fixed
28+
- None — no behavior regressions; only additions and tolerance expansions.
29+
930
## [3.1.0]
1031

1132
Immutable `ParseOptions`, typed value-object output, structured error codes, and two new validation rules. All additions are non-breaking for v3.0 callers; readonly rule properties are a hard cutover for code that was mutating them directly (the factory methods and deprecated setters continue to work).

README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,12 @@ if ($address->invalid) {
4545

4646
$result = Parse::getInstance()->parseMultiple('a@a.com, b@b.com');
4747
foreach ($result->emailAddresses as $addr) { /* ... */ }
48+
49+
// Streaming for large batches (v3.2+) — yields one address at a time.
50+
foreach (Parse::getInstance()->parseStream($csvRows) as $addr) {
51+
if ($addr->invalid) continue;
52+
// ...
53+
}
4854
```
4955

5056
### Advanced Usage with ParseOptions
@@ -166,6 +172,7 @@ $parser = new Parse(null, $options);
166172
| `applyNfcNormalization` | `false` | Apply NFC Unicode normalization (RFC 6532 §3.1) |
167173
| `validateDisplayNamePhrase` | `false` | Enforce RFC 5322 §3.2.5 phrase syntax on unquoted display names |
168174
| `strictIdna` | `false` | Apply full IDNA2008 conformance on U-label domains (RFC 5891/5892/5893) |
175+
| `allowObsRoute` | `false` | Accept RFC 5322 §4.4 obs-route source-routes like `<@host1,@host2:user@host3>` |
169176
| **Length & Output** | | |
170177
| `enforceLengthLimits` | `true` | Enforce RFC 5321 length limits (64/254/63) |
171178
| `includeDomainAscii` | `false` | Include punycode `domain_ascii` in output |

ROADMAP.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -38,22 +38,24 @@ Future plans by version. Items here are intent, not commitment — priority and
3838
- [x] `strictIdna: bool` — apply full IDNA2008 conformance (`IDNA_USE_STD3_RULES | IDNA_CHECK_BIDI | IDNA_CHECK_CONTEXTJ | IDNA_NONTRANSITIONAL_TO_ASCII`) per RFC 5891/5892/5893. Enabled by default in `rfc6531()`.
3939
- [x] Extended test coverage: 265 assertions (target: 250+).
4040

41-
## v3.2 — Streaming, Severity Levels, Obsolete Syntax
41+
## v3.2 — Streaming, Severity Levels, Obsolete Syntax — shipped
4242

4343
**Batch streaming:**
44-
- [ ] `parseStream(iterable): Generator`yield `ParsedEmailAddress` one at a time for large email lists, reducing memory footprint.
44+
- [x] `Parse::parseStream(iterable, string): Generator<ParsedEmailAddress>`yields one typed address at a time; each input item may itself contain multiple separator-delimited addresses.
4545

4646
**Validation severity levels:**
47-
- [ ] Add a `ValidationSeverity` enum (`Critical`, `Warning`, `Info`) attached to each parsed address — allows callers to accept "soft" failures while rejecting hard ones.
47+
- [x] `ValidationSeverity` enum with `Critical`, `Warning`, `Info` cases.
48+
- [x] `ParseErrorCode::severity()` method classifying every code (13 Warning, rest Critical).
49+
- [x] `ParsedEmailAddress::invalidSeverity()` accessor returning the derived severity (or `null` when valid).
4850

4951
**Obsolete syntax extensions (RFC 5322 §4):**
5052

51-
> Note: `obs-local-part` is already supported via `allowObsLocalPart` in v3.0. The items below cover the remaining obsolete forms.
53+
> Note: `obs-local-part` was already supported via `allowObsLocalPart` in v3.0.
5254
53-
- [ ] `obs-route` handling for the `rfc5322()` preset.
54-
- [ ] CFWS (comments / folding whitespace) improvements.
55-
- [ ] `obs-angle-addr` support.
56-
- [ ] `obs-domain-list` syntax for the `rfc2822()` preset.
55+
- [x] `obs-route` handling `ParseOptions::$allowObsRoute` gates acceptance of `<@host1,@host2:user@host3>` source-route prefixes; the route is captured on `ParsedEmailAddress::$obsRoute`. Enabled by default in `rfc5322()` and `rfc2822()`.
56+
- [x] `obs-angle-addr` — implied by obs-route support (it is the outer `[CFWS] "<" obs-route addr-spec ">" [CFWS]` form).
57+
- [x] `obs-domain-list` — the `*("," [CFWS] ["@" domain])` shape is consumed inside `STATE_OBS_ROUTE`.
58+
- [x] CFWS (comments / folding whitespace) improvements — look-ahead in the whitespace handler now absorbs CFWS at dot-atom boundaries (`local @domain`, `local@ domain`, `local @ domain`) and around angle-addr delimiters (`< local@domain >`, `<local @ domain>`), including folded whitespace (LF + WSP). Comments in these positions were already supported in v3.0.
5759

5860
## v4.0 — Breaking Modernization
5961

UPGRADE.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,44 @@
11
# Upgrade Guide
22

3+
## v3.1 → v3.2
4+
5+
v3.2 is fully additive — no breaking changes. Two behavior changes are worth noting for callers who depended on them:
6+
7+
### Behavior Changes (Tolerance Expansions)
8+
9+
**CFWS around `@` and inside `<…>` is now accepted.** The v3.1 parser rejected these inputs as "Email address contains whitespace"; v3.2 treats them as RFC 5322 §3.2.2 folding whitespace:
10+
11+
```php
12+
// All of these now parse successfully (v3.2+):
13+
'local @domain.com' // trailing CFWS on local-part
14+
'local@ domain.com' // leading CFWS on domain
15+
'local @ domain.com' // both
16+
'< local@domain.com >' // inside angle-addr
17+
'<local @ domain.com>' // both, inside angle-addr
18+
"local\n\t@domain.com" // folded whitespace
19+
```
20+
21+
If your code validated that addresses are "tight" (no whitespace), re-check with the v3.2 definition — these now register as `invalid=false`.
22+
23+
**Obs-route `<@host:addr>` is accepted in `rfc5322()` and `rfc2822()` presets.** Previously rejected as "Invalid character in domain"; now recognized, stripped, and the real addr-spec is exposed. The captured route is available as `$parsed->obsRoute`. Disabled in `rfc5321()` and legacy defaults — no change there. To opt out, call `->withAllowObsRoute(false)` on the preset.
24+
25+
### Additions (Non-Breaking)
26+
27+
- **`Parse::parseStream(iterable, string): Generator`** — lazy batch parsing. Use it for large inputs where holding every `ParsedEmailAddress` in memory is undesirable.
28+
- **`ValidationSeverity` enum**`Critical` / `Warning` / `Info`. Access via `$parsed->invalidSeverity()` or `$errorCode->severity()`. Use it to distinguish "unparseable" from "policy-rejected but well-formed":
29+
```php
30+
if ($parsed->invalid && $parsed->invalidSeverity() === ValidationSeverity::Warning) {
31+
// Well-formed address rejected by a configured rule (UTF-8, FQDN, IP range, length).
32+
// Safe to accept in non-SMTP contexts if desired.
33+
}
34+
```
35+
- **`ParsedEmailAddress::$obsRoute`** — captured obs-route prefix (e.g. `@hostA,@hostB`) when one was stripped. `null` for normal addresses.
36+
- **`ParseOptions::$allowObsRoute`** (readonly) + `withAllowObsRoute()` builder.
37+
38+
### Minimum Requirements (Unchanged)
39+
40+
PHP `^8.1`, `ext-mbstring`, `ext-intl`.
41+
342
## v3.0 → v3.1
443

544
v3.1 is additive with one hard cutover: the 15 `ParseOptions` rule properties are now `readonly`. Factory presets and the deprecated setters still work. Everything else is new and non-breaking.

composer.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@
5252
"test:coverage": "phpunit --coverage-html coverage",
5353
"cs:check": "php-cs-fixer fix --dry-run --diff",
5454
"cs:fix": "php-cs-fixer fix",
55-
"stan": "phpstan analyse",
55+
"stan": "phpstan analyse --memory-limit=512M",
5656
"ci": [
5757
"@cs:check",
5858
"@stan",

0 commit comments

Comments
 (0)