Replace hand-rolled Base64Decoder in System.Private.Xml with System.Buffers.Text.Base64

> [!NOTE]
> This issue was drafted with Copilot assistance.

Per [this comment](https://github.com/dotnet/runtime/pull/125930#issuecomment-4113519119) from @stephentoub, `System.Xml.Base64Decoder` has its own hand-rolled base64 decoding implementation that should be replaced with the modern `System.Buffers.Text.Base64` APIs.

## Current state

[`Base64Decoder.cs`](https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.Xml/src/System/Xml/Base64Decoder.cs) contains:
- A 123-byte lookup table mapping ASCII chars to base64 digit values
- Manual bit-shifting to assemble decoded bytes (`(b << 6) | digit`, extract when `bFilled >= 8`)
- Hand-rolled `=` padding consumption and post-padding whitespace validation
- Streaming state carried across calls via `_bits` / `_bitsFilled` fields

The encoder side (`Base64Encoder.cs`) already delegates to `Convert.ToBase64CharArray` — the decoder is the outlier.

## Proposed change

Replace the core `Decode(ReadOnlySpan, Span, ...)` logic with `System.Buffers.Text.Base64` APIs. The `OperationStatus`-based overloads (e.g., `Base64.DecodeFromUtf8`) support partial input/output natively, which maps well to the streaming `IncrementalReadDecoder` pattern where the caller may supply more chars than fit in the output buffer.

Key considerations:
- The XML decoder skips whitespace inline; the BCL base64 APIs may need a pre-pass or use of an overload that handles whitespace.
- Error messages should identify the offending character (as fixed in #125930), not dump the whole buffer.
- The `IncrementalReadDecoder` contract requires tracking how many chars were consumed and bytes produced, which `OperationStatus` provides.

## Aside: other XML codec opportunities

While auditing the XML encoder/decoder surface, two other areas surfaced that are less clear-cut but worth noting:

**`XmlUtf8RawTextWriter.EncodeMultibyteUTF8` / `EncodeSurrogate`** — Hand-rolled UTF-8 byte assembly using `0xC0`/`0xE0`/`0xF0` masks and bit-shifting in unsafe pointer code. Could potentially use `Rune.EncodeToUtf8`, but this is a hot path with unsafe pointer arithmetic where the manual approach may be faster. Would need benchmarking before changing.

**`BinHexDecoder.Decode(ReadOnlySpan, bool)` (static overload)** — Allocates a `byte[]`, decodes char-by-char, then `Array.Resize`s. The instance path already uses `HexConverter.FromChar` (modern), but the static convenience method could potentially use `Convert.FromHexString` with whitespace pre-stripping. The complication is XML-specific behavior: inline whitespace skipping and optional odd-count tolerance. The `BinHexEncoder` side is already fully modernized (`HexConverter.EncodeToUtf16`, `Convert.ToHexString`).

## Feasibility / effort assessment

> [!NOTE]
> This assessment was generated with Copilot assistance.

### Code churn: **Low**
- **1 file changed**: `Base64Decoder.cs` (~170 lines total, ~80-line core `Decode` method)
- Deletes the 123-byte lookup table, manual bit-shifting loop, and `_bits`/`_bitsFilled` state fields
- Replaces with calls to `Base64.DecodeFromChars(ReadOnlySpan<char>, Span<byte>, out int charsConsumed, out int bytesWritten, bool isFinalBlock)` which already handles inline whitespace
- The `IncrementalReadDecoder` wrapper and its `SetNextOutputBuffer`/`DecodedCount`/`IsFull` contract stay unchanged
- Integration surface is well-encapsulated: `ReadContentAsBinaryHelper.cs` and `XmlTextReaderImpl.cs` create/use the decoder, but their code doesn't change

### Difficulty: **Low–Medium**
- **Easy part**: The BCL API is almost a drop-in — `DecodeFromChars` accepts `ReadOnlySpan<char>`, returns `OperationStatus` with `charsConsumed`/`bytesWritten`, and skips whitespace natively
- **Tricky parts**:
  - **Whitespace definition**: The XML decoder uses `XmlCharType.IsWhiteSpace`; the BCL uses space/tab/CR/LF. Need to verify these match for base64 content
  - **Error reporting**: Current code throws `XmlException(SR.Xml_InvalidBase64Value, ...)` — the BCL returns `OperationStatus.InvalidData` without identifying the offending character. The error message quality from #125930 needs preserving, so some post-failure inspection logic may be needed
  - **Streaming state**: Current code carries `_bits`/`_bitsFilled` across calls. The BCL's `isFinalBlock=false` handles this, but the mapping needs care
  - **Padding behavior**: Current code has custom `=` consumption and post-padding whitespace-only validation. Need to verify BCL matches

### Test coverage: **Good (indirect)**
- ~35+ test methods in `ReadBase64.cs` cover: valid decoding, chunked/streaming reads, whitespace in middle, `=` padding, invalid chars, argument validation, overflow regression, reader state after reads
- Tests run across 5+ reader implementations (factory, subtree, wrapped, char-checking, custom) via inherited test classes
- **Gap**: No direct unit tests for `Base64Decoder` — all coverage is through `XmlReader` APIs. Sufficient to catch regressions, but a contributor could optionally add targeted edge-case tests

### Perf risk: **Low**
- The BCL `DecodeFromChars` is SIMD-optimized and should be faster than the scalar char-by-char lookup table loop
- This is not a hot path in typical XML workloads (base64 in XML is relatively niche: MTOM, embedded binary)

### Perf test coverage: **None** ⚠️
- No XML base64 benchmarks exist in `dotnet/performance` (searched for `ReadContentAsBase64`, `ReadElementContentAsBase64`, `XmlReader` benchmarks)
- BCL-level `Base64` benchmarks exist but don't exercise the XML reader path
- Not a blocker given the low perf risk

### Reviewability: **High**
- Small, self-contained change in one file with a clear "delete hand-rolled, call BCL" narrative

### What we get
- Eliminates a 123-byte magic lookup table and manual bit-shifting
- Consistency: encoder side already delegates to `Convert.ToBase64CharArray`; decoder becomes symmetric
- Future BCL base64 optimizations flow through automatically
- Net deletion of ~50–60 lines of tricky decoding logic


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace hand-rolled Base64Decoder in System.Private.Xml with System.Buffers.Text.Base64 #125993

Current state

Proposed change

Aside: other XML codec opportunities

Feasibility / effort assessment

Code churn: Low

Difficulty: Low–Medium

Test coverage: Good (indirect)

Perf risk: Low

Perf test coverage: None ⚠️

Reviewability: High

What we get

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Replace hand-rolled Base64Decoder in System.Private.Xml with System.Buffers.Text.Base64 #125993

Description

Current state

Proposed change

Aside: other XML codec opportunities

Feasibility / effort assessment

Code churn: Low

Difficulty: Low–Medium

Test coverage: Good (indirect)

Perf risk: Low

Perf test coverage: None ⚠️

Reviewability: High

What we get

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions