Skip to content

Conversation

@sri-adarsh-kumar
Copy link

@sri-adarsh-kumar sri-adarsh-kumar commented Oct 25, 2025

PR: Implement error collection for deserialization (#1196)

AI Disclosure

This is an AI-generated solution. The requirements were specified by a human developer based on a 9-year-old feature request from @odrotbohm, and the implementation was fully built by Claude Code (an AI coding agent).

Overview

Adds opt-in error collection capability to Jackson deserialization, allowing applications to collect multiple deserialization errors instead of failing fast on the first error. This addresses issue #1196 requested by Oliver Drotbohm in 2016.

Target branch: 3.x (for Jackson 3.1.0)

Motivation

Currently, Jackson fails fast on the first deserialization error, requiring a fix-retry cycle to discover all problems in a payload. This is inefficient for:

  • API validation: Return all validation errors to clients in one response
  • Batch processing: Log all errors but continue processing
  • Development tooling: Show developers all issues at once
  • Data migration: Identify all problematic records upfront

Key Features

1. Opt-in Per-Call Design

  • No global configuration - must explicitly call collectErrors() on ObjectReader
  • Thread-safe - each call gets its own isolated error bucket
  • Backward compatible - default behavior unchanged

2. DoS Protection

  • Default limit: 100 errors per call (configurable)
  • Hard failure when limit exceeded, with collected errors as suppressed exceptions
  • Prevents memory exhaustion from malicious payloads

3. Best-Effort Collection

  • Collects recoverable errors (type mismatches, unknown properties, etc.)
  • Non-recoverable errors (malformed JSON, structural issues) still fail immediately
  • Failed fields get default values (0 for primitives, null for objects)

4. JSON Pointer Paths (RFC 6901)

  • Each error includes path to problematic field: /items/0/price
  • Proper escaping: ~~0, /~1
  • Makes it easy to locate issues in complex nested structures

5. Thread Safety

  • Per-call error buckets - safe to reuse ObjectReader across threads
  • No shared state between calls

API Usage

Basic Usage

ObjectMapper mapper = new JsonMapper();
ObjectReader reader = mapper.readerFor(Order.class).collectErrors();

try {
    Order result = reader.readValueCollecting(json);
    // Success - all fields valid
} catch (DeferredBindingException ex) {
    System.out.println("Found " + ex.getProblems().size() + " problems:");
    for (CollectedProblem problem : ex.getProblems()) {
        System.out.println(problem.getPath() + ": " + problem.getMessage());
        // Can also access problem.getRawValue() to see bad input
    }
}

Custom Error Limit

// Collect up to 10 errors before hard failure
ObjectReader reader = mapper.readerFor(Order.class).collectErrors(10);

Multiple Input Types

// Supports all standard input types
Order order1 = reader.readValueCollecting(jsonString);
Order order2 = reader.readValueCollecting(jsonBytes);
Order order3 = reader.readValueCollecting(file);
Order order4 = reader.readValueCollecting(inputStream);
Order order5 = reader.readValueCollecting(reader);

Implementation Details

New Classes

1. CollectedProblem

Immutable value object representing a single deserialization error:

  • path: JSON Pointer to problematic field (e.g., /items/0/price)
  • message: Human-readable error message
  • rawValue: The invalid input value that caused the error
  • targetType: Expected Java type

2. DeferredBindingException

Aggregate exception thrown when errors are collected:

  • problems: List of all collected errors
  • partialResult: The partially deserialized object (with default values for failed fields)
  • limitReached: Flag indicating if error limit was hit
  • Extends DatabindException for compatibility

3. CollectingProblemHandler

Stateless problem handler that collects errors during deserialization:

  • Implements DeserializationProblemHandler interface
  • No shared state - all state passed via DeserializationContext
  • Handles type mismatches, unknown properties, and other recoverable errors
  • Returns default values to allow deserialization to continue

4. ObjectReader Extensions

New methods:

  • collectErrors(): Enable collection with default limit (100)
  • collectErrors(int maxProblems): Enable collection with custom limit
  • readValueCollecting(...): Read with error collection (5 overloads for different input types)

Design Decisions

  1. Per-call buckets: Each readValueCollecting() call gets a fresh error list stored in DeserializationContext attributes
  2. Default values: Failed fields use Java defaults (0, false, null) to allow deserialization to continue
  3. Suppressed exceptions: When hard failure occurs (limit reached, non-recoverable error), collected problems are attached as suppressed exceptions
  4. No recursion: Skips children of unknown properties to avoid cascading errors

Testing

Comprehensive test suite with 31 tests covering:

Core Functionality

  • Default fail-fast behavior (2 tests): Unchanged when not using error collection
  • Per-call bucket isolation (2 tests): Successive and concurrent calls don't share errors
  • Thread safety (1 test): 10 threads processing in parallel with isolated buckets

Edge Cases

  • JSON Pointer escaping (4 tests): Tilde, slash, combined, array indices
  • Limit behavior (3 tests): Default limit, custom limit, under limit
  • Unknown properties (2 tests): Collection and child skipping
  • Type coercions (6 tests): int, long, double, float, boolean, boxed types
  • Root-level errors (2 tests): Non-recoverable, path formatting
  • Hard failures (1 test): Suppressed exception attachment
  • Message formatting (2 tests): Single error, multiple errors
  • Input types (6 tests): byte[], File, InputStream, Reader, null handling, empty JSON

Backward Compatibility

  • 100% backward compatible - no changes to existing behavior
  • Feature is completely opt-in via explicit method calls
  • New exception type extends existing DatabindException
  • No performance impact when not using the feature
  • No breaking changes to any existing APIs

Performance Considerations

  • Negligible overhead when disabled: Feature only activates when explicitly requested
  • Per-call allocation: Each call allocates a small ArrayList for errors (default capacity: 10)
  • Early termination: Stops collecting at configured limit to prevent DoS
  • No synchronization: Thread-safe through isolation, not locking

Use Cases

API Validation

@PostMapping("/orders")
public ResponseEntity<?> createOrder(@RequestBody String json) {
    try {
        Order order = orderReader.readValueCollecting(json);
        return ResponseEntity.ok(orderService.save(order));
    } catch (DeferredBindingException ex) {
        List<ValidationError> errors = ex.getProblems().stream()
            .map(p -> new ValidationError(p.getPath().toString(), p.getMessage()))
            .toList();
        return ResponseEntity.badRequest().body(errors);
    }
}

Batch Processing

for (String record : records) {
    try {
        Data data = reader.readValueCollecting(record);
        process(data);
    } catch (DeferredBindingException ex) {
        logger.error("Record validation failed with {} errors:",
                     ex.getProblems().size());
        ex.getProblems().forEach(p ->
            logger.error("  {}: {}", p.getPath(), p.getMessage()));
        // Continue with next record
    }
}

Limitations

  1. Best-effort: Not all errors can be collected

    • Malformed JSON (missing braces) fails immediately
    • Structural problems fail immediately
    • Type conversion errors ARE collected
    • Unknown property errors ARE collected (if enabled)
  2. Default values: Failed fields get defaults

    • Primitives: 0, false
    • Objects: null
    • May affect downstream validation logic
  3. Error limit: Default 100 errors

    • Prevents DoS but may not catch all issues in very large payloads
    • Configurable per call

Future Enhancements

Potential future improvements (not in this PR):

  • Structured error types (beyond just messages)
  • Integration with Bean Validation (JSR-380)
  • Custom error formatters
  • Error severity levels
  • Configurable default value strategies

Generated by: Claude Code (Anthropic)
Specified by: @sri-adarsh-kumar
Original feature request: @odrotbohm (2016)

*
* @param maxProblems Maximum number of problems to collect (must be > 0)
* @return A new ObjectReader configured for error collection
* @since 3.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the throws should be included in the javadoc

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added now. Thanks for the feedback.

_assertNotNull("p", p);

// CRITICAL: Allocate a FRESH bucket for THIS call (thread-safety)
List<tools.jackson.databind.exc.CollectedProblem> bucket = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to use fully qualified class name here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed now. Thanks.


// Create per-call attributes with the fresh bucket
ContextAttributes perCallAttrs = _config.getAttributes()
.withPerCallAttribute(tools.jackson.databind.deser.CollectingProblemHandler.class, bucket);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, avoid using fully qualified class names - use imports so the code can be kept concise

* @return A new ObjectReader configured for error collection
* @since 3.1
*/
public ObjectReader collectErrors() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like the names of the new APIs. I don't especially like 'collect' but I think 'Error' is plain wrong.
'Error' has a very explicit meaning in Java - examples include OutOfMemoryError. Exception is more correct than Error here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like problemCollectingReader() etc would make more sense

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to problemCollectingReader and readValueCollectingProblems method names

@JooHyukKim
Copy link
Member

-1 on this PR itself due to the sheer amount of line changes.

Ai code assistants should really get trained on splitting tasks to safer(or at least feeling so) bits of progression

* @param segment The raw segment (property name or array index)
* @return Escaped segment safe for JSON Pointer
*/
private String escapeJsonPointerSegment(String segment) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not duplicate code here instead of using src/main/java/com/fasterxml/jackson/core/JsonPointer.java (from jackson-core)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointer 😉 . Changed now.

} catch (tools.jackson.databind.exc.DeferredBindingException e) {
throw e; // Already properly formatted

} catch (DatabindException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea to only catch DatabindExceptions and not other JacksonExceptions?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is intentional. DatabindException represents deserialization/binding errors that we want to collect (missing properties, type mismatches, etc.).

Other Jackson exceptions (like JsonParseException, StreamReadException) might pertain more to malformed JSON structure that would fail fast and not be collected.

* Unique private key object for the maximum problem limit attribute.
* Using a dedicated object prevents collisions with user attributes.
*/
private static final class MaxProblemsKey {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we store configuration as context attribute, instead of within ProblemHandler itself, as config setting?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.... or maybe as typed collector state added to context, I guess.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cowtowncoder Can you clarify if there is any change expected for this?
From my understanding, the configuration is currently part of the context as a typed collector state.

/**
* Enables error collection mode with a custom problem limit.
*
* <p><b>Thread-safety</b>: The returned reader is immutable and thread-safe.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to explain that this is implemented as a problem handler, replacing any handler that might have been formerly configured.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in the Javadoc for problemCollectingReader method.

sri-adarsh-kumar and others added 7 commits October 26, 2025 12:57
- Add CollectedProblem: immutable value object for error details
- Add DeferredBindingException: aggregate exception for multiple errors
- Add CollectingProblemHandler: stateless handler collecting recoverable errors
- Add ObjectReader.collectErrors() and readValueCollecting() methods
- Add comprehensive test suite (27 tests) covering all scenarios

Features:
- Opt-in per-call error collection (no global config)
- Thread-safe with per-call bucket isolation
- RFC 6901 compliant JSON Pointer paths
- DoS protection with configurable limit (default 100)
- Primitive vs reference type default value policy
- Suppressed exception support for hard failures

Tests verify:
- Per-call bucket isolation (concurrent + successive)
- JSON Pointer escaping (tilde, slash, combined)
- Limit reached behavior
- Unknown property handling
- Default value policy
- Message formatting
- Edge cases

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Address review feedback from specs/issue-1196-collecting-deserialization-errors-claude-review-gpt-5.md:

API Surface & Delegation:
- Fix ObjectReader.readValueCollecting() to use public API `this.with(perCallAttrs).readValue(p)` instead of protected `_new(...)` factory
- Maintains consistency with Jackson's builder pattern and public surface

Limit Resolution:
- Read max-problem cap from per-call reader config instead of base _config
- Properly honors per-call attribute overrides
- Affects both normal completion and hard failure paths

Javadoc Enhancements:
- Add comprehensive class-level Javadoc to DeferredBindingException with usage examples
- Enhance CollectedProblem Javadoc explaining all fields, truncation, and immutability
- Expand CollectingProblemHandler Javadoc detailing design, recoverable errors, and DoS protection
- Improve ObjectReader.readValueCollecting() Javadoc noting behavior without collectErrors() and parser filtering differences

Testing:
- All 27 CollectingErrorsTest tests pass
- Full suite: 4,662 tests pass, 0 failures, 0 errors
- No regressions introduced

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add helper method overloads for all input types (File, InputStream, Reader)
  to eliminate inline catchThrowableOfType duplication
- Extract buildInvalidOrderJson() helper to avoid duplicated JSON building
  logic across 4 tests
- Use StandardCharsets.UTF_8 instead of checked exception "UTF-8" string
- Improve executor shutdown safety with shutdownNow() fallback and proper
  InterruptedException handling
- Enhance concurrent test to verify exact unique error values, catching any
  bucket-sharing regressions
- Use Files.deleteIfExists() for robust temp file cleanup
- Streamline hard failure test to focus on suppressed exception mechanics
- Replace hand-rolled try/catch blocks with expectDeferredBinding() helper
  in 4 additional tests

All 31 tests pass. Code is more maintainable with consistent patterns and
reduced duplication.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add release notes entry in VERSION for 3.1.0
- Add tutorial section in README.md matching existing tone
- Explain problem: fix-retry cycle vs collecting all errors at once
- Show basic usage with code examples
- Document DoS protection with 100 error default limit
- Clarify best-effort nature and what can/cannot be collected
- Document JSON Pointer paths (RFC 6901) for error locations
- Explain thread-safe per-call error buckets
- List practical use cases: API validation, batch processing, tooling

Addresses 9-year-old feature request from Oliver Drotbohm covering
key concerns from discussion: DoS protection, thread safety,
recoverable errors, and error reporting.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…ception handling

- Replace deprecated fail() calls with AssertJ assertions
- Fix deprecated catchThrowableOfType() parameter order (Class, lambda)
- Remove unnecessary throws Exception declarations from 27 test methods
- Keep throws Exception only for tests using try-with-resources

All 31 tests pass successfully.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Restore src/test/java/module-info.java that was accidentally removed
in commit b7fbb39. This file is required for the test module system
configuration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
…umentation

This commit addresses all technical feedback from reviewers
@pjfanning, @cowtowncoder, and @JooHyukKim on PR FasterXML#5364.

API Changes:
- Rename collectErrors() → problemCollectingReader()
- Rename readValueCollecting() → readValueCollectingProblems()
- Rationale: "Error" conflicts with Java's Error class (OutOfMemoryError).
  "Problem" aligns with existing DeserializationProblemHandler terminology.

Code Quality:
- Add proper imports for CollectingProblemHandler, CollectedProblem,
  DeferredBindingException
- Replace 6 fully qualified class names with short names
- Add missing @throws javadoc tags

Remove Code Duplication:
- Refactor buildJsonPointer() to use jackson-core's JsonPointer API
  (appendProperty/appendIndex methods)
- Delete custom escapeJsonPointerSegment() method (~8 lines)
- Leverage tested jackson-core RFC 6901 escaping implementation

Enhanced Documentation:
- Add architecture rationale to CollectingProblemHandler explaining
  why context attributes are used (thread-safety, call isolation,
  immutability, config vs state separation)
- Add exception handling strategy to readValueCollectingProblems()
  explaining why only DatabindException is caught (not all
  JacksonExceptions - streaming errors should fail fast)
- Add handler replacement warning to problemCollectingReader()
- Update DeferredBindingException javadoc with new API names

Files changed: 5
- README.md (tutorial examples)
- ObjectReader.java (main API)
- CollectingProblemHandler.java (implementation)
- DeferredBindingException.java (exception javadoc)
- CollectingErrorsTest.java (test updates)

All tests pass. No behavioral changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@sri-adarsh-kumar sri-adarsh-kumar force-pushed the issue-1196-error-collection-attempt-1 branch from 200557a to f0c4342 Compare October 26, 2025 12:02
@sri-adarsh-kumar
Copy link
Author

sri-adarsh-kumar commented Oct 26, 2025

@JooHyukKim

Thanks for the feedback.

-1 on this PR itself due to the sheer amount of line changes.

If it's some consolation, the actual logic change in ObjectReader and CollectingProblemHandler is only like 1/3 of the total changes. Others are from domain classes, documentation and tests. As this is not a small change (atleast from my POV) this seemed reasonable for me.

Ai code assistants should really get trained on splitting tasks to safer(or at least feeling so) bits of progression

Reg Coding Assistants and splitting tasks, this is indeed true. Once their context window starts filling up their answers are not that reliable.

Let me explain my process how I overcome this (something I also do in my job).

  1. Specify the requirements to Claude Sonnet 4.5 (in Claude Code) along with any context (in this case the Github issue and the underlying discussion) and ask it to make a technical spec.

  2. Review the spec with GPT-5 and continuously iterate on it between Sonnet 4.5, GPT-5 and myself. Spec will itself have sub-sections and checklist that the Coding Agent can build piece by piece.

  3. Build the finalised spec using the Coding Agent (Sonnet 4.5).

  4. Self review manually as well as with GPT-5 until the solution is in a reviewable state

  5. Push the solution for peer review

If you see any code blocks that could be improved/simplified or if you have any feedback to my development process (mentioned above), I would be happy to learn and improve on it in future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants