Skip to content

fix: resolve batch inference crash by handling non-serializable objects in GCS cache#352

Open
SalimMessaad1 wants to merge 3 commits intogoogle:mainfrom
SalimMessaad1:fix-serialization-bug
Open

fix: resolve batch inference crash by handling non-serializable objects in GCS cache#352
SalimMessaad1 wants to merge 3 commits intogoogle:mainfrom
SalimMessaad1:fix-serialization-bug

Conversation

@SalimMessaad1
Copy link

@SalimMessaad1 SalimMessaad1 commented Feb 10, 2026

Problem: The batch inference process was crashing when trying to compute a cache hash for requests containing non-serializable objects, such as Enum (used in SafetySettings) or Dataclasses. This triggered a TypeError: Object of type ... is not JSON serializable during json.dumps.

Solution:

  1. Added a private helper function _json_serializer to handle dataclasses and Enum types during JSON serialization.

  2. Updated GCSBatchCache._compute_hash to use this serializer, ensuring stable and reliable hashing even with complex request configurations.

Impact: Prevents runtime crashes during batch processing and ensures caching works correctly with all Gemini model configurations.

Resolves #353

…erializer for dataclasses and enums

Added a JSON serializer to handle non-serializable objects in the `_compute_hash` method.
@google-cla
Copy link

google-cla bot commented Feb 10, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@github-actions
Copy link

No linked issues found. Please link an issue in your pull request description or title.

Per our Contributing Guidelines, all PRs must:

  • Reference an issue with one of:
    • Closing keywords: Fixes #123, Closes #123, Resolves #123 (auto-closes on merge in the same repository)
    • Reference keywords: Related to #123, Refs #123, Part of #123, See #123 (links without closing)
  • The linked issue should have 5+ 👍 reactions from unique users (excluding bots and the PR author)
  • Include discussion demonstrating the importance of the change

You can also use cross-repo references like owner/repo#123 or full URLs.

@github-actions github-actions bot added the size/XS Pull request with less than 50 lines changed label Feb 10, 2026
…erializer for dataclasses and enums

Added a JSON serializer to handle non-serializable objects in the `_compute_hash` method.
@SalimMessaad1
Copy link
Author

I've enabled maintainer edits. Please feel free to adjust the indentation or formatting to match the project's standards if needed. I'm also waiting for community support on issue #353 as per the guidelines

Copy link

@Muitamax Muitamax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solution: Use a default serializer
You can pass a default function to json.dumps that converts non-serializable objects to strings or dicts.
import json
from dataclasses import asdict, is_dataclass
from enum import Enum
def custom_serializer(obj):
if isinstance(obj, Enum):
return obj.value # Or str(obj)
if is_dataclass(obj):
return asdict(obj)
raise TypeError(f"Object of type {type(obj).name} is not JSON serializable")

hash_string = json.dumps(key_data, sort_keys=True, default=custom_serializer)
print(hash_string)

✅ Notes:
Enum → convert to .value or str(obj)
dataclass → convert to dictionary using asdict
sort_keys=True ensures stable hashing order
This way, GCSBatchCache._compute_hash will no longer crash.

@SalimMessaad1
Copy link
Author

Hi @Muitamax, thank you so much for the review and for confirming the solution, I really appreciate the feedback and the approval.

Hi @aksg87, as mentioned above, the logic has been verified and approved by the community. This PR fixes a deterministic TypeError crash in the Batch API hashing process, which is critical for users with complex SafetySettings.

I would appreciate your final review as a maintainer to help move this towards a merge. I've enabled maintainer edits to facilitate any minor formatting adjustments needed for the CI. Thanks for your time and support

@Muitamax
Copy link

Muitamax commented Feb 11, 2026 via email

IgnatG added a commit to IgnatG/langextract that referenced this pull request Feb 17, 2026
Add custom JSON serializer for dataclasses and enums in _compute_hash()
to prevent crash when cache key contains non-serializable objects.

Upstream: google#352
IgnatG added a commit to IgnatG/langextract that referenced this pull request Feb 17, 2026
…sh test

Remove duplicate _json_default from _upload() in favor of module-level
_json_serializer. Add test for deterministic hashing with enum/dataclass
values.

Upstream: google#359 (merged with google#352)
IgnatG added a commit to IgnatG/langextract that referenced this pull request Feb 17, 2026
Applied PRs: google#356, google#369, google#329, google#267, google#305, google#326, google#352, google#359, google#317, google#362,
google#242, google#310, google#241. Moved from Low Priority to Applied section. Updated
cherry-pick log with all commit hashes.
@github-actions
Copy link

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XS Pull request with less than 50 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeError in GCSBatchCache hashing when using Enum-based settings

2 participants