fix: resolve batch inference crash by handling non-serializable objects in GCS cache#352
fix: resolve batch inference crash by handling non-serializable objects in GCS cache#352SalimMessaad1 wants to merge 3 commits intogoogle:mainfrom
Conversation
…erializer for dataclasses and enums Added a JSON serializer to handle non-serializable objects in the `_compute_hash` method.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
No linked issues found. Please link an issue in your pull request description or title. Per our Contributing Guidelines, all PRs must:
You can also use cross-repo references like |
…erializer for dataclasses and enums Added a JSON serializer to handle non-serializable objects in the `_compute_hash` method.
|
I've enabled maintainer edits. Please feel free to adjust the indentation or formatting to match the project's standards if needed. I'm also waiting for community support on issue #353 as per the guidelines |
Muitamax
left a comment
There was a problem hiding this comment.
Solution: Use a default serializer
You can pass a default function to json.dumps that converts non-serializable objects to strings or dicts.
import json
from dataclasses import asdict, is_dataclass
from enum import Enum
def custom_serializer(obj):
if isinstance(obj, Enum):
return obj.value # Or str(obj)
if is_dataclass(obj):
return asdict(obj)
raise TypeError(f"Object of type {type(obj).name} is not JSON serializable")
hash_string = json.dumps(key_data, sort_keys=True, default=custom_serializer)
print(hash_string)
✅ Notes:
Enum → convert to .value or str(obj)
dataclass → convert to dictionary using asdict
sort_keys=True ensures stable hashing order
This way, GCSBatchCache._compute_hash will no longer crash.
|
Hi @Muitamax, thank you so much for the review and for confirming the solution, I really appreciate the feedback and the approval. Hi @aksg87, as mentioned above, the logic has been verified and approved by the community. This PR fixes a deterministic TypeError crash in the Batch API hashing process, which is critical for users with complex SafetySettings. I would appreciate your final review as a maintainer to help move this towards a merge. I've enabled maintainer edits to facilitate any minor formatting adjustments needed for the CI. Thanks for your time and support |
|
I am honoured to be of help.
…On Wed, Feb 11, 2026 at 2:17 PM Salim Messaad ***@***.***> wrote:
*SalimMessaad1* left a comment (google/langextract#352)
<#352 (comment)>
Hi @Muitamax <https://github.com/Muitamax>, thank you so much for the
review and for confirming the solution, I really appreciate the feedback
and the approval.
Hi @aksg87 <https://github.com/aksg87>, as mentioned above, the logic has
been verified and approved by the community. This PR fixes a deterministic
TypeError crash in the Batch API hashing process, which is critical for
users with complex SafetySettings.
I would appreciate your final review as a maintainer to help move this
towards a merge. I've enabled maintainer edits to facilitate any minor
formatting adjustments needed for the CI. Thanks for your time and support
—
Reply to this email directly, view it on GitHub
<#352 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/BP7B6IJ4CGVU42CKNQBLQYL4LMFWJAVCNFSM6AAAAACUVEVQYSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQOBTG44DMMRVGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Add custom JSON serializer for dataclasses and enums in _compute_hash() to prevent crash when cache key contains non-serializable objects. Upstream: google#352
…sh test Remove duplicate _json_default from _upload() in favor of module-level _json_serializer. Add test for deterministic hashing with enum/dataclass values. Upstream: google#359 (merged with google#352)
Applied PRs: google#356, google#369, google#329, google#267, google#305, google#326, google#352, google#359, google#317, google#362, google#242, google#310, google#241. Moved from Low Priority to Applied section. Updated cherry-pick log with all commit hashes.
|
Your branch is 1 commits behind git fetch origin main
git merge origin/main
git pushNote: Enable "Allow edits by maintainers" to allow automatic updates. |
Problem: The batch inference process was crashing when trying to compute a cache hash for requests containing non-serializable objects, such as Enum (used in SafetySettings) or Dataclasses. This triggered a TypeError: Object of type ... is not JSON serializable during json.dumps.
Solution:
Added a private helper function _json_serializer to handle dataclasses and Enum types during JSON serialization.
Updated GCSBatchCache._compute_hash to use this serializer, ensuring stable and reliable hashing even with complex request configurations.
Impact: Prevents runtime crashes during batch processing and ensures caching works correctly with all Gemini model configurations.
Resolves #353