Fix: retry on invalid JSON from OpenAI and Claude #845

runarmod · 2025-08-21T15:01:56Z

This PR adds logic to retry prompting OpenAI and Claude if their JSON response is not valid. See #829

Note: The implementation of the JSON validation for Claude is different than the others, so I had to use an alternative solution. It looks like it is possible to instruct Claude to produce JSON based on a schema, as is done for the other services. I haven't used marker with the Claude service, so I don't know if there are many JSON mistakes when not using a schema. It might be worth looking into for someone using Claude with marker. See the documentation.

runarmod · 2025-08-24T14:31:30Z

Latest commit reverts sleeping after catching JSON parsing error, as was done with Gemini in #848.

VikParuchuri · 2025-09-04T16:05:03Z

Thank you! I updated the gemini service to also adjust the temperature if json failed to be parsed, so we avoid getting the same result again - https://github.com/datalab-to/marker/blob/master/marker/services/gemini.py . Is it possible to do something similar here?

Re: claude, I think when I implemented this originally, claude didn't have support for structured output. Will refactor at some point, thanks for the pointer.

runarmod · 2025-09-05T15:22:57Z

Thank you! I updated the gemini service to also adjust the temperature if json failed to be parsed, so we avoid getting the same result again - https://github.com/datalab-to/marker/blob/master/marker/services/gemini.py . Is it possible to do something similar here?

I see that temperature control is easily modifiable via the parse/create calls for these services as well, so I can implement that here as well.

What do you think is a good baseline temperature for the other services? I see that the default temperature specified (by marker) for Gemini is 0.0 (default specified by Google depends on the model, but possible range is 0.0-2.0)(source). The other services do not have a temperature specified (by marker), so they use the defults defined by Anthropic and OpenAI.

I see that Anthropic has a default of 1.0 (possible range 0.0-1.0) (source), and OpenAI defaults to 1.0 (with possible range 0.0-2.0) (source).

Should we hardcode a default of 0.0 instead, and then increase to 0.2 if the JSON is malformed? Or 0.1 for Anthropic since they use a smaller range?

runarmod added 3 commits August 21, 2025 16:18

fix: retry on invalid JSON from Claude

1225666

fix: retry on invalid JSON from OpenAI

360172a

fix: do not sleep after JSON parsing error

b9ebab4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: retry on invalid JSON from OpenAI and Claude #845

Fix: retry on invalid JSON from OpenAI and Claude #845

Uh oh!

runarmod commented Aug 21, 2025

Uh oh!

runarmod commented Aug 24, 2025

Uh oh!

VikParuchuri commented Sep 4, 2025

Uh oh!

runarmod commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix: retry on invalid JSON from OpenAI and Claude #845

Are you sure you want to change the base?

Fix: retry on invalid JSON from OpenAI and Claude #845

Uh oh!

Conversation

runarmod commented Aug 21, 2025

Uh oh!

runarmod commented Aug 24, 2025

Uh oh!

VikParuchuri commented Sep 4, 2025

Uh oh!

runarmod commented Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants