Skip to content

Conversation

@runarmod
Copy link
Contributor

This PR adds logic to retry prompting OpenAI and Claude if their JSON response is not valid. See #829

Note: The implementation of the JSON validation for Claude is different than the others, so I had to use an alternative solution. It looks like it is possible to instruct Claude to produce JSON based on a schema, as is done for the other services. I haven't used marker with the Claude service, so I don't know if there are many JSON mistakes when not using a schema. It might be worth looking into for someone using Claude with marker. See the documentation.

@runarmod
Copy link
Contributor Author

Latest commit reverts sleeping after catching JSON parsing error, as was done with Gemini in #848.

@VikParuchuri
Copy link
Member

Thank you! I updated the gemini service to also adjust the temperature if json failed to be parsed, so we avoid getting the same result again - https://github.com/datalab-to/marker/blob/master/marker/services/gemini.py . Is it possible to do something similar here?

Re: claude, I think when I implemented this originally, claude didn't have support for structured output. Will refactor at some point, thanks for the pointer.

@runarmod
Copy link
Contributor Author

runarmod commented Sep 5, 2025

Thank you! I updated the gemini service to also adjust the temperature if json failed to be parsed, so we avoid getting the same result again - https://github.com/datalab-to/marker/blob/master/marker/services/gemini.py . Is it possible to do something similar here?

I see that temperature control is easily modifiable via the parse/create calls for these services as well, so I can implement that here as well.

What do you think is a good baseline temperature for the other services? I see that the default temperature specified (by marker) for Gemini is 0.0 (default specified by Google depends on the model, but possible range is 0.0-2.0)(source). The other services do not have a temperature specified (by marker), so they use the defults defined by Anthropic and OpenAI.

I see that Anthropic has a default of 1.0 (possible range 0.0-1.0) (source), and OpenAI defaults to 1.0 (with possible range 0.0-2.0) (source).

Should we hardcode a default of 0.0 instead, and then increase to 0.2 if the JSON is malformed? Or 0.1 for Anthropic since they use a smaller range?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants