Skip to content

Use additional context in LLM prompt, update OpenAI dep#4040

Merged
flodolo merged 6 commits intomozilla:mainfrom
flodolo:improve_llm_prompt
Apr 7, 2026
Merged

Use additional context in LLM prompt, update OpenAI dep#4040
flodolo merged 6 commits intomozilla:mainfrom
flodolo:improve_llm_prompt

Conversation

@flodolo
Copy link
Copy Markdown
Collaborator

@flodolo flodolo commented Mar 26, 2026

Fixes #4030

  • Update OpenAI package to latest version (2.29.0)
  • Improve prompt formulation
  • Pass string ID, comment, terminology matches when available
  • Move OpenAI GPT version to settings

f"ENGLISH SOURCE:\n{english_text}\n\n"
f"MACHINE TRANSLATION TO REFINE:\n{translated_text}"
)
# TODO: remove before merge.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is here to help with testing, needs to be removed before merge.


class Command(BaseCommand):
help = "Refines machine translations using OpenAI's GPT-4 API with specified characteristics"
help = "Refines machine translations using OpenAI's GPT API with specified characteristics"
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point we're going to move away from GPT-4, not sure there is a benefit in having references to GTP-4 in the code.

@flodolo flodolo force-pushed the improve_llm_prompt branch 2 times, most recently from 5a56d9c to ada5db9 Compare March 26, 2026 13:07
- Update OpenAI package to latest version (2.29.0)
- Improve prompt formulation
- Pass string ID, comment, terminology matches when available
- Move OpenAI GPT version to settings
@flodolo flodolo force-pushed the improve_llm_prompt branch from ada5db9 to aa2327a Compare March 26, 2026 13:14
Copy link
Copy Markdown
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

Why are we passing all this data from frontend to backend, instead of just passing entity ID and then retrieving all the data from the DB?

@flodolo
Copy link
Copy Markdown
Collaborator Author

flodolo commented Apr 2, 2026

Why are we passing all this data from frontend to backend, instead of just passing entity ID and then retrieving all the data from the DB?

The front-end already has all the data. Are we OK with extra queries to do this on the backend? I believe we'd need:

  1. A couple to get comments (1 for entity+section+comment, 1 for pinned comment).
  2. Scan terminology for matches.
  3. Get term translations (1 query per matched term).

Is this off?

@mathjazz
Copy link
Copy Markdown
Collaborator

mathjazz commented Apr 2, 2026

The front-end already has all the data. Are we OK with extra queries to do this on the backend?

The frontend already having the data is not a strong reason to make the client the source of truth.

Also:

  1. Server-owned data shouldn’t come from the client.
  2. Duplication of logic: if the frontend needs to know how to assemble all the context for GPT, that logic is now split across client and server.
  3. This is a GET endpoint, and it is sending potentially large text blobs plus JSON payloads in query params. That is awkward for URL length limits, logging exposure, and caching behavior. This endpoint probably wants POST even aside from the DB-question.

@flodolo
Copy link
Copy Markdown
Collaborator Author

flodolo commented Apr 2, 2026

Makes sense. I'll look into moving this to the back-end (next week at this point).

@flodolo flodolo force-pushed the improve_llm_prompt branch from fab834f to cc5f4fc Compare April 2, 2026 14:39
@flodolo flodolo requested a review from mathjazz April 7, 2026 06:37
Copy link
Copy Markdown
Collaborator

@mathjazz mathjazz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Deployed to DEV. Works fine!

Left one more note.

translated_text,
characteristic,
locale,
entity_id=None,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not entity_key?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, in my head it's always ID 🤷🏼

I'll update and remove the debug print.

@flodolo flodolo merged commit 8b2b7a4 into mozilla:main Apr 7, 2026
8 checks passed
@flodolo flodolo deleted the improve_llm_prompt branch April 10, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve LLM prompt by providing clearer instructions and additional context

2 participants