Skip to content

add/remove language codes and refuse to publish books with invalid codes#318

Open
elfkuzco wants to merge 6 commits into
mainfrom
custom-language-codes
Open

add/remove language codes and refuse to publish books with invalid codes#318
elfkuzco wants to merge 6 commits into
mainfrom
custom-language-codes

Conversation

@elfkuzco
Copy link
Copy Markdown
Contributor

@elfkuzco elfkuzco commented Jun 4, 2026

Rationale

This PR enhances the mill to ensure that a book's language metadata is a valid ISO639-3 language code. Additionally, it provides ability to extend the list of language codes and also disallow some that might be valid

Changes

  • add env vars DISALLOWED_LANGUAGE_CODES and CUSTOM_LANGUAGE_CODES to support removal and adding of language codes respectively
  • update language code database at mill startup
  • mark books with unknown language codes as errored and refuse to add book to title for further processing

This closes #101

@elfkuzco elfkuzco requested a review from benoit74 June 4, 2026 10:05
@elfkuzco
Copy link
Copy Markdown
Contributor Author

elfkuzco commented Jun 4, 2026

fixing issues...

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 4, 2026

Codecov Report

❌ Patch coverage is 63.63636% with 20 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.68%. Comparing base (2359a7a) to head (e26d165).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
backend/src/cms_backend/context.py 47.05% 7 Missing and 2 partials ⚠️
backend/src/cms_backend/api/routes/fields.py 52.94% 4 Missing and 4 partials ⚠️
backend/src/cms_backend/__init__.py 80.00% 2 Missing ⚠️
backend/src/cms_backend/db/book.py 88.88% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #318      +/-   ##
==========================================
- Coverage   83.17%   82.68%   -0.50%     
==========================================
  Files          58       58              
  Lines        2912     2968      +56     
  Branches      276      293      +17     
==========================================
+ Hits         2422     2454      +32     
- Misses        415      431      +16     
- Partials       75       83       +8     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry, I realize I've not been specific enough in the issue.

This language metadata issue should not totally block books but simply force them in staging with a new book issue kind, just like other book issues we already have (metadata mismatch, flavour mismatch). Since language metadata in catalog endpoint now anyway comes from the title, this will be OK.

@elfkuzco
Copy link
Copy Markdown
Contributor Author

elfkuzco commented Jun 4, 2026

Since language metadata in catalog endpoint now anyway comes from the title, this will be OK.

But the issues list won't be cleared because we don't update language individually for books which would trigger update_book_issues. Should we support updating languages for books?

@benoit74
Copy link
Copy Markdown
Contributor

benoit74 commented Jun 4, 2026

But the issues list won't be cleared because we don't update language individually for books which would trigger update_book_issues. Should we support updating languages for books?

Nope, either we set a correct language value in title (and book still has a bad language) or we update pycountry / env vars (and reprocess book to clear the issue).

@elfkuzco elfkuzco force-pushed the custom-language-codes branch from 399e49f to f25f393 Compare June 4, 2026 12:43
@elfkuzco
Copy link
Copy Markdown
Contributor Author

elfkuzco commented Jun 4, 2026

had to rebase so i could get the features from the update_book_issues.

@elfkuzco elfkuzco requested a review from benoit74 June 4, 2026 12:43
Copy link
Copy Markdown
Contributor

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting closer, but Language ZIM metadata can be a comma-separated list of languages, and it looks like this PR is not ready for that (each language in the list must be valid)

@elfkuzco elfkuzco self-assigned this Jun 5, 2026
Copy link
Copy Markdown
Contributor

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_language_code must support CSV of languages as well

@elfkuzco
Copy link
Copy Markdown
Contributor Author

elfkuzco commented Jun 5, 2026

validate_language_code must support CSV of languages as well

currently, it's only used as the Title's language. Will a title's language be a CSV of languages? I don't use it for in book related schema as we don't update book languages yet.

@benoit74
Copy link
Copy Markdown
Contributor

benoit74 commented Jun 5, 2026

Title language is under normal condition strictly identical to book language, just like any title metadata coming from the book. The title language can hence be a CSV as well

@elfkuzco elfkuzco requested a review from benoit74 June 5, 2026 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refuse to publish books with unknown ISO639-3 codes in Language metadata

2 participants