|  | 
|  | 1 | +--- | 
|  | 2 | +title: "DMP '25 Final Report by Aman Chadha" | 
|  | 3 | +excerpt: "Final Report for the project Modernizing Music Blocks’ i18n with AI-Assisted Translation" | 
|  | 4 | +category: "DEVELOPER NEWS" | 
|  | 5 | +date: "2025-09-17" | 
|  | 6 | +slug: "2025-09-17-dmp-25-aman-chadha-final-report" | 
|  | 7 | +author: "@/constants/MarkdownFiles/authors/aman-chadha.md" | 
|  | 8 | +tags: "dmp25,sugarlabs,finalreport,aman-chadha" | 
|  | 9 | +image: "assets/Images/c4gt_DMP.png" | 
|  | 10 | +--- | 
|  | 11 | + | 
|  | 12 | +<!-- markdownlint-disable --> | 
|  | 13 | + | 
|  | 14 | +# DMP '25 Final Report by Aman Chadha | 
|  | 15 | + | 
|  | 16 | +## Contributor Details | 
|  | 17 | + | 
|  | 18 | +**Name:** Aman Chadha   | 
|  | 19 | + | 
|  | 20 | +**GitHub:** [AmanChadha](https://github.com/ac-mmi)   | 
|  | 21 | +**Organization:** [Sugar Labs](https://www.sugarlabs.org/)   | 
|  | 22 | +**Project:** [JS Internationalization with AI Translation Support](https://github.com/sugarlabs/musicblocks/pull/4731)   | 
|  | 23 | +**Mentors:** [Walter Bender](https://github.com/walterbender), [Devin Ulibarri](https://github.com/pikurasa)   | 
|  | 24 | + | 
|  | 25 | +--- | 
|  | 26 | + | 
|  | 27 | +## Project Overview | 
|  | 28 | + | 
|  | 29 | +Music Blocks is a learning platform for children worldwide. Currently, it primarily supports **English, Japanese, and Spanish**, leaving many learners struggling when the platform is not in their native language. The goal of this project was to **modernize Music Blocks’ i18n system** and introduce an **AI-assisted translation workflow** to improve accessibility and engagement globally.   | 
|  | 30 | + | 
|  | 31 | +Key problems addressed:   | 
|  | 32 | +- The legacy `webL10n.js` system lacked **modern i18n features**, including fallback strategies and JSON-based translation support.   | 
|  | 33 | +- UI strings often lacked context, leading to ambiguous or inaccurate translations.   | 
|  | 34 | +- Translators faced difficulty translating terms with multiple meanings, like "duck" (pitch vs. volume).   | 
|  | 35 | + | 
|  | 36 | +--- | 
|  | 37 | + | 
|  | 38 | +## Project Objectives | 
|  | 39 | + | 
|  | 40 | +- Migrate from **webL10n.js to i18next** for modern, modular, and maintainable i18n.   | 
|  | 41 | +- Automate translation of missing strings using **AI with contextual awareness**.   | 
|  | 42 | +- Ensure a **contributor-friendly workflow**, where human translators can review AI suggestions.   | 
|  | 43 | +- Expand accessibility for new languages and improve adoption by educators worldwide.   | 
|  | 44 | + | 
|  | 45 | +--- | 
|  | 46 | + | 
|  | 47 | +## Technical Approach | 
|  | 48 | + | 
|  | 49 | +### Framework Migration | 
|  | 50 | + | 
|  | 51 | +- **Why migration was needed:**   | 
|  | 52 | +  - `webL10n.js` was outdated and lacked support for modern i18n features.   | 
|  | 53 | +  - i18next supports **language-specific formatting**, flexible fallbacks, and **JSON-based translation files**.   | 
|  | 54 | + | 
|  | 55 | +- **Process:**   | 
|  | 56 | +  - Replaced `webL10n.js` references in the codebase with i18next API calls.   | 
|  | 57 | +  - Added **fallback strategies**: cleaned text, lowercase, title case, hyphenated strings.   | 
|  | 58 | +  - Incrementally tested migration to ensure existing UI remained functional.   | 
|  | 59 | + | 
|  | 60 | +### Context-Aware Translation (RAG Model) | 
|  | 61 | + | 
|  | 62 | +- Extracted **code context** for each `msgid` by taking **5 lines above and below** and any developer comments.   | 
|  | 63 | +- Stored context snippets in **context_ui_full.json** with metadata: source file, line numbers, and snippet.   | 
|  | 64 | +- Indexed the JSON in **ChromaDB**, a vector database optimized for semantic search.   | 
|  | 65 | +- Built a **RAG model** to retrieve and analyze context, generating clear explanations for each string.   | 
|  | 66 | + | 
|  | 67 | +--- | 
|  | 68 | + | 
|  | 69 | +## AI-Assisted Translation Workflow | 
|  | 70 | + | 
|  | 71 | +### Workflow Steps | 
|  | 72 | + | 
|  | 73 | +1. **.PO to JSON Automation:**   | 
|  | 74 | +   - Converted `.po` files to JSON using a Python script, enabling AI integration.   | 
|  | 75 | + | 
|  | 76 | +2. **Translation with Context:**   | 
|  | 77 | +   - Retrieved context using the RAG model.   | 
|  | 78 | +   - Sent `msgid + context` to translation API for accurate translations.   | 
|  | 79 | + | 
|  | 80 | +3. **Google Translate API Integration:**   | 
|  | 81 | +   - Chose Google Translate for its **robustness, contextual translation quality, and reliability**.   | 
|  | 82 | +   - Open-source alternatives like LibreTranslate produce **word-by-word translations** and fail to use surrounding context.   | 
|  | 83 | + | 
|  | 84 | +4. **Automated QA:**   | 
|  | 85 | +   - Developed a **Selenium + GPT script** to validate translations automatically.   | 
|  | 86 | +   - Detected inaccuracies and flagged strings for manual review by a human translator.   | 
|  | 87 | + | 
|  | 88 | +5. **PO File Generation:**   | 
|  | 89 | +   - Generated complete Arabic, Japanese, and Hindi `.po` files using the automated pipeline.   | 
|  | 90 | + | 
|  | 91 | +--- | 
|  | 92 | + | 
|  | 93 | +### Key Python Translation Script | 
|  | 94 | + | 
|  | 95 | +```python | 
|  | 96 | +from google.cloud import translate_v2 as translate | 
|  | 97 | + | 
|  | 98 | +translate_client = translate.Client() | 
|  | 99 | + | 
|  | 100 | +def translate_prompt(msgid, context, target_lang="ar"): | 
|  | 101 | +    prompt = f"{msgid}: {context}" | 
|  | 102 | +    result = translate_client.translate(prompt, target_language=target_lang) | 
|  | 103 | +    translated = html.unescape(result["translatedText"]).strip() | 
|  | 104 | +    return translated.split(':')[0].strip() if ':' in translated else translated | 
|  | 105 | +``` | 
|  | 106 | + | 
|  | 107 | +--- | 
|  | 108 | + | 
|  | 109 | +## Challenges & Solutions | 
|  | 110 | + | 
|  | 111 | +| Challenge | Solution | | 
|  | 112 | +|-----------|---------| | 
|  | 113 | +| Ambiguous UI strings (e.g., "duck","pitch","minor" etc) | Implemented **RAG model** for context-aware translations | | 
|  | 114 | +| Legacy i18n system | Migrated from `webL10n.js` → i18next with JSON support | | 
|  | 115 | +| Automated translation validation | Built **Selenium + GPT-based QA system** to mark errors for review | | 
|  | 116 | +| Open-source translation drawbacks | Used **Google Translate API** for higher quality and context handling | | 
|  | 117 | + | 
|  | 118 | +--- | 
|  | 119 | + | 
|  | 120 | +## Achievements | 
|  | 121 | + | 
|  | 122 | +- Successfully **migrated Music Blocks to i18next**.   | 
|  | 123 | +- Developed a **context-aware AI translation workflow**.   | 
|  | 124 | +- Generated **Arabic, Japanese, and Hindi `.po` files**.   | 
|  | 125 | +- Built an **automation pipeline** for `.po → JSON → AI translation → validation → .po` cycle.   | 
|  | 126 | +- Created QA tooling to **check translation accuracy** before human review.   | 
|  | 127 | + | 
|  | 128 | +--- | 
|  | 129 | + | 
|  | 130 | +## Key Learnings | 
|  | 131 | + | 
|  | 132 | +- Extracting and using **context** drastically improves translation accuracy.   | 
|  | 133 | +- Clean migration and testing are crucial when replacing legacy infrastructure.   | 
|  | 134 | +- Combining **AI automation with human review** ensures high-quality localization.   | 
|  | 135 | +- Open-source translation tools can be limited; commercial APIs may be necessary for production quality.   | 
|  | 136 | + | 
|  | 137 | +--- | 
|  | 138 | + | 
|  | 139 | +## Future Work | 
|  | 140 | + | 
|  | 141 | +- Add support for more AI translation models (e.g., DeepL, OpenAI).   | 
|  | 142 | +- Extend automated QA to **more languages**.   | 
|  | 143 | +- Build a **web-based UI** for translators to review flagged translations.   | 
|  | 144 | +- Integrate GitHub Actions for automatic updates of `.po` files on new/modified strings.   | 
|  | 145 | + | 
|  | 146 | +--- | 
|  | 147 | + | 
|  | 148 | +## Resources & References | 
|  | 149 | + | 
|  | 150 | +- **Music Blocks Repository:** [github.com/sugarlabs/musicblocks](https://github.com/sugarlabs/musicblocks)   | 
|  | 151 | +- **Migration PR:** [#4731](https://github.com/sugarlabs/musicblocks/pull/4731)   | 
|  | 152 | +- **i18next Documentation:** [i18next.com](https://www.i18next.com/)   | 
|  | 153 | +- **ChromaDB:** [chromadb.com](https://www.chromadb.com/)   | 
|  | 154 | + | 
|  | 155 | +--- | 
|  | 156 | + | 
|  | 157 | +## Conclusion | 
|  | 158 | + | 
|  | 159 | +This project modernized Music Blocks’ localization infrastructure, introduced **AI-assisted, context-aware translations**, and enabled **scalable multilingual support**. By combining **framework migration, RAG-based context generation, automated translation, and QA tooling**, Music Blocks is now better equipped to serve children worldwide in their **native languages**, improving engagement, accessibility, and global adoption.   | 
|  | 160 | + | 
|  | 161 | +I am deeply grateful to my mentors, the Sugar Labs community, and C4GT for their guidance and support throughout this journey. | 
|  | 162 | + | 
0 commit comments