Skip to content

Commit 2a81ccf

Browse files
committed
Add DMP ’25 Final Report by Aman Chadha
1 parent f0459e0 commit 2a81ccf

File tree

1 file changed

+162
-0
lines changed

1 file changed

+162
-0
lines changed
Lines changed: 162 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,162 @@
1+
---
2+
title: "DMP '25 Final Report by Aman Chadha"
3+
excerpt: "Final Report for the project Modernizing Music Blocks’ i18n with AI-Assisted Translation"
4+
category: "DEVELOPER NEWS"
5+
date: "2025-09-17"
6+
slug: "2025-09-17-dmp-25-aman-chadha-final-report"
7+
author: "@/constants/MarkdownFiles/authors/aman-chadha.md"
8+
tags: "dmp25,sugarlabs,finalreport,aman-chadha"
9+
image: "assets/Images/c4gt_DMP.png"
10+
---
11+
12+
<!-- markdownlint-disable -->
13+
14+
# DMP '25 Final Report by Aman Chadha
15+
16+
## Contributor Details
17+
18+
**Name:** Aman Chadha
19+
20+
**GitHub:** [AmanChadha](https://github.com/ac-mmi)
21+
**Organization:** [Sugar Labs](https://www.sugarlabs.org/)
22+
**Project:** [JS Internationalization with AI Translation Support](https://github.com/sugarlabs/musicblocks/pull/4731)
23+
**Mentors:** [Walter Bender](https://github.com/walterbender), [Devin Ulibarri](https://github.com/pikurasa)
24+
25+
---
26+
27+
## Project Overview
28+
29+
Music Blocks is a learning platform for children worldwide. Currently, it primarily supports **English, Japanese, and Spanish**, leaving many learners struggling when the platform is not in their native language. The goal of this project was to **modernize Music Blocks’ i18n system** and introduce an **AI-assisted translation workflow** to improve accessibility and engagement globally.
30+
31+
Key problems addressed:
32+
- The legacy `webL10n.js` system lacked **modern i18n features**, including fallback strategies and JSON-based translation support.
33+
- UI strings often lacked context, leading to ambiguous or inaccurate translations.
34+
- Translators faced difficulty translating terms with multiple meanings, like "duck" (pitch vs. volume).
35+
36+
---
37+
38+
## Project Objectives
39+
40+
- Migrate from **webL10n.js to i18next** for modern, modular, and maintainable i18n.
41+
- Automate translation of missing strings using **AI with contextual awareness**.
42+
- Ensure a **contributor-friendly workflow**, where human translators can review AI suggestions.
43+
- Expand accessibility for new languages and improve adoption by educators worldwide.
44+
45+
---
46+
47+
## Technical Approach
48+
49+
### Framework Migration
50+
51+
- **Why migration was needed:**
52+
- `webL10n.js` was outdated and lacked support for modern i18n features.
53+
- i18next supports **language-specific formatting**, flexible fallbacks, and **JSON-based translation files**.
54+
55+
- **Process:**
56+
- Replaced `webL10n.js` references in the codebase with i18next API calls.
57+
- Added **fallback strategies**: cleaned text, lowercase, title case, hyphenated strings.
58+
- Incrementally tested migration to ensure existing UI remained functional.
59+
60+
### Context-Aware Translation (RAG Model)
61+
62+
- Extracted **code context** for each `msgid` by taking **5 lines above and below** and any developer comments.
63+
- Stored context snippets in **context_ui_full.json** with metadata: source file, line numbers, and snippet.
64+
- Indexed the JSON in **ChromaDB**, a vector database optimized for semantic search.
65+
- Built a **RAG model** to retrieve and analyze context, generating clear explanations for each string.
66+
67+
---
68+
69+
## AI-Assisted Translation Workflow
70+
71+
### Workflow Steps
72+
73+
1. **.PO to JSON Automation:**
74+
- Converted `.po` files to JSON using a Python script, enabling AI integration.
75+
76+
2. **Translation with Context:**
77+
- Retrieved context using the RAG model.
78+
- Sent `msgid + context` to translation API for accurate translations.
79+
80+
3. **Google Translate API Integration:**
81+
- Chose Google Translate for its **robustness, contextual translation quality, and reliability**.
82+
- Open-source alternatives like LibreTranslate produce **word-by-word translations** and fail to use surrounding context.
83+
84+
4. **Automated QA:**
85+
- Developed a **Selenium + GPT script** to validate translations automatically.
86+
- Detected inaccuracies and flagged strings for manual review by a human translator.
87+
88+
5. **PO File Generation:**
89+
- Generated complete Arabic, Japanese, and Hindi `.po` files using the automated pipeline.
90+
91+
---
92+
93+
### Key Python Translation Script
94+
95+
```python
96+
from google.cloud import translate_v2 as translate
97+
98+
translate_client = translate.Client()
99+
100+
def translate_prompt(msgid, context, target_lang="ar"):
101+
prompt = f"{msgid}: {context}"
102+
result = translate_client.translate(prompt, target_language=target_lang)
103+
translated = html.unescape(result["translatedText"]).strip()
104+
return translated.split(':')[0].strip() if ':' in translated else translated
105+
```
106+
107+
---
108+
109+
## Challenges & Solutions
110+
111+
| Challenge | Solution |
112+
|-----------|---------|
113+
| Ambiguous UI strings (e.g., "duck","pitch","minor" etc) | Implemented **RAG model** for context-aware translations |
114+
| Legacy i18n system | Migrated from `webL10n.js` → i18next with JSON support |
115+
| Automated translation validation | Built **Selenium + GPT-based QA system** to mark errors for review |
116+
| Open-source translation drawbacks | Used **Google Translate API** for higher quality and context handling |
117+
118+
---
119+
120+
## Achievements
121+
122+
- Successfully **migrated Music Blocks to i18next**.
123+
- Developed a **context-aware AI translation workflow**.
124+
- Generated **Arabic, Japanese, and Hindi `.po` files**.
125+
- Built an **automation pipeline** for `.po → JSON → AI translation → validation → .po` cycle.
126+
- Created QA tooling to **check translation accuracy** before human review.
127+
128+
---
129+
130+
## Key Learnings
131+
132+
- Extracting and using **context** drastically improves translation accuracy.
133+
- Clean migration and testing are crucial when replacing legacy infrastructure.
134+
- Combining **AI automation with human review** ensures high-quality localization.
135+
- Open-source translation tools can be limited; commercial APIs may be necessary for production quality.
136+
137+
---
138+
139+
## Future Work
140+
141+
- Add support for more AI translation models (e.g., DeepL, OpenAI).
142+
- Extend automated QA to **more languages**.
143+
- Build a **web-based UI** for translators to review flagged translations.
144+
- Integrate GitHub Actions for automatic updates of `.po` files on new/modified strings.
145+
146+
---
147+
148+
## Resources & References
149+
150+
- **Music Blocks Repository:** [github.com/sugarlabs/musicblocks](https://github.com/sugarlabs/musicblocks)
151+
- **Migration PR:** [#4731](https://github.com/sugarlabs/musicblocks/pull/4731)
152+
- **i18next Documentation:** [i18next.com](https://www.i18next.com/)
153+
- **ChromaDB:** [chromadb.com](https://www.chromadb.com/)
154+
155+
---
156+
157+
## Conclusion
158+
159+
This project modernized Music Blocks’ localization infrastructure, introduced **AI-assisted, context-aware translations**, and enabled **scalable multilingual support**. By combining **framework migration, RAG-based context generation, automated translation, and QA tooling**, Music Blocks is now better equipped to serve children worldwide in their **native languages**, improving engagement, accessibility, and global adoption.
160+
161+
I am deeply grateful to my mentors, the Sugar Labs community, and C4GT for their guidance and support throughout this journey.
162+

0 commit comments

Comments
 (0)