Skip to content

Conversation

mic-code
Copy link

@mic-code mic-code commented Oct 2, 2025

Traceback (most recent call last):
  File "C:\repository\Test\graphrag-test\createKG.py", line 24, in <module>
    rows = [row for row in reader]
                           ^^^^^^
  File "C:\Users\michael\AppData\Roaming\uv\python\cpython-3.12.9-windows-x86_64-none\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 43: character maps to <undefined>

CSVLoader loading csv with chinese character will result in the above error.
This PR fixes it

Summary by CodeRabbit

  • Bug Fixes
    • Ensured CSV imports use UTF-8 encoding by default, preventing character decoding errors and improving support for international characters across environments.

@coderabbitai
Copy link

coderabbitai bot commented Oct 2, 2025

Walkthrough

Added an explicit UTF-8 encoding to the file open call in CSVLoader.load; other logic remains unchanged.

Changes

Cohort / File(s) Summary
CSV Loader
graphrag_sdk/document_loaders/csv.py
Specify UTF-8 encoding in open(..., encoding='utf-8') within CSVLoader.load; no other logic changes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

I nibble bytes in tidy rows,
A CSV breeze where data flows.
With UTF-8 my whiskers twitch—
No garbled glyphs, no codec glitch!
Hop, parse, yield—so clean, so bright—
Carrots aligned, the text reads right.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly summarizes the main purpose of the changeset by indicating that the pull request enables UTF-8 support for CSV processing, which aligns directly with the addition of the encoding='utf-8' parameter in CSVLoader.load. It is concise, specific, and free of extraneous details.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce1c8c and ded68fe.

📒 Files selected for processing (1)
  • graphrag_sdk/document_loaders/csv.py (1 hunks)
🔇 Additional comments (1)
graphrag_sdk/document_loaders/csv.py (1)

17-17: LGTM! Proper fix for Unicode support.

Adding explicit UTF-8 encoding resolves the UnicodeDecodeError and ensures consistent behavior across platforms (Windows defaults to cp1252, which cannot handle Chinese characters). UTF-8 is the correct choice as it's backward-compatible with ASCII and supports all Unicode characters.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant