Support UTF8 csv #134

mic-code · 2025-10-02T07:49:19Z

Traceback (most recent call last):
  File "C:\repository\Test\graphrag-test\createKG.py", line 24, in <module>
    rows = [row for row in reader]
                           ^^^^^^
  File "C:\Users\michael\AppData\Roaming\uv\python\cpython-3.12.9-windows-x86_64-none\Lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 43: character maps to <undefined>

CSVLoader loading csv with chinese character will result in the above error.
This PR fixes it

Summary by CodeRabbit

Bug Fixes
- Ensured CSV imports use UTF-8 encoding by default, preventing character decoding errors and improving support for international characters across environments.

coderabbitai · 2025-10-02T07:49:30Z

Walkthrough

Added an explicit UTF-8 encoding to the file open call in CSVLoader.load; other logic remains unchanged.

Changes

Cohort / File(s)	Summary
CSV Loader `graphrag_sdk/document_loaders/csv.py`	Specify UTF-8 encoding in `open(..., encoding='utf-8')` within `CSVLoader.load`; no other logic changes.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

I nibble bytes in tidy rows,
A CSV breeze where data flows.
With UTF-8 my whiskers twitch—
No garbled glyphs, no codec glitch!
Hop, parse, yield—so clean, so bright—
Carrots aligned, the text reads right.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title clearly summarizes the main purpose of the changeset by indicating that the pull request enables UTF-8 support for CSV processing, which aligns directly with the addition of the encoding='utf-8' parameter in CSVLoader.load. It is concise, specific, and free of extraneous details.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ce1c8c and ded68fe.

📒 Files selected for processing (1)

graphrag_sdk/document_loaders/csv.py (1 hunks)

🔇 Additional comments (1)

graphrag_sdk/document_loaders/csv.py (1)

17-17: LGTM! Proper fix for Unicode support.

Adding explicit UTF-8 encoding resolves the UnicodeDecodeError and ensures consistent behavior across platforms (Windows defaults to cp1252, which cannot handle Chinese characters). UTF-8 is the correct choice as it's backward-compatible with ASCII and supports all Unicode characters.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Support UTF8 csv

ded68fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support UTF8 csv #134

Support UTF8 csv #134

Uh oh!

mic-code commented Oct 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 2, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Support UTF8 csv #134

Are you sure you want to change the base?

Support UTF8 csv #134

Uh oh!

Conversation

mic-code commented Oct 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mic-code commented Oct 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 2, 2025 •

edited

Loading