You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- [x] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
run python build_index.py on a windows machine with german settings (this has cp1252 as default)
the program fails with a UnicodeDecodeError as can be seen in the logs below
The problem can be easily fixed if you set the codepage to utf-8 in the terminal/shell/powershell,
e.g. in powershell:
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
[Console]::InputEncoding = [System.Text.Encoding]::UTF8
We should add this information to the docs.
I can create a PR for this if you consider the information usefull (I do :-) )
Any log messages given by the failure
Failed Build
(.venv) PS C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial> python build_index.py
Data directory 'C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\data/product-info/' exists and contains 20 files.
Crack and chunk files from local path: C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\data/product-info/
Start embedding using connection with id = ...
Start creating index from embeddings.
Successfully created index at C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\tutorial-index-mlindex
Method indexes: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class Index: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Exception in thread Thread-19 (_readerthread):
Traceback (most recent call last):
File "c:\Program Files\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "c:\Program Files\Python311\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "c:\Program Files\Python311\Lib\subprocess.py", line 1599, in _readerthread
buffer.append(fh.read())
^^^^^^^^^
File "c:\Program Files\Python311\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 271: character maps to
Uploading tutorial-index-mlindex (0.0 MBs): 100%|#####################################################################################################################################| 1296/1296 [00:00<00:00, 1996.11it/s]
Fix. e.g. for powershell
[Console]::OutputEncoding = [System.Text.Encoding]::UT
[Console]::InputEncoding = [System.Text.Encoding]::UTF8
Expected/desired behavior
with the fix above it runs fine e.g.
(.venv) PS C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial> python build_index.py
Data directory 'C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\data/product-info/' exists and contains 20 files.
Crack and chunk files from local path: C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\data/product-info/
Start embedding using connection with id = ...
Start creating index from embeddings.
Successfully created index at C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\tutorial-index-mlindex
Method indexes: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class Index: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
not OS specific
Versions
not version specific
Mention any other details that might be useful
Thanks! We'll be in touch soon.
The text was updated successfully, but these errors were encountered:
This issue is for a: (mark with an
x
)Minimal steps to reproduce
The problem can be easily fixed if you set the codepage to utf-8 in the terminal/shell/powershell,
e.g. in powershell:
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
[Console]::InputEncoding = [System.Text.Encoding]::UTF8
We should add this information to the docs.
I can create a PR for this if you consider the information usefull (I do :-) )
Any log messages given by the failure
Expected/desired behavior
OS and Version?
Versions
Mention any other details that might be useful
The text was updated successfully, but these errors were encountered: