Skip to content

Support encoding option for ftpfs #497

Open
@frafra

Description

@frafra

I am fetching data from a Windows FTP server, which contains some special characters.

Traceback (most recent call last):
  File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/errors.py", line 125, in new_func
    return func(*args, **kwargs)
  File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/opener/ftpfs.py", line 56, in open_fs
    return ftp_fs.opendir(dir_path, factory=ClosingSubFS)
  File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/base.py", line 1247, in opendir
    if not self.getinfo(path).is_dir:
  File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/ftpfs.py", line 682, in getinfo
    directory = self._read_dir(dir_name)
  File "/home/frafra/.cache/pypoetry/virtualenvs/pyfilesystem-sync-qQEmY_5I-py3.8/lib/python3.8/site-packages/fs/ftpfs.py", line 559, in _read_dir
    self.ftp.retrlines(
  File "/usr/lib64/python3.8/ftplib.py", line 461, in retrlines
    line = fp.readline(self.maxline + 1)
  File "/usr/lib64/python3.8/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 49: invalid continuation byte

ipdb session:

ipdb> data
b'10-01-2021  11:00PM       <DIR>          Bilder V\xe4stra G\xf6taland\r\n10-06-2021  10:03AM       <DIR>          SeNorge\r\n'
ipdb> data.decode('utf8')
*** UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe4 in position 49: invalid continuation byte
ipdb> data.decode('windows-1252')
'10-01-2021  11:00PM       <DIR>          Bilder Västra Götaland\r\n10-06-2021  10:03AM       <DIR>          SeNorge\r\n'

Python built-in ftplib can use a different encoding: https://docs.python.org/3/library/ftplib.html#ftplib.FTP

class ftplib.FTP(host='', user='', passwd='', acct='', timeout=None, source_address=None, *, encoding='utf-8')¶

ftpfs does not take "encoding" as parameter:

ftp_fs = FTPFS(
ftp_host,
port=ftp_port,
user=parse_result.username,
passwd=parse_result.password,
proxy=parse_result.params.get("proxy"),
timeout=int(parse_result.params.get("timeout", "10")),
tls=bool(parse_result.protocol == "ftps"),
)

pyfilesystem2/fs/ftpfs.py

Lines 399 to 409 in baa0560

def __init__(
self,
host, # type: Text
user="anonymous", # type: Text
passwd="", # type: Text
acct="", # type: Text
timeout=10, # type: int
port=21, # type: int
proxy=None, # type: Optional[Text]
tls=False, # type: bool
):

I propose to accept encoding as an optional parameter, which should then passed to the FTP constructor.

It would then be possible to connect to resources like: ftp://user:password@ftpserver/path?encoding=windows-1252

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions