Hello,
I understand that this is potentially out of scope for the project, but considering the existence of OfficeFile.is_encrypted() I feel this would tie its usage up nicely.
I'll explain a use case via example:
I am using this to load up a set of usually-encrypted Excel files into pandas, this is great, except a handful of these Excel files have randomly have not been password protected. I don't actually care whether or not they have a password, I just want to put them all into dataframes.
Right now, the argument I pass to pandas.read_excel() is either a non-protected Excel file's Path, or a BytesIO objected retrieved using this library.
This is fine but it has resulted in this messy function:
def decrypt_office_file(file: Path, password: str = None) -> Union[io.BytesIO, Path]:
decrypted_file = io.BytesIO()
with open(file, 'rb') as f:
office_file = msoffcrypto.OfficeFile(f)
if office_file.is_encrypted():
office_file.load_key(password=password)
office_file.decrypt(decrypted_file)
else:
decrypted_file = file
return decrypted_file
excel_file = decrypt_office_file("my_file.xlsx")
df = pd.read_excel(excel_file, ...)
And then I just have to hope everything downstream is cool with taking either a BytesIO or a str/Path, which is okay for pandas but I imagine is less okay for other libraries/use cases.
I'm not sure how it would be best to insert the functionality, but something like OfficeFile.to_bytes() (I'm sure there are better ideas for function names available) would be great, then we can have consistent return types.
I also find it really odd that .decrypt() takes the object you want to inject the file into as an argument, rather than returning a BytesIO object? It makes following the code flow feel awkward to me, but that's an issue for another day!
Hello,
I understand that this is potentially out of scope for the project, but considering the existence of
OfficeFile.is_encrypted()I feel this would tie its usage up nicely.I'll explain a use case via example:
I am using this to load up a set of usually-encrypted Excel files into
pandas, this is great, except a handful of these Excel files have randomly have not been password protected. I don't actually care whether or not they have a password, I just want to put them all into dataframes.Right now, the argument I pass to
pandas.read_excel()is either a non-protected Excel file'sPath, or aBytesIOobjected retrieved using this library.This is fine but it has resulted in this messy function:
And then I just have to hope everything downstream is cool with taking either a
BytesIOor astr/Path, which is okay for pandas but I imagine is less okay for other libraries/use cases.I'm not sure how it would be best to insert the functionality, but something like
OfficeFile.to_bytes()(I'm sure there are better ideas for function names available) would be great, then we can have consistent return types.I also find it really odd that
.decrypt()takes the object you want to inject the file into as an argument, rather than returning a BytesIO object? It makes following the code flow feel awkward to me, but that's an issue for another day!