Skip to content

Add checks for unsupported characters within image path and output path#4390

Open
Pfleiderer-Adrian wants to merge 1 commit intoInsightSoftwareConsortium:mainfrom
Pfleiderer-Adrian:master
Open

Add checks for unsupported characters within image path and output path#4390
Pfleiderer-Adrian wants to merge 1 commit intoInsightSoftwareConsortium:mainfrom
Pfleiderer-Adrian:master

Conversation

@Pfleiderer-Adrian
Copy link

Description

I encountered an issue while attempting to load images using the ITK / SimpleITK framework in my Python script. The traceback indicates that the problem arises when attempting to create an IO object for reading an image file that contains Umlauts (ä, ü, ö) in its path. See: #4388

Providing a more descriptive error message indicating that the path cannot be read due to special characters, such as Umlauts, would be helpful for users to understand and address the issue.

Changes

I have added a simple filepath check for illegal characters in the _NewImageReader and imwrite routines. Additionally, for the imwrite routine, a check was added to verify if the output directory exists.

PR Checklist

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for contributing a pull request! 🙏

Welcome to the ITK community! 🤗👋☀️

We are glad you are here and appreciate your contribution. Please keep in mind our community participation guidelines. 📜
More support and guidance on the contribution process can be found in our contributing guide. 📖

This is an automatic message. Allow for time for the ITK community to be able to read the pull request and comment
on it.

Copy link
Member

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this check in makes sense via Python interface. Mostly looks good.

@dzenanz dzenanz requested review from blowekamp and thewtex January 10, 2024 16:04
@blowekamp
Copy link
Member

There may be something in the SWIG layer that can be down with the conversion. There is some information here:
https://www.swig.org/Doc3.0/Python.html#Python_nn77

That needs to be gone through.

@jhlegarreta
Copy link
Member

jhlegarreta commented Jan 10, 2024

@Pfleiderer-Adrian thanks for having a look at this. A few comments:

@dzenanz
Copy link
Member

dzenanz commented Jan 10, 2024

The second commit should be squashed into the first one.

Copy link
Member

@blowekamp blowekamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple suggestions.

try:
filename.encode('ascii')
except UnicodeEncodeError:
msg += "\nThe output path contains not supported special characters. \n" + f"Filename = {filename}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be useful to place this check into a function.

@Pfleiderer-Adrian
Copy link
Author

Thank you for your feedback! I'll post an update this weekend. :)

@N-Dekker
Copy link
Contributor

Thank you for diving into this issue, @Pfleiderer-Adrian Do I understand correctly that this issue (#4388) is only a problem for Windows users, not for Linux users? (Specifically, only those Windows users that do not have UTF-8 support enabled and selected a locale that does not support the filename?) I'm asking, because it feels too strict to only accept ASCII characters from now, for all users.

So is it really the intention to reject any non-ASCII character in file names, for any user? Aren't we then throwing the baby out with the bath water?

Would it be an idea to only do those checks after trying to use imageIO, and only when imageIO has really failed to do its job?

- In `imwrite` function, add validation to ensure the output directory exists,
  raising an error if not.
- Check for unsupported special characters in file paths, raising an error if
  present.
- Apply similar checks to `itkTemplate` to validate input file paths for existence and encoding.
Copy link
Member

@hjmjohnson hjmjohnson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be addressed before this is finalized.

#4390 (comment)

@N-Dekker
Copy link
Contributor

Nowadays, UTF-8 support appears possible on any supported Windows machine, using a manifest file (XML) and a SetConsoleOutputCP(CP_UTF8) function call, as implemented for elastix by Bernhard Fröhler (@codeling): SuperElastix/elastix#1401 To be considered for ITK as well...?

Copy link
Member

@dzenanz dzenanz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes the Python wrapping behave differently from C++.


# Check if output path exists
msg = ""
if not os.path.isdir(os.path.dirname(filename)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will most likely interfere with Slicer's MRML ID ImageIO. Scratch that - Slicer uses C++ version of ITK for "writing" temporary files.

msg = ""
if not os.path.isfile(inputFileName):
msg += "\nThe file doesn't exist. \n" + f"Filename = {inputFileName}"
# Check if image path contains not supported special characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not supported -> unsupported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:Python wrapping Python bindings for a class

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants