Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify is_remote_filesystem to return True for FUSE-mounted paths #5885

Closed
wants to merge 3 commits into from

Conversation

maddiedawson
Copy link
Contributor

No description provided.

@maddiedawson maddiedawson changed the title [WIP] Modify is_local_filesystem to return True for FUSE-mounted paths [WIP] Modify is_remote_filesystem to return True for FUSE-mounted paths May 23, 2023
@maddiedawson maddiedawson changed the title [WIP] Modify is_remote_filesystem to return True for FUSE-mounted paths Modify is_remote_filesystem to return True for FUSE-mounted paths May 23, 2023
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@maddiedawson maddiedawson force-pushed the dbfs branch 3 times, most recently from 4a4d3d3 to 8fd672b Compare May 23, 2023 16:17
return False
if isinstance(fs, FuseFileSystem) or fs.protocol != "file":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a FUSE-mounted file system to not be a remote file system?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, but I'm not sure if someone would use Fuse to mount a local fs

@maddiedawson
Copy link
Contributor Author

@lhoestq would you or another maintainer be able to review please? :)

@lhoestq
Copy link
Member

lhoestq commented May 24, 2023

Why you do need to support FUSE mounted paths ?

datasets uses data that live on disk for fast lookups - FUSE mounted disks would lead to poor performance and I wouldn't recomment using it.

@maddiedawson
Copy link
Contributor Author

Fuse is commonly used to mount remote file systems (e.g. S3, DBFS) as a local directory. Since it's slower than using an actual local device, it's better to treat it as remote to reduce latency.

@maddiedawson maddiedawson requested a review from es94129 May 24, 2023 20:09
@lhoestq
Copy link
Member

lhoestq commented May 25, 2023

I think people would be confused if they don't have the same dataset behavior depending on the disk type.

If they want to use a remote bucket they should use the remote URI instead, e.g. s3://.... Advancements on this are tracked at #5281

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants