Replies: 3 comments 14 replies
-
@Yohahaha Thanks for opening this issue. The FileSystem API needs an overhaul. There has been a lot of confusion/bugs around getFileSystem API and the optional config. We need a cleaner API to fix this and also allow per-session config. |
Beta Was this translation helpful? Give feedback.
-
In Velox, we have the following filesystems:
The current FileSystem APIs are
In the current implementation, it is up to the filesystem implementation to create and cache a filesystem instance. GCS, ABFS, LocalFS cache a single instance. This can change in the future. S3, HFDS cache multiple instances. I propose the following changes after seeking some inspiration from the PrestoFileSystemCache
@Yohahaha The proposed API will allow you to create a file system per session if desired. You can define a new factory with a custom key generator. |
Beta Was this translation helpful? Give feedback.
-
@majetideepak Deepak, thank you for putting together a proposal. Overall looks good. I have a question about refresh API.
FileSystem object is shared and therefore it seems a bit strange that a single user can modify it by calling refresh. Won't this surprise other users? Also, having to pass old config seems inconvenient. Is it needed? Can we remove this parameter? |
Beta Was this translation helpful? Give feedback.
-
Description
FileSystem accept config and as a singleton created at first call of FileHandleGenerator, FileHandleGenerator was hold by HiveConnector and initialize it in ctor.
HiveConnector accept static configs in its ctor, and also accept dynamic(session) configs at runtime after PR #7659. So the issue is what kind of configs should we pass to FileSystem? If FileSystem keep singleton, it need accept HiveConnector's static config for necessary initialization. If FileSystem be per-session instance, we may need move FileHandleGenerator into QueryCtx and use session config to initialize it.
I prefer make FileSystem be per-session instance, it helps to change derived class's behavior at runtime, e.g. accessing S3 with another IAM or apply optimized config without restart.
cc @mbasmanova @majetideepak @zhli1142015
Beta Was this translation helpful? Give feedback.
All reactions