-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Add Support for S3A prefix #214
Comments
Do other DMLs(e.g., insert, delete)work? Can you share the stack of the error? |
Can you remove this config and try again?
|
Yes, the other DML commands work as expected, and I also removed the config above, but it still results in an error. This is the code I ran
|
@TheerachotTle I think the issue is this config:
If you refer to the quickstart guide, it gives an example of Spark configs that can be used to connect to an Iceberg REST catalog. Having said that, I think |
if this is specific to |
Removing the config and it still doesn't work. From my understanding, the remove_orphan_files operation involves file listing to determine which files should be removed, and the Spark procedure uses Hadoop FS to perform listing operations.
I have tried this procedure with other Iceberg catalogs, and it has the same problem when using the s3:// prefix. I'm not sure if the title should be changed to be about this procedure? |
Yup, I'm guessing the failure is triggered due to procedure is using the Spark Hadoop FS while other DML commands use the FileIO from the iceberg catalog. It more likely a config thing than a bug, but I need to take a close look. Would you share a way to to reproduce it? for example, the spark version and config, and the command used to call the procedure. |
I'm using spark 3.5.0
config of spark
code to reproduce
|
This is an Iceberg issue instead of a Polaris one. To summarize, DML commands and procedures usually use
|
Here is another old thread on Iceberg slack about this issue
Since listPrefix is now available, maybe we can update the procedure to use FileIO. I will create an issue in Iceberg. |
Oh great! There is already a PR for this. |
Thanks @anuragmantri for chiming in. It'd be ideal to use Iceberg We will still need a workaround at this moment though, as the Iceberg change and release will take a while. You can customize your iceberg lib of course, but not every user is able to do that. @dennishuo mentioned a workaround here. It doesn't work for me locally, but worth to try. cc @TheerachotTle
|
How about replacing |
Polaris doesn't allow me to create a catalog with this prefix.
With this config, I can use remove_orphan_files without any error. |
Let's document it before it is fixed in the Iceberg side, actually it should be documented in Iceberg side. |
Is your feature request related to a problem? Please describe.
I have set the allowed location of the created catalog to S3 storage type using
s3://
prefix. When I runremove_orphan_files
procedure in Spark, it results in an error message:No FileSystem for scheme "s3"
. To solve this problem, I attempted to create the catalog with thes3a://
prefix, but I received a 400 Bad Request error with the message:Location prefix not allowed
.Here's my spark configuration
Describe the solution you'd like
Probably add the s3a:// prefix as an alternative for the S3 storage type.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: