Skip to content

Require explicit user-consent to enable HadoopFileIO #1532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

snazy
Copy link
Member

@snazy snazy commented May 6, 2025

Using HadoopFileIO in Polaris can enable "hidden features" that users are likely not aware of. This change requires users to manually update the configuration to be able to use HadoopFileIO in way that highlights the consequences of enabling it.

This PR updates Polaris in multiple ways:

  • The default of SUPPORTED_CATALOG_STORAGE_TYPES is changed to not include the FILE storage type.
  • Respect the ALLOW_SPECIFYING_FILE_IO_IMPL configuration on namespaces, tables and views to prevent setting an io-impl value for anything but one of the configured, supported storage-types.
  • Unify validation code in a new class IcebergPropertiesValidation.
  • Using FILE or HadoopFileIO now also requires the explicit configuration ALLOW_INSECURE_STORAGE_TYPES_ACCEPTING_SECURITY_RISKS=true.
  • Added production readiness checks that trigger when ALLOW_INSECURE_STORAGE_TYPES_ACCEPTING_SECURITY_RISKS is true or SUPPORTED_CATALOG_STORAGE_TYPES contains FILE (defaults and per-realm).
  • The two new readiness checks are considered severe. Severe readiness-errors prevent the server from starting up - unless the user explicitly configured polaris.readiness.ignore-security-issues=true.

Log messages and configuration options explicitly use "clear" phrases highlighting the consequences.

With these changes it is intentionally extremely difficult to start Polaris with HadoopFileIO. People who work around all these safety nets must have realized that what they are doing.

A lot of the test code relies on FILE/HadoopFileIO, those tests got all the configurations to let those tests continue to work as they are, bypassing the added security safeguards.

"polaris.features.defaults.\"INITIALIZE_DEFAULT_CATALOG_FILEIO_FOR_TEST\"",
"true",
"polaris.features.defaults.\"SUPPORTED_CATALOG_STORAGE_TYPES\"",
"[\"FILE\"]");
"[\"FILE\",\"S3\"]",
"polaris.readiness.ignore-security-issues",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYT about putting this into a global quarkus test config profile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - but I guess all Quarkus profiles should either be special per-test or unified/composable or moved to a central place.

Copy link
Contributor

@dimas-b dimas-b May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

composition works by default... one can have application.properties and profile classes at the same time and their properties are merged, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - but let's defer to a separate PR to clean all those up.

@snazy snazy force-pushed the explicit-hadoop-fio branch from 0a748ba to 3f65378 Compare May 6, 2025 13:42
@snazy snazy added bug Something isn't working 1.0-blocker labels May 6, 2025
@eric-maynard eric-maynard removed the bug Something isn't working label May 6, 2025
@eric-maynard
Copy link
Contributor

This is not a bug.

Copy link
Contributor

@eric-maynard eric-maynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot going on with this PR and I think it might be good to separate out some of these changes. For example, just removing FILE as a default supported storage type would be valuable.

.defaultValue(false)
.buildFeatureConfiguration();

public static final FeatureConfiguration<Boolean>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this isn't completely redundant with SUPPORTED_CATALOG_STORAGE_TYPES

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's there to make the fact mentioned in the description explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you put FILE in SUPPORTED_CATALOG_STORAGE_TYPES that seems quite explicit no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard You are correct in that this flag is redundant. However, I believe the intent here is to make the consequences 100% explicit.

I can see how some newcomers may overlook the SUPPORTED_CATALOG_STORAGE_TYPES=[..., "FILE"] during a configuration review and not realize the consequences it could have. Whereas an environment variable that contains "insecure" and "security risks" will definitely raise eyebrows.

Think about deleting a Github repository where you have to go to the Danger Zone, click "Delete, confirm, then confirm a second time, then type the repository name and confirm a third time". Here, given the consequences, I think it is worth having a double-confirmation of the FILE storage type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a hard time believing we can't incrementalize this change the evaluate the necessity of the individual components of it. We're going from this being in the default config for multiple releases to requiring multiple flags to enable it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, because HadoopFileIO is considered insecure, we want to make it hard for users to do the wrong thing. Hence the addition of multiple flags. With the current way, it is trivial to accidentally overlook the activation of HadoopFileIO.

There could be other proposals for this implementation. But then they will be evaluated against this one goal: make it hard for users to do the wrong thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are (and maybe this PR is) conflating two things -- FILE storage type and HadoopFileIO. They are not always the same.

SUPPORTED_CATALOG_STORAGE_TYPES is probably sufficient to disallow the FILE storage type. FILE shouldn't be in the default, but I think it's too great a leap to go from having it in the default to requiring multiple flags to enable it.

For disallowing HadoopFileIO, we need a different approach than what's done in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concrete suggestions for improvements are always welcome.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, this PR addresses some issue that should really be mitigated quickly.

@@ -55,26 +62,53 @@ public class ProductionReadinessChecks {
*/
private static final String WARNING_SIGN_UTF_8 = "\u0000\u26A0\uFE0F";

private static final String SEVERE_SIGN_UTF_8 = "\u0000\uD83D\uDED1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes seem pretty unnecessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think that there are nuances in what's a warning and what's severe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That may be so, but I don't think this change is necessary to Require explict user-consent to enable HadoopFileIO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard I am not sure what the ask is here. Would you rather have the UTF-8 icon and/or the log message implemented in a separate PR? Or do you mean that the PR title should be a bit more general?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I think a separate PR for these changes could be a good idea. If this PR really is a bugfix, let's not couple unnecessary changes with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be mindful on the overhead that too fine-grained PR would cause though...

The changes in this PR are cohesive to one another. And it is quite small. So it makes sense to submit the changes in a single PR. Splitting a PR into many tiny PRs would prevent reviewers from understanding the big picture and would not have any functional merit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think the changes are "related", I think it's fair to do the changes in one PR (even if it's just "display/rendering" messages).

Copy link
Contributor

@eric-maynard eric-maynard May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely fair, it's only a question of necessity. If there really is an urgent 1.0-blocking bugfix to be made, must we include these changes here? Are these logging changes really 1.0-blocking?

Comment on lines +72 to +74
&& !configStore
.getConfiguration(ctx, SUPPORTED_CATALOG_STORAGE_TYPES)
.contains(storageType.name())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage types and File IO implementations are not necessarily the same thing; you could have any number of FileIO implementations that can talk to S3

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes? That's what fromFileIoImplementation() figures out, no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the hard-coded FileIO implementations (assuming the implementation you have in mind is actually the one on the classpath), yes. For other implementations, no.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard could you provide an example of what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's one in the repo here. It uses Azure, but it's not represented in StorageTypeFileIO. How is that meant to behave?

&& !configStore.getConfiguration(
ctx, ALLOW_INSECURE_STORAGE_TYPES_ACCEPTING_SECURITY_RISKS)) {
throw new ValidationException(
"File IO implementation '%s' (storage type '%s') is considered insecure and must not be used",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes a custom File IO insecure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What makes it secure?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing. That's why there's nothing in our current documentation or code that promises that it is secure.

@snazy
Copy link
Member Author

snazy commented May 7, 2025

This is not a bug.

It is. And it is very severe.

@snazy snazy added the bug Something isn't working label May 7, 2025
@eric-maynard
Copy link
Contributor

How exactly is this a bug?

@flyrain
Copy link
Contributor

flyrain commented May 7, 2025

  S3("org.apache.iceberg.aws.s3.S3FileIO", true),
  GCS("org.apache.iceberg.gcp.gcs.GCSFileIO", true),
  AZURE("org.apache.iceberg.azure.adlsv2.ADLSFileIO", true),
  FILE("org.apache.iceberg.hadoop.HadoopFileIO", false),

Thanks for bring this up. It's a good point to clarify whether the HadoopFileIO is suitable for production. However, there is no guarantee any of above strings ensure a secure FileIO. They can be easily injected by any customized build or even completely different implementations with the same name. I believe it is ultimately Polaris admin's responsibility to make sure their Polaris deployment as secure as possible. We(Polaris developer/community) can provide suggestions to them for sure. With that, I'm +1 on well documented behavior on HadoopFileIO, but I don't think these checking and new configurations are necessary. SUPPORTED_CATALOG_STORAGE_TYPES should be good enough to let Polaris admin make the decision.

@snazy
Copy link
Member Author

snazy commented May 7, 2025

whether the HadoopFileIO is suitable for production

@flyrain you should really know why it's not

@snazy snazy force-pushed the explicit-hadoop-fio branch from 3f65378 to c149a7f Compare May 8, 2025 14:11
Comment on lines +27 to +33
S3("org.apache.iceberg.aws.s3.S3FileIO", true),

GCS("org.apache.iceberg.gcp.gcs.GCSFileIO", true),

AZURE("org.apache.iceberg.azure.adlsv2.ADLSFileIO", true),

FILE("org.apache.iceberg.hadoop.HadoopFileIO", false),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't actually guarantee that the implementation found at this location by the classloader is the one you have in mind

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but again, I'd suggest improving the "precision" of checks later, after merging this PR.

Copy link
Contributor

@eric-maynard eric-maynard May 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feature configs are forever -- if we won't ever be able to make the check precise, we shouldn't ever add a config that implies a precise check. I don't see how the check can be made precise, so I am opposed to adding the config now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, HadoopFileIO is used for more than just FILE -- it's the fallback defined here. In the past we've seen it used for AZURE paths.

static StorageTypeFileIO fromFileIoImplementation(String fileIoImplementation) {
var type = FILE_TO_TO_STORAGE_TYPE.get(fileIoImplementation);
if (type == null) {
throw new IllegalArgumentException("Unknown FileIO implementation: " + fileIoImplementation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected to throw for a custom fileIO?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I guess this goes beyond the original intent behind this PR 🤔

@snazy : WDYT about deferring the enforcement of the compile-time list of FileIO implementations?

@dimas-b
Copy link
Contributor

dimas-b commented May 9, 2025

INITIALIZE_DEFAULT_CATALOG_FILEIO_FOR_TEST

It was present before this PR, in IcebergCatalog

@flyrain
Copy link
Contributor

flyrain commented May 10, 2025

INITIALIZE_DEFAULT_CATALOG_FILEIO_FOR_TEST was already present, but I don’t think that justifies promoting it to a first-class config flag. In fact, we shouldn’t be exposing this at all, it’s a test-only utility and should be removed.
That said, I don’t think this PR needs to handle the removal. A more reasonable approach would be to leave it as-is for now and clean it up in a follow-up PR.
Let’s not legitimize it by formalizing it in the public config surface.

@dimas-b
Copy link
Contributor

dimas-b commented May 10, 2025

INITIALIZE_DEFAULT_CATALOG_FILEIO_FOR_TEST was already present, but I don’t think that justifies promoting it to a first-class config flag. In fact, we shouldn’t be exposing this at all, it’s a test-only utility and should be removed.

Good points.

@dennishuo : is it removable now? IIRC it had some usecase before, but I'm not sure whether it is still needed 🤔

@eric-maynard
Copy link
Contributor

INITIALIZE_DEFAULT_CATALOG_FILEIO_FOR_TEST was already present, but I don’t think that justifies promoting it to a first-class config flag.

+1, it was left out of the original PolarisConfiguration change for a reason and this is also what I am saying here

As for removing it, I don't really have an opinion. It's certainly not a change that seems necessary to fix any kind of bug.

* risks.
*/
@WithDefault("false")
boolean ignoreSevereIssues();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides what's already been pointed out about the validity of the FileIO check done here, this is not a good config name. What is a "severe issue"? Does it mean that we don't ignore mild issues? Are there super-severe issues?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard "Severe" is a standard term to denote the most important information. See java.util.logging.Level for an example of how it is used in the context of logs.

Copy link
Contributor

@eric-maynard eric-maynard May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a log level of SEVERE and a config called "ignore severe issues" are completely different things. Is the implication that this config would disable failure in all cases where we use the SEVERE logging level?

Even if that is the case, my questions stand. Does this config not apply to less-severe issues? What guidelines will qualify an issue as severe (necessitating a SEVERE log)? Have we gone back and audited all "issues" to verify none should be considered SEVERE? What even is an "issue"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eric-maynard I don't know how to answer your questions. They seem related to a much bigger scope than the scope of this PR. Especially the audit part. We could extend the logic that in subsequent PRs, so that more cases are covered, if it is what you have in mind.

I guess my question to you is : how can we move this PR forward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my comment lays out a path:

it might be good to separate out some of these changes

In the interest of moving things forward I linked some examples of what more incremental improvements might look like below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure it lays out a path, that's the thing. What value would be created out of splitting this PR into smaller PRs?

It could also possibly be a topic for a ML discussion if there are PR guidelines that are not explicitly written in CONTRIBUTING.md. Maybe I am missing something...?

Copy link
Member

@RussellSpitzer RussellSpitzer May 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should aim for small independent code changes whenever possible. Just because two things work well together doesn't mean they should be in the same Pull request. As a generic guide, 1 "feature" per PR, small is better than bigger, independent changes should stay independent. IMHO, if the title can't cover the scope of the changes then the PR should be broken up. I think we have several independent things here as noted in the PR description. Ideally for me even refactors go in separate pull-requests. We've been a bit lax on this recently as we have adopted code from other projects but I don't think we should get in the habit.

The general benefits of this approach are for downstream integrators which can cherry-pick changes easier, and reviewers will have an easier time since net small changes are much easier to understand and usually easier to come to agreement on. (Bisecting also becomes much easier). If PR's can only be understood of a larger context, then grouping them in a project is usually the right thing to do or explaining the context in a design doc.

I think this pr itself does a great job of mapping out the independent changes in a bulleted list. IMHO, each of those points could probably approached independently since I think we already have consensus on at least some of the Items.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not about PR guidelines -- there are a bunch of issues with the code changes proposed in this PR that I've highlighted in various comments. Even the questions asked in this thread are not resolved. I still don't know what a "severe issue" is.

Breaking up the change could allow us to move forward with some of the less controversial aspects of the change, but if you're opposed to that I'm also happy to continue hashing out these issues here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I generally agree with the last comments, specifically in this PR, I consider the changes "related": several/rendering messages is related to clearly inform the users about allowing insecure file types.
So, it's fair to "group" related changes in one PR in this case.
Another option would have been to just focus on "warning messages" (not rendered) when allowing insecure file types. As I'm sure that 90% of the users won't use insecure file types (and use the default configuration), a warning (not seen 90% of the time) would be sufficient.

@snazy snazy force-pushed the explicit-hadoop-fio branch 2 times, most recently from e610053 to 23f49d2 Compare May 12, 2025 07:56
@eric-maynard
Copy link
Contributor

eric-maynard commented May 12, 2025

See #1566 for a simple implementation removing FILE from the storage types supported by default.

See this branch for an example of a simple way to prohibit the default file IO factory from letting the service use a HadoopFileIO.

errors.add(
Error.ofSevere(
format(
"The storage type '%s' is considered insecure and to expose the service to severe security ricks!",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The storage type '%s' is considered insecure and to expose the service to severe security ricks!",
"The storage type '%s' is considered insecure and exposes the service to severe security risks!",

errors.add(
Error.ofSevere(
format(
"The storage type '%s' is considered insecure and to expose the service to severe security ricks!",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The storage type '%s' is considered insecure and to expose the service to severe security ricks!",
"The storage type '%s' is considered insecure and exposes the service to severe security risks!",

@jbonofre jbonofre changed the title Require explict user-consent to enable HadoopFileIO Require explicit user-consent to enable HadoopFileIO May 13, 2025
jbonofre
jbonofre previously approved these changes May 13, 2025
.defaultValue(false)
.buildFeatureConfiguration();

public static final FeatureConfiguration<Boolean> ALLOW_INSECURE_STORAGE_TYPES =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a user facing property, we need to be consistent (hard to change later).
I think this property is good enough as it's generic to any storage type (not only FILE).

@@ -109,7 +109,7 @@ polaris.realm-context.header-name=Polaris-Realm
polaris.realm-context.require-header=false

polaris.features.defaults."ENFORCE_PRINCIPAL_CREDENTIAL_ROTATION_REQUIRED_CHECKING"=false
polaris.features.defaults."SUPPORTED_CATALOG_STORAGE_TYPES"=["S3","GCS","AZURE","FILE"]
polaris.features.defaults."SUPPORTED_CATALOG_STORAGE_TYPES"=["S3","GCS","AZURE"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: For clarify, I wonder if we should not add polaris.features.defaults."ALLOW_INSECURE_STORAGE_TYPES"=false here.

@@ -55,26 +62,53 @@ public class ProductionReadinessChecks {
*/
private static final String WARNING_SIGN_UTF_8 = "\u0000\u26A0\uFE0F";

private static final String SEVERE_SIGN_UTF_8 = "\u0000\uD83D\uDED1";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think the changes are "related", I think it's fair to do the changes in one PR (even if it's just "display/rendering" messages).

* risks.
*/
@WithDefault("false")
boolean ignoreSevereIssues();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I generally agree with the last comments, specifically in this PR, I consider the changes "related": several/rendering messages is related to clearly inform the users about allowing insecure file types.
So, it's fair to "group" related changes in one PR in this case.
Another option would have been to just focus on "warning messages" (not rendered) when allowing insecure file types. As I'm sure that 90% of the users won't use insecure file types (and use the default configuration), a warning (not seen 90% of the time) would be sufficient.

@flyrain
Copy link
Contributor

flyrain commented May 17, 2025

As I said in the email thread, I don't think ALLOW_INSECURE_STORAGE_TYPES is necessary, but I'm OK with it. With that, +0 on the PR.

dimas-b and others added 7 commits May 19, 2025 11:02
Using `HadoopFileIO` in Polaris can enable "hidden features" that users are likely not aware of. This change requires users to manually update the configuration to be able to use `HadoopFileIO` in way that highlights the consequences of enabling it.

This PR updates Polaris in multiple ways:
* The default of `SUPPORTED_CATALOG_STORAGE_TYPES` is changed to not include the `FILE` storage type.
* Respect the `ALLOW_SPECIFYING_FILE_IO_IMPL` configuration on namespaces, tables and views to prevent setting an `io-impl` value for anything but one of the configured, supported storage-types.
* Unify validation code in a new class `IcebergPropertiesValidation`.
* Using `FILE` or `HadoopFileIO` now _also_ requires the explicit configuration `ALLOW_INSECURE_STORAGE_TYPES_ACCEPTING_SECURITY_RISKS=true`.
* Added production readiness checks that trigger when `ALLOW_INSECURE_STORAGE_TYPES_ACCEPTING_SECURITY_RISKS` is `true` or `SUPPORTED_CATALOG_STORAGE_TYPES` contains `FILE` (defaults and per-realm).
* The two new readiness checks are considered _severe_. Severe readiness-errors prevent the server from starting up - unless the user explicitly configured `polaris.readiness.ignore-security-issues=true`.

Log messages and configuration options explicitly use "clear" phrases highlighting the consequences.

With these changes it is intentionally extremely difficult to start Polaris with HadoopFileIO. People who work around all these safety nets must have realized that what they are doing.

A lot of the test code relies on `FILE`/`HadoopFileIO`, those tests got all the configurations to let those tests continue to work as they are, bypassing the added security safeguards.
@snazy snazy force-pushed the explicit-hadoop-fio branch from 23f49d2 to 6403276 Compare May 19, 2025 09:21
@snazy snazy requested a review from singhpk234 as a code owner May 19, 2025 09:21
@snazy snazy dismissed flyrain’s stale review May 19, 2025 09:25

Dismissing @flyrain's RFC due to his comment above and this mail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0-blocker bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants