Skip to content

Conversation

@vchag
Copy link

@vchag vchag commented Nov 3, 2025

These changes were created to address concerns raised in Issues #2630

What changes were proposed in this pull request?

  1. Streamlining Event-to-JSON Serialization:
    The previous architecture involved a redundant, two-step conversion process: events were transformed into intermediate Maps by an abstract class, and only then was the resulting map serialized into JSON (as seen in AwsCloudWatchEventListener).
    Since Jackson JSON is our chosen serialization format, we've removed this unnecessary intermediate mapping step. We introduced Jackson Mixins to provide native JSON serialization support directly on the event objects. This refactoring simplifies the serialization pipeline from:
    Event -> Map -> JSON
    to
    Event -> JSON

  2. Introduces AllEventsForwardingListener, a generic event listener designed to automatically receive and forward all Polaris events without requiring explicit event-type configuration.

  3. Introduces polaris.event-listener.aws-cloudwatch.event-types property: This property introduces a configurable event filter for the AWS CloudWatch event listener.
    It defines which Polaris event types should be serialized and forwarded to CloudWatch Logs.

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

CHANGELOG.md

Copy link
Contributor

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for re-opening a PR @vchag , this is now looking much better.

@vchag vchag force-pushed the feature-aws_cloudwatchlistener-for-all_events branch from 342b6fd to 7910075 Compare November 5, 2025 05:04
vchag added 2 commits November 4, 2025 21:26
…r customizer from AwsCloudWatchEventListener and AwsCloudWatchEventListenerTest

@Override
protected boolean shouldHandle(PolarisEvent event) {
if (event == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: event will never be false here, this test is redundant.


private ExecutorService executorService;
private AutoCloseable mockitoContext;
private static final ObjectMapper objectMapper = new ObjectMapper();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private static final ObjectMapper objectMapper = new ObjectMapper();
private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper();
static {
new PolarisIcebergObjectMapperCustomizer(new MemorySize(BigInteger.valueOf(1024 * 1024)))
.customize(OBJECT_MAPPER);
}

}

@Test
void ensureObjectMapperCustomizerIsApplied() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary test. You are testing a Quarkus feature.

Copy link
Contributor

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for your efforts and for contributing this PR. The MixIn idea is indeed quite interesting and I'm excited if it can work for this use case - it could help make things much cleaner.

However, I don't think you fully considered all angles of why the PropertyMapEventListener was introduced. There are many event fields that likely should not ever be pushed to an external system - for security reasons (such as table credentials that could be exposed in a LoadTableResponse) or for functional reasons (such as when a TableMetadata object is larger than 1MB, the maximum size for a record in CloudWatch Logs). I'm sure there are many other reasons here that are not coming top of mind. PropertyMapEventListener is supposed to be the transformation layer where only relevant information can be parsed out of the event and then sent onwards.

Maybe I could've done a better job to add a comment to the class stating this - but I can't approve this PR without all these considerations addressed. This is the exact reason why it has been taking me a while to complete the functionality through as well.

I am still quite interesting in seeing if MixIns can make things better - but I'm not familiar with what other options MixIns/Jackson provide to do automated parsing and/or redacting of the sensitive/larger/etc objects. I am hoping you can help direct this conversation. I would also recommend you to post these findings not only here on the PR but on the Dev Mailing list, as it may help gain more eyes who can help with this issue.

objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
objectMapper.setPropertyNamingStrategy(new PropertyNamingStrategies.KebabCaseStrategy());
objectMapper.addMixIn(PolarisEvent.class, PolarisEventBaseMixin.class);
objectMapper.addMixIn(TableIdentifier.class, IcebergMixins.TableIdentifierMixin.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought these are taken care of by the RESTSerializers below?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, RESTSerializers have serializers for TableIdentifier and Namespace already.

boolean synchronousMode();

@WithName("event-types")
Optional<Set<Class<? extends PolarisEvent>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible to do? I don't see the ability to do this in the Quarkus public documentation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is possible. Here is an example:

polaris.event-listener.aws-cloudwatch.event-types=\
  org.apache.polaris.service.events.IcebergRestCatalogEvents$BeforeCreateNamespaceEvent,\
  org.apache.polaris.service.events.IcebergRestCatalogEvents$AfterCreateNamespaceEvent

See default converters here:

https://smallrye.io/smallrye-config/Main/config/getting-started/#converters

when(config.awsCloudWatchRegion()).thenReturn("us-east-1");
when(config.synchronousMode()).thenReturn(false); // Default to async mode
when(config.synchronousMode()).thenReturn(false); // default async
when(config.eventTypes()).thenReturn(java.util.Optional.empty()); // handle all events
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
when(config.eventTypes()).thenReturn(java.util.Optional.empty()); // handle all events
when(config.eventTypes()).thenReturn(Optional.empty()); // handle all events

logEvent -> {
String message = logEvent.message();
JsonNode root = objectMapper.readTree(message);
JsonNode event = root.path("event").isMissingNode() ? root : root.path("event");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: shouldn't the @JsonUnwrapped make sure that the event information is not under an "event" object?

}

private void verifyLogGroupAndStreamExist(CloudWatchLogsAsyncClient client) {
// Verify log group exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary change

@Test
void shouldListenToAllEventTypesWhenConfigNotProvided() {
// given: config.eventTypes() is empty → listen to all events
when(config.eventTypes()).thenReturn(java.util.Optional.empty());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
when(config.eventTypes()).thenReturn(java.util.Optional.empty());
when(config.eventTypes()).thenReturn(Optional.empty());

new AwsCloudWatchEventListener(config, clock, objectMapper);

// This is any random PolarisEvent — if the listener listens to all types,
// shouldHandle(event) should return true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the thought behind this test, but technically this is not the right way to test a "random" event. Maybe you create a new "RandomEvent" Java record that extends "PolarisEvent" (making it clear that it definitely cannot be referred to outside the scope of this test class and show that shouldHandle still returns true.

objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
objectMapper.setPropertyNamingStrategy(new PropertyNamingStrategies.KebabCaseStrategy());
objectMapper.addMixIn(PolarisEvent.class, PolarisEventBaseMixin.class);
objectMapper.addMixIn(TableIdentifier.class, IcebergMixins.TableIdentifierMixin.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, RESTSerializers have serializers for TableIdentifier and Namespace already.

boolean synchronousMode();

@WithName("event-types")
Optional<Set<Class<? extends PolarisEvent>>>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is possible. Here is an example:

polaris.event-listener.aws-cloudwatch.event-types=\
  org.apache.polaris.service.events.IcebergRestCatalogEvents$BeforeCreateNamespaceEvent,\
  org.apache.polaris.service.events.IcebergRestCatalogEvents$AfterCreateNamespaceEvent

See default converters here:

https://smallrye.io/smallrye-config/Main/config/getting-started/#converters

# polaris.event-listener.aws-cloudwatch.log-stream=polaris-cloudwatch-default-stream
# polaris.event-listener.aws-cloudwatch.region=us-east-1
# polaris.event-listener.aws-cloudwatch.synchronous-mode=false
# polaris.event-listener.aws-cloudwatch.event-types= // the absence of this property would result in processing all Polaris event types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to add an example of how to actually specify events here:

Suggested change
# polaris.event-listener.aws-cloudwatch.event-types= // the absence of this property would result in processing all Polaris event types.
# The absence of the property below would result in processing all Polaris event types (default).
# polaris.event-listener.aws-cloudwatch.event-types=\
# org.apache.polaris.service.events.IcebergRestCatalogEvents$BeforeCreateNamespaceEvent,\
# org.apache.polaris.service.events.IcebergRestCatalogEvents$AfterCreateNamespaceEvent

@adutra
Copy link
Contributor

adutra commented Nov 6, 2025

However, I don't think you fully considered all angles of why the PropertyMapEventListener was introduced. There are many event fields that likely should not ever be pushed to an external system - for security reasons (such as table credentials that could be exposed in a LoadTableResponse) or for functional reasons (such as when a TableMetadata object is larger than 1MB, the maximum size for a record in CloudWatch Logs). I'm sure there are many other reasons here that are not coming top of mind. PropertyMapEventListener is supposed to be the transformation layer where only relevant information can be parsed out of the event and then sent onwards.

I have some reservations regarding a common transformation layer for all data sinks. While the idea is sound in principle, I'm skeptical about reaching a consensus on the specific transformations to apply and information to redact.

We'll likely agree on redacting sensitive data. However, other transformations could lead to endless debate. For instance, CloudWatch rejects payloads over 1MB, but a Kafka cluster might accept up to 100MB. Similarly, if a use case doesn't require a specific field, another use case might. And finally, transformations should be vendor-agnostic, of course.

This leads me to believe that a "transformation layer," despite good intentions, will never be generic enough to serve all purposes. While we could introduce parameters to influence transformations, this would be a significant and time-consuming effort.

Instead, I propose applying the least possible logic to events, ideally only security-related logic. Events should be passed as-is to the sinks, allowing each sink to perform further transformations and redactions on the event payload. I think we should absolutely avoid specific logic per event type, as this would imply having to special-case each one of the 150+ event types.

JSON mixins should also be as generic as possible. If a sink cannot use them due to them not producing the desired JSON, it should use a different object mapper. Alternatively, a sink could use mixins and the default object mapper to produce not a String, but a JsonNode instance, apply further transformations to it, and then send it to the target remote system.

Finally, I don't believe an intermediary Map representation offers much benefit.

Anyways, I think the transformation layer topic deserves a discussion on the ML.

Maybe I could've done a better job to add a comment to the class stating this - but I can't approve this PR without all these considerations addressed. This is the exact reason why it has been taking me a while to complete the functionality through as well.
I am still quite interesting in seeing if MixIns can make things better - but I'm not familiar with what other options MixIns/Jackson provide to do automated parsing and/or redacting of the sensitive/larger/etc objects. I am hoping you can help direct this conversation. I would also recommend you to post these findings not only here on the PR but on the Dev Mailing list, as it may help gain more eyes who can help with this issue.

I understand and share the security concerns, but we can't delay this PR indefinitely. Manually addressing each event for transformations is incredibly time-consuming, as you've noted.

Asking the original contributor to implement logic for sensitive or larger objects falls outside the PR's original scope and will take too long (we have 150+ events!). I propose merging this PR now and tackling the transformation layer as a follow-up.

Let's not forget: the events feature is still in beta for a good reason.

@adnanhemani
Copy link
Contributor

I have some reservations regarding a common transformation layer for all data sinks. While the idea is sound in principle, I'm skeptical about reaching a consensus on the specific transformations to apply and information to redact.

I completely agree with this actually - a common transformation layer for all data sinks is likely not possible. But there can (and should) be groupings of Event Listener implementations that follow similar transformations. i.e. AWS CloudWatch Logs, Google Cloud Logging, and Azure Monitor Logs (I've not begun work on the latter two but are future enhancements I am planning for) all have similar maximum record sizes, and we should safely remove the TableMetadata objects wherever they exist in an event for these sorts of listeners as a result.

Finally, I don't believe an intermediary Map representation offers much benefit.

I don't think having an intermediary Map representation is a requirement by any means, but it seemed like the most straight-forward way to grab only the relevant information from each event - even if it is quite verbose. If there is a way to make this much cleaner using MixIns solely, given the concerns that exist, I am all for it.

I understand and share the security concerns, but we can't delay this PR indefinitely.

While I understand the thought here, I cannot advocate/approve for this PR if it does not resolve the security concerns, at a minimum. We should not be introducing security concerns into our code base (regardless of "beta" status of a feature) in the name of simply moving things forward. Polaris has not done so in the past, and we should continue to apply the same bar for the future. cc @snazy who has pointed out many security concerns in the past.

If you're still in support of merging the PR through without resolving the security concerns, let's start a thread on the ML to ensure the community is aware of this security gap, supports this decision, and acknowledges the precedence that this may inadvertently set for the future.

Manually addressing each event for transformations is incredibly time-consuming, as you've noted.

That’s precisely why only one event was initially wired through in the original PRs. I’m not against adding more events — a few at a time — as long as they’re properly reviewed to ensure their payloads don’t introduce any issues. Even in that case, we should first outline a clear approach for handling events that may require additional filtering. Committing fully to a method without understanding how it will accommodate all scenarios wouldn’t be a prudent path forward.

@adutra
Copy link
Contributor

adutra commented Nov 7, 2025

I’m not against adding more events — a few at a time — as long as they’re properly reviewed to ensure their payloads don’t introduce any issues.

Let's assume we take 1 week of work per event. We have 150+ events. This effort would take almost 3 years to complete.

I don't think this is sustainable. We must take a step back and question the current design.

While I understand the thought here, I cannot advocate/approve for this PR if it does not resolve the security concerns, at a minimum.

The author (@vchag) will have to spend weeks, maybe more, chasing all the possible security issues. What will happen is that they will lose motivation, close the PR, and move on to something else.

@adnanhemani
Copy link
Contributor

Let's assume we take 1 week of work per event. We have 150+ events. This effort would take almost 3 years to complete.

I don't think a vast majority of events will require more than a few minutes of verification. The ones that will take a longer time can be temporarily not supported while the community works on them. Sure, it's more than a day's worth of effort to get this to work, but still worth it IMO in comparison to inadvertently introducing known security concerns.

Here's what I can suggest to continue moving things forward:

  • A quick screening of payloads to verify there is no security concerns with the payloads being sent. If an event is flagged for potential issues, remove it from being emitted with a TODO for later fix.
  • A known-to-work approach (approved by the community) for how do we redact (and/or strategically omit) information that may be nested within the Events payloads, as that will be necessary for the above point.

WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants