-
Notifications
You must be signed in to change notification settings - Fork 328
Introducing changes to enable forwarding of Polaris events to AWS Cloudwatch. #2962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Introducing changes to enable forwarding of Polaris events to AWS Cloudwatch. #2962
Conversation
adutra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for re-opening a PR @vchag , this is now looking much better.
...g/apache/polaris/service/events/listeners/aws/cloudwatch/AwsCloudWatchEventListenerTest.java
Outdated
Show resolved
Hide resolved
...a/org/apache/polaris/service/events/listeners/aws/cloudwatch/AwsCloudWatchEventListener.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/polaris/service/events/listeners/AllEventsForwardingListener.java
Outdated
Show resolved
Hide resolved
...a/org/apache/polaris/service/events/listeners/aws/cloudwatch/AwsCloudWatchEventListener.java
Outdated
Show resolved
Hide resolved
...a/org/apache/polaris/service/events/listeners/aws/cloudwatch/AwsCloudWatchEventListener.java
Outdated
Show resolved
Hide resolved
...a/org/apache/polaris/service/events/listeners/aws/cloudwatch/AwsCloudWatchEventListener.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/polaris/service/events/listeners/AllEventsForwardingListener.java
Outdated
Show resolved
Hide resolved
342b6fd to
7910075
Compare
…r customizer from AwsCloudWatchEventListener and AwsCloudWatchEventListenerTest
|
|
||
| @Override | ||
| protected boolean shouldHandle(PolarisEvent event) { | ||
| if (event == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: event will never be false here, this test is redundant.
|
|
||
| private ExecutorService executorService; | ||
| private AutoCloseable mockitoContext; | ||
| private static final ObjectMapper objectMapper = new ObjectMapper(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| private static final ObjectMapper objectMapper = new ObjectMapper(); | |
| private static final ObjectMapper OBJECT_MAPPER = new ObjectMapper(); | |
| static { | |
| new PolarisIcebergObjectMapperCustomizer(new MemorySize(BigInteger.valueOf(1024 * 1024))) | |
| .customize(OBJECT_MAPPER); | |
| } |
| } | ||
|
|
||
| @Test | ||
| void ensureObjectMapperCustomizerIsApplied() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary test. You are testing a Quarkus feature.
adnanhemani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for your efforts and for contributing this PR. The MixIn idea is indeed quite interesting and I'm excited if it can work for this use case - it could help make things much cleaner.
However, I don't think you fully considered all angles of why the PropertyMapEventListener was introduced. There are many event fields that likely should not ever be pushed to an external system - for security reasons (such as table credentials that could be exposed in a LoadTableResponse) or for functional reasons (such as when a TableMetadata object is larger than 1MB, the maximum size for a record in CloudWatch Logs). I'm sure there are many other reasons here that are not coming top of mind. PropertyMapEventListener is supposed to be the transformation layer where only relevant information can be parsed out of the event and then sent onwards.
Maybe I could've done a better job to add a comment to the class stating this - but I can't approve this PR without all these considerations addressed. This is the exact reason why it has been taking me a while to complete the functionality through as well.
I am still quite interesting in seeing if MixIns can make things better - but I'm not familiar with what other options MixIns/Jackson provide to do automated parsing and/or redacting of the sensitive/larger/etc objects. I am hoping you can help direct this conversation. I would also recommend you to post these findings not only here on the PR but on the Dev Mailing list, as it may help gain more eyes who can help with this issue.
| objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); | ||
| objectMapper.setPropertyNamingStrategy(new PropertyNamingStrategies.KebabCaseStrategy()); | ||
| objectMapper.addMixIn(PolarisEvent.class, PolarisEventBaseMixin.class); | ||
| objectMapper.addMixIn(TableIdentifier.class, IcebergMixins.TableIdentifierMixin.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought these are taken care of by the RESTSerializers below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, RESTSerializers have serializers for TableIdentifier and Namespace already.
| boolean synchronousMode(); | ||
|
|
||
| @WithName("event-types") | ||
| Optional<Set<Class<? extends PolarisEvent>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this possible to do? I don't see the ability to do this in the Quarkus public documentation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is possible. Here is an example:
polaris.event-listener.aws-cloudwatch.event-types=\
org.apache.polaris.service.events.IcebergRestCatalogEvents$BeforeCreateNamespaceEvent,\
org.apache.polaris.service.events.IcebergRestCatalogEvents$AfterCreateNamespaceEventSee default converters here:
https://smallrye.io/smallrye-config/Main/config/getting-started/#converters
| when(config.awsCloudWatchRegion()).thenReturn("us-east-1"); | ||
| when(config.synchronousMode()).thenReturn(false); // Default to async mode | ||
| when(config.synchronousMode()).thenReturn(false); // default async | ||
| when(config.eventTypes()).thenReturn(java.util.Optional.empty()); // handle all events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| when(config.eventTypes()).thenReturn(java.util.Optional.empty()); // handle all events | |
| when(config.eventTypes()).thenReturn(Optional.empty()); // handle all events |
| logEvent -> { | ||
| String message = logEvent.message(); | ||
| JsonNode root = objectMapper.readTree(message); | ||
| JsonNode event = root.path("event").isMissingNode() ? root : root.path("event"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shouldn't the @JsonUnwrapped make sure that the event information is not under an "event" object?
| } | ||
|
|
||
| private void verifyLogGroupAndStreamExist(CloudWatchLogsAsyncClient client) { | ||
| // Verify log group exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary change
| @Test | ||
| void shouldListenToAllEventTypesWhenConfigNotProvided() { | ||
| // given: config.eventTypes() is empty → listen to all events | ||
| when(config.eventTypes()).thenReturn(java.util.Optional.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| when(config.eventTypes()).thenReturn(java.util.Optional.empty()); | |
| when(config.eventTypes()).thenReturn(Optional.empty()); |
| new AwsCloudWatchEventListener(config, clock, objectMapper); | ||
|
|
||
| // This is any random PolarisEvent — if the listener listens to all types, | ||
| // shouldHandle(event) should return true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the thought behind this test, but technically this is not the right way to test a "random" event. Maybe you create a new "RandomEvent" Java record that extends "PolarisEvent" (making it clear that it definitely cannot be referred to outside the scope of this test class and show that shouldHandle still returns true.
| objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); | ||
| objectMapper.setPropertyNamingStrategy(new PropertyNamingStrategies.KebabCaseStrategy()); | ||
| objectMapper.addMixIn(PolarisEvent.class, PolarisEventBaseMixin.class); | ||
| objectMapper.addMixIn(TableIdentifier.class, IcebergMixins.TableIdentifierMixin.class); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, RESTSerializers have serializers for TableIdentifier and Namespace already.
| boolean synchronousMode(); | ||
|
|
||
| @WithName("event-types") | ||
| Optional<Set<Class<? extends PolarisEvent>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes this is possible. Here is an example:
polaris.event-listener.aws-cloudwatch.event-types=\
org.apache.polaris.service.events.IcebergRestCatalogEvents$BeforeCreateNamespaceEvent,\
org.apache.polaris.service.events.IcebergRestCatalogEvents$AfterCreateNamespaceEventSee default converters here:
https://smallrye.io/smallrye-config/Main/config/getting-started/#converters
| # polaris.event-listener.aws-cloudwatch.log-stream=polaris-cloudwatch-default-stream | ||
| # polaris.event-listener.aws-cloudwatch.region=us-east-1 | ||
| # polaris.event-listener.aws-cloudwatch.synchronous-mode=false | ||
| # polaris.event-listener.aws-cloudwatch.event-types= // the absence of this property would result in processing all Polaris event types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to add an example of how to actually specify events here:
| # polaris.event-listener.aws-cloudwatch.event-types= // the absence of this property would result in processing all Polaris event types. | |
| # The absence of the property below would result in processing all Polaris event types (default). | |
| # polaris.event-listener.aws-cloudwatch.event-types=\ | |
| # org.apache.polaris.service.events.IcebergRestCatalogEvents$BeforeCreateNamespaceEvent,\ | |
| # org.apache.polaris.service.events.IcebergRestCatalogEvents$AfterCreateNamespaceEvent |
I have some reservations regarding a common transformation layer for all data sinks. While the idea is sound in principle, I'm skeptical about reaching a consensus on the specific transformations to apply and information to redact. We'll likely agree on redacting sensitive data. However, other transformations could lead to endless debate. For instance, CloudWatch rejects payloads over 1MB, but a Kafka cluster might accept up to 100MB. Similarly, if a use case doesn't require a specific field, another use case might. And finally, transformations should be vendor-agnostic, of course. This leads me to believe that a "transformation layer," despite good intentions, will never be generic enough to serve all purposes. While we could introduce parameters to influence transformations, this would be a significant and time-consuming effort. Instead, I propose applying the least possible logic to events, ideally only security-related logic. Events should be passed as-is to the sinks, allowing each sink to perform further transformations and redactions on the event payload. I think we should absolutely avoid specific logic per event type, as this would imply having to special-case each one of the 150+ event types. JSON mixins should also be as generic as possible. If a sink cannot use them due to them not producing the desired JSON, it should use a different object mapper. Alternatively, a sink could use mixins and the default object mapper to produce not a String, but a JsonNode instance, apply further transformations to it, and then send it to the target remote system. Finally, I don't believe an intermediary Map representation offers much benefit. Anyways, I think the transformation layer topic deserves a discussion on the ML.
I understand and share the security concerns, but we can't delay this PR indefinitely. Manually addressing each event for transformations is incredibly time-consuming, as you've noted. Asking the original contributor to implement logic for sensitive or larger objects falls outside the PR's original scope and will take too long (we have 150+ events!). I propose merging this PR now and tackling the transformation layer as a follow-up. Let's not forget: the events feature is still in beta for a good reason. |
I completely agree with this actually - a common transformation layer for all data sinks is likely not possible. But there can (and should) be groupings of Event Listener implementations that follow similar transformations. i.e. AWS CloudWatch Logs, Google Cloud Logging, and Azure Monitor Logs (I've not begun work on the latter two but are future enhancements I am planning for) all have similar maximum record sizes, and we should safely remove the
I don't think having an intermediary Map representation is a requirement by any means, but it seemed like the most straight-forward way to grab only the relevant information from each event - even if it is quite verbose. If there is a way to make this much cleaner using MixIns solely, given the concerns that exist, I am all for it.
While I understand the thought here, I cannot advocate/approve for this PR if it does not resolve the security concerns, at a minimum. We should not be introducing security concerns into our code base (regardless of "beta" status of a feature) in the name of simply moving things forward. Polaris has not done so in the past, and we should continue to apply the same bar for the future. cc @snazy who has pointed out many security concerns in the past. If you're still in support of merging the PR through without resolving the security concerns, let's start a thread on the ML to ensure the community is aware of this security gap, supports this decision, and acknowledges the precedence that this may inadvertently set for the future.
That’s precisely why only one event was initially wired through in the original PRs. I’m not against adding more events — a few at a time — as long as they’re properly reviewed to ensure their payloads don’t introduce any issues. Even in that case, we should first outline a clear approach for handling events that may require additional filtering. Committing fully to a method without understanding how it will accommodate all scenarios wouldn’t be a prudent path forward. |
Let's assume we take 1 week of work per event. We have 150+ events. This effort would take almost 3 years to complete. I don't think this is sustainable. We must take a step back and question the current design.
The author (@vchag) will have to spend weeks, maybe more, chasing all the possible security issues. What will happen is that they will lose motivation, close the PR, and move on to something else. |
I don't think a vast majority of events will require more than a few minutes of verification. The ones that will take a longer time can be temporarily not supported while the community works on them. Sure, it's more than a day's worth of effort to get this to work, but still worth it IMO in comparison to inadvertently introducing known security concerns. Here's what I can suggest to continue moving things forward:
WDYT? |
These changes were created to address concerns raised in Issues #2630
What changes were proposed in this pull request?
Streamlining Event-to-JSON Serialization:
The previous architecture involved a redundant, two-step conversion process: events were transformed into intermediate Maps by an abstract class, and only then was the resulting map serialized into JSON (as seen in AwsCloudWatchEventListener).
Since Jackson JSON is our chosen serialization format, we've removed this unnecessary intermediate mapping step. We introduced Jackson Mixins to provide native JSON serialization support directly on the event objects. This refactoring simplifies the serialization pipeline from:
Event -> Map -> JSON
to
Event -> JSON
Introduces
AllEventsForwardingListener, a generic event listener designed to automatically receive and forward all Polaris events without requiring explicit event-type configuration.Introduces
polaris.event-listener.aws-cloudwatch.event-typesproperty: This property introduces a configurable event filter for the AWS CloudWatch event listener.It defines which Polaris event types should be serialized and forwarded to CloudWatch Logs.
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?
CHANGELOG.md