|
| 1 | +# PIP-445: Add Builder Methods to Create Message-based TableView |
| 2 | + |
| 3 | +# Background knowledge |
| 4 | + |
| 5 | +* **TableView**: In Pulsar, a `TableView` is a client-side abstraction that provides a key-value map interface over a Pulsar topic. It consumes messages from the topic (typically a compacted one) and maintains an in-memory view of the latest value for each key. This allows applications to easily query the current state of a key without managing a consumer manually. |
| 6 | + |
| 7 | +* **Pulsar `Message<T>`**: A Pulsar message is not just its data payload. The `Message<T>` object is a container that includes the deserialized **payload** (`T`) as well as important **metadata**, such as a message key, user-defined properties (a key-value map), event time, publish time, and more. |
| 8 | + |
| 9 | +# Motivation |
| 10 | + |
| 11 | +The current `TableView` API provides a `get(String key)` method that only returns the deserialized **value** (`T`) of the latest message for a given key. This limits its usefulness for applications that need access to the message's metadata. |
| 12 | + |
| 13 | +For instance, a user might need to inspect the message **properties** to get a trace-id or check the **event time** to determine if the data is recent. Currently, the only way to access this metadata is to create a separate, redundant `Consumer` on the same topic, which is inefficient and undermines the convenience of using a `TableView`. |
| 14 | + |
| 15 | +This proposal aims to solve this problem by providing a way to create a `TableView` that exposes the entire `Message<T>` object. |
| 16 | + |
| 17 | +# Goals |
| 18 | + |
| 19 | +## In Scope |
| 20 | + |
| 21 | +* Add new methods, `createForMessages()` and `createForMessagesAsync()`, to the `TableViewBuilder<T>` interface. |
| 22 | +* Allow users to create a `TableView<Message<T>>` instance, which provides access to the complete `Message<T>` object for each key, including its payload, properties, and all other metadata. |
| 23 | +* Ensure the change is fully backward-compatible and does not impact the performance of existing `TableView` users. |
| 24 | + |
| 25 | +## Out of Scope |
| 26 | + |
| 27 | +* Modifying the behavior of the existing `create()` and `createAsync()` methods in the builder. |
| 28 | +* Changing the underlying topic compaction logic or any broker-side functionality. |
| 29 | + |
| 30 | +# High Level Design |
| 31 | + |
| 32 | +The proposed solution is a simple and non-breaking addition to the public client API. Instead of modifying the existing `TableView` implementation, we will introduce new methods to the `TableViewBuilder`. |
| 33 | + |
| 34 | +1. New methods, `TableView<Message<T>> createForMessages()` and `CompletableFuture<TableView<Message<T>>> createForMessagesAsync()`, will be added to the `TableViewBuilder<T>` interface. |
| 35 | +2. These methods will create a new, specialized `TableView` implementation (`MessageTableViewImpl`) that stores the entire `Message<T>` object for each key. |
| 36 | +3. The existing `create()` and `createAsync()` methods will continue to create the standard `TableView` that stores only the message value (`T`). |
| 37 | + |
| 38 | +This opt-in design provides the new functionality efficiently without impacting the performance or behavior of existing `TableView` use cases. |
| 39 | + |
| 40 | +# Detailed Design |
| 41 | + |
| 42 | +## Design & Implementation Details |
| 43 | + |
| 44 | +The changes will be confined to the Pulsar client library. |
| 45 | + |
| 46 | +* **Interface `org.apache.pulsar.client.api.TableViewBuilder<T>`**: |
| 47 | + New methods will be added to this interface to create a `TableView` for messages. |
| 48 | + |
| 49 | +* **Class `org.apache.pulsar.client.impl.TableViewBuilderImpl<T>`**: |
| 50 | + The new `createForMessages` methods will be implemented to instantiate a new `MessageTableViewImpl`. |
| 51 | + |
| 52 | +* **New Class `org.apache.pulsar.client.impl.MessageTableViewImpl<T>`**: |
| 53 | + A new class will be created that implements `TableView<Message<T>>`. It will be based on the existing `TableViewImpl` but its internal map will store `Message<T>` objects instead of just `T` values. Its `get(key)` method will return the full `Message<T>` object. |
| 54 | + |
| 55 | +* **Class `org.apache.pulsar.client.impl.TableViewImpl<T>`**: |
| 56 | + This class will remain unchanged, ensuring no impact on existing users. |
| 57 | + |
| 58 | +## Public-facing Changes |
| 59 | + |
| 60 | +### Public API |
| 61 | + |
| 62 | +New methods will be added to the `org.apache.pulsar.client.api.TableViewBuilder<T>` interface. |
| 63 | + |
| 64 | +* **Method Signatures**: |
| 65 | + ```java |
| 66 | + TableView<Message<T>> createForMessages() throws PulsarClientException; |
| 67 | + |
| 68 | + CompletableFuture<TableView<Message<T>>> createForMessagesAsync(); |
| 69 | + ``` |
| 70 | +* **Description**: Creates a `TableView` instance where the values in the map are the full `Message<T>` objects, including payload and metadata. This allows access to message properties, event time, etc. |
| 71 | +* **Return Value**: A `TableView<Message<T>>` instance. |
| 72 | + |
| 73 | +### Binary protocol |
| 74 | + |
| 75 | +No changes. |
| 76 | + |
| 77 | +### Configuration |
| 78 | + |
| 79 | +No changes. |
| 80 | + |
| 81 | +### CLI |
| 82 | + |
| 83 | +No changes. |
| 84 | + |
| 85 | +### Metrics |
| 86 | + |
| 87 | +No changes. |
| 88 | + |
| 89 | +# Monitoring |
| 90 | + |
| 91 | +No new metrics are introduced by this change. Existing client-side metrics are unaffected. |
| 92 | + |
| 93 | +# Security Considerations |
| 94 | + |
| 95 | +This proposal has no security implications. The new method exposes message metadata that the client is already authorized to receive by consuming the topic. It does not alter any authentication or authorization mechanisms. |
| 96 | + |
| 97 | +# Backward & Forward Compatibility |
| 98 | + |
| 99 | +This change is fully backward-compatible. |
| 100 | + |
| 101 | +* The addition of new methods to the builder interface is a non-breaking change. Existing code that uses `create()` or `createAsync()` will continue to function as before with no performance or behavioral changes. |
| 102 | + |
| 103 | +## Upgrade |
| 104 | + |
| 105 | +The upgrade process is seamless. Applications can update their client dependency to a version containing this feature and start using the new builder methods without any other changes. |
| 106 | + |
| 107 | +## Downgrade / Rollback |
| 108 | + |
| 109 | +A downgrade is also seamless. If an application that uses the new `createForMessages` methods is rolled back to an older client version, it will fail at compile time. Applications that do not use the new methods can be rolled back without any issues. |
| 110 | + |
| 111 | +## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations |
| 112 | + |
| 113 | +This is a client-side change and has no impact on geo-replication. |
| 114 | + |
| 115 | +# Alternatives |
| 116 | + |
| 117 | +## Add `getRawMessage(String key)` to `TableView` |
| 118 | + |
| 119 | +An alternative considered was to add a `getRawMessage(String key)` method directly to the `TableView` interface. This would have required modifying the existing `TableViewImpl` to store the entire `Message<T>` object for all users. |
| 120 | + |
| 121 | +This approach was rejected because it would be a **breaking change in terms of performance**. It would increase memory and CPU consumption for all `TableView` users, even those who do not need access to the raw message. The proposed builder-based approach is superior as it makes this an opt-in feature, preserving the performance characteristics of the existing `TableView`. |
| 122 | + |
| 123 | +# Links |
| 124 | + |
| 125 | +* Mailing List discussion thread: TBD |
| 126 | +* Mailing List voting thread: TBD |
0 commit comments