Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions pip/pip-445.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# PIP-445: Add Builder Methods to Create Message-based TableView

# Background knowledge

* **TableView**: In Pulsar, a `TableView` is a client-side abstraction that provides a key-value map interface over a Pulsar topic. It consumes messages from the topic (typically a compacted one) and maintains an in-memory view of the latest value for each key. This allows applications to easily query the current state of a key without managing a consumer manually.

* **Pulsar `Message<T>`**: A Pulsar message is not just its data payload. The `Message<T>` object is a container that includes the deserialized **payload** (`T`) as well as important **metadata**, such as a message key, user-defined properties (a key-value map), event time, publish time, and more.

# Motivation

The current `TableView` API provides a `get(String key)` method that only returns the deserialized **value** (`T`) of the latest message for a given key. This limits its usefulness for applications that need access to the message's metadata.

For instance, a user might need to inspect the message **properties** to get a trace-id or check the **event time** to determine if the data is recent. Currently, the only way to access this metadata is to create a separate, redundant `Consumer` on the same topic, which is inefficient and undermines the convenience of using a `TableView`.

This proposal aims to solve this problem by providing a way to create a `TableView` that exposes the entire `Message<T>` object.

# Goals

## In Scope

* Add new generic methods, `createMapped()` and `createMappedAsync()`, to the `TableViewBuilder<T>` interface, which accept a mapping function.
* Allow users to create a `TableView<V>` instance by providing a function that transforms a `Message<T>` into a custom object `V`. This includes the ability to create a `TableView<Message<T>>` by passing an identity function.
* Ensure the change is fully backward-compatible and does not impact the performance of existing `TableView` users.

## Out of Scope

* Modifying the behavior of the existing `create()` and `createAsync()` methods in the builder.
* Changing the underlying topic compaction logic or any broker-side functionality.
* Handling exceptions thrown by the user-provided mapper function within the `TableView` (e.g., "poison pill" message handling).

# High Level Design

The proposed solution is a simple and non-breaking addition to the public client API. Instead of adding a specific method for retrieving messages, we will introduce a more flexible, generic mapping mechanism.

1. New generic methods, `<V> TableView<V> createMapped(...)` and `<V> CompletableFuture<TableView<V>> createMappedAsync(...)`, will be added to the `TableViewBuilder<T>` interface.
2. These methods will accept a `java.util.function.Function<Message<T>, V>` as a parameter. This `mapper` function defines how to transform an incoming raw `Message<T>` into a value `V` to be stored in the `TableView`.
3. This approach provides maximum flexibility. Users who need the entire `Message<T>` object can simply pass `Function.identity()` as the mapper. Other users can create custom, memory-efficient objects containing only the necessary data from the message payload and metadata.
4. The existing `create()` and `createAsync()` methods will remain unchanged, preserving behavior for all existing use cases.

# Detailed Design

## Design & Implementation Details

The changes will be confined to the Pulsar client library.

* **Refactoring of `org.apache.pulsar.client.impl.TableViewImpl<T>`**:
* The existing `TableViewImpl<T>` class is refactored into a new inheritance structure.
* The core logic is moved to a new abstract base class, `org.apache.pulsar.client.impl.AbstractTableView<T, V>`.
* To preserve the original functionality, a new simple class `org.apache.pulsar.client.impl.TableView<T>` is introduced, which extends `AbstractTableView<T, T>`.

* **New `MessageMapperTableView` implementation**:
* `org.apache.pulsar.client.impl.MessageMapperTableView<T, V>`: A new class that extends `AbstractTableView<T, V>`. It implements the logic for the new `createMapped` methods, using a supplied `Function<Message<T>, V>` to transform messages into the values stored in the `TableView`.

* **Class `org.apache.pulsar.client.impl.TableViewBuilderImpl<T>`**:
* The builder's implementation will be updated to use the new classes.
* `create()` and `createAsync()` will now instantiate the new `org.apache.pulsar.client.impl.TableView<T>` to provide the classic `TableView` behavior.
* The new `createMapped()` and `createMappedAsync()` methods will instantiate `MessageMapperTableView<T, V>` with the user-provided mapper function.

# Public-facing Changes

## Public API

New generic methods will be added to the `org.apache.pulsar.client.api.TableViewBuilder<T>` interface.

* **Method Signatures**:
```java
<V> TableView<V> createMapped(Function<Message<T>, V> mapper) throws PulsarClientException;

<V> CompletableFuture<TableView<V>> createMappedAsync(Function<Message<T>, V> mapper);
```
* **Description**: Creates a `TableView` instance where the values are the result of applying a user-defined `mapper` function to each message. This provides a flexible way to create a key-value view over a topic, allowing users to extract data from the message payload, properties, and other metadata into a custom object `V`. To get a view of the full `Message<T>` objects, `java.util.function.Function.identity()` can be used as the mapper.
* **Parameters**:
* `mapper`: A function that takes a `Message<T>` and returns a custom object of type `V`.
* **Return Value**: A `TableView<V>` instance.
* **Behavior Notes**:
* If the `mapper` function returns `null`, it is treated as a tombstone message, and the corresponding key will be removed from the `TableView`.
* Exceptions thrown by the `mapper` function are not handled by the `TableView` itself. This may cause the consumer to get stuck and attempt to process the "poison pill" message repeatedly. Handling such failures is considered out of scope for this proposal.

### Binary protocol

No changes.

### Configuration

No changes.

### CLI

No changes.

### Metrics

No changes.

# Monitoring

No new metrics are introduced by this change. Existing client-side metrics are unaffected.

# Security Considerations

This proposal has no security implications. The new method exposes message metadata that the client is already authorized to receive by consuming the topic. It does not alter any authentication or authorization mechanisms.

# Backward & Forward Compatibility

This change is fully backward-compatible.

* The addition of new methods to the builder interface is a non-breaking change. Existing code that uses `create()` or `createAsync()` will continue to function as before with no performance or behavioral changes.

## Upgrade

The upgrade process is seamless. Applications can update their client dependency to a version containing this feature and start using the new builder methods without any other changes.

## Downgrade / Rollback

A downgrade is also seamless. If an application that uses the new `createForMessages` methods is rolled back to an older client version, it will fail at compile time. Applications that do not use the new methods can be rolled back without any issues.

## Pulsar Geo-Replication Upgrade & Downgrade/Rollback Considerations

This is a client-side change and has no impact on geo-replication.

# Alternatives

## Add `getRawMessage(String key)` to `TableView`

An alternative considered was to add a `getRawMessage(String key)` method directly to the `TableView` interface. This would have required modifying the existing `TableViewImpl` to store the entire `Message<T>` object for all users.

This approach was rejected because it would be a **breaking change in terms of performance**. It would increase memory and CPU consumption for all `TableView` users, even those who do not need access to the raw message. The proposed builder-based approach is superior as it makes this an opt-in feature, preserving the performance characteristics of the existing `TableView`.

## Add specific `createForMessages()` methods

Another alternative was to add specific, non-generic methods like `createForMessages()` that would always return a `TableView<Message<T>>`.

This approach was rejected because it is less flexible than the mapper-based solution. The `createMapped` approach covers the `createForMessages` use case (via `Function.identity()`) while also empowering users to perform custom transformations, making it a more powerful and future-proof API.

# Links

* Mailing List discussion thread: TBD
* Mailing List voting thread: TBD