Skip to content

Conversation

@namest504
Copy link

@namest504 namest504 commented Oct 14, 2025

This PR introduces PIP-445, which proposes adding a getRawMessage() method to the TableView API.

This allows users to access the full message metadata (e.g., properties, event time), not just the payload.


Implementation PR: #24809

mailing list discussion thread: https://lists.apache.org/thread/12jxo0n3njjpm17w2zdo2qvx1bs3y2dk


  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@github-actions
Copy link

@namest504 Please add the following content to your PR description and select a checkbox:

- [ ] `doc` <!-- Your PR contains doc changes -->
- [ ] `doc-required` <!-- Your PR changes impact docs and you will update later -->
- [ ] `doc-not-needed` <!-- Your PR changes do not impact docs -->
- [ ] `doc-complete` <!-- Docs have been already added -->

@github-actions github-actions bot added doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. and removed doc-label-missing labels Oct 14, 2025
@namest504 namest504 changed the title [improve][pip] PIP-444: Expose Raw Message Access in TableView API [improve][pip] PIP-445: Expose Raw Message Access in TableView API Oct 14, 2025
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different approach should be taken. Please check the comment #24809 (review) .

The current getRawMessage solution would be a breaking change for existing TableView API users since it would unnecessarily consume more memory and CPU. That's why a different approach must be taken. I've suggested to resolve the issue by adding new methods to the API for returning a TableView with Message as the type parameter.

TableView<Message<T>> createForMessages() throws PulsarClientException;
CompletableFuture<TableView<Message<T>>> createForMessagesAsync() throws PulsarClientException;

Would that approach solve your use case?

@lhotari
Copy link
Member

lhotari commented Oct 14, 2025

@namest504 Please subscribe to the dev mailing list using the instructions available at https://pulsar.apache.org/contact/#mailing-lists unless you already have completed the steps. (I released your previous email from the moderation queue and emails from out-of-list addresses usually end up there.)

Another detail is that you are currently anonymous. It's encouraged to work with a real name in Apache projects such as Apache Pulsar since larger contributions are made under contributor agreements and if you'd ever like to become a committer, you'd have to have a real identity. We don't ask contributors to sign a CLA for minor contributions so it's not a blocker in this case.

@namest504 namest504 changed the title [improve][pip] PIP-445: Expose Raw Message Access in TableView API [improve][pip] PIP-445: Add Builder Methods to Create Message-based TableView Oct 14, 2025
@namest504
Copy link
Author

A different approach should be taken. Please check the comment #24809 (review) .

The current getRawMessage solution would be a breaking change for existing TableView API users since it would unnecessarily consume more memory and CPU. That's why a different approach must be taken. I've suggested to resolve the issue by adding new methods to the API for returning a TableView with Message as the type parameter.

TableView<Message<T>> createForMessages() throws PulsarClientException;
CompletableFuture<TableView<Message<T>>> createForMessagesAsync() throws PulsarClientException;

Would that approach solve your use case?

@lhotari
Thank you for your detailed feedback and guidance.

I've revised the PR title.

I've updated the PR to adopt the builder-based approach you suggested. This new commit implements the createForMessages() and createForMessagesAsync() methods to avoid the performance impact on existing TableView users.

Please let me know if this new direction looks good. Thanks again!

@namest504
Copy link
Author

Also, I've subscribed to the developer mailing list as you recommended.

@lhotari
Copy link
Member

lhotari commented Oct 15, 2025

When I was working on the experiment, lhotari@c5a4991, one idea that came into mind is that it would be possible to allow creating a view based on a Function that maps the message to some other object. This would allow using a custom object that takes whatever properties are needed from the message properties and there wouldn't be a need to keep a reference to the complete message instance.
If someone would like to use a message based TableView, passing java.util.function.Function.identity() as the function would be a way to use the Message as the value.

@lhotari
Copy link
Member

lhotari commented Oct 15, 2025

Here's the MessageMapperTableViewImpl experiment:

https://github.com/lhotari/pulsar/blob/lh-tableview-refactoring-to-support-message-values/pulsar-client/src/main/java/org/apache/pulsar/client/impl/MessageMapperTableViewImpl.java

for the builder, there would be these methods (instead of the createForMessages)

<V> TableView<V> createMapped(Function<Message<T>, V> mapper, boolean releasePooledMessage) throws PulsarClientException;
<V> CompletableFuture<TableView<V>> createMappedAsync(Function<Message<T>, V> mapper, boolean releasePooledMessage) throws PulsarClientException;

with this, you could create a TableView<Message<T>> by calling the createMapped/createMappedAsync method with parameters java.util.function.Function.identity(), false.

@namest504 I think that this mapper function based approach would provide more flexibility for different use cases where message properties would need to be taken into account. Do you see any problems with this approach?

@namest504
Copy link
Author

namest504 commented Oct 16, 2025

@lhotari

Thank you for the continued guidance. I want to confirm my understanding and share my thoughts before proceeding with the implementation.

My understanding is that the goal here is to provide maximum flexibility—evolving beyond a binary choice of payload vs raw message and instead empowering users to transform a Message into any custom object V that fits their specific needs. Is this correct?

I agree this is a very appropriate and powerful extension for this feature.

For example

public class OriginalData {
    String a;
    double b;
}

public class Foo {
    String a;
    double b;
    long eventMetaA;   // from message
    String eventMetaB; // from message
}

TableView<Foo> view = builder
    .createMapped(message -> {
        OriginalData original = message.getValue();
        return new Foo(
            original.a,
            original.b,
            message.getEventMetaA(),
            message.getEventMetaB()
        );
    });

I do have two main concerns regarding potential edge cases, and I would appreciate your guidance on them.

  1. If a mapper function returns null: What would be the most appropriate way for the TableView to handle this?
  2. If the mapper throws an exception: What is the desired behavior in this scenario?

Are these concerns that I should address in the implementation, or are they handled by other parts of the system and not something I need to consider?


If my understanding of the direction is correct, I'd like to update the PIP document and the implementation PR accordingly. Please let me know if this is the right way to proceed.

@lhotari
Copy link
Member

lhotari commented Oct 17, 2025

I do have two main concerns regarding potential edge cases, and I would appreciate your guidance on them.

  1. If a mapper function returns null: What would be the most appropriate way for the TableView to handle this?

In that case, the key is removed from the map since null is considered to be a tombstone value.

  1. If the mapper throws an exception: What is the desired behavior in this scenario?

Good question. I don't think that this PIP impacts this. If the mapper throws an exception, the same message will get retried, so the table view will get stuck. The table view doesn't have any way to handle "poison pill" messages which fail every time.
I think addressing this is out of context for this PIP.

Are these concerns that I should address in the implementation, or are they handled by other parts of the system and not something I need to consider?

If my understanding of the direction is correct, I'd like to update the PIP document and the implementation PR accordingly. Please let me know if this is the right way to proceed.

Makes sense, please go ahead.

@namest504
Copy link
Author

@lhotari

I've updated the PIP based on your latest feedback. The document now reflects the createMapped function approach, replacing the previous createForMessages methods.

The corresponding implementation in #24809 has been updated as well. Ready for your review when you are. Thanks!

@namest504 namest504 closed this Oct 20, 2025
@namest504 namest504 reopened this Oct 20, 2025
@namest504 namest504 requested a review from lhotari October 20, 2025 07:08
Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please go ahead with the voting for PIP-445

@lhotari lhotari dismissed their stale review October 29, 2025 07:05

Review comments were addressed. Approval will be made after a successful vote on the dev mailing list.

@lhotari
Copy link
Member

lhotari commented Oct 29, 2025

This PR introduces PIP-445, which proposes adding a getRawMessage() method to the TableView API.

This allows users to access the full message metadata (e.g., properties, event time), not just the payload.

Please update the summary in the description of the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc Your PR contains doc changes, no matter whether the changes are in markdown or code files. PIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants