Skip to content

Conversation

@NehanPathan
Copy link
Contributor

@paulirwin @NightOwl888

Fixes:
Fixes #1181

Summary of the changes:
Implemented IAsyncReplicator and async methods in ReplicationClient for non-blocking replication operations.


Description

This PR introduces an async Task-based API for the replication client, allowing operations such as checking for updates, obtaining files, and releasing sessions to be executed asynchronously.

Key changes:

  • Added IAsyncReplicator interface.
  • Updated HttpReplicator to implement async versions of the operations (CheckForUpdateAsync, ObtainFileAsync, ReleaseAsync, PublishAsync).
  • Added async helper methods in HttpReplicator to wrap HTTP requests using HttpClient.
  • ReplicationClient now exposes async methods that call into IAsyncReplicator.

This avoids synchronous HTTP calls that could deadlock or cause performance issues, while keeping IReplicationHandler synchronous, as Lucene.NET APIs currently do not have async equivalents.

Additional context:
This implementation has been tested and works successfully when using UpdateNowAsync in the GSoC project by referencing the Lucene.NET repository in the GSoC extensions project.


@paulirwin paulirwin self-requested a review September 10, 2025 12:41
Copy link
Contributor

@paulirwin paulirwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I have some changes that need to be made, and a larger one about a refactoring that is up for discussion. Please share your thoughts on the refactoring before embarking on it!

/// <param name="handler"></param>
/// <param name="factory"></param>
/// <exception cref="ArgumentNullException"></exception>
public ReplicationClient(IAsyncReplicator asyncReplicator, IReplicationHandler handler, ISourceDirectoryFactory factory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this is a brittle design. It means that you have to know which constructor was used in order to know which methods to call and not get an exception at runtime. I think we should break this type apart.

Let me know your thoughts on this:

  1. Refactor out a common ReplicationClientBase abstract class with any common methods/fields
  2. Make this type inherit from the base class. Leave // LUCENENET: comments where methods were refactored into the base class, to aid future porting efforts. Move the async methods into item 3 below.
  3. Introduce a new AsyncReplicationClient that inherits from ReplicationClientBase, and only has the async methods. This way you can know that it will have an async replicator field.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it is unlikely a user will require both async and synchronous methods at the same time. And it would clarify the StartAsyncUpdateLoop vs StartUpdateThread() logic. Although, I think both of those could be potentially combined.

I am hesitant to get on board with having completely separate implementations, though. That is not typically how concrete components in the BCL evolve. Usually, the synchronous and asynchronous methods exist side by side.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree with @NightOwl888 suggestion
I guess we should stick with one ReplicationClient that implements both IReplicator and IAsyncReplicator

initially maintaining or testing it quite seems tough
but i guess our future target is asyn only
So may be
what we can do rightnow
Mark sync APIs with [Obsolete("Prefer async methods to avoid deadlocks.")] to gently guide users toward async.
Long term: async becomes the default; sync kept only for backward compatibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I’m also flexible — if you both feel strongly about splitting into separate sync/async clients right now to minimize maintenance and testing overhead, I can adapt to that approach too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark sync APIs with [Obsolete("Prefer async methods to avoid deadlocks.")] to gently guide users toward async.
Long term: async becomes the default; sync kept only for backward compatibility.

No, synchronous programming isn't going away any time soon. It is just in the space of HTTP servers, that it is no longer considered a good practice. Also, keep in mind that when debugging we will be comparing our implementation against the synchronous Java code. So, we should try to minimize changes to these methods.


ReplicationClient is not sealed. It is designed to be inherited to provide additional functionality. So, in that regard, it is the abstraction and can provide the base implementation of both the synchronous and asynchronous sides. The question is since ReplicationClient can already be mocked by inheritance, do we need to add interfaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying! You’re right — marking sync methods [Obsolete] was premature. Sync APIs are still important for backward compatibility and debugging against Java. Since ReplicationClient can already be extended and mocked via inheritance, we don’t need separate interfaces for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another problem with the current approach of having two constructors: the constructors are ambiguous if you try to pass in a HttpReplicator without casting it to either interface first:

screenshot-2025-11-21-at-20-19-15

Our options include:

  1. Accept that you have to cast this before passing it in. I think this is a poor design choice for other reasons mentioned above, but also having to impose this on users is not ideal. Note that it's also technically a breaking change as-is, since we'd require people to cast HttpReplicator.
  2. Remove the IAsyncReplicator interface and move its async methods into IReplicator (so that it would have both async and sync methods), making any custom implementations either implement the async methods or throw NotImplementedException. This would be a breaking change because you'd make custom implementations add these async methods.
  3. A slight alternative to option 2 but with the same effect: make IAsyncReplicator inherit from IReplicator, and have ReplicationClient take only an IAsyncReplicator (which would now provide both methods). This is a breaking change since users of ReplicationClient with custom IReplicators would have to make their custom replicator implement IAsyncReplicator as well.
  4. Remove synchronous replication support and lean in to async only. This would of course be a breaking change, but it would simplify the API.
  5. Split out an AsyncReplicationClient as discussed above. AsyncReplicationClient would take an IAsyncReplicator; ReplicationClient would take a Replicator. This is the only non-breaking way to do this.

@NightOwl888 thoughts? I could go for any of these except for option 1. I think if we're going to have any breaking changes we might as well do option 4 and just remove synchronous replication support to keep it simple, but I'm inclined to go for option 5 and just not have the breaking change.

/// <param name="handler"></param>
/// <param name="factory"></param>
/// <exception cref="ArgumentNullException"></exception>
public ReplicationClient(IAsyncReplicator asyncReplicator, IReplicationHandler handler, ISourceDirectoryFactory factory)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose it is unlikely a user will require both async and synchronous methods at the same time. And it would clarify the StartAsyncUpdateLoop vs StartUpdateThread() logic. Although, I think both of those could be potentially combined.

I am hesitant to get on board with having completely separate implementations, though. That is not typically how concrete components in the BCL evolve. Usually, the synchronous and asynchronous methods exist side by side.

@NehanPathan
Copy link
Contributor Author

@paulirwin @NightOwl888
I’ve tried to cover most of the suggested changes, and the nullable warnings have also been resolved.
Please let me know if I made any mistakes — I apologize in advance, as there were quite a few nullable warnings to address.

@NehanPathan NehanPathan force-pushed the feature/async-replicator-client branch from 25f65ff to 8135524 Compare October 6, 2025 15:19
@NehanPathan
Copy link
Contributor Author

NehanPathan commented Oct 6, 2025

@NightOwl888
2 weeks ago, I accidentally merged the master branch into feature/async-replicator-client.
I’ve performed a force push on this branch to remove the merge commit from apache:master.

Reason: The merge brought in changes from master that we do not want in this feature branch. All intended feature commits are still intact.

This keeps the branch history clean and focused on the feature work.

If any changes from master are important to include, please let me know, and we can merge them properly.

@NehanPathan
Copy link
Contributor Author

I’m seeing the check-editorconfig CI fail due to a “final newline expected” error. Locally, git diff --check doesn’t show any issues, so I suspect it might be related to line endings (CRLF vs LF) or some subtle trailing whitespace.

Could you please guide me on the safest way to fix this, such as how to identify files with extra trailing whitespace or missing newlines, so the CI passes without affecting other files?

@NightOwl888
Copy link
Contributor

NightOwl888 commented Oct 6, 2025

@NehanPathan

That error means that after all of the content in the file, there is no newline character. So, it just needs to be added.

I know it is a bit confusing - remove all trailing whitespace except at the end of the file, add a line break.

@paulirwin paulirwin force-pushed the feature/async-replicator-client branch from 2773b42 to c660b52 Compare November 21, 2025 23:09
@paulirwin
Copy link
Contributor

I have rebased this PR on latest master. Had to do #1221 to fix the build error due to an editorconfig-checker update. I'll look at resolving the remaining comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

notes:improvement An enhancement to an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Async version of IReplicator (IAsyncReplicator) and ReplicationClient

3 participants