Skip to content

DNM #1381

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 30 commits into
base: master
Choose a base branch
from
Draft

DNM #1381

wants to merge 30 commits into from

Conversation

wk989898
Copy link
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

What is changed and how it works?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

Copy link

ti-chi-bot bot commented May 28, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels May 28, 2025
Copy link

ti-chi-bot bot commented May 28, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign flowbehappy for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label May 28, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @wk989898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team,

Gemini here, providing a summary for this pull request titled "DNM". Based on the title and the empty description, this appears to be a work-in-progress or draft PR, so the full intent and context might not be completely captured yet. However, by inspecting the code changes, I can see that this PR introduces significant refactoring and new components related to the redo log functionality within the downstream adapter.

The core idea seems to be the introduction of a dedicated RedoDispatcher and integrating the redo log handling as a distinct sink type. This involves changes across the dispatcher, dispatcher manager, event collector, and the redo log writing components themselves, along with updates to the communication protocols.

Highlights

  • New Redo Dispatcher: A new RedoDispatcher type is added, implementing the EventDispatcher interface. This dispatcher is specifically designed to handle events and write them to the redo log sink.
  • Redo Sink Integration: The redo log writing logic is encapsulated in a new redo.Sink type, which now acts as a standard sink implementation. This allows the system to treat the redo log as another destination for events.
  • Dispatcher Manager Enhancements: The EventDispatcherManager is updated to manage both the standard dispatchers and the new RedoDispatcher instances. It includes logic for creating corresponding redo dispatchers, managing their lifecycle, and allocating separate memory quotas for the main sink and the redo sink. It also introduces a mechanism to collect and report redo checkpoint timestamps.
  • Event Routing Logic: The EventCollector is modified to differentiate between events intended for standard dispatchers and those for redo dispatchers. It uses separate internal dynamic streams and adds a Redo flag to messages to ensure events are routed correctly.
  • Protocol Updates: New protobuf messages (RedoTsMessage) and fields (redo in DispatcherRequest) are added to facilitate communication related to redo log progress and dispatcher types between components.
  • Redo Writer Refactoring: The underlying redo log writer implementation, particularly the memory-based one, has been refactored. The encoding worker group is removed, and encoding seems to be handled directly within the file worker group. The file writer is split to handle DDL and DML logs separately, and the FlushLog method is removed from the writer interface, implying flushing is now managed internally by the writer.

Changelog

Click here to see the changelog
  • downstreamadapter/dispatcher/dispatcher.go
    • Added GetBlockStatusesChan, HandleDispatcherStatus, GetType methods to EventDispatcher interface.
    • Added redoGlobalTs and cacheEvents fields to Dispatcher struct.
    • Modified NewDispatcher to accept redoGlobalTs.
    • Modified HandleEvents to include a redo check and cache events if necessary.
    • Added HandleCacheEvents method.
    • Closed cacheEvents channel in Remove.
  • downstreamadapter/dispatcher/dispatcher_info.go
    • Implemented GetBlockStatusesChan and GetType for Dispatcher.
    • Added TypeDispatcherCommon constant.
  • downstreamadapter/dispatcher/dispatcher_test.go
    • Added math import.
    • Modified newDispatcherForTest to initialize and pass redoTs.
  • downstreamadapter/dispatcher/helper.go
    • Changed dispatcher type in ResendTask and related functions/handlers to EventDispatcher interface.
    • Updated ResendTask.Execute to use GetBlockStatusesChan().
    • Added TypeDispatcherRedo, IsRedoDispatcher, cacheEvents struct, and newCacheEvents function.
  • downstreamadapter/dispatcher/redo_dispatcher.go
    • Added new file redo_dispatcher.go defining RedoDispatcher implementing EventDispatcher.
    • Implemented methods for handling events, dispatcher status, and managing redo log writing via a redoSink.
  • downstreamadapter/dispatchermanager/event_dispatcher_manager.go
    • Added imports for redo and messaging.
    • Added fields for managing redo dispatchers (redoTableTriggerEventDispatcher, redoDispatcherMap, redoMap) and redo sink/meta (redoSink, redoGlobalTs, redoMeta).
    • Added fields for memory quotas (sinkQuota, redoQuota).
    • Modified NewEventDispatcherManager to initialize redo components and calculate quotas.
    • Modified close to close the redoSink.
    • Modified InitalizeTableTriggerEventDispatcher to register the redo table trigger dispatcher.
    • Modified newDispatchers to create and register RedoDispatcher instances alongside normal ones.
    • Modified removeDispatcher to also remove the corresponding RedoDispatcher.
    • Added collectRedoTs goroutine to report redo checkpoint TS.
    • Added SetGlobalRedoTs method.
  • downstreamadapter/dispatchermanager/heartbeat_collector.go
    • Added import for messaging.
    • Changed generic type for dispatcherStatusDynamicStream to EventDispatcher.
    • Added redoTsMessageDynamicStream and related registration/removal methods.
    • Modified RecvMessages to handle TypeRedoTsMessage.
  • downstreamadapter/dispatchermanager/helper.go
    • Made DispatcherMap generic (DispatcherMap[T dispatcher.EventDispatcher]).
    • Changed generic types in newHeartBeatResponseDynamicStream and newHeartBeatResponseHandler to EventDispatcher.
    • Added RedoTsMessage struct and RedoTsMessageHandler to process redo TS messages.
  • downstreamadapter/eventcollector/dispatcher_stat.go
    • Added import for messaging.
    • Added isRedo field to dispatcherStat.
    • Updated methods to use messaging.EventServiceTopic constant.
  • downstreamadapter/eventcollector/event_collector.go
    • Added Redo field to DispatcherRequest.
    • Added redoDs dynamic stream.
    • Modified New and Close to handle redoDs.
    • Modified AddDispatcher and RemoveDispatcher to use the correct dynamic stream based on dispatcher type.
    • Modified WakeDispatcher to accept isRedo.
    • Modified mustSendDispatcherRequest to set the Redo flag in the protobuf message.
    • Modified runProcessMessage to route events based on the Redo flag in the target message.
    • Added getDynamicStream helper function.
  • downstreamadapter/eventcollector/event_collector_test.go
    • Added methods to mockEventDispatcher to satisfy the updated EventDispatcher interface, including GetType and GetRedo.
  • downstreamadapter/eventcollector/helper.go
    • Modified EventsHandler.Handle to pass stat.isRedo to WakeDispatcher.
  • downstreamadapter/sink/redo/helper.go
    • Added new file helper.go with statefulRts struct and methods (likely refactored from old redo/manager.go).
  • downstreamadapter/sink/redo/meta.go
    • Moved and renamed from redo/meta_manager.go.
    • Renamed metaManager to RedoMeta and NewMetaManager to NewRedoMeta.
    • Updated method signatures and receivers.
  • downstreamadapter/sink/redo/sink.go
    • Added new file sink.go defining Sink implementing sink.Sink.
    • Implemented methods for managing redo log writing via a writer.RedoLogWriter.
  • downstreamadapter/sink/redo/sink_test.go
    • Moved and renamed from redo/manager_test.go.
    • Updated tests to use the new redo.Sink type and its methods.
  • eventpb/event.pb.go
    • Added Redo field to DispatcherRequest protobuf message and updated generated code.
  • eventpb/event.proto
    • Added bool redo = 16; to DispatcherRequest message definition.
  • heartbeatpb/heartbeat.pb.go
    • Added RedoTsMessage protobuf message and updated generated code.
  • heartbeatpb/heartbeat.proto
    • Added RedoTsMessage message definition.
  • maintainer/maintainer.go
    • Added redoTs field.
    • Modified NewMaintainer to initialize redoTs.
    • Added onRedoTsPersisted method to handle RedoTsMessage.
  • maintainer/maintainer_manager.go
    • Modified recvMessages to handle TypeRedoTsMessage.
  • pkg/common/types.go
    • Added RedoSinkType constant.
  • pkg/config/changefeed.go
    • Added Consistent field to ChangefeedConfig.
    • Modified ToChangefeedConfig to copy the Consistent config.
  • pkg/eventservice/dispatcher_stat.go
    • Added redo field to wrapEvent.
    • Updated methods to use messaging.EventServiceTopic constant.
  • pkg/eventservice/event_broker.go
    • Modified various send functions to pass the redo status.
    • Modified message processing and sending logic to handle the redo status and set the Redo flag in TargetMessage.
  • pkg/eventservice/event_service.go
    • Added GetRedo method to DispatcherInfo interface.
  • pkg/eventservice/event_service_test.go
    • Added redo field and GetRedo method to mockDispatcherInfo.
  • pkg/messaging/message.go
    • Added TypeRedoTsMessage constant.
    • Added Redo field to TargetMessage.
    • Updated String, decodeIOType, and NewSingleTargetMessage to handle redo messages and the Redo field.
  • redo/manager.go
    • Deleted file (functionality moved to new redo sink and meta components).
  • redo/writer/blackhole/blackhole_log_writer.go
    • Removed FlushLog method.
    • Modified WriteEvents to call event.PostFlush().
  • redo/writer/file/file_log_writer.go
    • Added imports.
    • Split logWriter into ddlWriter and dmlWriter.
    • Modified NewLogWriter to create separate writers for DDL and DML.
    • Modified WriteEvents to route events to the correct writer, flush, and call event.PostFlush().
    • Removed FlushLog.
    • Modified Close and isStopped to handle both writers.
  • redo/writer/memory/encoding_worker.go
    • Deleted file (encoding logic moved).
  • redo/writer/memory/file_worker.go
    • Added imports.
    • Removed polymorphicRedoEvent and related logic.
    • Modified fileWorkerGroup to directly receive writer.RedoEvent.
    • Updated Run and bgWriteLogs to process RedoEvent directly, including encoding, writing, flushing, and calling event.PostFlush().
    • Modified newFileCache and writeToCache to handle raw data and commit TS.
  • redo/writer/memory/mem_log_writer.go
    • Added import.
    • Replaced encodeWorkers and fileWorkers with ddlFileWorkers and dmlFileWorkers.
    • Modified NewLogWriter to create separate file worker groups for DDL and DML.
    • Modified WriteEvents to send events to the appropriate file worker group.
  • redo/writer/writer.go
    • Added PostFlush() and GetType() methods to RedoEvent interface.
    • Removed FlushLog() from RedoLogWriter interface.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A DNM stands, a work in flight,
Refactoring code, both day and night.
Redo logs flow, a separate stream,
A complex system, a software dream.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature related to redo logging, including a RedoDispatcher, a redoSink, and updates across various components to support this. The changes are extensive and touch upon core dispatcher and sink logic, event handling, and configuration.

Given the "DNM" (Do Not Merge) title, I assume this is a work-in-progress. The review focuses on correctness, potential issues, and areas for clarification. Several critical and high-severity issues have been identified, particularly concerning test adaptation, configuration management, and potential race conditions or deadlocks that need careful consideration before this can be merged.

Summary of Findings

  • Test Correctness: The test file downstreamadapter/sink/redo/sink_test.go (renamed from redo/manager_test.go) is critically outdated. It attempts to use methods and fields from the old redoManager implementation on the new redo.Sink type, which will cause test failures. This test suite needs a complete overhaul to align with the new Sink API and internal structure.
  • Configuration Management in Redo Sink: The LogType in writer.LogWriterConfig is hardcoded during redoSink.New initialization and subsequently mutated by file.NewLogWriter. This could lead to unexpected behavior if the config object is shared or its LogType is inspected elsewhere. A clearer separation or copying of configs for DDL and DML writers might be safer.
  • Dispatcher Caching Logic: The Dispatcher.HandleEvents caching mechanism based on redoGlobalTs needs careful review to ensure it doesn't lead to indefinite blocking under certain conditions related to the update frequency of redoGlobalTs and the processing speed of HandleCacheEvents.
  • Resource Closure Race Condition: In Dispatcher.Remove(), closing d.cacheEvents could panic if HandleEvents is concurrently blocked sending to this channel. Synchronization or a different signaling mechanism might be needed.
  • Flushing Strategy in File Writer: redo/writer/file/file_log_writer.go now flushes within every WriteEvents call. This change should be evaluated for performance implications, especially with frequent, small event batches.
  • Incomplete Redo Dispatcher Logic: The dealWithBlockEvent method in RedoDispatcher is commented out, and the TypeSyncPointEvent case in its HandleEvents is empty. This suggests some DDL/SyncPoint handling logic might be incomplete or intentionally simplified for the redo path.
  • TODO/FIXME Comments: Several TODO and FIXME comments exist in the code (e.g., regarding redoTableTriggerEventDispatcher bootstrap, metrics for redoDs, checkpoint calculation in RedoDispatcher, and LogType hardcoding in redoSink.New). These should be addressed.

Merge Readiness

This pull request introduces a substantial redo logging feature and is currently marked as "DNM" (Do Not Merge), which is appropriate given its work-in-progress nature. There are several critical and high-severity issues identified, particularly concerning the correctness of tests for the new redo sink, potential race conditions, and configuration management clarity. Additionally, some medium-severity concerns around dispatcher logic, flushing strategies, and incomplete functionality need to be addressed.

I recommend that these issues, especially the critical ones related to testing and potential deadlocks/races, be resolved before this PR is considered for merging. As a reviewer, I am unable to approve pull requests. The author should ensure further review and approval from other maintainers after addressing the feedback.

Copy link

ti-chi-bot bot commented May 28, 2025

@gemini-code-assist[bot]: adding LGTM is restricted to approvers and reviewers in OWNERS files.

In response to this:

Code Review

This pull request introduces a significant new feature related to redo logging, including a RedoDispatcher, a redoSink, and updates across various components to support this. The changes are extensive and touch upon core dispatcher and sink logic, event handling, and configuration.

Given the "DNM" (Do Not Merge) title, I assume this is a work-in-progress. The review focuses on correctness, potential issues, and areas for clarification. Several critical and high-severity issues have been identified, particularly concerning test adaptation, configuration management, and potential race conditions or deadlocks that need careful consideration before this can be merged.

Summary of Findings

  • Test Correctness: The test file downstreamadapter/sink/redo/sink_test.go (renamed from redo/manager_test.go) is critically outdated. It attempts to use methods and fields from the old redoManager implementation on the new redo.Sink type, which will cause test failures. This test suite needs a complete overhaul to align with the new Sink API and internal structure.
  • Configuration Management in Redo Sink: The LogType in writer.LogWriterConfig is hardcoded during redoSink.New initialization and subsequently mutated by file.NewLogWriter. This could lead to unexpected behavior if the config object is shared or its LogType is inspected elsewhere. A clearer separation or copying of configs for DDL and DML writers might be safer.
  • Dispatcher Caching Logic: The Dispatcher.HandleEvents caching mechanism based on redoGlobalTs needs careful review to ensure it doesn't lead to indefinite blocking under certain conditions related to the update frequency of redoGlobalTs and the processing speed of HandleCacheEvents.
  • Resource Closure Race Condition: In Dispatcher.Remove(), closing d.cacheEvents could panic if HandleEvents is concurrently blocked sending to this channel. Synchronization or a different signaling mechanism might be needed.
  • Flushing Strategy in File Writer: redo/writer/file/file_log_writer.go now flushes within every WriteEvents call. This change should be evaluated for performance implications, especially with frequent, small event batches.
  • Incomplete Redo Dispatcher Logic: The dealWithBlockEvent method in RedoDispatcher is commented out, and the TypeSyncPointEvent case in its HandleEvents is empty. This suggests some DDL/SyncPoint handling logic might be incomplete or intentionally simplified for the redo path.
  • TODO/FIXME Comments: Several TODO and FIXME comments exist in the code (e.g., regarding redoTableTriggerEventDispatcher bootstrap, metrics for redoDs, checkpoint calculation in RedoDispatcher, and LogType hardcoding in redoSink.New). These should be addressed.

Merge Readiness

This pull request introduces a substantial redo logging feature and is currently marked as "DNM" (Do Not Merge), which is appropriate given its work-in-progress nature. There are several critical and high-severity issues identified, particularly concerning the correctness of tests for the new redo sink, potential race conditions, and configuration management clarity. Additionally, some medium-severity concerns around dispatcher logic, flushing strategies, and incomplete functionality need to be addressed.

I recommend that these issues, especially the critical ones related to testing and potential deadlocks/races, be resolved before this PR is considered for merging. As a reviewer, I am unable to approve pull requests. The author should ensure further review and approval from other maintainers after addressing the feedback.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@wk989898
Copy link
Collaborator Author

/test all

@wk989898
Copy link
Collaborator Author

/test all

@wk989898
Copy link
Collaborator Author

/test all

Copy link

ti-chi-bot bot commented May 30, 2025

@wk989898: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test 22b2662 link true /test pull-unit-test
pull-cdc-pulsar-integration-light 22b2662 link false /test pull-cdc-pulsar-integration-light
pull-cdc-mysql-integration-heavy 22b2662 link true /test pull-cdc-mysql-integration-heavy

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link

ti-chi-bot bot commented Jun 3, 2025

[FORMAT CHECKER NOTIFICATION]

Notice: To remove the do-not-merge/needs-linked-issue label, please provide the linked issue number on one line in the PR body, for example: Issue Number: close #123 or Issue Number: ref #456.

📖 For more info, you can check the "Contribute Code" section in the development guide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant