Skip to content

Conversation

@lidezhu
Copy link
Collaborator

@lidezhu lidezhu commented Oct 17, 2025

What problem does this PR solve?

Issue Number: close #2694

What is changed and how it works?

  • Refactored Changefeed Metrics Reporting: The responsibility for collecting and reporting changefeed resolved timestamps has been moved from the eventStore to the eventService's metricsCollector.
  • Simplified Event Store: The eventStore no longer maintains changefeedStat structures or directly collects/reports changefeed-level metrics, simplifying its internal logic.
  • Enhanced Event Service Metrics Collection: The eventService now aggregates minimum resolved timestamps per changefeed and reports them to the log coordinator more frequently (every 1 second).

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

Please refer to [Release Notes Language Style Guide](https://pingcap.github.io/tidb-dev-guide/contribute-to-tidb/release-notes-style-guide.html) to write a quality release note.

If you don't think this PR needs a release note then fill it with `None`.

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 17, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @lidezhu, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the log service's ability to track and manage schema store states, particularly concerning resolved timestamps. By introducing per-changefeed tracking within the schema store and enabling it to report its progress to the log coordinator, the system achieves improved accuracy and granularity in monitoring changefeed replication. The coordinator's logic is updated to seamlessly incorporate these new schema store metrics, ensuring a more robust and precise calculation of the overall resolved timestamp for each changefeed.

Highlights

  • Schema Store Tracking per Changefeed: The schema store now maintains a record of registered tables on a per-changefeed basis, enabling more granular management and association of schema information with specific data replication tasks.
  • Schema Store Resolved Timestamp Reporting: Schema stores periodically report their resolved timestamps for each changefeed to the log coordinator, providing crucial progress updates for data consistency.
  • Log Coordinator Integration: The log coordinator has been enhanced to receive and integrate these resolved timestamps from schema stores, combining them with other node states to accurately calculate the overall minimum resolved timestamp for each changefeed.
  • API Updates for Changefeed ID: The RegisterTable and UnregisterTable methods within the SchemaStore interface and its implementations now require a changefeedID argument, formalizing the per-changefeed tracking.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism for schema stores to report their state to the log coordinator, which allows for more accurate changefeed metrics. The changes include updating Protobuf definitions, modifying the log coordinator to handle these new states, and implementing the reporting logic in the schema store. The overall approach is sound. I've identified a minor performance issue related to slice allocation in the coordinator and a more significant correctness bug in how schema store metrics are aggregated when a changefeed spans multiple keyspaces. Addressing these points will improve the robustness and performance of the new functionality.

@lidezhu lidezhu force-pushed the ldz/fix-resolved-ts-metric1017 branch from 65a0bd5 to f5a7bb7 Compare October 17, 2025 11:46
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 17, 2025
@lidezhu lidezhu changed the title [WIP] metrics: fix changefeed resolved ts Oct 17, 2025
@ti-chi-bot ti-chi-bot bot removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/needs-linked-issue labels Oct 17, 2025
@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 17, 2025

/test all

@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 17, 2025

/gemini summary

@gemini-code-assist
Copy link

Summary of Changes

This pull request refactors the mechanism for tracking and reporting changefeed resolved timestamps within the system. It shifts the responsibility from the eventStore to the eventService's metricsCollector, centralizing this logic and simplifying the eventStore. The eventService now actively collects and reports aggregated changefeed resolved timestamps to the log coordinator, which has been updated to communicate with the eventService for these metrics. This change aims to improve the accuracy and efficiency of changefeed resolved timestamp reporting.

Highlights

  • Refactored Changefeed Metrics Reporting: The responsibility for collecting and reporting changefeed resolved timestamps has been moved from the eventStore to the eventService's metricsCollector.
  • Simplified Event Store: The eventStore no longer maintains changefeedStat structures or directly collects/reports changefeed-level metrics, simplifying its internal logic.
  • Enhanced Event Service Metrics Collection: The eventService now aggregates minimum resolved timestamps per changefeed and reports them to the log coordinator more frequently (every 1 second).
  • Coordinator Communication Update: The log coordinator now broadcasts requests to the eventServiceTopic to discover and receive changefeed resolved timestamp updates.
Changelog
  • logservice/coordinator/coordinator.go
    • Added eventServiceTopic constant.
    • Increased the pre-allocated capacity for broadcast messages to accommodate messages for the new eventServiceTopic.
  • logservice/eventstore/event_store.go
    • Removed changefeedStat struct and its associated changefeedMeta field, eliminating direct changefeed-level statistics tracking within the event store.
    • Removed the removeDispatcherFromChangefeedStat function and related cleanup logic.
    • Eliminated the collectAndReportChangefeedMetrics function and its dedicated ticker, delegating this responsibility elsewhere.
  • logservice/eventstore/event_store_test.go
    • Removed unit tests specifically designed for changefeedStat management, reflecting the removal of this functionality.
  • pkg/eventservice/dispatcher_stat.go
    • Removed the dispatcherStatMap field from the changefeedStatus struct, streamlining its data structure.
  • pkg/eventservice/event_broker.go
    • Introduced coordinatorInfo (an atomic.Value) to store the ID of the log coordinator.
    • Added setCoordinatorInfo and getCoordinatorInfo methods for managing the coordinator's ID.
  • pkg/eventservice/event_service.go
    • Implemented a handler for TypeLogCoordinatorBroadcastRequest messages, allowing the event service to receive and update the coordinator's ID in its brokers.
  • pkg/eventservice/metrics_collector.go
    • Adjusted the metrics collection interval from 5 seconds to 1 second for more frequent updates.
    • Added changefeedReceivedResolvedTs to metricsSnapshot to store the minimum resolved timestamp for each changefeed.
    • Modified collectDispatcherMetrics to calculate and populate changefeedReceivedResolvedTs.
    • Introduced reportChangefeedStates to send these aggregated changefeed resolved timestamps to the log coordinator.
Activity
  • ti-chi-bot[bot] indicated the PR is not yet approved and requires 3aceshowhand's approval.
  • gemini-code-assist[bot] provided an initial summary and highlights.
  • lidezhu requested /test all and /gemini summary.
  • gemini-code-assist[bot] provided a high-priority review comment suggesting aggregation of resolved timestamps per changefeed to avoid incorrect metrics.
  • gemini-code-assist[bot] provided a medium-priority review comment suggesting correcting the slice capacity allocation in logservice/coordinator/coordinator.go to 3*len(c.nodes.m) to prevent reallocations.

@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 17, 2025

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the collection and reporting of changefeed resolved timestamps, moving the responsibility from eventStore to eventService. This is a sensible architectural change that simplifies the eventStore component. The implementation correctly relocates the logic and adjusts the necessary communication paths between the logCoordinator and the eventService.

I have identified a critical issue in logservice/schemastore/persist_storage.go where changing a panic to a silent return during initialization could lead to subsequent panics due to a partially initialized object. Additionally, I've offered a couple of medium-severity suggestions for the new logic in pkg/eventservice/metrics_collector.go to enhance code clarity and performance.

@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 18, 2025

/test all

@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 20, 2025

/test pull-cdc-mysql-integration-light

@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 21, 2025

/retest

@hongyunyan
Copy link
Collaborator

/test pull-cdc-mysql-integration-light

@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 22, 2025

/test all

@hongyunyan
Copy link
Collaborator

/test all

@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 22, 2025

/test all

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Oct 22, 2025
@ti-chi-bot ti-chi-bot bot added the lgtm label Oct 22, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 22, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: asddongmen, hongyunyan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [asddongmen,hongyunyan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Oct 22, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 22, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-10-22 05:06:31.172128037 +0000 UTC m=+848297.249380596: ☑️ agreed by asddongmen.
  • 2025-10-22 06:40:16.298026562 +0000 UTC m=+853922.375279122: ☑️ agreed by hongyunyan.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Oct 22, 2025

@lidezhu: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cdc-mysql-integration-light-next-gen 2bb84ca link unknown /test pull-cdc-mysql-integration-light-next-gen
pull-cdc-storage-integration-light-next-gen 2bb84ca link unknown /test pull-cdc-storage-integration-light-next-gen
pull-cdc-kafka-integration-light-next-gen 2bb84ca link unknown /test pull-cdc-kafka-integration-light-next-gen
pull-cdc-mysql-integration-heavy-next-gen 2bb84ca link unknown /test pull-cdc-mysql-integration-heavy-next-gen
pull-cdc-storage-integration-heavy-next-gen 2bb84ca link unknown /test pull-cdc-storage-integration-heavy-next-gen
pull-cdc-pulsar-integration-light-next-gen 2bb84ca link unknown /test pull-cdc-pulsar-integration-light-next-gen
pull-cdc-kafka-integration-heavy-next-gen 2bb84ca link unknown /test pull-cdc-kafka-integration-heavy-next-gen

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@lidezhu
Copy link
Collaborator Author

lidezhu commented Oct 22, 2025

/retest

@ti-chi-bot ti-chi-bot bot merged commit 6c770fd into master Oct 23, 2025
16 of 23 checks passed
@ti-chi-bot ti-chi-bot bot deleted the ldz/fix-resolved-ts-metric1017 branch October 23, 2025 02:14
@tenfyzhong tenfyzhong added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Oct 28, 2025
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #2855.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

changefeed resolved ts have no data when there are no tables in the changefeed

5 participants