Skip to content

add(redis-slowlog):增加redis的慢查询日志采集和展示#3165

Open
RankRao wants to merge 9 commits into
hhyo:masterfrom
RankRao:add-redis-slowlog
Open

add(redis-slowlog):增加redis的慢查询日志采集和展示#3165
RankRao wants to merge 9 commits into
hhyo:masterfrom
RankRao:add-redis-slowlog

Conversation

@RankRao
Copy link
Copy Markdown
Contributor

@RankRao RankRao commented May 8, 2026

背景:
参考pt_query_digest的思路,采集redis日志信息到Archery的mysql数据库,然后在slowquery页面展示,展示样例参考现有的mysql样式。

现有的redis engine有一个已知bug未合并,#2748
由于使用到了redis cluster相关功能,先合并了相关代码。

新增功能:
1、新增redis的日志表:src\init_sql\redis_slow_query_review.sql
2、新增redis的采集程序:src\script\collect_redis_slow_log.py
3、新增redis的慢查询日志展示:和mysql慢查询日志差不多,只是去掉了8小时时区问题。

注:mysql不同实例相似命令基本没有,所以checksum基本唯一,但是redis命令虚拟化之后,相似命令过多,checksum在不同实例重复也多,所以和mysql查询慢查询不同,会增加where hostname in {hostnamelist}的约束。

使用方法:

1、设置redis的慢查询日志:
注意:对于redis cluster,每个节点都有单独的slowlog,所以在每个节点以下配置都要跑一次。参数量级按业务实际情况调整。

// 设置慢查询最大容量为5000秒
config set slowlog-max-len 5000
// 设置大于200微秒的redis命令记录到慢查询
config set slowlog-log-slower-than 200
// 查询慢查询总数
slowlog len
// 查询最近5000条慢查询
slowlog get 5000

2、Archery库新增redis的采集表:
mysql的Archery数据库,新增三个表:src\init_sql\redis_slow_query_review.sql

3、采集redis日志:
修改src\script\collect_redis_slow_log.py采集表的redis节点信息,REDIS_LIST。
将对应脚本放服务器,通过crontab定时执行。按业务实际情况,每10分钟或每小时执行。
注:有考虑去重ID,相关去重使用slowlog本身的ID自增属性,记录信息到redis_slowlog_cursor表。

4、页面查看:
【SQL优化】->【慢查日志】页面即可查询redis慢查询日志,
其中如果选择的redis实例是cluster集群模式,会展示集群节点下的所有主节点的慢查询日志。

架构和版本:
已测试:
1、redis单节点,版本4.0.9。
2、redis cluster集群,版本4.0.9和版本6.2.13。
未测试:
1、redis sentinel哨兵。

2、新增redis的采集程序:src\script\collect_redis_slow_log.py
3、新增redis的慢查询日志展示:和mysql慢查询日志差不多,只是去掉了8小时时区问题。
@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

❌ Patch coverage is 98.91697% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.31%. Comparing base (fa83fc6) to head (c827f1b).

Files with missing lines Patch % Lines
common/utils/chart_dao.py 22.22% 7 Missing ⚠️
sql/slowlog.py 97.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3165      +/-   ##
==========================================
+ Coverage   82.25%   83.31%   +1.06%     
==========================================
  Files         136      138       +2     
  Lines       21759    22577     +818     
==========================================
+ Hits        17897    18810     +913     
+ Misses       3862     3767      -95     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@RankRao
Copy link
Copy Markdown
Contributor Author

RankRao commented May 8, 2026

不足的地方:
Redis命令虚拟化这一块,

Redis 慢查询的命令格式为 [command, arg1, arg2, ...],参数化目标是将具体值替换为通配符,使同类操作(不同 key)归为同一指纹。示例:

SET user:1000 "hello" → set user:*

GET order:20220101 → get order:*

DEL key1 key2 key3 → del key*

HMSET user:1000 name "Tom" age 30 → hmset user:*

当前只处理,第一个参数(一般为redis命令)改为小写,第二个参数(一般为key)将连续数字替换为*,其他参数舍弃。

这样处理可以满足大部分的虚拟化命令,但是可能部分特别的命令未满足。

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c6363fcc49

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread src/script/collect_redis_slow_log.py
Comment thread src/script/collect_redis_slow_log.py Outdated
Comment thread sql/slowlog.py Outdated
Comment thread common/utils/chart_dao.py Outdated
Comment thread src/script/collect_redis_slow_log.py Outdated
Comment thread sql/engines/redis.py Outdated
Comment thread sql/engines/redis.py Outdated
Comment thread sql/slowlog.py Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9629c8ae83

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sql/engines/redis.py
Comment thread sql/engines/redis.py
Comment thread sql/engines/redis.py Outdated
Comment thread sql/engines/redis.py
Comment thread sql/engines/redis.py Outdated
Comment thread sql/templates/slowquery.html
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5af19e09e7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sql/slowlog.py
Comment thread src/script/collect_redis_slow_log.py
Comment thread sql/slowlog.py
Comment thread sql/engines/redis.py
Comment thread src/script/collect_redis_slow_log.py
Comment thread sql/engines/redis.py
Comment thread src/init_sql/redis_slow_query_review.sql
@RankRao
Copy link
Copy Markdown
Contributor Author

RankRao commented May 9, 2026

这个codex机器人有点搞哦,解决了四个bug提示,提交上去,又框框出现八个bug提示,这没尽头啊。

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9b90b4ec18

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +146 to +148
key = checksum
if key not in groups:
groups[key] = {"durations": [], "timestamps": [], "sample": sample}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Bucket Redis slowlogs by time window before persisting

This aggregation key uses only checksum, so all newly fetched entries for the same fingerprint are merged into one history row regardless of when they occurred. If a run catches logs spanning multiple days (for example after collector downtime), ts_min becomes the earliest timestamp and downstream trend queries grouped by date(ts_min) attribute all counts/durations to that first day, producing incorrect daily charts.

Useful? React with 👍 / 👎.

$.each(row, function (key, value) {
if (key === 'SQLText') {
if (is_redis) {
html.push('<pre>' + value + '</pre>');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Escape Redis command text in history detail rendering

The Redis history detail formatter injects SQLText directly into a <pre> block without escaping. If a slowlog entry contains HTML/script-like content (e.g. crafted key names/arguments), opening the row detail in the “慢日志明细” tab will execute attacker-controlled markup in the browser, creating a stored XSS path for users viewing slowlog data.

Useful? React with 👍 / 👎.

Comment on lines +207 to +218
db.commit()

# 5. 更新游标
cursor.execute(
"""
INSERT INTO redis_slowlog_cursor (hostname, last_processed_id)
VALUES (%s, %s)
ON DUPLICATE KEY UPDATE last_processed_id = VALUES(last_processed_id)
""",
(node_name, max_new_id),
)
db.commit()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make history writes and cursor advance a single transaction

This code commits redis_slow_query_review_history rows before advancing redis_slowlog_cursor, so a crash/error between these steps leaves partial aggregates persisted with an old cursor. The next run reprocesses the same slowlog IDs and can insert another overlapping history row (different ts_max), which inflates cnt/duration totals in slowlog list and trend queries.

Useful? React with 👍 / 👎.

Comment on lines +228 to +236
try:
collect_slowlog_for_node(
redis_host=node["host"],
redis_port=node["port"],
redis_username=node.get("username"),
redis_password=node.get("password"),
)
except Exception as e:
print(f"采集 {node['host']}:{node['port']} 失败: {e}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Ensure DB resources are closed on collection exceptions

If collect_slowlog_for_node raises after opening MySQL resources, collect_slowlog() catches and logs the exception but those connections/cursors are never closed because cleanup only happens on the success/early-return paths. Under recurring node errors this leaks MySQL sessions across runs and can eventually block ingestion with connection-limit failures.

Useful? React with 👍 / 👎.

$.each(row, function (key, value) {
if (key === 'SQLText') {
if (is_redis) {
html.push('<pre>' + value + '</pre>');
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Escape Redis SQLText in history detail formatter

The Redis branch of the history detail view injects SQLText directly into a <pre> element without escaping. A crafted command/key containing HTML can execute script when a user expands a row in the “慢日志明细” tab, creating a stored XSS path from slowlog data.

Useful? React with 👍 / 👎.

log["command"].decode("utf-8", errors="replace")
)
checksum = hashlib.md5(fingerprint.encode()).hexdigest()
timestamp = datetime.fromtimestamp(log["start_time"])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Convert Redis timestamps using a fixed timezone

The collector uses datetime.fromtimestamp() without a timezone, which converts Redis epoch seconds to the host machine’s local timezone. If the collector host timezone differs from the Archery/MySQL timezone, stored ts_min/ts_max/first_seen/last_seen are shifted, so Redis slowlog rows and charts are shown under incorrect times/dates.

Useful? React with 👍 / 👎.

Comment thread sql/slowlog.py
Comment on lines +71 to +72
redisslowqueryhistory__hostname__in=hostnames,
redisslowqueryhistory__ts_min__range=(start_time, end_time),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid strict hostname matching between UI and collector

Redis slowlog queries are filtered by exact hostname strings from live cluster discovery, but ingestion stores whatever host:port literal is configured in REDIS_LIST. If one side uses DNS names and the other uses IPs (or announced hostnames differ), all rows are filtered out and Redis slowlog pages/trends appear empty despite existing data.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant