add(redis-slowlog):增加redis的慢查询日志采集和展示#3165
Conversation
2、新增redis的采集程序:src\script\collect_redis_slow_log.py 3、新增redis的慢查询日志展示:和mysql慢查询日志差不多,只是去掉了8小时时区问题。
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3165 +/- ##
==========================================
+ Coverage 82.25% 83.31% +1.06%
==========================================
Files 136 138 +2
Lines 21759 22577 +818
==========================================
+ Hits 17897 18810 +913
+ Misses 3862 3767 -95 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
不足的地方: Redis 慢查询的命令格式为 [command, arg1, arg2, ...],参数化目标是将具体值替换为通配符,使同类操作(不同 key)归为同一指纹。示例: SET user:1000 "hello" → set user:* GET order:20220101 → get order:* DEL key1 key2 key3 → del key* HMSET user:1000 name "Tom" age 30 → hmset user:* 当前只处理,第一个参数(一般为redis命令)改为小写,第二个参数(一般为key)将连续数字替换为*,其他参数舍弃。 这样处理可以满足大部分的虚拟化命令,但是可能部分特别的命令未满足。 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c6363fcc49
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9629c8ae83
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5af19e09e7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
这个codex机器人有点搞哦,解决了四个bug提示,提交上去,又框框出现八个bug提示,这没尽头啊。 |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b90b4ec18
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| key = checksum | ||
| if key not in groups: | ||
| groups[key] = {"durations": [], "timestamps": [], "sample": sample} |
There was a problem hiding this comment.
Bucket Redis slowlogs by time window before persisting
This aggregation key uses only checksum, so all newly fetched entries for the same fingerprint are merged into one history row regardless of when they occurred. If a run catches logs spanning multiple days (for example after collector downtime), ts_min becomes the earliest timestamp and downstream trend queries grouped by date(ts_min) attribute all counts/durations to that first day, producing incorrect daily charts.
Useful? React with 👍 / 👎.
| $.each(row, function (key, value) { | ||
| if (key === 'SQLText') { | ||
| if (is_redis) { | ||
| html.push('<pre>' + value + '</pre>'); |
There was a problem hiding this comment.
Escape Redis command text in history detail rendering
The Redis history detail formatter injects SQLText directly into a <pre> block without escaping. If a slowlog entry contains HTML/script-like content (e.g. crafted key names/arguments), opening the row detail in the “慢日志明细” tab will execute attacker-controlled markup in the browser, creating a stored XSS path for users viewing slowlog data.
Useful? React with 👍 / 👎.
| db.commit() | ||
|
|
||
| # 5. 更新游标 | ||
| cursor.execute( | ||
| """ | ||
| INSERT INTO redis_slowlog_cursor (hostname, last_processed_id) | ||
| VALUES (%s, %s) | ||
| ON DUPLICATE KEY UPDATE last_processed_id = VALUES(last_processed_id) | ||
| """, | ||
| (node_name, max_new_id), | ||
| ) | ||
| db.commit() |
There was a problem hiding this comment.
Make history writes and cursor advance a single transaction
This code commits redis_slow_query_review_history rows before advancing redis_slowlog_cursor, so a crash/error between these steps leaves partial aggregates persisted with an old cursor. The next run reprocesses the same slowlog IDs and can insert another overlapping history row (different ts_max), which inflates cnt/duration totals in slowlog list and trend queries.
Useful? React with 👍 / 👎.
| try: | ||
| collect_slowlog_for_node( | ||
| redis_host=node["host"], | ||
| redis_port=node["port"], | ||
| redis_username=node.get("username"), | ||
| redis_password=node.get("password"), | ||
| ) | ||
| except Exception as e: | ||
| print(f"采集 {node['host']}:{node['port']} 失败: {e}") |
There was a problem hiding this comment.
Ensure DB resources are closed on collection exceptions
If collect_slowlog_for_node raises after opening MySQL resources, collect_slowlog() catches and logs the exception but those connections/cursors are never closed because cleanup only happens on the success/early-return paths. Under recurring node errors this leaks MySQL sessions across runs and can eventually block ingestion with connection-limit failures.
Useful? React with 👍 / 👎.
| $.each(row, function (key, value) { | ||
| if (key === 'SQLText') { | ||
| if (is_redis) { | ||
| html.push('<pre>' + value + '</pre>'); |
There was a problem hiding this comment.
Escape Redis SQLText in history detail formatter
The Redis branch of the history detail view injects SQLText directly into a <pre> element without escaping. A crafted command/key containing HTML can execute script when a user expands a row in the “慢日志明细” tab, creating a stored XSS path from slowlog data.
Useful? React with 👍 / 👎.
| log["command"].decode("utf-8", errors="replace") | ||
| ) | ||
| checksum = hashlib.md5(fingerprint.encode()).hexdigest() | ||
| timestamp = datetime.fromtimestamp(log["start_time"]) |
There was a problem hiding this comment.
Convert Redis timestamps using a fixed timezone
The collector uses datetime.fromtimestamp() without a timezone, which converts Redis epoch seconds to the host machine’s local timezone. If the collector host timezone differs from the Archery/MySQL timezone, stored ts_min/ts_max/first_seen/last_seen are shifted, so Redis slowlog rows and charts are shown under incorrect times/dates.
Useful? React with 👍 / 👎.
| redisslowqueryhistory__hostname__in=hostnames, | ||
| redisslowqueryhistory__ts_min__range=(start_time, end_time), |
There was a problem hiding this comment.
Avoid strict hostname matching between UI and collector
Redis slowlog queries are filtered by exact hostname strings from live cluster discovery, but ingestion stores whatever host:port literal is configured in REDIS_LIST. If one side uses DNS names and the other uses IPs (or announced hostnames differ), all rows are filtered out and Redis slowlog pages/trends appear empty despite existing data.
Useful? React with 👍 / 👎.
背景:
参考pt_query_digest的思路,采集redis日志信息到Archery的mysql数据库,然后在slowquery页面展示,展示样例参考现有的mysql样式。
现有的redis engine有一个已知bug未合并,#2748
由于使用到了redis cluster相关功能,先合并了相关代码。
新增功能:
1、新增redis的日志表:src\init_sql\redis_slow_query_review.sql
2、新增redis的采集程序:src\script\collect_redis_slow_log.py
3、新增redis的慢查询日志展示:和mysql慢查询日志差不多,只是去掉了8小时时区问题。
注:mysql不同实例相似命令基本没有,所以checksum基本唯一,但是redis命令虚拟化之后,相似命令过多,checksum在不同实例重复也多,所以和mysql查询慢查询不同,会增加where hostname in {hostnamelist}的约束。
使用方法:
1、设置redis的慢查询日志:
注意:对于redis cluster,每个节点都有单独的slowlog,所以在每个节点以下配置都要跑一次。参数量级按业务实际情况调整。
// 设置慢查询最大容量为5000秒
config set slowlog-max-len 5000
// 设置大于200微秒的redis命令记录到慢查询
config set slowlog-log-slower-than 200
// 查询慢查询总数
slowlog len
// 查询最近5000条慢查询
slowlog get 5000
2、Archery库新增redis的采集表:
mysql的Archery数据库,新增三个表:src\init_sql\redis_slow_query_review.sql
3、采集redis日志:
修改src\script\collect_redis_slow_log.py采集表的redis节点信息,REDIS_LIST。
将对应脚本放服务器,通过crontab定时执行。按业务实际情况,每10分钟或每小时执行。
注:有考虑去重ID,相关去重使用slowlog本身的ID自增属性,记录信息到redis_slowlog_cursor表。
4、页面查看:
【SQL优化】->【慢查日志】页面即可查询redis慢查询日志,
其中如果选择的redis实例是cluster集群模式,会展示集群节点下的所有主节点的慢查询日志。
架构和版本:
已测试:
1、redis单节点,版本4.0.9。
2、redis cluster集群,版本4.0.9和版本6.2.13。
未测试:
1、redis sentinel哨兵。