Skip to content

add(tablespace):Top表空间:新增mongo和clickhouse数据库的支持#3169

Open
RankRao wants to merge 3 commits into
hhyo:masterfrom
RankRao:add-tablespace
Open

add(tablespace):Top表空间:新增mongo和clickhouse数据库的支持#3169
RankRao wants to merge 3 commits into
hhyo:masterfrom
RankRao:add-tablespace

Conversation

@RankRao
Copy link
Copy Markdown
Contributor

@RankRao RankRao commented May 11, 2026

1、新增mongo的表空间查询。
使用:db.follow.aggregate( [ { $collStats: { storageStats: { } } } ] )

colStats命令已过期,使用聚合方法$colStats
https://www.mongodb.com/zh-cn/docs/manual/reference/command/collStats/

https://www.mongodb.com/zh-cn/docs/manual/reference/operator/aggregation/collStats/#mongodb-pipeline-pipe.-collStats

排除了系统库("admin","config","local"),未排除正常库中的系统表(例如慢查询表system.profile)。

2、新增clickhouse的表空间查询。

使用系统表查询对应空间数据:system.parts。

有些低版本没有对应的primary_key_size(主键索引空间)字段,目前先不取。

排除了系统库('system', 'INFORMATION_SCHEMA', 'information_schema')。

@RankRao RankRao changed the title Add tablespace add(tablespace):Top表空间:新增mongo和clickhouse数据库的支持。 May 11, 2026
@RankRao RankRao changed the title add(tablespace):Top表空间:新增mongo和clickhouse数据库的支持。 add(tablespace):Top表空间:新增mongo和clickhouse数据库的支持 May 11, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

❌ Patch coverage is 99.22780% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.45%. Comparing base (fa83fc6) to head (391c503).

Files with missing lines Patch % Lines
sql/engines/test_mongo.py 98.74% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3169      +/-   ##
==========================================
+ Coverage   82.25%   82.45%   +0.19%     
==========================================
  Files         136      136              
  Lines       21759    22018     +259     
==========================================
+ Hits        17897    18154     +257     
- Misses       3862     3864       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c227e54823

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sql/templates/dbdiagnostic.html Outdated
$(function () {
// 会话管理-支持的数据库类型
supportedDbType=['mysql','mongo', 'oracle','redis','pgsql','doris']
supportedDbType=['mysql','mongo', 'oracle','redis','pgsql','doris','clickhouse']
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported ClickHouse from diagnostic db list

Adding clickhouse to supportedDbType makes ClickHouse instances selectable, but there is still no clickhouse entry in processListTableInfos (from common/static/dbdiagnostic/js/db_info.js). In that case command_types stays NaN, and the instance-change handler later evaluates command_types.length, which throws a JS TypeError and prevents the diagnostic page from functioning for ClickHouse instances.

Useful? React with 👍 / 👎.

Comment thread sql/engines/mongo.py
if db_name in self.forbidden_databases:
continue
db = conn[db_name]
count += len(db.list_collection_names())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count only collections that can return storage stats

tablespace() intentionally skips collections whose $collStats call fails, but tablespace_count() unconditionally counts every name from list_collection_names(). In MongoDB, this includes objects like views where $collStats with storageStats errors, so total can exceed the number of rows actually retrievable, causing incorrect pagination and empty pages.

Useful? React with 👍 / 👎.

Comment thread sql/templates/dbdiagnostic.html Outdated
$(function () {
// 会话管理-支持的数据库类型
supportedDbType=['mysql','mongo', 'oracle','redis','pgsql','doris']
supportedDbType=['mysql','mongo', 'oracle','redis','pgsql','doris','clickhouse']
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported ClickHouse from diagnostic db list

Adding clickhouse to supportedDbType makes ClickHouse instances selectable, but there is still no clickhouse entry in processListTableInfos (from common/static/dbdiagnostic/js/db_info.js). In that case command_types stays NaN, and the instance-change handler later evaluates command_types.length, which throws a JS TypeError and breaks the diagnostic page for ClickHouse instances.

Useful? React with 👍 / 👎.

Comment thread sql/engines/mongo.py
Comment on lines +1774 to +1775
for stats in stats_cursor:
storage = stats.get("storageStats", {})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Aggregate per-shard $collStats before emitting rows

On sharded MongoDB deployments, $collStats can return multiple documents per collection (one per shard). This loop appends every document directly to rows, so a single namespace appears multiple times and its sizes are not combined, which produces incorrect top-space ranking and inconsistent pagination for sharded collections.

Useful? React with 👍 / 👎.

Comment thread sql/engines/mongo.py
if db_name in self.forbidden_databases:
continue
db = conn[db_name]
count += len(db.list_collection_names())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count only collections that can return storage stats

tablespace() skips collections when $collStats fails, but tablespace_count() counts every name from list_collection_names(). Since that list includes objects like views (where $collStats with storageStats errors), the returned total can exceed the number of rows that can ever be shown, leading to incorrect pagination and empty pages.

Useful? React with 👍 / 👎.

Comment thread sql/templates/dbdiagnostic.html Outdated
$(function () {
// 会话管理-支持的数据库类型
supportedDbType=['mysql','mongo', 'oracle','redis','pgsql','doris']
supportedDbType=['mysql','mongo', 'oracle','redis','pgsql','doris','clickhouse']
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported ClickHouse from diagnostic db list

Adding clickhouse to supportedDbType makes ClickHouse instances selectable, but there is still no clickhouse entry in processListTableInfos (from common/static/dbdiagnostic/js/db_info.js). In that case command_types remains NaN, and the instance-change handler then accesses command_types.length, which throws a JS TypeError and breaks the diagnostic page for ClickHouse instances.

Useful? React with 👍 / 👎.

Comment thread sql/engines/clickhouse.py
WHERE active = 1
AND database NOT IN ('system', 'INFORMATION_SCHEMA', 'information_schema')
GROUP BY database, table, engine
ORDER BY table_rows DESC
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sort ClickHouse tablespace by size instead of row count

This query powers the “Top表空间” view, but it orders by table_rows rather than disk usage, so very large tables with fewer rows can be pushed below smaller-but-denser tables. That makes the ranking inaccurate for space diagnostics and inconsistent with the other engines’ size-based ordering.

Useful? React with 👍 / 👎.

Comment thread sql/engines/mongo.py
if db_name in self.forbidden_databases:
continue
db = conn[db_name]
collection_names = db.list_collection_names()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle listCollections permissions when enumerating Mongo data

The code calls list_collection_names() with default options, which requires listCollections privileges; users with only collection-level privileges can trigger OperationFailure here. Because this call is outside a per-database guard, one such database can fail the whole tablespace request instead of returning partial results.

Useful? React with 👍 / 👎.

Comment thread sql/engines/mongo.py
Comment on lines +1774 to +1775
for stats in stats_cursor:
storage = stats.get("storageStats", {})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Aggregate per-shard $collStats before emitting rows

On sharded MongoDB deployments, $collStats can return multiple documents for one collection (one per shard). Appending each document directly produces duplicate namespaces and unmerged sizes, so top-space ranking and pagination become incorrect for sharded collections.

Useful? React with 👍 / 👎.

Comment thread sql/engines/mongo.py
if db_name in self.forbidden_databases:
continue
db = conn[db_name]
count += len(db.list_collection_names())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Count only collections that can return storage stats

tablespace() skips collections when $collStats fails, but tablespace_count() counts every name from list_collection_names(). Since that list includes objects like views where $collStats with storageStats errors, total can exceed the number of rows that can actually be returned, causing incorrect pagination and empty pages.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 391c503a49

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sql/engines/clickhouse.py
formatReadableSize(sum(marks_bytes)) AS marks_bytes,
formatReadableSize(sum(data_uncompressed_bytes)) AS data_uncompressed,
formatReadableSize(sum(data_compressed_bytes)) AS data_compressed,
round((sum(data_compressed_bytes) / sum(data_uncompressed_bytes)) * 100, 2) AS compress_ratio
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard ClickHouse compression ratio divide-by-zero

When sum(data_uncompressed_bytes) is 0 (common for empty/new tables), this expression can yield NaN/Infinity for compress_ratio. The tablespace API serializes results with Python json.dumps, which emits these as bare NaN/Infinity tokens, and jQuery's JSON parser treats that as invalid JSON, causing the Top tablespace request to fail instead of rendering rows. Please return a finite value (e.g., 0 or NULL via if/nullIf) before serializing.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant