Skip to content

fix(clickhouse): 改用HTTP协议连接以兼容新版ClickHouse服务端#3168

Open
zhuonixian wants to merge 2 commits into
hhyo:masterfrom
zhuonixian:fix/clickhouse-http-protocol
Open

fix(clickhouse): 改用HTTP协议连接以兼容新版ClickHouse服务端#3168
zhuonixian wants to merge 2 commits into
hhyo:masterfrom
zhuonixian:fix/clickhouse-http-protocol

Conversation

@zhuonixian
Copy link
Copy Markdown

Summary

  • ClickHouse引擎从native TCP协议(clickhouse-driver)改为HTTP协议(端口8123)连接,解决与ClickHouse 23.x及以上版本通信时"Code: 100. Unknown packet"错误
  • clickhouse-driver 0.2.10使用native TCP协议,仅支持ClickHouse ~v20.10(revision 54468),该driver已停止更新无法升级解决
  • HTTP接口在不同CK版本间保持兼容,是更稳定的连接方式

主要改动

  • 用urllib.request HTTP POST替代clickhouse_driver.connect
  • 参数占位符从%(xxx)s/:xxx改为CK HTTP API的{xxx:String}格式
  • 结果解析使用FORMAT JSONCompact获取结构化数据
  • HTTP端口通过系统配置ck_http_port设置(默认8123)
  • 多host配置取第一个节点连接
  • get_table_engine的split(".")改为split(".",1)修复含.表名

Test plan

  • 验证ClickHouse 23.x版本连接正常,查询、数据字典、执行等功能正常
  • 验证旧版ClickHouse(20.x) HTTP连接兼容性
  • 验证ck_http_port系统配置项生效
  • 验证多host场景取第一个节点连接

clickhouse-driver 0.2.10使用native TCP协议,仅支持ClickHouse
~v20.10(revision 54468),与ClickHouse 23.x及以上版本通信时
报"Code: 100. Unknown packet"错误。该driver已停止更新无法升
级解决。

将ClickHouse引擎从native TCP协议改为HTTP协议(端口8123)连接,
HTTP接口在不同CK版本间保持兼容。主要改动:
- 用urllib.request HTTP POST替代clickhouse_driver.connect
- 参数占位符从%(xxx)s/:xxx改为CK HTTP API的{xxx:String}格式
- 结果解析使用FORMAT JSONCompact获取结构化数据
- HTTP端口通过系统配置ck_http_port设置(默认8123)
- 多host配置取第一个节点连接
- get_table_engine的split(".")改为split(".",1)修复含.表名
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 19c95a3820

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sql/engines/clickhouse.py
host = self.host
if "," in host:
host = host.split(",")[0].strip()
return f"http://{host}:{self.http_port}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the tunnel-forwarded port in HTTP URLs

When a ClickHouse instance is configured with an SSH tunnel, EngineBase rewrites self.host and self.port to the local forwarded endpoint, but this URL discards that effective port and always connects to the global/default HTTP port. In that configuration requests go to forwarded_host:8123 instead of the tunnel's forwarded port, so all ClickHouse queries fail unless an unrelated service happens to listen there. Build the URL from the effective instance/tunnel port, or otherwise forward the configured HTTP port per instance.

Useful? React with 👍 / 👎.

Comment thread sql/engines/clickhouse.py
cursor.execute(sql, parameters)
if int(limit_num) > 0:
rows = cursor.fetchmany(size=int(limit_num))
raw = self._http_execute(sql, db_name=db_name, params=parameters)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect the configured query timeout

In the SQL query view, sql/query.py passes the configured max_execution_time into query(), but this call drops **kwargs and _http_execute() falls back to its hard-coded 30 second urllib timeout. Any ClickHouse query that is valid under the default 60 second Archery limit but takes more than 30 seconds will now fail client-side before the configured limit is reached; pass the timeout through or derive it from the existing max_execution_time argument.

Useful? React with 👍 / 👎.

Comment thread sql/engines/clickhouse.py

if want_json and is_query:
# Don't add FORMAT if SQL already has one
if re.search(r"\bFORMAT\b", sql, re.I):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Only skip JSONCompact for a real FORMAT clause

This regex treats any occurrence of the word FORMAT as an output format clause, including ordinary string literals or identifiers such as SELECT 'FORMAT'. In those cases the HTTP request is sent without FORMAT JSONCompact, ClickHouse returns its default tab-separated text, and query() reports Unexpected response format instead of returning the row; check for a trailing/output FORMAT clause rather than a word anywhere in the SQL.

Useful? React with 👍 / 👎.

Comment thread sql/engines/clickhouse.py
Comment on lines +132 to +134
except Exception as e:
logger.warning(f"ClickHouse获取版本失败: {e}")
self._server_version = (0, 0, 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Report version lookup failures instead of downgrading

If the version probe fails because the new HTTP endpoint is misconfigured, temporarily unavailable, or lacks permission, this caches (0, 0, 0) and the later query_check()/explain_check() paths silently skip ClickHouse EXPLAIN validation. In that scenario malformed SELECTs or otherwise unchecked DDL can be marked as audit-passed instead of surfacing the connection/configuration problem; propagate the failure or mark the check result as errored rather than pretending the server is an old version.

Useful? React with 👍 / 👎.

Comment thread sql/engines/clickhouse.py
Comment on lines 481 to +482
for statement in sqlparse.split(sql):
cursor.execute(statement, parameters)
cursor.close()
self._http_execute(statement.strip(), db_name=db_name, params=parameters, want_json=False)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve a ClickHouse session for split execution

When execute() receives multiple statements that rely on session-scoped state, such as CREATE TEMPORARY TABLE ...; INSERT INTO temp ... or SET ...; INSERT ..., this loop sends every split statement as an independent HTTP request without a shared session_id. The previous native cursor executed the split statements on one connection, so those workflows now fail because the second request cannot see the temporary table or setting from the first request.

Useful? React with 👍 / 👎.

- clickhouse.py 文件末尾添加换行符满足black格式要求
- 单元测试mock目标从query改为_http_execute以适配HTTP协议
- mock返回值从ResultSet改为JSONCompact格式dict
- 移除clickhouse_driver.connect的mock(不再使用该driver)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7ec5b3695

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sql/engines/clickhouse.py
Comment on lines +12 to +15
import json
import urllib.request
import urllib.parse
import urllib.error
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restore the connect symbol for existing tests

When the pytest suite collects sql/engines/test_clickhouse.py, its @patch("sql.engines.clickhouse.connect") decorators now fail because this commit removes connect from the module namespace. unittest.mock.patch requires the target attribute to exist unless create=True, so these tests raise AttributeError before exercising the new HTTP behavior; either update that test module to patch _http_execute/URL handling too, or keep a compatible symbol until the tests are migrated.

Useful? React with 👍 / 👎.

Comment thread sql/engines/clickhouse.py
cursor.execute(sql, parameters)
if int(limit_num) > 0:
rows = cursor.fetchmany(size=int(limit_num))
raw = self._http_execute(sql, db_name=db_name, params=parameters)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Apply query limits before issuing the HTTP request

When callers use the engine API directly with limit_num > 0 (for example ClickHouseEngine.query(sql="select * from large_table", limit_num=100)), this now sends the unrestricted SQL over HTTP and only slices result_set.rows after urlopen(...).read() has loaded the entire JSON response into memory. The previous cursor path used fetchmany(size=limit_num), so this regression can make limited queries download and materialize millions of rows before returning the first page; push the limit into the SQL or stream/stop reading once the requested rows are collected.

Useful? React with 👍 / 👎.

Comment thread sql/engines/clickhouse.py
if re.search(r"\bFORMAT\b", sql, re.I):
sql_to_send = sql.rstrip(";").rstrip()
else:
sql_to_send = sql.rstrip(";").rstrip() + " FORMAT JSONCompact"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Strip whitespace before removing the semicolon

When a raw query string ends with a semicolon followed by whitespace, e.g. SELECT 1;\n, rstrip(";") does not remove the semicolon because the final character is whitespace; after the next rstrip() this sends SELECT 1; FORMAT JSONCompact. ClickHouse HTTP treats that as an invalid/multi-statement request instead of returning JSON, so direct query() callers that include the common trailing newline now fail; strip whitespace before stripping the semicolon.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@LeoQuote LeoQuote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

不行, 我们不能在这里重造 clickhouse 客户端, 如果你想, 请发布一个新的, 以 http 协议为基础的 clickhouse 客户端, 并在此引用, 谢谢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants