-
Notifications
You must be signed in to change notification settings - Fork 31
Support HTTP compression #700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@proddata: Thanks for your feedback, the requirements sound reasonable if we really need to have the need to turn off the feature on demand. Naming thingsI know this driver uses a standard protocol, while most other low-level database drivers are using a different protocol, mostly binary or otherwise proprietary to their needs, where compression might be handled differently, or is enabled from the start. In order to adhere to relevant (naming) conventions as good as possible, how to make the feature configurable like we are proposing it: Can we look into, most prominently to learn about interfaces and parameter names, how others are doing it? a) How other Python DBAPI libraries handle this situation how to make relevant parameters configurable (names, values, units), and ... of course, this only matches if compression is also a concern there. I think it might do, but I might also bee too naive. Rationale
In particular, I am not exclusively concerned about the DBAPI driver here, because it rarely has direct exposure other than using it from applications. However, SQLAlchemy is a differerent animal, because its connection string is exposed to wider audiences of people and machines, using it in downstream applications of many kinds, standalone or cloud-based, library-shaped or ephemerally-hosted, or not. You name it. In this spirit, we aim to standardize on naming conventions here, so I am asking to do the same, looking how others are naming their parameters, also for the compression feature. |
In the Elasticsearch Python client, HTTP compression is controlled by a simple on/off switch: However, it’s important to distinguish between the two aspects of compression support in the client: sending and receiving compressed data. Only the content itself is compressed, not the overall request structure. For example, a simple query like SELECT 1 results in approximately 250 bytes transmitted in the request body. Enabling compression in such cases provides little benefit in terms of data reduction but, in initial tests, introduced a slight increase in latency (a few milliseconds per request).
PostgreSQL does not support request compression in the same way. Neither the PostgreSQL wire protocol nor common client implementations (in Python or other languages) provide native compression mechanisms similar to Elasticsearch’s HTTP-based approach. |
Thanks, I've added the information about Elasticsearch to the table below. With the PostgreSQL wire protocol, compression can be enabled, if your OpenSSL library supports zlib, by toggling the connection parameter I have not been able to spot any threshold parameters other than with Oracle, up until now. Of course this enumeration is neither exhaustive, nor deep, and just tries to tap a little bit into the topic of proper "naming things", and "exploring the landscape".
|
TLS compression has been removed in TLSs v1.3 due to CRIME HTTP compression is somewhat vulnerable to BREACH, so it’s important to differentiate between request and response encoding. Ideally, a client should support both, but they should be managed with separate settings, as request compression generally doesn’t present the same risks. That is also partially why I only initially talked about request encoding. |
I see. Thank you very much. So, in order to be able to use other implementations and their parameterizations as blueprints on "naming things", eventually, we need to focus on databases that use traditional OpenSSL, but specifically HTTP as a communication protocol, because those details (e.g. request vs. response compression parameters) will only be present and of concern in such environments. In this case, selecting Elasticsearch is a perfect choice 1. However, relevant parameter sets seem pretty thin in this regard: Footnotes
|
OpenSearch Python ClientThe OpenSearch Python Low-Level Client supports HTTP compression for request bodies: http_compress = True # Enables gzip compression for request bodies Contrary to ES, they mention only request bodies (haven't checked the actual implementation) ClickHouse Python ClientThe ClickHouse Python Client (with limited SQLAlchemy support) also provides compression settings: ClickHouse Java ClientThe ClickHouse Java Client uses the HTTP interface and provides three compression-related settings:
ClickHouse JavaScript ClientThe ClickHouse JavaScript Client differentiates between request and response compression: |
Thank you. What do you think about those parameter names, for both DB API's
|
Since these settings primarily deal with content encoding, we might consider naming them accordingly and aligning with HTTP conventions, such as:
Semantically, this would also remove the need for a dedicated on/off parameter. WDYT? For the |
Hi. I would like to use a naming scheme that is very much independent from the used protocol / not necessarily tied to it, when possible, focusing on the semantic meaning around the Maybe let's get rid of |
Then maybe:
Or alternatively:
|
Slightly OT: Discovered per Dependabot update, it looks like Prometheus also introduced zstd compression support recently. |
About
CrateDB’s HTTP interface supports gzip and deflate compressed requests, but the crate-python client currently does not utilize this capability. Adding request compression would reduce bandwidth usage, improve performance for large queries and bulk inserts, and align crate-python with best practices seen in other database clients.
As a user, I want the option to send compressed requests to CrateDB to improve performance on congested networks.
Requirements:
Context: Sending a Content-Encoding header for every request adds unnecessary overhead, so compression should only be used when the request size exceeds a configurable threshold (e.g., 1 KB, 2 KB, or 4 KB, similar to other libraries).
Warning
This is primarily about request encoding / compression. HTTP response encoding is vulnerable to BREACH and therefore requires additional measurements.
@proddata said:
@surister said:
References
The text was updated successfully, but these errors were encountered: