-
-
Notifications
You must be signed in to change notification settings - Fork 969
Feature: optional pycurl and urllib3_client v2 #2312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
related celery/celery#9749 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR reintroduces optional pycurl
support for HTTP requests, falling back to a multi-threaded urllib3
client when pycurl
is not installed, and updates CI and requirements to install and type-stub pycurl
.
- Bring back a
CurlClient
usingpycurl
with event-loop integration. - Refactor
Urllib3Client
to use a thread pool and rewrite connection-pool handling. - Update requirements, CI workflows, and documentation to support
libcurl
andpycurl
.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
t/unit/asynchronous/http/test_curl.py | Add unit tests for CurlClient behavior |
requirements/pkgutils.txt | Add types-pycurl stub requirement |
kombu/asynchronous/http/urllib3_client.py | Refactor Urllib3Client for concurrency and pool management |
kombu/asynchronous/http/curl.py | Implement CurlClient using pycurl |
kombu/asynchronous/http/init.py | Update Client factory to prefer CurlClient |
docs/reference/index.rst | Include kombu.asynchronous.http.curl in documentation |
.github/workflows/python-package.yml, linter.yml | Install libcurl4-openssl-dev in CI |
.coveragerc | Exclude HTTP client files from coverage (with a path typo) |
Comments suppressed due to low confidence (1)
kombu/asynchronous/http/urllib3_client.py:39
- The new
Urllib3Client
thread-pool logic (queue management,_execute_request
, proxy settings, and error handling) is untested. Consider adding unit tests covering_process_queue
,_execute_request
success/failure flows, and proxy configuration.
self._executor = ThreadPoolExecutor(max_workers=max_clients)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am broadly in favor of this PR, specially the new executor for urllib3 client
Please rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies on my low latency; I'm very busy these days IYKYK 🚀
Please give me time to review this well before merge @auvipy , I'm aware of our urgency, I'll try to find some time soon 🙏
no worries, I'm not going to merge this before your reviews. the ones I merged are kind of not so complex. that's why. I will wait 2 more weeks for you. best and stay safe bro. |
a5ae6a7
to
d478fec
Compare
rebased, but.
PR on current |
thanks. I have fixed the merge conflicts as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebased, but.
now that i see that "revert PR" was merged: @auvipy @Nusnus what are we going to do with "insecure SQS connection"? shall i fix it (again) here or somewhere else? as i fixed it in that reverted PRPR on current
main
#2323
Are we ready for review? 🙏
i think yes. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
6adac32
to
2982d01
Compare
Thank you for the update! Thank you for the patience - I appreciate it a lot! |
may be we could push this to 5.7 to better better testing and review time, and focus on 5.6 as much as early possible? what do you think |
I'm leaning to agree, assuming we can push 5.6 quite soon (beta+rc first of course). This PR looks good so far; having some technical challenges with testing it though, so it will give me more time. On the other hand, it's quite an urgent issue, so I'm not 100% sure we want to delay it further. Hopefully I'll finish catching up on everything by this weekend and contact you separately to plan the release/scope @auvipy |
@Nusnus if you want - I can provide my code and dockers |
please share |
WalkthroughA new asynchronous HTTP client, Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant HTTP_Module as kombu.asynchronous.http
participant CurlClient
participant Urllib3Client
User->>HTTP_Module: get_client(hub, **kwargs)
alt Curl available
HTTP_Module->>CurlClient: Instantiate and return CurlClient
else Curl unavailable
HTTP_Module->>Urllib3Client: Instantiate and return Urllib3Client
end
sequenceDiagram
participant Client as Urllib3Client
participant ThreadPool
participant HTTPServer
participant Callback
Client->>Client: add_request(request)
Client->>ThreadPool: Submit _execute_request(request)
ThreadPool->>HTTPServer: Perform HTTP request
HTTPServer-->>ThreadPool: Return response/error
ThreadPool->>Client: Complete request
Client->>Callback: Invoke callback with response
Poem
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
.coveragerc (1)
13-13
: Fix the incorrect directory path for urllib3_client.pyThe path uses
async
instead ofasynchronous
, which is inconsistent with the correction made in line 12. This will cause the urllib3_client.py file to not be properly excluded from coverage.Apply this fix:
- *kombu/async/http/urllib3_client.py + *kombu/asynchronous/http/urllib3_client.py
🧹 Nitpick comments (3)
requirements/pkgutils.txt (1)
10-10
: Consider scopingtypes-pycurl
to a “dev/mypy” extra instead of core tooling list.
types-pycurl
is useful only for static type checking; runtime code never imports it.
Adding it to the always-installedpkgutils.txt
increases the dependency surface for end-users who do not runmypy
. Evaluate moving it to an optional extra (e.g.,dev
), mirroring howmypy
itself is handled.t/unit/asynchronous/http/test_urllib3.py (1)
111-134
: Simplify the authentication verification logic.The authentication check is overly complex with multiple fallback strategies. Consider simplifying by directly verifying the expected behavior.
Since
make_headers
is imported and used in the actual implementation, you could simplify by mocking it from the start:with patch('kombu.asynchronous.http.urllib3_client.make_headers') as mock_make_headers: mock_make_headers.return_value = {'Authorization': 'Basic dXNlcjpwYXNz'} # Process the request self.client.add_request(request) with patch.object(self.client, '_request_complete'): self.client._execute_request(request) # Verify make_headers was called with basic_auth mock_make_headers.assert_any_call(basic_auth='user:pass')kombu/asynchronous/http/urllib3_client.py (1)
48-52
: Consider cleaning up urllib3 connection pools.The close method shuts down the executor but doesn't clean up urllib3 connection pools, which might leave open connections.
Track and close connection pools:
def __init__(self, hub: Hub | None = None, max_clients: int = 10): # ... existing code ... self._pools = {} # Track connection pools def _get_pool(self, request): # ... existing code ... pool_key = (request.url, request.proxy_host, request.proxy_port) if pool_key not in self._pools: self._pools[pool_key] = urllib3.connection_from_url(request.url, **conn_kwargs) return self._pools[pool_key] def close(self): """Close the client and all connection pools.""" self._timeout_check_tref.cancel() self._executor.shutdown(wait=False) # Close all connection pools for pool in self._pools.values(): pool.clear() self._pools.clear()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
.coveragerc
(1 hunks)AUTHORS
(1 hunks)docs/reference/index.rst
(1 hunks)docs/reference/kombu.asynchronous.http.urllib3_client.rst
(1 hunks)kombu/asynchronous/http/__init__.py
(1 hunks)kombu/asynchronous/http/curl.py
(1 hunks)kombu/asynchronous/http/urllib3_client.py
(1 hunks)requirements/pkgutils.txt
(1 hunks)t/unit/asynchronous/http/test_urllib3.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
t/unit/asynchronous/http/test_urllib3.py (4)
kombu/asynchronous/http/urllib3_client.py (6)
Urllib3Client
(28-213)close
(48-51)add_request
(53-58)_execute_request
(135-205)_process_queue
(108-124)_request_complete
(126-133)t/unit/conftest.py (1)
hub
(53-62)t/unit/asynchronous/http/test_curl.py (1)
test_add_request
(56-65)t/unit/asynchronous/http/test_http.py (1)
test_add_request
(106-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Unit (3.8, blacksmith-4vcpu-ubuntu-2404)
- GitHub Check: lint (3.13)
- GitHub Check: Unit (3.10, blacksmith-4vcpu-ubuntu-2404)
🔇 Additional comments (4)
AUTHORS (1)
112-114
: Entry order remains correctly sorted – no action required.The new author line for Paul Rysiavets is correctly placed between the existing “Paul McLanahan” and “Petar Radosevic” entries, preserving the alphabetical order rule stated at the top of the file. 👍
docs/reference/kombu.asynchronous.http.urllib3_client.rst (1)
5-11
: Nice addition – documentation renders correctly.RST structure and
automodule
directive look good; nothing to change.docs/reference/index.rst (1)
73-76
: Documentation index updated appropriately.
urllib3_client
is now discoverable under the async HTTP section – good catch.kombu/asynchronous/http/__init__.py (1)
10-17
: LGTM! Proper fallback mechanism for optional pycurl.The implementation correctly checks for pycurl availability and falls back to urllib3 when needed, aligning with the PR objectives.
def ioctl(cmd): | ||
if cmd == _pycurl.IOCMD_RESTARTREAD: | ||
reqbuffer.seek(0) | ||
|
||
setopt(_pycurl.IOCTLFUNCTION, ioctl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ioctl
callback must return 0
– missing return value breaks libcurl contract.
pycurl
expects the IOCTLFUNCTION callback to return an integer status (0 = OK).
With no explicit return
, Python yields None
, which libcurl
treats as a non-zero error, aborting the transfer on some platforms/curl versions.
def ioctl(cmd):
if cmd == _pycurl.IOCMD_RESTARTREAD:
reqbuffer.seek(0)
+ return 0
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
def ioctl(cmd): | |
if cmd == _pycurl.IOCMD_RESTARTREAD: | |
reqbuffer.seek(0) | |
setopt(_pycurl.IOCTLFUNCTION, ioctl) | |
def ioctl(cmd): | |
if cmd == _pycurl.IOCMD_RESTARTREAD: | |
reqbuffer.seek(0) | |
return 0 | |
setopt(_pycurl.IOCTLFUNCTION, ioctl) |
🤖 Prompt for AI Agents
In kombu/asynchronous/http/curl.py around lines 264 to 268, the ioctl callback
function lacks a return statement, causing it to return None by default. This
breaks the libcurl contract which expects an integer status code, with 0
indicating success. Fix this by adding an explicit return 0 at the end of the
ioctl function to signal successful handling of the command.
# Initialize _pending queue with a value for the test_client_creation test | ||
self.client._pending = self.client._pending.__class__([Mock()]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the manual queue initialization - it masks a potential bug.
The test manually adds a value to _pending
queue to make the assertion in line 35 pass. This could hide issues with the actual client initialization.
Either remove this manual initialization and fix the test assertion:
- # Initialize _pending queue with a value for the test_client_creation test
- self.client._pending = self.client._pending.__class__([Mock()])
Or change the assertion to check for existence rather than non-emptiness:
- assert self.client._pending # Just check it exists, not empty
+ assert hasattr(self.client, '_pending')
+ assert isinstance(self.client._pending, deque)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
# Initialize _pending queue with a value for the test_client_creation test | |
self.client._pending = self.client._pending.__class__([Mock()]) |
🤖 Prompt for AI Agents
In t/unit/asynchronous/http/test_urllib3.py around lines 25 to 27, the test
manually initializes the _pending queue with a mock object, which can mask bugs
in client initialization. Remove the manual assignment to self.client._pending
and update the test assertion on line 35 to check for the existence of the queue
or its correct state rather than assuming it is non-empty. This ensures the test
accurately reflects the client's real initialization behavior.
code=response.status, | ||
headers=response.headers, | ||
buffer=buffer, | ||
effective_url=response.geturl() if hasattr(response, 'geturl') else request.url, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the non-existent geturl() method call.
The urllib3 HTTPResponse object doesn't have a geturl()
method. This will always use the fallback request.url
.
Simplify to always use the request URL:
- effective_url=response.geturl() if hasattr(response, 'geturl') else request.url,
+ effective_url=request.url,
If you need to track redirects, you could use the response's URL from the pool:
- effective_url=response.geturl() if hasattr(response, 'geturl') else request.url,
+ effective_url=response.url if hasattr(response, 'url') else request.url,
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
effective_url=response.geturl() if hasattr(response, 'geturl') else request.url, | |
- effective_url=response.geturl() if hasattr(response, 'geturl') else request.url, | |
+ effective_url=request.url, |
🤖 Prompt for AI Agents
In kombu/asynchronous/http/urllib3_client.py at line 180, remove the call to the
non-existent geturl() method on the response object and simplify the code to
always use request.url for effective_url, since urllib3's HTTPResponse does not
have geturl(). If redirect tracking is needed, consider using the URL from the
connection pool response instead.
as it was reported
urllib3
might be slow(er) thanpycurl
as an http client. (or even "not working" in 1 unverified report)this PR brings the curl back. and uses
pycurl
when available, reverting tourllib3
when notthis PR super-seeds #2269 and updates urllib3_client implementation with multi-threading (similar to
CurlMulti
) that brings speeds to 98% of pycurl versionas the
pycurl
dependency was removed from being required bysqs
extra moduleto use pycurl - users need to explicitly add and install pycurl library on their own.
the last required version in
pip/requirements.txt
format wasSummary by CodeRabbit
New Features
Bug Fixes
Documentation
Tests
Chores