Skip to content

Conversation

spawn-guy
Copy link
Contributor

@spawn-guy spawn-guy commented Jun 5, 2025

as it was reported urllib3 might be slow(er) than pycurl as an http client. (or even "not working" in 1 unverified report)

this PR brings the curl back. and uses pycurl when available, reverting to urllib3 when not

this PR super-seeds #2269 and updates urllib3_client implementation with multi-threading (similar to CurlMulti) that brings speeds to 98% of pycurl version

as the pycurl dependency was removed from being required by sqs extra module
to use pycurl - users need to explicitly add and install pycurl library on their own.

the last required version in pip/requirements.txt format was

pycurl>=7.43.0.5; sys_platform != 'win32' and platform_python_implementation=="CPython"

Summary by CodeRabbit

  • New Features

    • Introduced an asynchronous HTTP client based on urllib3, supporting connection pooling, SSL, proxies, authentication, and concurrency control.
    • Added new documentation and API reference for the urllib3-based asynchronous HTTP client.
  • Bug Fixes

    • Corrected and expanded the list of files excluded from coverage reports.
  • Documentation

    • Updated and extended documentation to include the new urllib3-based HTTP client and its API reference.
  • Tests

    • Added comprehensive unit tests for the new urllib3-based asynchronous HTTP client.
  • Chores

    • Added a new author to the authors list.
    • Updated package requirements to include a new dependency for type checking.

@auvipy
Copy link
Member

auvipy commented Jun 9, 2025

related celery/celery#9749

@auvipy auvipy requested review from auvipy and Copilot June 10, 2025 16:13
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR reintroduces optional pycurl support for HTTP requests, falling back to a multi-threaded urllib3 client when pycurl is not installed, and updates CI and requirements to install and type-stub pycurl.

  • Bring back a CurlClient using pycurl with event-loop integration.
  • Refactor Urllib3Client to use a thread pool and rewrite connection-pool handling.
  • Update requirements, CI workflows, and documentation to support libcurl and pycurl.

Reviewed Changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
t/unit/asynchronous/http/test_curl.py Add unit tests for CurlClient behavior
requirements/pkgutils.txt Add types-pycurl stub requirement
kombu/asynchronous/http/urllib3_client.py Refactor Urllib3Client for concurrency and pool management
kombu/asynchronous/http/curl.py Implement CurlClient using pycurl
kombu/asynchronous/http/init.py Update Client factory to prefer CurlClient
docs/reference/index.rst Include kombu.asynchronous.http.curl in documentation
.github/workflows/python-package.yml, linter.yml Install libcurl4-openssl-dev in CI
.coveragerc Exclude HTTP client files from coverage (with a path typo)
Comments suppressed due to low confidence (1)

kombu/asynchronous/http/urllib3_client.py:39

  • The new Urllib3Client thread-pool logic (queue management, _execute_request, proxy settings, and error handling) is untested. Consider adding unit tests covering _process_queue, _execute_request success/failure flows, and proxy configuration.
self._executor = ThreadPoolExecutor(max_workers=max_clients)

Copy link
Member

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am broadly in favor of this PR, specially the new executor for urllib3 client

@Nusnus Nusnus self-requested a review June 14, 2025 14:18
@auvipy
Copy link
Member

auvipy commented Jun 15, 2025

Please rebase

Copy link
Member

@Nusnus Nusnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies on my low latency; I'm very busy these days IYKYK 🚀

Please give me time to review this well before merge @auvipy , I'm aware of our urgency, I'll try to find some time soon 🙏

@auvipy
Copy link
Member

auvipy commented Jun 17, 2025

Apologies on my low latency; I'm very busy these days IYKYK 🚀

Please give me time to review this well before merge @auvipy , I'm aware of our urgency, I'll try to find some time soon 🙏

no worries, I'm not going to merge this before your reviews. the ones I merged are kind of not so complex. that's why. I will wait 2 more weeks for you. best and stay safe bro.

@spawn-guy spawn-guy force-pushed the feature_optional_pycurl_u3speed branch from a5ae6a7 to d478fec Compare June 23, 2025 12:58
@spawn-guy
Copy link
Contributor Author

spawn-guy commented Jun 23, 2025

rebased, but.

now that i see that "revert PR" was merged: @auvipy @Nusnus what are we going to do with "insecure SQS connection"? shall i fix it (again) here or somewhere else? as i fixed it in that reverted PR

PR on current main #2323

@auvipy
Copy link
Member

auvipy commented Jun 23, 2025

thanks. I have fixed the merge conflicts as well.

@auvipy auvipy added this to the 5.6.0 milestone Jun 24, 2025
Copy link
Member

@Nusnus Nusnus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@spawn-guy

rebased, but.

now that i see that "revert PR" was merged: @auvipy @Nusnus what are we going to do with "insecure SQS connection"? shall i fix it (again) here or somewhere else? as i fixed it in that reverted PR

PR on current main #2323

Are we ready for review? 🙏

@auvipy
Copy link
Member

auvipy commented Jun 24, 2025

i think yes.

@spawn-guy
Copy link
Contributor Author

spawn-guy commented Jun 24, 2025

@Nusnus @auvipy the only 1 thing that is left is pycurl is still required by default in requirements/extras/sqs.txt
this only affects extras="sqs".
the test suite should run normal

@Nusnus Nusnus force-pushed the feature_optional_pycurl_u3speed branch from 6adac32 to 2982d01 Compare July 1, 2025 09:19
@Nusnus
Copy link
Member

Nusnus commented Jul 1, 2025

Thank you for the update!
I’ve been off for some time, finally getting back.
Catching up on everything and coming back to this once I get the full picture of everything I’ve missed.
I’ll also run some tests on this PR to ensure it runs well with all of the other changes in main / celery main.
I’ll post my results here as part of my upcoming review.

Thank you for the patience - I appreciate it a lot!

@auvipy
Copy link
Member

auvipy commented Jul 8, 2025

may be we could push this to 5.7 to better better testing and review time, and focus on 5.6 as much as early possible? what do you think

@Nusnus
Copy link
Member

Nusnus commented Jul 9, 2025

may be we could push this to 5.7 to better better testing and review time, and focus on 5.6 as much as early possible? what do you think

I'm leaning to agree, assuming we can push 5.6 quite soon (beta+rc first of course).

This PR looks good so far; having some technical challenges with testing it though, so it will give me more time. On the other hand, it's quite an urgent issue, so I'm not 100% sure we want to delay it further.

Hopefully I'll finish catching up on everything by this weekend and contact you separately to plan the release/scope @auvipy

@spawn-guy
Copy link
Contributor Author

@Nusnus if you want - I can provide my code and dockers

@auvipy
Copy link
Member

auvipy commented Jul 11, 2025

please share

@spawn-guy
Copy link
Contributor Author

spawn-guy commented Jul 14, 2025

@auvipy @Nusnus https://github.com/spawn-guy/celery-sqs-python

Copy link

coderabbitai bot commented Jul 14, 2025

Walkthrough

A new asynchronous HTTP client, Urllib3Client, was added to the codebase, along with its documentation and comprehensive unit tests. The module selection logic in the HTTP package was updated to support both Curl and Urllib3-based clients. Documentation and coverage configurations were updated to reflect the new module, and a new author was credited.

Changes

File(s) Change Summary
kombu/asynchronous/http/urllib3_client.py New module: Implements Urllib3Client, an async HTTP client using urllib3 and thread pools.
t/unit/asynchronous/http/test_urllib3.py New test suite: Unit tests for Urllib3Client, covering request handling, auth, proxies, errors, and concurrency.
kombu/asynchronous/http/init.py Updated: Added BaseClient import, changed client selection logic, updated type hints, exported get_client.
docs/reference/kombu.asynchronous.http.urllib3_client.rst New doc: API reference for kombu.asynchronous.http.urllib3_client.
docs/reference/index.rst Updated: Added urllib3_client to async HTTP module documentation index.
.coveragerc Updated: Corrected/expanded omit patterns for coverage, including new urllib3 client.
requirements/pkgutils.txt Updated: Added conditional dependency for types-pycurl for non-Windows CPython.
AUTHORS Updated: Added "Paul Rysiavets [email protected]" to the author list.
kombu/asynchronous/http/curl.py Formatting: Added a blank line in the ioctl function for readability.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant HTTP_Module as kombu.asynchronous.http
    participant CurlClient
    participant Urllib3Client

    User->>HTTP_Module: get_client(hub, **kwargs)
    alt Curl available
        HTTP_Module->>CurlClient: Instantiate and return CurlClient
    else Curl unavailable
        HTTP_Module->>Urllib3Client: Instantiate and return Urllib3Client
    end
Loading
sequenceDiagram
    participant Client as Urllib3Client
    participant ThreadPool
    participant HTTPServer
    participant Callback

    Client->>Client: add_request(request)
    Client->>ThreadPool: Submit _execute_request(request)
    ThreadPool->>HTTPServer: Perform HTTP request
    HTTPServer-->>ThreadPool: Return response/error
    ThreadPool->>Client: Complete request
    Client->>Callback: Invoke callback with response
Loading

Poem

In Kombu’s warren, new code appears,
A client with threads, for modern frontiers.
Urllib3 hops in, with tests by its side,
Docs and coverage, all neatly supplied.
With every request, a rabbit’s delight—
Async and swift, through the network’s night!
🐇✨

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
.coveragerc (1)

13-13: Fix the incorrect directory path for urllib3_client.py

The path uses async instead of asynchronous, which is inconsistent with the correction made in line 12. This will cause the urllib3_client.py file to not be properly excluded from coverage.

Apply this fix:

-    *kombu/async/http/urllib3_client.py
+    *kombu/asynchronous/http/urllib3_client.py
🧹 Nitpick comments (3)
requirements/pkgutils.txt (1)

10-10: Consider scoping types-pycurl to a “dev/mypy” extra instead of core tooling list.

types-pycurl is useful only for static type checking; runtime code never imports it.
Adding it to the always-installed pkgutils.txt increases the dependency surface for end-users who do not run mypy. Evaluate moving it to an optional extra (e.g., dev), mirroring how mypy itself is handled.

t/unit/asynchronous/http/test_urllib3.py (1)

111-134: Simplify the authentication verification logic.

The authentication check is overly complex with multiple fallback strategies. Consider simplifying by directly verifying the expected behavior.

Since make_headers is imported and used in the actual implementation, you could simplify by mocking it from the start:

with patch('kombu.asynchronous.http.urllib3_client.make_headers') as mock_make_headers:
    mock_make_headers.return_value = {'Authorization': 'Basic dXNlcjpwYXNz'}
    
    # Process the request
    self.client.add_request(request)
    with patch.object(self.client, '_request_complete'):
        self.client._execute_request(request)
    
    # Verify make_headers was called with basic_auth
    mock_make_headers.assert_any_call(basic_auth='user:pass')
kombu/asynchronous/http/urllib3_client.py (1)

48-52: Consider cleaning up urllib3 connection pools.

The close method shuts down the executor but doesn't clean up urllib3 connection pools, which might leave open connections.

Track and close connection pools:

def __init__(self, hub: Hub | None = None, max_clients: int = 10):
    # ... existing code ...
    self._pools = {}  # Track connection pools

def _get_pool(self, request):
    # ... existing code ...
    pool_key = (request.url, request.proxy_host, request.proxy_port)
    if pool_key not in self._pools:
        self._pools[pool_key] = urllib3.connection_from_url(request.url, **conn_kwargs)
    return self._pools[pool_key]

def close(self):
    """Close the client and all connection pools."""
    self._timeout_check_tref.cancel()
    self._executor.shutdown(wait=False)
    # Close all connection pools
    for pool in self._pools.values():
        pool.clear()
    self._pools.clear()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 40ab91a and a64217a.

📒 Files selected for processing (9)
  • .coveragerc (1 hunks)
  • AUTHORS (1 hunks)
  • docs/reference/index.rst (1 hunks)
  • docs/reference/kombu.asynchronous.http.urllib3_client.rst (1 hunks)
  • kombu/asynchronous/http/__init__.py (1 hunks)
  • kombu/asynchronous/http/curl.py (1 hunks)
  • kombu/asynchronous/http/urllib3_client.py (1 hunks)
  • requirements/pkgutils.txt (1 hunks)
  • t/unit/asynchronous/http/test_urllib3.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
t/unit/asynchronous/http/test_urllib3.py (4)
kombu/asynchronous/http/urllib3_client.py (6)
  • Urllib3Client (28-213)
  • close (48-51)
  • add_request (53-58)
  • _execute_request (135-205)
  • _process_queue (108-124)
  • _request_complete (126-133)
t/unit/conftest.py (1)
  • hub (53-62)
t/unit/asynchronous/http/test_curl.py (1)
  • test_add_request (56-65)
t/unit/asynchronous/http/test_http.py (1)
  • test_add_request (106-109)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Unit (3.8, blacksmith-4vcpu-ubuntu-2404)
  • GitHub Check: lint (3.13)
  • GitHub Check: Unit (3.10, blacksmith-4vcpu-ubuntu-2404)
🔇 Additional comments (4)
AUTHORS (1)

112-114: Entry order remains correctly sorted – no action required.

The new author line for Paul Rysiavets is correctly placed between the existing “Paul McLanahan” and “Petar Radosevic” entries, preserving the alphabetical order rule stated at the top of the file. 👍

docs/reference/kombu.asynchronous.http.urllib3_client.rst (1)

5-11: Nice addition – documentation renders correctly.

RST structure and automodule directive look good; nothing to change.

docs/reference/index.rst (1)

73-76: Documentation index updated appropriately.

urllib3_client is now discoverable under the async HTTP section – good catch.

kombu/asynchronous/http/__init__.py (1)

10-17: LGTM! Proper fallback mechanism for optional pycurl.

The implementation correctly checks for pycurl availability and falls back to urllib3 when needed, aligning with the PR objectives.

Comment on lines 264 to 268
def ioctl(cmd):
if cmd == _pycurl.IOCMD_RESTARTREAD:
reqbuffer.seek(0)

setopt(_pycurl.IOCTLFUNCTION, ioctl)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

ioctl callback must return 0 – missing return value breaks libcurl contract.

pycurl expects the IOCTLFUNCTION callback to return an integer status (0 = OK).
With no explicit return, Python yields None, which libcurl treats as a non-zero error, aborting the transfer on some platforms/curl versions.

                 def ioctl(cmd):
                     if cmd == _pycurl.IOCMD_RESTARTREAD:
                         reqbuffer.seek(0)
+                    return 0
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def ioctl(cmd):
if cmd == _pycurl.IOCMD_RESTARTREAD:
reqbuffer.seek(0)
setopt(_pycurl.IOCTLFUNCTION, ioctl)
def ioctl(cmd):
if cmd == _pycurl.IOCMD_RESTARTREAD:
reqbuffer.seek(0)
return 0
setopt(_pycurl.IOCTLFUNCTION, ioctl)
🤖 Prompt for AI Agents
In kombu/asynchronous/http/curl.py around lines 264 to 268, the ioctl callback
function lacks a return statement, causing it to return None by default. This
breaks the libcurl contract which expects an integer status code, with 0
indicating success. Fix this by adding an explicit return 0 at the end of the
ioctl function to signal successful handling of the command.

Comment on lines +25 to +27
# Initialize _pending queue with a value for the test_client_creation test
self.client._pending = self.client._pending.__class__([Mock()])

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove the manual queue initialization - it masks a potential bug.

The test manually adds a value to _pending queue to make the assertion in line 35 pass. This could hide issues with the actual client initialization.

Either remove this manual initialization and fix the test assertion:

-        # Initialize _pending queue with a value for the test_client_creation test
-        self.client._pending = self.client._pending.__class__([Mock()])

Or change the assertion to check for existence rather than non-emptiness:

-        assert self.client._pending  # Just check it exists, not empty
+        assert hasattr(self.client, '_pending')
+        assert isinstance(self.client._pending, deque)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Initialize _pending queue with a value for the test_client_creation test
self.client._pending = self.client._pending.__class__([Mock()])
🤖 Prompt for AI Agents
In t/unit/asynchronous/http/test_urllib3.py around lines 25 to 27, the test
manually initializes the _pending queue with a mock object, which can mask bugs
in client initialization. Remove the manual assignment to self.client._pending
and update the test assertion on line 35 to check for the existence of the queue
or its correct state rather than assuming it is non-empty. This ensures the test
accurately reflects the client's real initialization behavior.

code=response.status,
headers=response.headers,
buffer=buffer,
effective_url=response.geturl() if hasattr(response, 'geturl') else request.url,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove the non-existent geturl() method call.

The urllib3 HTTPResponse object doesn't have a geturl() method. This will always use the fallback request.url.

Simplify to always use the request URL:

-                effective_url=response.geturl() if hasattr(response, 'geturl') else request.url,
+                effective_url=request.url,

If you need to track redirects, you could use the response's URL from the pool:

-                effective_url=response.geturl() if hasattr(response, 'geturl') else request.url,
+                effective_url=response.url if hasattr(response, 'url') else request.url,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
effective_url=response.geturl() if hasattr(response, 'geturl') else request.url,
- effective_url=response.geturl() if hasattr(response, 'geturl') else request.url,
+ effective_url=request.url,
🤖 Prompt for AI Agents
In kombu/asynchronous/http/urllib3_client.py at line 180, remove the call to the
non-existent geturl() method on the response object and simplify the code to
always use request.url for effective_url, since urllib3's HTTPResponse does not
have geturl(). If redirect tracking is needed, consider using the URL from the
connection pool response instead.

@auvipy auvipy modified the milestones: 5.6.0, 5.7.0 Jul 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants