Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not possible to make an heuristic depend on the request. #131

Open
hartym opened this issue Oct 18, 2016 · 4 comments
Open

Not possible to make an heuristic depend on the request. #131

hartym opened this issue Oct 18, 2016 · 4 comments

Comments

@hartym
Copy link

hartym commented Oct 18, 2016

We're having a bit of trouble with an heuristic. The fact is, Heuristic classes get the raw response, (a.k.a the requests.packages.urllib3.response.HTTPResponse), which does not contain any information about the request (for example, URL ...).

If we look at the parent's build_response(...) method, we see it has both raw request and response (cache_response method takes both).

    def build_response(self, request, response, from_cache=False):
        """
        Build a response by making a request or using the cache.

        This will end up calling send and returning a potentially
        cached response
        """
        if not from_cache and request.method == 'GET':

            # apply any expiration heuristics
            if response.status == 304:
                # We must have sent an ETag request. This could mean
                # that we've been expired already or that we simply
                # have an etag. In either case, we want to try and
                # update the cache if that is the case.
                cached_response = self.controller.update_cached_response(
                    request, response
                )

                if cached_response is not response:
                    from_cache = True

                # We are done with the server response, read a
                # possible response body (compliant servers will
                # not return one, but we cannot be 100% sure) and
                # release the connection back to the pool.
                response.read(decode_content=False)
                response.release_conn()

                response = cached_response

            # We always cache the 301 responses
            elif response.status == 301:
                self.controller.cache_response(request, response)
            else:
                # Check for any heuristics that might update headers
                # before trying to cache.
                if self.heuristic:
                    response = self.heuristic.apply(response)

                # Wrap the response file with a wrapper that will cache the
                #   response when the stream has been consumed.
                response._fp = CallbackFileWrapper(
                    response._fp,
                    functools.partial(
                        self.controller.cache_response,
                        request,
                        response,
                    )
                )

Is there any reason (other than BC, obviously) why the heuristic only gets then response, and not the request ?

@SteadBytes
Copy link

I know this is an old issue but I think supporting this could be beneficial. I see no reason why the request couldn't be made available to the heuristic. I'm happy to make a PR for it :)

@ionrock
Copy link
Contributor

ionrock commented Mar 13, 2019

As it has been a while, I could be wrong, but I think there were some difficulties taking the urllib3 response and converting it to the requests response and back again b/c of the file handle for the response body.

When I did look, there wasn't really anything that was missing pers, it just required a different API. That said, I'm happy to look at any PRs to improve the heuristics, including a higher level abstraction that might be helpful. I'd only ask that if anyone does have any radical ideas, lets chat about it in a ticket before spending too much time in the code.

@SteadBytes
Copy link

SteadBytes commented Mar 14, 2019

Thanks for replying @ionrock ! Essentially I'm thinking of passing in the request in addition to the response to the apply method of the heuristic:

class BaseHeuristic(object):

    def warning(self, request, response):
        ...

    def update_headers(self, request, response):
        ...

    def apply(self, request, response):
        ...

That would then allow for heuristics that, for example, take into account query parameters. A trivial example:

class NeverCacheParameterHeuristic(BaseHeuristic):

    def update_headers(self, request, response):
        if "never_cache" in request.params:
            return {}
        return super().update_headers(request, response)

What do you think? 😄 I'd be more than happy to implement and PR for anything that comes out of this discussion 👍

@blueyed
Copy link

blueyed commented Dec 6, 2019

@SteadBytes
I think you should go with creating a PR already. (I came here thinking about the same)
The only problem I can see however, that custom signatures need to be updated (when not taking **kwargs already). For this the signature could be inspected, but it might also be ok to do this in a new major release then?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants