Skip to content

Log a warning/raise error if user tries to cache something undeterministic (uncacheable) #12229

@N-Demir

Description

@N-Demir

First check

  • I added a descriptive title to this issue.
  • I used the GitHub search to find a similar request and didn't find it.
  • I searched the Prefect documentation for this feature.

Prefect Version

2.x

Describe the current behavior

I initially thought this was a prefect bug, but after a lot of investigation I was able to determine that cloudpickle used by task_input_hash is non deterministic from run to run in lots of different scenarios. See the following issues opened on their end:

For me, the issue came from trying to pass in a class to a prefect task with caching enabled using task_input_hash. Strangely this is only deterministic if the class is first imported from another module and not defined in the same file as the script that kicks off the flow (not defined in __main__).

This single hard to debug issue has led to an incredible amount of frustration at prefect and confusion among myself and team members, and from surveying the landscape of caching in python there doesn't seem to be an incredible out of the box easier solution

Describe the proposed behavior

I'm not expert enough to know of what the best solution is or if any of the other libraries I mentioned are definitively better, but from a user's perspective I want to emphasize one simpler thing that streamlit does well that has saved me a ton of pain in the past: the UnhashableParamError

In streamlit's caching if you try to cache a custom object it will just error and you will receive a UnhashableParamError. Had prefect done this for me it would've alleviated so so much confusion about why caching wasn't working (leading me down rabbit holes of getting more confused with the distinction between prefect caching and results).

Perhaps an error is aggressive, but even a warning log would make a huge difference to debugging why from run to run prefect's caching isn't working. And maybe task_input_hash should be more restrictive in forcing user's to have hashable args instead of blindly using a nondeterministic cloudpickle as the failsafe to json

Example Use

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAn improvement of an existing featureupstream dependencyAn upstream issue caused by a bug in one of our dependencies

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions