-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently, CryptoDataCache returns data exclusively as pandas DataFrames. While pandas is a widely-used and robust library, Polars has been gaining popularity for its superior performance in many data processing scenarios, especially with large datasets.
I propose introducing a configurable setting in CryptoDataCache that allows users to opt-in to having returned data in the Polars DataFrame format instead of pandas.
Proposed Implementation:
Add a new parameter (e.g. use_polars: bool = False) to the CryptoDataCache constructor.
Internally, modify data-returning methods (e.g. get_prices, get_ohlcv, etc.) to return either a pd.DataFrame or pl.DataFrame based on the flag.
Ensure dependencies on Polars are optional — i.e. raise a helpful error if use_polars=True but the library isn't installed.
Document the new setting and its implications (e.g. type differences, API compatibility).
Benefits:
Improved performance for users who work with large datasets.
More flexibility for users who prefer working with Polars’ lazy evaluation and multithreaded execution model.
Potential Considerations:
Differences in method signatures and return types between pandas and Polars.
Polars DataFrames are not a drop-in replacement for all pandas workflows, so users should be aware of what’s returned.
Example API:
from crypto_data_cache import CryptoDataCache
cdc = CryptoDataCache(use_polars=True)
df = cdc.fetch_data("BTCUSDT", "2025-01-01", "2025-01-31", interval="1h")
print(type(df)) # <class 'polars.DataFrame'>