Skip to content

Add Option to Return Polars DataFrames #1

@SithHades

Description

@SithHades

Currently, CryptoDataCache returns data exclusively as pandas DataFrames. While pandas is a widely-used and robust library, Polars has been gaining popularity for its superior performance in many data processing scenarios, especially with large datasets.

I propose introducing a configurable setting in CryptoDataCache that allows users to opt-in to having returned data in the Polars DataFrame format instead of pandas.

Proposed Implementation:

Add a new parameter (e.g. use_polars: bool = False) to the CryptoDataCache constructor.
Internally, modify data-returning methods (e.g. get_prices, get_ohlcv, etc.) to return either a pd.DataFrame or pl.DataFrame based on the flag.
Ensure dependencies on Polars are optional — i.e. raise a helpful error if use_polars=True but the library isn't installed.
Document the new setting and its implications (e.g. type differences, API compatibility).
Benefits:

Improved performance for users who work with large datasets.
More flexibility for users who prefer working with Polars’ lazy evaluation and multithreaded execution model.
Potential Considerations:

Differences in method signatures and return types between pandas and Polars.
Polars DataFrames are not a drop-in replacement for all pandas workflows, so users should be aware of what’s returned.

Example API:

from crypto_data_cache import CryptoDataCache
cdc = CryptoDataCache(use_polars=True)
df = cdc.fetch_data("BTCUSDT", "2025-01-01", "2025-01-31", interval="1h")
print(type(df))  # <class 'polars.DataFrame'>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions