-
Notifications
You must be signed in to change notification settings - Fork 3.5k
Open
Labels
Description
Right now, users can use pandas series/dataframes in state and serialize them with a JsonPlusSerializer
that has pickle_fallback
enabled.
It'd be great to have these as first class citizens, able to be serialized via msgpack
like numpy
arrays.
This is a bit of a tricky task, as dataframes have lots of nuanced features like:
- multiindexes (both row and column)
- dtypes by column
- opportunity for arbitrary objects in table cells
We want to preserve df structure during serialization and deserialization.
There are a few options here, assuming we continue to use msgpack:
- Dump pickled content (this is a bit redundant, both are serialization protocols). One benefit here is that pandas x pickle work well together with all of the above nuances
- Dump bytes directly, though custom logic will have to be written for the above pandas features
- Use
arrow
- this is the most efficient storage wise, though there are some type inconsistencies (like withobject
dtype) that will need to be considered.
A PR addressing this should have thorough testing, perhaps mimicking many of the conditions tested for in #5057.
You might want to reference #5035 as a reference for how to add logic for new types to JsonPlusSerializer
.