-
Couldn't load subscription status.
- Fork 72
Added functions to support IO for Parquet files. #562
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Added test_parquet_io.py.
Fixed minor issue.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #562 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 32 32
Lines 1786 1856 +70
=========================================
+ Hits 1786 1856 +70 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@niksirbi Can you please review this? |
Thanks for your work on this @ShigrafS. Sorry we didn't have time to get to this earlier, as we are busy with other development priorities (which are more urgent) and, in my case, being away attending conferences. I will give you feedback on this when I manage to review it in detail. |
for more information, see https://pre-commit.ci
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this, @ShigrafS.
This is a solid start, but I propose narrowing the scope of this PR to make it more manageable.
Specifically, let’s first focus solely on the from_tidy_df and to_tidy_df functions. I envisage these functions as providing a one-to-one mapping between our xarray dataset format and a “tidy” pandas DataFrame.
The columns I’d expect in such a tidy DataFrame—closely following animovement’s format—are:
time: derived directly from thetimedimension of thexarraydataset, in whatever units are used there (you’re currently usingframe). This means that in thefrom_tidy_dffunction, thefpsparameter will no longer be used to populate thetimecolumn. Instead, it will only be stored in the dataset’sattrsif provided.individual: this is missing from the animovement docs, but we definitely need it to handle multi-animal data. You currently call thistrack_id, but it would be good to match thexarraydimension name.keypointxyz(only if the data is 3D)confidence
For now, please omit the from_animovement_file and to_animovement_file functions. I’d like to discuss some details with animovement’s developer, who is currently on extended leave. We can easily add those functions later once we have the tidy DataFrame functions sorted.
Additionally, please remove any unrelated changes that have crept into other functions (such as the loaders for SLEAP, Anipose, etc.). Perhaps these were inadvertently introduced when merging from the main branch?
The easiest approach might be to close this PR and open a fresh one—starting from the latest main branch—implementing only the necessary tidy DataFrame functions and their corresponding tests.
Thanks again for your effort on this!
|
Thank you for your review @niksirbi I'll follow up on this with a new PR incorporating your suggestions. |
|
Thanks for the update @ShigrafS. In that case, I'll convert this PR to draft for now. Make sure to reference this when you open your new PR. |



Closes #307
Description
What is this PR
Summary
This PR adds Parquet file support to the
movementpackage, enabling it to read and write pose tracking data in the tidy DataFrame format used by the[animovement](https://github.com/roaldarbol/animovement)R package. It enhances interoperability, supports efficient data storage, and simplifies integration with modern data analysis tools.🧩 Related Issue
#307
✨ What's New
✅ Load Functions (
movement/io/load_poses.py)from_tidy_df: Converts a tidy pandas DataFrame into anxarray.Dataset.from_animovement_file: Reads a.parquetfile and converts it usingfrom_tidy_df.from_fileto supportsource_software="animovement".✅ Save Functions (
movement/io/save_poses.py)to_tidy_df: Converts anxarray.Datasetto a tidy DataFrame with optional confidence values.to_animovement_file: Saves a dataset to a.parquetfile viato_tidy_df.✅ Dependency Update
pyarrowtopyproject.tomlto support Parquet I/O via pandas.✅ Tests (
tests/test_parquet_io.py)💡 Why This Matters
animovementpackage.movementcloser to data science best practices.How has this PR been tested?
Local pytest and CI tests.
Is this a breaking change?
If this PR breaks any existing functionality, please explain how and why.
Does this PR require an update to the documentation?
If any features have changed, or have been added. Please explain how the
documentation has been updated.
Checklist: