Added functions to support IO for Parquet files. #562

ShigrafS · 2025-04-27T11:42:27Z

Closes #307

Description

What is this PR

Bug fix
Addition of a new feature
Other

Summary

This PR adds Parquet file support to the movement package, enabling it to read and write pose tracking data in the tidy DataFrame format used by the [animovement](https://github.com/roaldarbol/animovement) R package. It enhances interoperability, supports efficient data storage, and simplifies integration with modern data analysis tools.

🧩 Related Issue

#307

Support tidy dataframe and Parquet I/O to facilitate data exchange with animovement

✨ What's New

✅ Load Functions (`movement/io/load_poses.py`)

Added from_tidy_df: Converts a tidy pandas DataFrame into an xarray.Dataset.
Added from_animovement_file: Reads a .parquet file and converts it using from_tidy_df.
Updated from_file to support source_software="animovement".

✅ Save Functions (`movement/io/save_poses.py`)

Added to_tidy_df: Converts an xarray.Dataset to a tidy DataFrame with optional confidence values.
Added to_animovement_file: Saves a dataset to a .parquet file via to_tidy_df.

✅ Dependency Update

Added pyarrow to pyproject.toml to support Parquet I/O via pandas.

✅ Tests (`tests/test_parquet_io.py`)

Added a new test suite covering:
- Conversion between tidy DataFrames and datasets
- Round-trip accuracy (DataFrame → dataset → DataFrame, and Parquet file round-trips)
- Edge cases like missing data, no confidence, and invalid inputs

💡 Why This Matters

Interoperability: Enables seamless exchange with the animovement package.
Performance: Parquet provides efficient columnar storage and compression.
Usability: Tidy format is ideal for plotting, statistics, and tabular exploration.
Reliability: Comprehensive test coverage ensures stable, correct behavior.
Modernization: Brings movement closer to data science best practices.

How has this PR been tested?

Local pytest and CI tests.

Is this a breaking change?

If this PR breaks any existing functionality, please explain how and why.

Does this PR require an update to the documentation?

If any features have changed, or have been added. Please explain how the
documentation has been updated.

Checklist:

The code has been tested locally
Tests have been added to cover all new functionality
The documentation has been updated to reflect any changes
The code has been formatted with pre-commit

Added test_parquet_io.py.

…et_io.py

Fixed minor issue.

ShigrafS · 2025-04-28T04:21:22Z

@niksirbi @sfmig This PR is ready to be merged.
Kindly review it.

codecov · 2025-04-28T09:00:41Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (fe28a5f) to head (4c8f835).

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #562   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           32        32           
  Lines         1786      1856   +70     
=========================================
+ Hits          1786      1856   +70

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

ShigrafS · 2025-04-28T11:51:15Z

@niksirbi @sfmig I've added a few tests to increase the coverage to 100%.
This should solve the Codecov issue.

ShigrafS · 2025-05-14T14:19:46Z

@niksirbi Can you please review this?

niksirbi · 2025-05-19T12:42:36Z

@niksirbi Can you please review this?

Thanks for your work on this @ShigrafS. Sorry we didn't have time to get to this earlier, as we are busy with other development priorities (which are more urgent) and, in my case, being away attending conferences. I will give you feedback on this when I manage to review it in detail.

for more information, see https://pre-commit.ci

sonarqubecloud · 2025-06-08T10:09:04Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

niksirbi

Thanks for working on this, @ShigrafS.

This is a solid start, but I propose narrowing the scope of this PR to make it more manageable.

Specifically, let’s first focus solely on the from_tidy_df and to_tidy_df functions. I envisage these functions as providing a one-to-one mapping between our xarray dataset format and a “tidy” pandas DataFrame.

The columns I’d expect in such a tidy DataFrame—closely following animovement’s format—are:

time: derived directly from the time dimension of the xarray dataset, in whatever units are used there (you’re currently using frame). This means that in the from_tidy_df function, the fps parameter will no longer be used to populate the time column. Instead, it will only be stored in the dataset’s attrs if provided.
individual: this is missing from the animovement docs, but we definitely need it to handle multi-animal data. You currently call this track_id, but it would be good to match the xarray dimension name.
keypoint
x
y
z (only if the data is 3D)
confidence

For now, please omit the from_animovement_file and to_animovement_file functions. I’d like to discuss some details with animovement’s developer, who is currently on extended leave. We can easily add those functions later once we have the tidy DataFrame functions sorted.

Additionally, please remove any unrelated changes that have crept into other functions (such as the loaders for SLEAP, Anipose, etc.). Perhaps these were inadvertently introduced when merging from the main branch?

The easiest approach might be to close this PR and open a fresh one—starting from the latest main branch—implementing only the necessary tidy DataFrame functions and their corresponding tests.

Thanks again for your effort on this!

ShigrafS · 2025-07-06T10:13:00Z

Thank you for your review @niksirbi
I had been a little preoccupied with other commitments.
I greatly appreciate your review.

I'll follow up on this with a new PR incorporating your suggestions.
Thanks again.

niksirbi · 2025-07-07T19:18:48Z

Thanks for the update @ShigrafS. In that case, I'll convert this PR to draft for now. Make sure to reference this when you open your new PR.

ShigrafS and others added 5 commits April 27, 2025 17:09

Added functions to support IO for Parquet files.

6acd053

Added test_parquet_io.py.

Fixed failing mypy errors.

c2e396c

Replaced np.where with np.nonzero load_poses.py.

f2ee358

Replaced np.random.rand with a numpy.random.default_rng in test_parqu…

4513df6

…et_io.py

Updated pyproject.toml to include pyarrow.

b3558e6

Fixed minor issue.

ShigrafS marked this pull request as ready for review April 27, 2025 17:40

Added tests to increase coverage to 100%.

a8c6c6c

ShigrafS added 2 commits April 30, 2025 21:21

Merge branch 'main' into parquet

642ad7c

Merge branch 'main' into parquet

f5a75fd

niksirbi self-requested a review May 2, 2025 07:34

ShigrafS added 5 commits May 8, 2025 06:49

Merge branch 'main' into parquet

e20964c

Merge branch 'main' into parquet

d41b46f

Merge branch 'main' into parquet

6d35843

Merge branch 'main' into parquet

cc3a97a

Merge branch 'main' into parquet

88937f5

Merge branch 'main' into parquet

dfe4935

ShigrafS and others added 7 commits May 19, 2025 23:44

Merge branch 'main' into parquet

52c17d0

Merge branch 'main' into parquet

6b8c246

Merge branch 'main' into parquet

6e25a7b

Fixed merge conflicts.

b30e542

[pre-commit.ci] auto fixes from pre-commit.com hooks

c9c775b

for more information, see https://pre-commit.ci

Fixed CI errors.

0a087da

Merge branch 'main' into parquet

4c8f835

niksirbi requested changes Jun 11, 2025

View reviewed changes

niksirbi marked this pull request as draft July 7, 2025 19:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Added functions to support IO for Parquet files. #562

Added functions to support IO for Parquet files. #562

Uh oh!

ShigrafS commented Apr 27, 2025 •

edited

Loading

Uh oh!

ShigrafS commented Apr 28, 2025

Uh oh!

codecov bot commented Apr 28, 2025 •

edited

Loading

Uh oh!

ShigrafS commented Apr 28, 2025

Uh oh!

ShigrafS commented May 14, 2025

Uh oh!

niksirbi commented May 19, 2025

Uh oh!

sonarqubecloud bot commented Jun 8, 2025

Uh oh!

niksirbi left a comment

Uh oh!

ShigrafS commented Jul 6, 2025

Uh oh!

niksirbi commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Added functions to support IO for Parquet files. #562

Are you sure you want to change the base?

Added functions to support IO for Parquet files. #562

Uh oh!

Conversation

ShigrafS commented Apr 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Summary

🧩 Related Issue

✨ What's New

✅ Load Functions (movement/io/load_poses.py)

✅ Save Functions (movement/io/save_poses.py)

✅ Dependency Update

✅ Tests (tests/test_parquet_io.py)

💡 Why This Matters

How has this PR been tested?

Is this a breaking change?

Does this PR require an update to the documentation?

Checklist:

Uh oh!

ShigrafS commented Apr 28, 2025

Uh oh!

codecov bot commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ShigrafS commented Apr 28, 2025

Uh oh!

ShigrafS commented May 14, 2025

Uh oh!

niksirbi commented May 19, 2025

Uh oh!

sonarqubecloud bot commented Jun 8, 2025

Quality Gate passed

Uh oh!

niksirbi left a comment

Choose a reason for hiding this comment

Uh oh!

ShigrafS commented Jul 6, 2025

Uh oh!

niksirbi commented Jul 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ShigrafS commented Apr 27, 2025 •

edited

Loading

✅ Load Functions (`movement/io/load_poses.py`)

✅ Save Functions (`movement/io/save_poses.py`)

✅ Tests (`tests/test_parquet_io.py`)

codecov bot commented Apr 28, 2025 •

edited

Loading