Accept pandas Series/ExtensionArray in add_scratch; lift pandas<3 cap#2208
Conversation
Pandas 3.0 makes ArrowStringArray the default backing for DataFrame string columns, so df['col'].values is an ExtensionArray rather than an ndarray. NWBFile.add_scratch previously rejected these at the PyNWB docval layer before HDMF's coercion could run. Widen the add_scratch data type tuple and array-branch isinstance check to accept pd.Series and pandas ExtensionArray. HDMF's Data.__init__ normalizes these to numpy via coerce_pandas_data (hdmf-dev/hdmf#1469, HDMF 6.1.0). Lift the pandas<3 cap and bump the minimum HDMF dependency to 6.1.0 so the coercion is available. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## dev #2208 +/- ##
=======================================
Coverage 95.29% 95.29%
=======================================
Files 30 30
Lines 3038 3039 +1
Branches 450 450
=======================================
+ Hits 2895 2896 +1
Misses 87 87
Partials 56 56
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates PyNWB to be compatible with pandas 3.x by allowing NWBFile.add_scratch to accept pandas.Series and pandas ExtensionArray inputs (including pandas 3’s PyArrow-backed string arrays), relying on HDMF’s newer coercion behavior. It also lifts the pandas<3 dependency cap and raises the minimum HDMF version accordingly.
Changes:
- Expanded
NWBFile.add_scratchinput validation to acceptpd.Seriesandpandas.api.extensions.ExtensionArray, and updated related docs/error messages. - Updated dependencies to require
hdmf>=6.1.0and removed thepandas<3cap. - Added unit tests for
add_scratchwithpd.Series,StringArray, and (optionally) PyArrow-backed string arrays; updated changelog.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| tests/unit/test_scratch.py | Adds coverage for add_scratch with Series and extension arrays, plus a pyarrow-guarded Arrow string array case. |
| src/pynwb/file.py | Extends add_scratch docval/type checks to accept Series and ExtensionArray inputs and updates messaging/docs. |
| pyproject.toml | Bumps minimum HDMF to 6.1.0 and removes the pandas<3 constraint. |
| CHANGELOG.md | Documents pandas 3 compatibility and the HDMF minimum version bump rationale. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
The pinned and minimum requirement files still pinned hdmf==6.0.2 while pyproject.toml requires hdmf>=6.1.0, so the pinned and minimum tox environments installed a conflicting hdmf and `pip check` failed. Pin both to 6.1.0. hdmf 6.1.0 keeps the same pandas>=1.4.0 and numpy>=1.22.0 floors, so the minimum environment is unaffected. Install pyarrow in the upgraded test environment so test_add_scratch_arrow_extension_array_string runs against pandas 3's default ArrowStringArray instead of being skipped everywhere in CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve conflicts from dev's migration to PEP 735 dependency groups: - requirements.txt / requirements-min.txt were deleted on dev (the pinned and minimum tox envs now resolve from pyproject.toml via uv), so the hdmf-pin sync to those files is obsolete; accept the deletions. - Re-add pyarrow to the upgraded test environment in tox.ini's new dependency-groups structure so the pandas 3 ArrowStringArray scratch test runs in CI. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@rly overall this looks good to me, but copilot suggested a couple of changes that look important. Ping me when I should re-review to approve. |
The scratch test skips unless pyarrow is importable, and the deps factor only covered test-upgraded envs, so it was skipped in test-py314-prerelease, where pandas 3 prereleases land. Extend the factor to test-prerelease. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Motivation
Pandas 3.0 makes
ArrowStringArraythe default backing for every DataFrame string column, sodf['col'].valuesis now anExtensionArrayrather than anndarray.NWBFile.add_scratchrejected such values at the PyNWBdocvallayer before HDMF's coercion could run, and PyNWB cappedpandas<3.The HDMF side was fixed in hdmf-dev/hdmf#1469 (shipped in HDMF 6.1.0), which centralizes coercion of
pd.Series/ExtensionArrayto numpy at theData.__init__/Data.extendboundary. Because PyNWB never overrides thearray_datamacro, that fix propagates automatically toTimeSeriessubclasses,add_unit,add_electrode,DynamicTable.from_dataframe, etc. The only PyNWB-specific surface needing a change isadd_scratch, which uses its own hand-rolled type tuple.Context: hdmf-dev/hdmf#1384, hdmf-dev/hdmf#1469.
How to test the behavior
Changes
NWBFile.add_scratch: acceptpd.Seriesandpandas.api.extensions.ExtensionArrayin thedatadocval type tuple and the array-branchisinstancecheck; updated docstring and error messages. Coercion to numpy happens in HDMF'sData.__init__, not in PyNWB.pyproject.toml: lifted thepandas<3cap and bumped the minimum HDMF dependency to>=6.1.0.tests/unit/test_scratch.py: added tests passingpd.Series(numeric and string),StringArray, and (pyarrow-guarded)ArrowStringArray; updated the widened error-message assertions.Verified the full unit suite passes under pandas 3.0.2 + pyarrow + HDMF with #1469.
Checklist
rufffrom the source directory.🤖 Generated with Claude Code