Skip to content

Accept pandas Series/ExtensionArray in add_scratch; lift pandas<3 cap#2208

Merged
rly merged 7 commits into
devfrom
fix/pandas-3-compat
Jun 27, 2026
Merged

Accept pandas Series/ExtensionArray in add_scratch; lift pandas<3 cap#2208
rly merged 7 commits into
devfrom
fix/pandas-3-compat

Conversation

@rly

@rly rly commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Motivation

Pandas 3.0 makes ArrowStringArray the default backing for every DataFrame string column, so df['col'].values is now an ExtensionArray rather than an ndarray. NWBFile.add_scratch rejected such values at the PyNWB docval layer before HDMF's coercion could run, and PyNWB capped pandas<3.

The HDMF side was fixed in hdmf-dev/hdmf#1469 (shipped in HDMF 6.1.0), which centralizes coercion of pd.Series/ExtensionArray to numpy at the Data.__init__/Data.extend boundary. Because PyNWB never overrides the array_data macro, that fix propagates automatically to TimeSeries subclasses, add_unit, add_electrode, DynamicTable.from_dataframe, etc. The only PyNWB-specific surface needing a change is add_scratch, which uses its own hand-rolled type tuple.

Context: hdmf-dev/hdmf#1384, hdmf-dev/hdmf#1469.

How to test the behavior

import pandas as pd
from pynwb import NWBFile
from datetime import datetime, timezone

nwb = NWBFile(session_description='x', identifier='y',
              session_start_time=datetime.now(timezone.utc))
df = pd.DataFrame({'animal': ['cat', 'dog']})
nwb.add_scratch(df['animal'].values, name='a', description='d')  # ArrowStringArray on pandas 3

Changes

  • NWBFile.add_scratch: accept pd.Series and pandas.api.extensions.ExtensionArray in the data docval type tuple and the array-branch isinstance check; updated docstring and error messages. Coercion to numpy happens in HDMF's Data.__init__, not in PyNWB.
  • pyproject.toml: lifted the pandas<3 cap and bumped the minimum HDMF dependency to >=6.1.0.
  • tests/unit/test_scratch.py: added tests passing pd.Series (numeric and string), StringArray, and (pyarrow-guarded) ArrowStringArray; updated the widened error-message assertions.

Verified the full unit suite passes under pandas 3.0.2 + pyarrow + HDMF with #1469.

Checklist

  • Did you update CHANGELOG.md with your changes?
  • Have you checked our Contributing document?
  • Have you ensured the PR clearly describes the problem and the solution?
  • Is your contribution compliant with our coding style? This can be checked running ruff from the source directory.
  • Have you checked to ensure that there aren't other open or previously closed Pull Requests for the same change?
  • Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

🤖 Generated with Claude Code

rly and others added 2 commits June 26, 2026 02:51
Pandas 3.0 makes ArrowStringArray the default backing for DataFrame string
columns, so df['col'].values is an ExtensionArray rather than an ndarray.
NWBFile.add_scratch previously rejected these at the PyNWB docval layer
before HDMF's coercion could run.

Widen the add_scratch data type tuple and array-branch isinstance check to
accept pd.Series and pandas ExtensionArray. HDMF's Data.__init__ normalizes
these to numpy via coerce_pandas_data (hdmf-dev/hdmf#1469, HDMF 6.1.0).

Lift the pandas<3 cap and bump the minimum HDMF dependency to 6.1.0 so the
coercion is available.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.29%. Comparing base (dc3ff5e) to head (ac81125).

Additional details and impacted files
@@           Coverage Diff           @@
##              dev    #2208   +/-   ##
=======================================
  Coverage   95.29%   95.29%           
=======================================
  Files          30       30           
  Lines        3038     3039    +1     
  Branches      450      450           
=======================================
+ Hits         2895     2896    +1     
  Misses         87       87           
  Partials       56       56           
Flag Coverage Δ
integration 73.14% <100.00%> (+<0.01%) ⬆️
unit 85.98% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rly rly requested a review from oruebel June 26, 2026 09:56
@rly rly enabled auto-merge (squash) June 26, 2026 09:56
@oruebel oruebel requested a review from Copilot June 26, 2026 18:47

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates PyNWB to be compatible with pandas 3.x by allowing NWBFile.add_scratch to accept pandas.Series and pandas ExtensionArray inputs (including pandas 3’s PyArrow-backed string arrays), relying on HDMF’s newer coercion behavior. It also lifts the pandas<3 dependency cap and raises the minimum HDMF version accordingly.

Changes:

  • Expanded NWBFile.add_scratch input validation to accept pd.Series and pandas.api.extensions.ExtensionArray, and updated related docs/error messages.
  • Updated dependencies to require hdmf>=6.1.0 and removed the pandas<3 cap.
  • Added unit tests for add_scratch with pd.Series, StringArray, and (optionally) PyArrow-backed string arrays; updated changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tests/unit/test_scratch.py Adds coverage for add_scratch with Series and extension arrays, plus a pyarrow-guarded Arrow string array case.
src/pynwb/file.py Extends add_scratch docval/type checks to accept Series and ExtensionArray inputs and updates messaging/docs.
pyproject.toml Bumps minimum HDMF to 6.1.0 and removes the pandas<3 constraint.
CHANGELOG.md Documents pandas 3 compatibility and the HDMF minimum version bump rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyproject.toml
Comment thread tests/unit/test_scratch.py
rly and others added 3 commits June 26, 2026 14:19
The pinned and minimum requirement files still pinned hdmf==6.0.2 while
pyproject.toml requires hdmf>=6.1.0, so the pinned and minimum tox
environments installed a conflicting hdmf and `pip check` failed. Pin both
to 6.1.0. hdmf 6.1.0 keeps the same pandas>=1.4.0 and numpy>=1.22.0 floors,
so the minimum environment is unaffected.

Install pyarrow in the upgraded test environment so
test_add_scratch_arrow_extension_array_string runs against pandas 3's
default ArrowStringArray instead of being skipped everywhere in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Resolve conflicts from dev's migration to PEP 735 dependency groups:
- requirements.txt / requirements-min.txt were deleted on dev (the pinned
  and minimum tox envs now resolve from pyproject.toml via uv), so the
  hdmf-pin sync to those files is obsolete; accept the deletions.
- Re-add pyarrow to the upgraded test environment in tox.ini's new
  dependency-groups structure so the pandas 3 ArrowStringArray scratch
  test runs in CI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@oruebel

oruebel commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@rly overall this looks good to me, but copilot suggested a couple of changes that look important. Ping me when I should re-review to approve.

The scratch test skips unless pyarrow is importable, and the deps factor
only covered test-upgraded envs, so it was skipped in test-py314-prerelease,
where pandas 3 prereleases land. Extend the factor to test-prerelease.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@rly rly merged commit f7274a6 into dev Jun 27, 2026
25 of 26 checks passed
@rly rly deleted the fix/pandas-3-compat branch June 27, 2026 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants