Accept pandas Series/ExtensionArray for Data; lift pandas<3 cap by rly · Pull Request #1469 · hdmf-dev/hdmf

rly · 2026-05-04T19:28:26Z

Summary

Accept pd.Series and pandas.api.extensions.ExtensionArray (incl. StringArray/ArrowStringArray) as data in Data and its subclasses, normalizing to numpy at the Data.__init__/Data.extend boundary so every subclass (VectorData, VectorIndex, ScratchData, ElementIdentifiers, …) picks up the fix without per-class changes.
Reject pd.NA/NaN-bearing pandas input with an informative TypeError (it would otherwise crash at HDF5 vlen-string write time). Inputs without missing values convert to their natural numpy dtype.
Lift the pandas<3 cap from pyproject.toml.

Why

Pandas 3.0 makes PyArrow-backed strings the default for all DataFrame string columns. df['col'].values is now ArrowStringArray, so VectorData(name=..., data=df['col'].values) (and any other typical user pattern that hands HDMF a string column) now fails docval type validation. Centralizing the fix at the Data construction boundary means VectorData, add_unit, add_electrode, from_dataframe, etc. all keep working with no further changes.

Behavior

ArrowStringArray, StringArray, pd.Series (any backing dtype), pd.Categorical, and nullable numeric/boolean dtypes (Int64, boolean, Float64) without missing values → converted to np.ndarray at their natural numpy dtype.
pandas input containing pd.NA or NaN → TypeError pointing at the missing-values cause and asking the user to fill with a sentinel. (This covers nullable dtypes with NAs, the only case where .to_numpy() would silently change the dtype.)
Non-pandas inputs are pass-through; no behavior change for existing callers.

Verification

Reproducer from Pandas 3.0 String Type Compatibility Breaking HDMF Data Ingestion #1384 now succeeds.
HDF5 roundtrip on DynamicTable.from_dataframe(df=...) with pandas 3.0.2 default string columns works end-to-end.
Full unit suite on a pandas 3.0.2 environment: 1804 passed, 117 skipped, 1 xfailed, 0 failed.

Test plan

Unit tests for coerce_pandas_data covering StringArray, ArrowStringArray, plain numeric Series, Categorical, no-NA nullable int/bool, and NA-bearing inputs.
End-to-end test through VectorData for both Series and df.values paths.
Manual HDF5 roundtrip with pandas 3.0.
CI passes on Python 3.10–3.13 with pandas 1.4 (lower bound), pandas 2.x, and pandas 3.x.
- https://github.com/hdmf-dev/hdmf/actions/runs/28064516378

🤖 Generated with Claude Code

…pat) Pandas 3.0 makes PyArrow-backed strings the default for DataFrame string columns, so df['col'].values is now ArrowStringArray and constructing VectorData(data=...) fails type validation. Add pd.Series and pandas.api.extensions.ExtensionArray to the array_data macro and coerce to numpy at the Data construction boundary so every Data subclass picks up the fix without per-class changes. Reject pd.NA/NaN with an informative TypeError (HDF5 vlen-string writes already crash on these) and reject IntegerArray/BooleanArray/FloatingArray to avoid silent dtype widening on .to_numpy(). Lift the pandas<3 cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-04T19:29:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.23%. Comparing base (af7879a) to head (849c1d6).

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1469      +/-   ##
==========================================
+ Coverage   93.22%   93.23%   +0.01%     
==========================================
  Files          41       41              
  Lines       10224    10242      +18     
  Branches     2109     2114       +5     
==========================================
+ Hits         9531     9549      +18     
  Misses        417      417              
  Partials      276      276

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

# Conflicts: # src/hdmf/utils.py

Drop the IntegerArray/BooleanArray/FloatingArray rejection in coerce_pandas_data. For inputs without missing values these convert losslessly to their natural numpy dtype, and the missing-values guard already rejects the only case where .to_numpy() would change the dtype. This also makes masked-nullable inputs behave like arrow-backed nullable inputs, which already passed through. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The array_data docval macro now lists pandas ExtensionArray, so autodoc-generated signatures reference pandas.ExtensionArray. pandas documents this class as pandas.api.extensions.ExtensionArray, so the short name has no intersphinx target and sphinx-build -W fails. Ignore it the same way as the other unresolved external classes. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # CHANGELOG.md

coerce_pandas_data converted nullable masked arrays (Int64, boolean, Float64) via to_numpy()/np.asarray(), which returns an object array on pandas < 2.2 and the native dtype only on pandas >= 2.2. Convert through the dtype's backing numpy_dtype instead, so the result keeps its native dtype (int64, bool, float64) on the full supported pandas range, including the 1.4 lower bound exercised by the minimum CI env. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rly marked this pull request as draft May 4, 2026 19:29

Merge branch 'dev' into fix/pandas-3-compat

2788270

rly mentioned this pull request May 5, 2026

Pandas 3.0 String Type Compatibility Breaking HDMF Data Ingestion #1384

Open

rly and others added 3 commits June 23, 2026 09:19

Merge branch 'dev' into fix/pandas-3-compat

4eb3992

# Conflicts: # src/hdmf/utils.py

rly marked this pull request as ready for review June 23, 2026 22:34

rly and others added 3 commits June 23, 2026 15:36

Merge remote-tracking branch 'origin/dev' into fix/pandas-3-compat

1292935

# Conflicts: # CHANGELOG.md

Merge branch 'dev' into fix/pandas-3-compat

849c1d6

rly requested a review from oruebel June 23, 2026 23:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accept pandas Series/ExtensionArray for Data; lift pandas<3 cap#1469

Accept pandas Series/ExtensionArray for Data; lift pandas<3 cap#1469
rly wants to merge 8 commits into
devfrom
fix/pandas-3-compat

rly commented May 4, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rly commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Behavior

Verification

Test plan

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

rly commented May 4, 2026 •

edited

Loading

codecov Bot commented May 4, 2026 •

edited

Loading