BUG: `pivot_table` drops rows and columns despite values being non-`NaN` #61113

it176131 · 2025-03-12T23:41:34Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np
import pandas as pd
import pandas._testing as tm


def test_pivot_table_index_and_column_with_nan() -> None:
    """Index and columns should exist if any non-null values.

    Input data
    ----------
        row  col  val
    0  NaN  0.0    0
    1  0.0  1.0    1
    2  1.0  2.0    2
    3  2.0  3.0    3
    4  3.0  NaN    4

    Expected output
    ---------------
    col  0.0  1.0  2.0  3.0  NaN
    row                         
    NaN  0.0  NaN  NaN  NaN  NaN
    0.0  NaN  1.0  NaN  NaN  NaN
    1.0  NaN  NaN  2.0  NaN  NaN
    2.0  NaN  NaN  NaN  3.0  NaN
    3.0  NaN  NaN  NaN  NaN  4.0
    """
    data = {
        "row": [None, *range(4)],
        "col": [*range(4), None],
        "val": range(5)
    }
    df = pd.DataFrame(data)
    actual = df.pivot_table(values="val", index="row", columns="col")
    e_index = [None, *range(4)]
    e_columns = [*range(4), None]
    e_data = np.zeros(shape=(5, 5))
    e_data.fill(np.NaN)
    np.fill_diagonal(a=e_data, val=range(5))
    expected = pd.DataFrame(
        data=e_data,
        index=pd.Index(data=e_index, name="row"),
        columns=pd.Index(data=e_columns, name="col")
    )
    tm.assert_frame_equal(left=actual, right=expected)  # fails

Issue Description

Rows and columns are unexpectedly dropped while creating a pivot table with a single NaN index label and a single NaN column label and no NaN values in the input data.

Input data

   row  col  val
0  NaN  0.0    0
1  0.0  1.0    1
2  1.0  2.0    2
3  2.0  3.0    3
4  3.0  NaN    4

Actual output

col  1.0  2.0  3.0
row               
0.0  1.0  NaN  NaN
1.0  NaN  2.0  NaN
2.0  NaN  NaN  3.0

Expected Behavior

The docs for dropna state:

Do not include columns whose entries are all NaN. If True, rows with a NaN value in any column will be omitted before computing margins.

Given that dropna=True by default, I'd expect all columns and rows to be present as each has at least one non-NaN value.

Expected output

col  0.0  1.0  2.0  3.0  NaN
row                         
NaN  0.0  NaN  NaN  NaN  NaN
0.0  NaN  1.0  NaN  NaN  NaN
1.0  NaN  NaN  2.0  NaN  NaN
2.0  NaN  NaN  NaN  3.0  NaN
3.0  NaN  NaN  NaN  NaN  4.0

Installed Versions

INSTALLED VERSIONS

commit : 3c93d06
python : 3.13.2
python-bits : 64
OS : Windows
OS-release : 11
Version : 10.0.22631
machine : AMD64
processor : AMD64 Family 25 Model 116 Stepping 1, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 3.0.0.dev0+2007.g3c93d06d64
numpy : 1.26.4
dateutil : 2.9.0.post0
pip : 25.0.1
Cython : 3.0.12
sphinx : 8.1.3
IPython : 9.0.2
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.13.3
blosc : None
bottleneck : 1.4.2
fastparquet : 2024.11.0
fsspec : 2025.3.0
html5lib : 1.1
hypothesis : 6.129.0
gcsfs : 2025.3.0
jinja2 : 3.1.6
lxml.etree : 5.3.1
matplotlib : 3.10.1
numba : 0.61.0
numexpr : 2.10.2
odfpy : None
openpyxl : 3.1.5
psycopg2 : 2.9.10
pymysql : 1.4.6
pyarrow : 19.0.1
pyreadstat : 1.2.8
pytest : 8.3.5
python-calamine : None
pytz : 2025.1
pyxlsb : 1.0.10
s3fs : 2025.3.0
scipy : 1.15.2
sqlalchemy : 2.0.39
tables : 3.10.2
tabulate : 0.9.0
xarray : 2024.9.0
xlrd : 2.0.1
xlsxwriter : 3.2.2
zstandard : 0.23.0
tzdata : 2025.1
qtpy : None
pyqt5 : None

The text was updated successfully, but these errors were encountered:

it176131 · 2025-03-12T23:46:05Z

Strangely, pivot almost produces the expected output (the columns are sorted with NaN first):

data = {
    "row": [None, *range(4)],
    "col": [*range(4), None],
    "val": range(5)
}
df = pd.DataFrame(data)
out = df.pivot(index="col", columns="row", values="val")
print(out)

col  NaN  0.0  1.0  2.0  3.0
row                         
NaN  NaN  0.0  NaN  NaN  NaN
0.0  NaN  NaN  1.0  NaN  NaN
1.0  NaN  NaN  NaN  2.0  NaN
2.0  NaN  NaN  NaN  NaN  3.0
3.0  4.0  NaN  NaN  NaN  NaN

snitish · 2025-03-13T04:18:48Z

It looks like dropna is used in multiple places inside __internal_pivot_table:

In the groupby() which means that NaN keys (either in index (row) or columns (col)) will be dropped.

grouped = data.groupby(keys, observed=observed, sort=sort, dropna=dropna)

For dropping all-NaN rows and columns

if dropna and ...
    agged = agged.dropna(how="all")

Due to the usage in 1 above, your result df doesn't contain cases where row or col is NaN. It looks like this behavior is not documented well.
Seems this is a known issue: #53521

it176131 · 2025-03-13T14:05:37Z

@snitish

Thanks for the reply! What options should I explore to fix and/or improve this? I will look at #53521 to see if suggestions have already been made there as well.

snitish · 2025-03-13T19:12:03Z

I'll let @rhshadrach chime in

rhshadrach · 2025-03-22T11:49:22Z

Indeed, this seems duplicative of #53521. For now I recommend using dropna=False followed by dropping all NA rows / columns in the output. Does that satisfy your use case?

it176131 · 2025-03-25T16:20:01Z

@rhshadrach I was actually looking to keep all the NA rows and columns in the output as in the expected output above. dropna=False does this, but wondering if I should update the docs in pivot_table so users know that dropna=True does more than what it says in the current docs. Maybe something like the following:

    dropna : bool, default True
         Do not include columns whose entries are all NaN. If True,
-        rows with a NaN value in any column will be omitted before
-        computing margins.
+        - rows with a NaN value in any column will be omitted before computing margins
+        - index/column keys containing NA values will be dropped (see ``dropna`` parameter in ``pandas.DataFrame.groupby``)

Shared groupby docs for dropna

rhshadrach · 2025-03-25T21:24:29Z

I'm +1 on adding to the docstring as suggested.

it176131 · 2025-03-26T00:06:10Z

I'll link a PR

it176131 · 2025-03-27T20:01:19Z

@rhshadrach PR has been linked

it176131 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 12, 2025

rhshadrach added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 22, 2025

rhshadrach added Docs and removed Bug Needs Info Clarification about behavior needed to assess issue labels Mar 25, 2025

it176131 linked a pull request Mar 26, 2025 that will close this issue

Doc pivot table #61184

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `pivot_table` drops rows and columns despite values being non-`NaN` #61113

BUG: `pivot_table` drops rows and columns despite values being non-`NaN` #61113

it176131 commented Mar 12, 2025 •

edited

Loading

INSTALLED VERSIONS

it176131 commented Mar 12, 2025 •

edited

Loading

snitish commented Mar 13, 2025 •

edited

Loading

it176131 commented Mar 13, 2025

snitish commented Mar 13, 2025

rhshadrach commented Mar 22, 2025

it176131 commented Mar 25, 2025

rhshadrach commented Mar 25, 2025

it176131 commented Mar 26, 2025

it176131 commented Mar 27, 2025

BUG: pivot_table drops rows and columns despite values being non-NaN #61113

BUG: pivot_table drops rows and columns despite values being non-NaN #61113

Comments

it176131 commented Mar 12, 2025 • edited Loading

Pandas version checks

Reproducible Example

Issue Description

Input data

Actual output

Expected Behavior

Expected output

Installed Versions

INSTALLED VERSIONS

it176131 commented Mar 12, 2025 • edited Loading

snitish commented Mar 13, 2025 • edited Loading

it176131 commented Mar 13, 2025

snitish commented Mar 13, 2025

rhshadrach commented Mar 22, 2025

it176131 commented Mar 25, 2025

rhshadrach commented Mar 25, 2025

it176131 commented Mar 26, 2025

it176131 commented Mar 27, 2025

BUG: `pivot_table` drops rows and columns despite values being non-`NaN` #61113

BUG: `pivot_table` drops rows and columns despite values being non-`NaN` #61113

it176131 commented Mar 12, 2025 •

edited

Loading

it176131 commented Mar 12, 2025 •

edited

Loading

snitish commented Mar 13, 2025 •

edited

Loading