Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: fillna('') does not replace NaT #11953

Open
jreback opened this issue Jan 4, 2016 · 17 comments · May be fixed by #61149
Open

BUG: fillna('') does not replace NaT #11953

jreback opened this issue Jan 4, 2016 · 17 comments · May be fixed by #61149
Labels
Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@jreback
Copy link
Contributor

jreback commented Jan 4, 2016

pandas generally tries to coerce values to fit the column dtype, or upcasts the dtype to fit.

For a setting operation this is convenient & I think expected as a user

In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : ['foo'], 'D' : [1]})

In [36]: df
Out[36]:
    A    B    C  D
0 NaT  NaN  foo  1

In [37]: df.dtypes
Out[37]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [38]: df.loc[0,'D'] = 1.0

In [39]: df.dtypes
Out[39]:
A    datetime64[ns]
B            object
C            object
D           float64
dtype: object

However for a .fillna (or .replace) operation this might be a bit unexpected. So A was coerced to object dtype, even though it was datetime64[ns].

In [40]: df.fillna('')
Out[40]:
  A B    C  D
0      foo  1

In [41]: df.fillna('').dtypes
Out[41]:
A     object
B     object
C     object
D    float64
dtype: object

So a possibility is to add a keyword errors='raise'|'coerce'|'ignore'. This last behavior would be equiv of errors='coerce'. While skipping this column would be done with errors='coerce'. (and of course raise would raise.

Ideally would have a default of coerce I think (to skip for non-compat values). Any thoughts on this?

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate API Design labels Jan 4, 2016
@jreback
Copy link
Contributor Author

jreback commented Jan 4, 2016

cc @ywang007

@ResidentMario
Copy link
Contributor

xref. #15533

@jreback I think this keyword would be a 👍. This would be a way of harmonizing the for/against validating forcefully/weakly that are under discussion at PR#15587. Once that PR is added, this behavior could presumably be added as a single if errors == 'raise': validate_fill_value(obj, value) call.

I think it's worth considering adding similar behavior to methods implementing fill_value. I'm not sure I like that idea, it feels like a lot of API overhead, but, worth considering.

@mroeschke
Copy link
Member

This behavior no longer coerces to object. I supposed it could use a test orthoganal to the enhancement request

In [34]: In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : [
    ...: 'foo'], 'D' : [1]})

In [35]: In [38]: df.loc[0,'D'] = 1.0

In [36]: df.dtypes
Out[36]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [37]: In [40]: df.fillna('')
Out[37]:
    A B    C  D
0 NaT    foo  1

In [38]: In [41]: df.fillna('').dtypes
Out[38]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [39]: pd.__version__
Out[39]: '1.3.0.dev0+1383.g855696cde0'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Apr 21, 2021
@mroeschke
Copy link
Member

Actually I think this is a bug and the original behavior was correct. NaT is a "na value" that wasn't replaced by empty string

In [1]: df = DataFrame({'A': Series(dtype='M8[ns]'), 'B': Series([np.nan], dtype='object'), 'C': ['foo'], 'D': [1]})

In [2]: df.fillna('')
Out[2]:
    A B    C  D
0 NaT    foo  1

In [3]: df.fillna('').dtypes
Out[3]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [4]: df.fillna(2).dtypes
Out[4]:
A     int64
B     int64
C    object
D     int64
dtype: object

In [5]: df.fillna(2)
Out[5]:
   A  B    C  D
0  2  2  foo  1

@mroeschke mroeschke added Bug Datetime Datetime data dtype and removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels May 12, 2021
@mroeschke mroeschke changed the title API: auto-coercing BUG: fillna('') does not replace NaT May 12, 2021
@mroeschke mroeschke added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label May 12, 2021
@eirkkr
Copy link

eirkkr commented Sep 21, 2021

Hello, just to add to this thread. I have encountered this bug when upgrading pandas from 1.2.5 to 1.3.3 (it looks like this bug was introduced in version 1.3.0).

When using fillna or replace on a datetime series, converting to empty string "" will not work. However, when using another string e.g. "hello" it will work, and coerce the series to object type. Also interestingly, df.replace({pd.NaT: ""}) has different behaviour to df.replace(pd.NaT, "")

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [pd.NaT]})

In [3]: df.fillna("")
Out[3]:
    A
0 NaT

In [4]: df.fillna("hello")
Out[4]:
       A
0  hello

In [5]: df.replace(pd.NaT, "")
Out[5]:
    A
0 NaT

In [6]: df.replace(pd.NaT, "hello")
Out[6]:
       A
0  hello

In [7]: df.replace({pd.NaT, ""})
Out[7]:
    A
0 NaT

In [8]: df.replace({pd.NaT, "hello"})
Out[8]:
    A
0 NaT

@AvivAvital2
Copy link

Also reproduced on 1.3.4

@yeyeric
Copy link

yeyeric commented May 2, 2022

same here on latest 1.4.2, pd.fillna('') doesn't work with NaT (pd.isnull() gives True though)

pd.fillna('something') works...

Very surpising it has been here since 2016 ?

@evelynegroen
Copy link

same on version 1.4.3, df = pd.DataFrame({"A": [pd.NaT]}), df.fillna("") will do nothing, df.fillna(" ") will replace NaT with a blank space.

@Supertramplee
Copy link

same here, NaT still shows if fill na with empty string df.fillna('')

@mroeschke
Copy link
Member

The core issue here appears to be specifically because the Timestamp constructor interprets empty string as pd.NaT and therefore the datetime64 type is not upcast to object

In [8]: pd.Timestamp("")
Out[8]: NaT

In [9]: pd.Timestamp(" ")
ValueError: could not convert string to Timestamp

If the behavior of Out[8] was deprecated to not return NaT then this behavior would probably be fixed

@Masumi-M
Copy link

Masumi-M commented Dec 8, 2022

This might be the temporary measure 👍

# 1. convert datetime to string
df["target"] = df["target"].dt.strftime('%Y-%m-%d %H:%M:%S')

# 2. fillna
replace_datetime_in_str = "2023-01-01 00:00:00"
df["target"] = df["target"].fillna(replace_dt)

# 3. convert string to datetime
df["target"] = pd.to_datetime(df["target"])

@ciscorucinski
Copy link

I'm a novice, but it seems to still be present in 2.0.1

@msingh0101
Copy link

I'm a novice, but it seems to still be present in 2.0.1

still present

@baptiste-pasquier
Copy link

There is also a bug when replacing with the string "NAN" :

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [pd.NaT]})

In [3]: df.fillna("")
Out[3]: 
    A
0 NaT

In [4]: df.fillna("hello")
Out[4]: 
       A
0  hello

In [5]: df.fillna("NAN")
Out[5]: 
    A
0 NaT

In [6]: df.fillna("NAN_")
Out[6]: 
      A
0  NAN_

@Pesec1
Copy link

Pesec1 commented Dec 12, 2024

df.fillna(pd.NA) was not filling np.nan and pd.NaT, but if I would check there values for missing it would comeback as true.
this fixed it for me df.fillna(pd.NA, axis=1) .
I have pandas version 2.2.3 and python3 version 3.12.7.

@hewliyang
Copy link

df.fillna(pd.NA) was not filling np.nan and pd.NaT, but if I would check there values for missing it would comeback as true. this fixed it for me df.fillna(pd.NA, axis=1) . I have pandas version 2.2.3 and python3 version 3.12.7.

in contrast, filling on axis=0 still does not work (it should) due to the same reason (dtype conflicts)

@Annam679
Copy link

Annam679 commented Feb 4, 2025

df.fillna(pd.NA) was not filling np.nan and pd.NaT, but if I would check there values for missing it would comeback as true. this fixed it for me df.fillna(pd.NA, axis=1) . I have pandas version 2.2.3 and python3 version 3.12.7.

in contrast, filling on axis=0 still does not work (it should) due to the same reason (dtype conflicts)

So for some reason this worked:
df,fillna('')
df.fillna(pd.NaT, axis=1)
df.fillna('')

literally without the second fillna its not working but with all three its working

@j-hendricks j-hendricks linked a pull request Mar 20, 2025 that will close this issue
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging a pull request may close this issue.