Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: fillna enhancement with method='nearest' #61124

Open
1 of 3 tasks
mennowitteveen opened this issue Mar 15, 2025 · 1 comment
Open
1 of 3 tasks

ENH: fillna enhancement with method='nearest' #61124

mennowitteveen opened this issue Mar 15, 2025 · 1 comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Info Clarification about behavior needed to assess issue

Comments

@mennowitteveen
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

This should be a simple follow-up to #9471, enabling support for alignment with method='nearest'.

Since fillna internally uses interpolate, which already supports method='nearest', this might work right away, though it will require extensive testing.

Feature Description

The new feature could be implemented by extending the current alignment functionality in Pandas to support method='nearest'. This would allow the user to align two Series or DataFrames by their indices, using the nearest available value when exact matches are not found. Here's a basic idea of how it could be implemented in pseudocode:

def align_nearest(df1, df2):
    # Use a nearest neighbor search to align the indices
    df1_nearest = df1.reindex(df2.index, method='nearest')
    return df1_nearest

This functionality could be added as a method to the existing pandas.DataFrame and pandas.Series objects, integrating smoothly into the current API.

Alternative Solutions

An alternative solution would be to use the existing interpolate function with method='nearest', which can be applied to the DataFrame or Series before performing the alignment. Additionally, third-party libraries like fuzzywuzzy or scipy.spatial could be used for more complex nearest matching.

import pandas as pd
from fuzzywuzzy import process

# Example using fuzzywuzzy to find nearest match
df1 = pd.DataFrame([...])
df2 = pd.DataFrame([...])
df1['nearest'] = df1['index_column'].apply(lambda x: process.extractOne(x, df2['index_column'])[0])

However, native support within Pandas would likely be more efficient and user-friendly.

Additional Context

@mennowitteveen mennowitteveen added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 15, 2025
@rhshadrach
Copy link
Member

rhshadrach commented Mar 19, 2025

Thanks for the request! The method argument of fillna is deprecated, and will be removed in 3.0. Assuming

df1.reindex(df2.index, method='nearest')

gives you the desired operation, why do we need to add to the pandas API at all? This seems to me to be a straight foward way to accomplish it.

@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

2 participants