Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the apply/map APIs #61128

Open
datapythonista opened this issue Mar 15, 2025 · 0 comments
Open

Improve the apply/map APIs #61128

datapythonista opened this issue Mar 15, 2025 · 0 comments
Labels
Apply Apply, Aggregate, Transform, Map

Comments

@datapythonista
Copy link
Member

The APIs for the apply and map methods seem to not be ideal. Those APIs were created in the very early days of pandas, and both pandas and Python are very different, and we have much more experience, and different environment such as type checking and others.

A good first example is the na_action parameter of map. I assume it was designed thinking that different actions could be applied when dealing with missing values in an elementwise operation. In practice, more than 15 years later, none has been implemented. And the resulting API is in my opinion far from ideal:

df.map(func, na_action=False)
df.map(func, na_action="ignore")

This also makes type checking unnecessarily complex. A better API would be using just a boolean skip_na or ignore_na:

df.map(func, skip_na=False)
df.map(func, skip_na=True)
df.map(func, skip_na=action == "ignore")

Another example is the inconsistency with args and kwargs. Some functions have both, some have just kwargs, we've been recently adding few missing... Also, when exists args is a regular parameter, while kwargs is a ** parameter, which is by itself inconsistent, and also confusing, with the number of parameters having slightly increased. For example:

df.apply(func, 0, result_type=None, result_format="reduction", engine=numba.njit, engine_params={"val": 0})

I don't think even advanced pandas users would be able to easily tell what parameters will be passed to the function. A much clearer API would be:

df.apply(func, args=("reduction",), kwargs={"engine_params": {"val": 0}}, axis=0, result_type=None, engine=numba.njit)

I think in this call it's immediate for users to know what are apply arguments, and when func arguments.

Another inconsistency is the arg / func parameter in Series.map and DataFrame.map. While the functions are conceptually the same, just applying the operator to either a Series or a DataFram, the signature and the behavior slightly changes, as Series will accept a dictionary, and DataFrame won't. Given that a dictionary can be converted to a function by just appending .get to it, I think it'd be better to make function consistently accept Python callables or numpy ufuncs.

Finally, the methods have their evolution, including the existance and deletion of applymap, but at this point is also probably a good idea to deprecate the legacy behavior of Series.apply behaving as Series.map depending on the value of by_row, which is the default. This is a bit tricky for backward compatibility reasons, but I think it eventually needs to be done, as it makes the API very counter-intuitive. map being always elementwise, and apply being always axis-wise, will make users life much easier, and the usage much easier to learn and explain.

We can also discuss about result_type and by_row in DataFrame.apply, which are very hard to understand.

@datapythonista datapythonista added the Apply Apply, Aggregate, Transform, Map label Mar 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map
Projects
None yet
Development

No branches or pull requests

1 participant