Improve the apply/map APIs

The APIs for the `apply` and `map` methods seem to not be ideal. Those APIs were created in the very early days of pandas, and both pandas and Python are very different, and we have much more experience, and different environment such as type checking and others.

A good first example is the `na_action` parameter of `map`. I assume it was designed thinking that different actions could be applied when dealing with missing values in an elementwise operation. In practice, more than 15 years later, none has been implemented. And the resulting API is in my opinion far from ideal:

```python
df.map(func, na_action=False)
df.map(func, na_action="ignore")
```

This also makes type checking unnecessarily complex. A better API would be using just a boolean `skip_na` or `ignore_na`:

```python
df.map(func, skip_na=False)
df.map(func, skip_na=True)
df.map(func, skip_na=action == "ignore")
```

Another example is the inconsistency with `args` and `kwargs`. Some functions have both, some have just kwargs, we've been recently adding few missing... Also, when exists `args` is a regular parameter, while `kwargs` is a `**` parameter, which is by itself inconsistent, and also confusing, with the number of parameters having slightly increased. For example:

```python
df.apply(func, 0, result_type=None, result_format="reduction", engine=numba.njit, engine_params={"val": 0})
```

I don't think even advanced pandas users would be able to easily tell what parameters will be passed to the function. A much clearer API would be:

```python
df.apply(func, args=("reduction",), kwargs={"engine_params": {"val": 0}}, axis=0, result_type=None, engine=numba.njit)
```

I think in this call it's immediate for users to know what are `apply` arguments, and when `func` arguments.

Another inconsistency is the `arg` / `func` parameter in `Series.map` and `DataFrame.map`. While the functions are conceptually the same, just applying the operator to either a `Series` or a `DataFram`, the signature and the behavior slightly changes, as `Series` will accept a dictionary, and `DataFrame` won't. Given that a dictionary can be converted to a function by just appending `.get` to it, I think it'd be better to make function consistently accept Python callables or numpy ufuncs.

Finally, the methods have their evolution, including the existance and deletion of `applymap`, but at this point is also probably a good idea to deprecate the legacy behavior of `Series.apply` behaving as `Series.map` depending on the value of `by_row`, which is the default. This is a bit tricky for backward compatibility reasons, but I think it eventually needs to be done, as it makes the API very counter-intuitive. `map` being always elementwise, and `apply` being always axis-wise, will make users life much easier, and the usage much easier to learn and explain.

We can also discuss about `result_type` and `by_row` in `DataFrame.apply`, which are very hard to understand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve the apply/map APIs #61128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve the apply/map APIs #61128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions