You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The APIs for the apply and map methods seem to not be ideal. Those APIs were created in the very early days of pandas, and both pandas and Python are very different, and we have much more experience, and different environment such as type checking and others.
A good first example is the na_action parameter of map. I assume it was designed thinking that different actions could be applied when dealing with missing values in an elementwise operation. In practice, more than 15 years later, none has been implemented. And the resulting API is in my opinion far from ideal:
Another example is the inconsistency with args and kwargs. Some functions have both, some have just kwargs, we've been recently adding few missing... Also, when exists args is a regular parameter, while kwargs is a ** parameter, which is by itself inconsistent, and also confusing, with the number of parameters having slightly increased. For example:
I think in this call it's immediate for users to know what are apply arguments, and when func arguments.
Another inconsistency is the arg / func parameter in Series.map and DataFrame.map. While the functions are conceptually the same, just applying the operator to either a Series or a DataFram, the signature and the behavior slightly changes, as Series will accept a dictionary, and DataFrame won't. Given that a dictionary can be converted to a function by just appending .get to it, I think it'd be better to make function consistently accept Python callables or numpy ufuncs.
Finally, the methods have their evolution, including the existance and deletion of applymap, but at this point is also probably a good idea to deprecate the legacy behavior of Series.apply behaving as Series.map depending on the value of by_row, which is the default. This is a bit tricky for backward compatibility reasons, but I think it eventually needs to be done, as it makes the API very counter-intuitive. map being always elementwise, and apply being always axis-wise, will make users life much easier, and the usage much easier to learn and explain.
We can also discuss about result_type and by_row in DataFrame.apply, which are very hard to understand.
The text was updated successfully, but these errors were encountered:
The APIs for the
apply
andmap
methods seem to not be ideal. Those APIs were created in the very early days of pandas, and both pandas and Python are very different, and we have much more experience, and different environment such as type checking and others.A good first example is the
na_action
parameter ofmap
. I assume it was designed thinking that different actions could be applied when dealing with missing values in an elementwise operation. In practice, more than 15 years later, none has been implemented. And the resulting API is in my opinion far from ideal:This also makes type checking unnecessarily complex. A better API would be using just a boolean
skip_na
orignore_na
:Another example is the inconsistency with
args
andkwargs
. Some functions have both, some have just kwargs, we've been recently adding few missing... Also, when existsargs
is a regular parameter, whilekwargs
is a**
parameter, which is by itself inconsistent, and also confusing, with the number of parameters having slightly increased. For example:I don't think even advanced pandas users would be able to easily tell what parameters will be passed to the function. A much clearer API would be:
I think in this call it's immediate for users to know what are
apply
arguments, and whenfunc
arguments.Another inconsistency is the
arg
/func
parameter inSeries.map
andDataFrame.map
. While the functions are conceptually the same, just applying the operator to either aSeries
or aDataFram
, the signature and the behavior slightly changes, asSeries
will accept a dictionary, andDataFrame
won't. Given that a dictionary can be converted to a function by just appending.get
to it, I think it'd be better to make function consistently accept Python callables or numpy ufuncs.Finally, the methods have their evolution, including the existance and deletion of
applymap
, but at this point is also probably a good idea to deprecate the legacy behavior ofSeries.apply
behaving asSeries.map
depending on the value ofby_row
, which is the default. This is a bit tricky for backward compatibility reasons, but I think it eventually needs to be done, as it makes the API very counter-intuitive.map
being always elementwise, andapply
being always axis-wise, will make users life much easier, and the usage much easier to learn and explain.We can also discuss about
result_type
andby_row
inDataFrame.apply
, which are very hard to understand.The text was updated successfully, but these errors were encountered: