Skip to content

Conversation

@zty-king
Copy link
Contributor

@zty-king zty-king commented Nov 4, 2025

PR Category

User Experience

PR Types

Others

Description

与pytorch参数习惯对齐,当前paddle实现的这些loss无参数size_average、reduce;当用户习惯性地以pytorch参数传参时,需要报错,并告知当前size_average与reduce值的组合,对应的reduction应该是什么。即size_average+reduce->reduction有一一对应关系,对应关系如下:
image

@codecov-commenter
Copy link

codecov-commenter commented Nov 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@5387ef2). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             develop    #76221   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         2           
  Lines              ?        78           
  Branches           ?         0           
===========================================
  Hits               ?        78           
  Misses             ?         0           
  Partials           ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@paddle-bot paddle-bot bot added the contributor External developers label Nov 4, 2025
@paddle-bot
Copy link

paddle-bot bot commented Nov 4, 2025

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@zty-king
Copy link
Contributor Author

zty-king commented Nov 5, 2025

/re-run all-failed

1 similar comment
@zty-king
Copy link
Contributor Author

zty-king commented Nov 5, 2025

/re-run all-failed

@zhwesky2010 zhwesky2010 changed the title fix_loss_args_legacy [API Compatibility] optimize size_average & reduce error message Nov 5, 2025
Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也顺便看看这些size_average/reduce的差异文档写得是否正确吧,若有错误之处辛苦一起更正

https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.nn.CrossEntropyLoss.html

)


@check_deprecated_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

装饰器都放到这个目录下吧

python/paddle/utils/decorator_utils.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@functools.wraps(original_init)
def new_init(self, *args, **kwargs):
reduce_raw = kwargs.pop('reduce', _SENTINEL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_average可能是通过位置参数传入进来的。详见这个单测下面的各种case

https://github.com/PaddlePaddle/PaConvert/blob/master/tests/test_nn_CrossEntropyLoss.py#L163

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,添加了size_average, reduce在pytorch中对应loss函数的位置映射,同时对特殊情况(正常使用paddle的loss会误触位置参数报错的情况)做了处理

)


@check_deprecated_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

装饰器放到__init__函数上吧,这个主要是针对__init__的修改。

同时这些class的__init__函数后面可能还要加别名装饰器,目前要求每个func至多一个装饰器,所以都统一合并为__init__的一个装饰器。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zty-king
Copy link
Contributor Author

zty-king commented Nov 5, 2025

也顺便看看这些size_average/reduce的差异文档写得是否正确吧,若有错误之处辛苦一起更正

https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.nn.CrossEntropyLoss.html

没有错的,但是格式有错误或者不统一,已经统一更改,pr链接如下:PaddlePaddle/docs#7611

"""

@functools.wraps(init_func)
def wrapper(self, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的代码写法能否优化,先按 args个数来匹配key,统一标准化为kwargs然后统一处理。

最大化减少非必要的判断和装饰器的开销。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@zty-king
Copy link
Contributor Author

zty-king commented Nov 6, 2025

/re-run all-failed

reduce_from_pos = v

# Step 2: Extract legacy params from kwargs (kwargs take priority)
reduce_raw = kwargs.pop('reduce', _SENTINEL)
Copy link
Contributor

@zhwesky2010 zhwesky2010 Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

能否继续简化写法,减少API的调度开销

不用判断pos、idx是否合法,配置装饰器时就必须保证都合法。

只判断用户传入的行为,其他的判断都不加:

size_average_val = None
if 'size_average' in kwargs:
    size_average_val = kwargs.pop('size_average')
elif len(args) > size_average_idx:
    size_average_val = args[size_average_idx]
   
reduce_val = None
if 'reduce' in kwargs:
    reduce_val = kwargs.pop('reduce')
elif len(args) > reduce_idx:
    reduce_val = args[reduce_idx]

suggested = compute_legacy_reduction(reduce_val, size_avg_val)

Copy link
Contributor Author

@zty-king zty-king Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

但是用户如果传位置参数,不加判断的话,没办法确定他此时是想以torch形式的接口传参,还是以paddle形式的接口传参;两者参数形式不一致,并且不同的loss, size_average和reduce的位置也不一样,想问问您这里说只判断用户传入行为指的是什么

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_average_val = None
if 'size_average' in kwargs:            # 对应关键字用法
    size_average_val = kwargs.pop('size_average')
elif len(args) > size_average_idx and type(args[size_average_idx]) is bool:    # 对应位置参数用法,如果有其他条件则往里面加
    size_average_val = args[size_average_idx]
   
reduce_val = None
if 'reduce' in kwargs:                      # 对应关键字用法
    reduce_val = kwargs.pop('reduce')   
elif len(args) > reduce_idx and type(args[reduce_idx]) is bool:                     # 对应位置参数用法,如果有其他条件则往里面加
    reduce_val = args[reduce_idx]

suggested = compute_legacy_reduction(reduce_val, size_avg_val)

这个逻辑可以了吧,如果这个通用逻辑不满足,就单个装饰器来写,一个API一个装饰器。

装饰器本身给API带来了额外开销,这个需要极简来写,减少行数、减少判断

装饰器自身要配置正确,不用去判断装饰器配置错误后的各种情况。比如这里的if pos、if idx都不需要判断,配置时就需要将这些都配对。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,不过如果直接用size_average_val =None,当用户仅传入reduce=None或size_avergae=None时,无法确认用户是否使用了这些torch里面才有的参数,就不会拦截报错(这个装饰器本意是任何size_average_val 和reduce组合都给出一个reducetion的结果),这个需要跟您讲清楚

if type(v) is bool:
# Special guard: CrossEntropyLoss idx=3 is soft_label (bool) in Paddle;
# treat it as legacy 'reduce' ONLY if arg at idx=2 is NOT a valid reduction string
if cls_name == 'CrossEntropyLoss' and args[2] in {
Copy link
Contributor

@zhwesky2010 zhwesky2010 Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

如果有特殊逻辑,建议可重新复用这份代码写一个 轻量特殊装饰器,都融合到一个大装饰器里判断 会增大开销。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,您看一下现在是否可行

@zty-king
Copy link
Contributor Author

zty-king commented Nov 8, 2025

/re-run all-failed

@zty-king
Copy link
Contributor Author

zty-king commented Nov 8, 2025

/re-run all-failed


@functools.wraps(init_func)
def wrapper(self, *args, **kwargs):
reduce_val = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

针对用户使用这个情况,

size_average=None,
reduce=None

这种情况极少,当然如果想要把这种情况也纳入到拦截范围里:

reduce_val = ''
size_avg_val = ''

idx = pos.get('size_average')
if 'size_average' in kwargs:
    size_avg_val = kwargs.pop('size_average')
elif len(args) > idx:
    size_avg_val = args[idx]

...

if reduce_val != '' or size_avg_val != '':
    suggested = compute_legacy_reduction(reduce_val, size_avg_val)
    raise ValueError(

这样是不是就能全拦截了,然后上面也不需要去判断bool了,又节省了一次判断。因为torch的size_average只有可能是bool和None,无需去判断torch不存在的情况,在torch代码本身为正确的前提下去做兼容性。

Copy link
Contributor Author

@zty-king zty-king Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里判断bool,是因为,有可能用户是以paddle的api格式传参,此时就可以通过位置参数是否是bool,来判断用户到底是在以pytorch格式传(是bool类型),还是以paddle格式传(非bool类型),这个方法可以统一将这些loss的情况都给覆盖到。因为size_average和reduce_val是bool类型。
如果去掉,就会出现,用户正常使用paddle格式传参,没有用size_average和reduce_val的想法,但是被误判了
只有CrossEntropyLoss、KLDivLoss这两个会出现paddle对应位置参数和torch对应位置参数(size_average或reduce_val)都是bool的情况,因此做了特殊处理

Copy link
Contributor

@zhwesky2010 zhwesky2010 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里判断bool,是因为,有可能用户是以paddle的api格式传参,此时就可以通过位置参数是否是bool,来判断用户到底是在以pytorch格式传(是bool类型),还是以paddle格式传(非bool类型),这个方法可以统一将这些loss的情况都给覆盖到。因为size_average和reduce_val是bool类型。 如果去掉,就会出现,用户正常使用paddle格式传参,没有用size_average和reduce_val的想法,但是被误判了 只有CrossEntropyLoss、KLDivLoss这两个会出现paddle对应位置参数和torch对应位置参数(size_average或reduce_val)都是bool的情况,因此做了特殊处理

OK,这个地方的判断没问题

}


def check_deprecated_params_on_init(init_func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个命名太大众化了,可以优化下:

check_deprecated_size_average_reduce

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (
cls_name == 'CrossEntropyLoss'
and len(args) > 2
and args[2] in {'mean', 'sum', 'none'}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两个是有什么特殊吗,直接照着上面的代码来修改配置是否就可以。不用retry

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,因为想着代码不冗余,所以这里做了代码复用,也可以直接按上面的代码修改,只是会有很多相似的冗余代码

Copy link
Contributor

@zhwesky2010 zhwesky2010 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,因为想着代码不冗余,所以这里做了代码复用,也可以直接按上面的代码修改,只是会有很多相似的冗余代码

或者尝试将公共逻辑抽取出来,写成一个单独的函数。来提升代码复用。

但try...except会导致运行预期之外的代码,会增大开销,不建议。

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return wrapper


def check_deprecated_params_with_special_cases(init_func):
Copy link
Contributor

@zhwesky2010 zhwesky2010 Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

命名也优化下,因为这个命名太大众化了

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


set_tests_properties(test_profiler PROPERTIES TIMEOUT 120)
set_tests_properties(test_cross_entropy_loss PROPERTIES TIMEOUT 180)
set_tests_properties(test_legacy_loss_args PROPERTIES TIMEOUT 30)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个单测是否可以精简下case,节省时间

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个单测挺快的,不到1s,这里我是象征性设置了个比较短的时间,我修改一下

@zhwesky2010
Copy link
Contributor

zhwesky2010 commented Nov 10, 2025

@zty-king 按上面的结论重新push吧



def compute_legacy_reduction(reduce_val, size_average_val):
reduce_val = reduce_val if reduce_val != '' else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是否不需要判断了,即使是 '' 也不影响后续逻辑


def compute_legacy_reduction(reduce_val, size_average_val):
reduce_val = reduce_val if reduce_val != '' else None
size_average_val = size_average_val if size_average_val != '' else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这两行判断是否可移除

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

好的,可以移除

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里确实可以移除,不过报错的地方可能还是需要加这个判断: if reduce_val=='': reduce_val= None if size_avg_val=='': size_avg_val= None

比如:reduce_val='',size_avg_val=True,即用户只传了一个size_avg_val;报错的时候,要把reduce_val转化为None,即告知:reduce_val=None,size_avg_val=True 对应的 reduction是什么

加到报错信息里去吧,没传入就不用报。

)


def legacy_reduction_guard(init_func):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个不是上下文guard,命名为:legacy_reduction_decoratorlegacy_reduction_special_decorator

@zty-king
Copy link
Contributor Author

zty-king commented Nov 11, 2025 via email

zhwesky2010
zhwesky2010 previously approved these changes Nov 11, 2025
Copy link
Contributor

@zhwesky2010 zhwesky2010 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhwesky2010 zhwesky2010 requested a review from SigureMo November 11, 2025 09:14
)


def legacy_reduction_decorator(init_func):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

参考其他 decorator 补充类型提示,这会导致被装饰的 API 类型提示完全丢掉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"."
)[
0
] # avoid subclass calling parent class init, causing cls_name to be inaccurate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

注释换个位置,这会导致格式化后很难看

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@functools.wraps(init_func)
def wrapper(self, *args, **kwargs):
cls_name = self.__class__.__name__
def wrapper(self, *args: _InputT.args, **kwargs: _InputT.kwargs) -> _RetT:
Copy link
Member

@SigureMo SigureMo Nov 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这对么?_InputT.args 里包含 self,你要是一定要单独处理 self 就不要给这个函数写类型提示,只给外面的函数加就行

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

抱歉,没注意这个问题,已修正,感谢指正

@zty-king
Copy link
Contributor Author

/re-run all-failed

@zhwesky2010
Copy link
Contributor

zhwesky2010 commented Nov 12, 2025

优化映射文档

https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.nn.BCEWithLogitsLoss.html#zhuanxieshili

映射文档这一块我觉得写得不是很正确,这里的 size_average&reduce 的优先级要高于reducetion,size_average&reduce每个有3种取值可能,穷举一共有9种组合需要转写:

size_average= None &reduce = None
size_average= None &reduce = True
size_average= None &reduce = False
size_average= True &reduce = None
size_average= True &reduce = True
size_average= True &reduce = False
size_average= False &reduce = True
size_average= False &reduce = True
size_average= False &reduce = False

对于第一种size_average= None &reduce = None则直接写明reduction会生效,两者完全一致。

其余8种情况,有的可以合并,有的需要分开写,这里梳理清楚,重写之前的转写示例吧,注意需要完整且正确(不需要按之前的格式),并与以下逻辑一一核对:

infoflow 2025-11-12 17-40-55

@zty-king
Copy link
Contributor Author

zty-king commented Nov 12, 2025 via email

Copy link
Contributor

@jeff41404 jeff41404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zty-king
Copy link
Contributor Author

zty-king commented Nov 12, 2025

优化映射文档

https://www.paddlepaddle.org.cn/documentation/docs/zh/develop/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.nn.BCEWithLogitsLoss.html#zhuanxieshili

映射文档这一块我觉得写得不是很正确,这里的 size_average&reduce 的优先级要高于reducetion,size_average&reduce每个有3种取值可能,穷举一共有9种组合需要转写:

size_average= None &reduce = None
size_average= None &reduce = True
size_average= None &reduce = False
size_average= True &reduce = None
size_average= True &reduce = True
size_average= True &reduce = False
size_average= False &reduce = True
size_average= False &reduce = True
size_average= False &reduce = False

对于第一种size_average= None &reduce = None则直接写明reduction会生效,两者完全一致。

其余8种情况,有的可以合并,有的需要分开写,这里梳理清楚,重写之前的转写示例吧,注意需要完整且正确(不需要按之前的格式),并与以下逻辑一一核对:

infoflow 2025-11-12 17-40-55
  • 这个格式您看可以吗?如果想进一步合并的话,就以reducetion的三种情况来合并,您看哪种好一些
image

@zty-king
Copy link
Contributor Author

zty-king commented Nov 13, 2025 via email

@zhwesky2010
Copy link
Contributor

zhwesky2010 commented Nov 13, 2025

好的

---- Replied Message ---- | From | @.> | | Date | 11/13/2025 10:41 | | To | PaddlePaddle/Paddle @.> | | Cc | Tianyu @.>, Mention @.> | | Subject | Re: [PaddlePaddle/Paddle] [API Compatibility] optimize size_average & reduce error message (PR #76221) | zhwesky2010 left a comment (PaddlePaddle/Paddle#76221) @zty-king 就按上面的那个格式来写吧 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

按reduction的三种情况来写吧,这样清晰点。

例如torch写reduction='sum'的所有情况,paddle统一转成reduction='sum'。

@zhwesky2010 zhwesky2010 merged commit 4424a9f into PaddlePaddle:develop Nov 13, 2025
91 of 96 checks passed
@zty-king
Copy link
Contributor Author

zty-king commented Nov 13, 2025 via email

LittleHeroZZZX pushed a commit to LittleHeroZZZX/Paddle that referenced this pull request Nov 19, 2025
…dlePaddle#76221)

* fix_loss_args_legacy

* adapt positional vars

* optimize code

* optimze code

* optimize code/root/miniconda3/envs/qwen_DCP_test/bin/python /home/paddle/test/legacy_test/test_legacy_loss_args.py

* fix the bug

* fix code

* add type hint

* fix code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants