-
Notifications
You must be signed in to change notification settings - Fork 15
ARC-V RHX-100 upstream patch series #192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
The Synopsys RMX-100 has a short, three-stage, in-order execution pipeline. The option -mmpy-option was added to control which version of the MPY unit the core has and what the latency of multiply instructions should be similar to ARCv2 cores (see gcc/config/arc/arc.opt:60). Authored-by: Artemiy Volkov <[email protected]> Signed-off-by: Michiel Derhaeg <[email protected]>
The Synopsys RHX-100 has a 10-stage, dual-issue, in-order execution pipeline. It has support for instruction fusion, which will be addressed by subsequent patches.
MichielDerhaeg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to split up the commits in something sensible. Didn't check whether they can be built individually though.
gcc/config/riscv/riscv.cc
Outdated
| return store_data_bypass_p (out_insn, in_insn); | ||
| } | ||
|
|
||
| /* Implement one boolean function for each of the values of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the declarations in the header be moved to arcv.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I originally placed the declarations in arcv.h, but looking at the existing code structure, it may be a better fit to keep them in riscv-protos.h
EDIT: See d479345
gcc/config/riscv/riscv-protos.h
Outdated
|
|
||
| struct riscv_tune_param; | ||
|
|
||
| extern int riscv_get_tune_param_issue_rate (void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This causes a circular dependency between riscv.cc and arcv.cc. Can we instead just pass in the issue_rate itself as a parameter where this is called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. Yes.
EDIT: See 73219a4
| case SIGN_EXTRACT: | ||
| if (TARGET_XTHEADBB && outer_code == SET | ||
| if ((TARGET_ARCV_RHX100 || TARGET_XTHEADBB) | ||
| && outer_code == SET |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this was added for the bit-extract fusion.
| (define_insn "*zero_extract_fused" | ||
| [(set (match_operand:SI 0 "register_operand" "=r") | ||
| (zero_extract:SI (match_operand:SI 1 "register_operand" "r") | ||
| (match_operand 2 "const_int_operand") | ||
| (match_operand 3 "const_int_operand")))] | ||
| "TARGET_ARCV_RHX100 && !TARGET_64BIT | ||
| && (INTVAL (operands[2]) > 1 || !TARGET_ZBS)" | ||
| { | ||
| int amount = INTVAL (operands[2]); | ||
| int end = INTVAL (operands[3]) + amount; | ||
| operands[2] = GEN_INT (BITS_PER_WORD - end); | ||
| operands[3] = GEN_INT (BITS_PER_WORD - amount); | ||
| return "slli\t%0,%1,%2\n\tsrli\t%0,%0,%3"; | ||
| } | ||
| [(set_attr "type" "alu_fused")] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, this fusion was never implemented as a define_insn_and_split. Might not be trivial to force these exact instructions after a split.
| (define_insn "madd_split_fused" | ||
| [(set (match_operand:SI 0 "register_operand" "=&r,r") | ||
| (plus:SI | ||
| (mult:SI (match_operand:SI 1 "register_operand" "r,r") | ||
| (match_operand:SI 2 "register_operand" "r,r")) | ||
| (match_operand:SI 3 "register_operand" "r,?0"))) | ||
| (clobber (match_scratch:SI 4 "=&r,&r"))] | ||
| "TARGET_ARCV_RHX100 | ||
| && !TARGET_64BIT && (TARGET_ZMMUL || TARGET_MUL)" | ||
| { | ||
| if (REGNO (operands[0]) == REGNO (operands[3])) | ||
| { | ||
| return "mul\t%4,%1,%2\n\tadd\t%4,%3,%4\n\tmv\t%0,%4"; | ||
| } | ||
| else | ||
| { | ||
| return "mul\t%0,%1,%2\n\tadd\t%0,%0,%3"; | ||
| } | ||
| } | ||
| [(set_attr "type" "imul_fused")] | ||
| ) | ||
|
|
||
| (define_insn "madd_split_fused_extended" | ||
| [(set (match_operand:DI 0 "register_operand" "=&r,r") | ||
| (sign_extend:DI | ||
| (plus:SI | ||
| (mult:SI (match_operand:SI 1 "register_operand" "r,r") | ||
| (match_operand:SI 2 "register_operand" "r,r")) | ||
| (match_operand:SI 3 "register_operand" "r,?0")))) | ||
| (clobber (match_scratch:SI 4 "=&r,&r"))] | ||
| "TARGET_ARCV_RHX100 | ||
| && (TARGET_ZMMUL || TARGET_MUL)" | ||
| { | ||
| if (REGNO (operands[0]) == REGNO (operands[3])) | ||
| { | ||
| return "mulw\t%4,%1,%2\n\taddw\t%4,%3,%4\n\tmv\t%0,%4"; | ||
| } | ||
| else | ||
| { | ||
| return "mulw\t%0,%1,%2\n\taddw\t%0,%0,%3"; | ||
| } | ||
| } | ||
| [(set_attr "type" "imul_fused")] | ||
| ) | ||
|
|
||
| (define_insn "*zero_extract_fused" | ||
| [(set (match_operand:SI 0 "register_operand" "=r") | ||
| (zero_extract:SI (match_operand:SI 1 "register_operand" "r") | ||
| (match_operand 2 "const_int_operand") | ||
| (match_operand 3 "const_int_operand")))] | ||
| "TARGET_ARCV_RHX100 && !TARGET_64BIT | ||
| && (INTVAL (operands[2]) > 1 || !TARGET_ZBS)" | ||
| { | ||
| int amount = INTVAL (operands[2]); | ||
| int end = INTVAL (operands[3]) + amount; | ||
| operands[2] = GEN_INT (BITS_PER_WORD - end); | ||
| operands[3] = GEN_INT (BITS_PER_WORD - amount); | ||
| return "slli\t%0,%1,%2\n\tsrli\t%0,%0,%3"; | ||
| } | ||
| [(set_attr "type" "alu_fused")] | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these patterns used to be a defined_insn_and_split. In case we can't get rid of them, let's try to go back to define_insn_and_split (check the downstream commit history) and see what the performance looks like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we revert madd_split_fused1 back to a define_insn_and_split we get 0.898% improvement.
Signed-off-by: Luis Silva <[email protected]>
Signed-off-by: Luis Silva <[email protected]>
Before: 10,543,513 cycles After: 10,543,496 cycles Difference: 17 cycles Improvement: 0.000% Signed-off-by: Luis Silva <[email protected]>
No description provided.