Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dump register variables correctly #2

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Dump register variables correctly #2

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Sep 17, 2021

No description provided.

@antoyo
Copy link

antoyo commented Sep 25, 2021

I missed this PR somehow.

Is this necessary for you now?

My workflow of submitting gcc patches is cumbersome, so I'm not sure how I'll handle external contribution to those patches yet.

@ghost
Copy link
Author

ghost commented Sep 26, 2021

Not really, just a little thing. It was more of "let's finally take a look at GCC internals, shall we?", and then I got distracted again and forgot about it.

My workflow of submitting gcc patches is cumbersome, so I'm not sure how I'll handle external contribution to those patches yet.

Oh, I was planning to send more while working (for example) on rust-lang/rustc_codegen_gcc#87 . Could that be a problem?

@antoyo
Copy link

antoyo commented Sep 26, 2021

If you send a PR that is completely independent of the other patches that should be relatively easy for me to handle it eventually.
I'll probably not send it for review until all the previous ones are merged if it adds a new API because I don't want to manage the conflicts in the libgccjit.map file.

Please read the guidelines in order to not forget anything that should be done for a gcc patch.
If you have any trouble for stuff like how to check the formatting is right, I can help you.

Also, if you are interested in sending your patches yourself to gcc, note that you don't need the copyright assignment anymore.

@ghost
Copy link
Author

ghost commented Sep 26, 2021

I sent this PR here and not to the mailing list because this patch is just a little extension to your patch. I'd be super happy to transfer these poor 5 LOC to public domain so you could just attribute them to yourself.

If you have any trouble for stuff like how to check the formatting is right, I can help you.

Yeah, please help. I see GCC uses some mix of tabs and spaces that resembles "indent with tabs (width 8), pad with spaces" scheme, but sometimes it's spaces only. I'm lost. Any place this scheme is written down to?

Also, if you are interested in sending your patches yourself to gcc, note that you don't need the copyright assignment anymore.

What's copyright assignment?

@antoyo
Copy link

antoyo commented Sep 26, 2021

Yeah, please help. I see GCC uses some mix of tabs and spaces that resembles "indent with tabs (width 8), pad with spaces" scheme, but sometimes it's spaces only. I'm lost. Any place this scheme is written down to?

There are two scripts that I use:

./contrib/gcc-changelog/git_check_commit.py

and

./contrib/check_GNU_style.sh 0001-patch-name.patch

What's copyright assignment?

GCC used to require that you give the copyright to the FSF and you had to sign a document.
Now, it's not required anymore, so it should be easier for you if you want to contribute directly.

Otherwise, I can always send your patches for you, but I'd appreciate if they pass the above tests.

antoyo pushed a commit that referenced this pull request Apr 13, 2022
DR 2352 changed the definitions of reference-related (so that it uses
"similar type" instead of "same type") and of reference-compatible (use
a standard conversion sequence).  That means that reference-related is
now more broad, which means that we will be binding more things directly.

The original patch for DR 2352 caused some problems, which were fixed in
r276251 by creating a "fake" ck_qual in direct_reference_binding, so
that in

  void f(int *); // #1
  void f(const int * const &); // #2
  int *x;
  int main()
  {
    f(x); // call #1
  }

we call #1.  The extra ck_qual in #2 causes compare_ics to select #1,
which is a better match for "int *" because then we don't have to do
a qualification conversion.

Let's turn to the problem in this PR.  We have

  void f(const int * const &); // #1
  void f(const int *); // #2
  int *x;
  int main()
  {
    f(x);
  }

We arrive in compare_ics to decide which one is better. The ICS for #1
looks like

    ck_ref_bind      <-    ck_qual         <-   ck_identity
  const int *const &     const int *const         int *

and the ICS for #2 is

    ck_qual     <-  ck_rvalue   <-  ck_identity
  const int *          int *           int *

We strip the reference and then comp_cv_qual_signature when comparing two
ck_quals sees that "const int *" is a proper subset of "const int *const"
and we return -1.  But that's wrong; presumably the top-level "const"
should be ignored and the call should be ambiguous.  This patch adjust
the type of the "fake" ck_qual so that this problem doesn't arise.

	PR c++/97296

gcc/cp/ChangeLog:

	* call.cc (direct_reference_binding): strip_top_quals when creating
	a ck_qual.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/ref-bind4.C: Add dg-error.
	* g++.dg/cpp0x/ref-bind8.C: New test.
@antoyo antoyo force-pushed the master branch 3 times, most recently from 0a94141 to 17858b5 Compare December 8, 2022 23:45
@antoyo antoyo force-pushed the master branch 9 times, most recently from 648c851 to 16686cb Compare August 24, 2023 20:31
GuillaumeGomez pushed a commit to GuillaumeGomez/gcc that referenced this pull request Jan 10, 2024
Improve stack protector patterns and peephole2s even more:

a. Use unrelated register clears with integer mode size <= word
   mode size to clear stack protector scratch register.

b. Use unrelated register initializations in front of stack
   protector sequence to clear stack protector scratch register.

c. Use unrelated register initializations using LEA instructions
   to clear stack protector scratch register.

These stack protector improvements reuse 6914 unrelated register
initializations to substitute the clear of stack protector scratch
register in 12034 instances of stack protector sequence in recent linux
defconfig build.

gcc/ChangeLog:

	* config/i386/i386.md (@stack_protect_set_1_<PTR:mode>_<W:mode>):
	Use W mode iterator instead of SWI48.  Output MOV instead of XOR
	for TARGET_USE_MOV0.
	(stack_protect_set_1 peephole2): Use integer modes with
	mode size <= word mode size for operand 3.
	(stack_protect_set_1 peephole2 rust-lang#2): New peephole2 pattern to
	substitute stack protector scratch register clear with unrelated
	register initialization, originally in front of stack
	protector sequence.
	(*stack_protect_set_3_<PTR:mode>_<SWI48:mode>): New insn pattern.
	(stack_protect_set_1 peephole2): New peephole2 pattern to
	substitute stack protector scratch register clear with unrelated
	register initialization involving LEA instruction.
GuillaumeGomez pushed a commit to GuillaumeGomez/gcc that referenced this pull request Jan 10, 2024
Use unrelated register initializations using zero/sign-extend instructions
to clear stack protector scratch register.

Hanlde only SI -> DImode extensions for 64-bit targets, as this is the
only extension that triggers the peephole in a non-negligible number.

Also use explicit check for word_mode instead of mode iterator in peephole2
patterns to avoid pattern explosion.

gcc/ChangeLog:

	* config/i386/i386.md (stack_protect_set_1 peephole2):
	Explicitly check operand 2 for word_mode.
	(stack_protect_set_1 peephole2 rust-lang#2): Ditto.
	(stack_protect_set_2 peephole2): Ditto.
	(stack_protect_set_3 peephole2): Ditto.
	(*stack_protect_set_4z_<mode>_di): New insn patter.
	(*stack_protect_set_4s_<mode>_di): Ditto.
	(stack_protect_set_4 peephole2): New peephole2 pattern to
	substitute stack protector scratch register clear with unrelated
	register initialization involving zero/sign-extend instruction.
@antoyo antoyo force-pushed the master branch 3 times, most recently from 8ecf518 to ed50105 Compare February 3, 2024 17:46
antoyo pushed a commit that referenced this pull request Feb 7, 2024
This patch adjusts the costs so that we treat REG and SUBREG expressions the
same for costing.

This was motivated by bt_skip_func and bt_find_func in xz and results in nearly
a 5% improvement in the dynamic instruction count for input #2 and smaller, but
definitely visible improvements pretty much across the board.  Exceptions would
be perlbench input #1 and exchange2 which showed very small regressions.

In the bt_find_func and bt_skip_func cases we have  something like this:

> (insn 10 7 11 2 (set (reg/v:DI 136 [ x ])
>         (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))) "zz.c":6:21 387 {*zero_extendsidi2_bitmanip}
>      (nil))
> (insn 11 10 12 2 (set (reg:DI 142 [ _1 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 139 [ b ]))) "zz.c":7:23 5 {adddi3}
>      (nil))

[ ... ]> (insn 13 12 14 2 (set (reg:DI 143 [ _2 ])
>         (plus:DI (reg/v:DI 136 [ x ])
>             (reg/v:DI 141 [ c ]))) "zz.c":8:23 5 {adddi3}
>      (nil))

Note the two uses of (reg 136). The best way to handle that in combine might be
a 3->2 split.  But there's a much better approach if we look at fwprop...

(set (reg:DI 142 [ _1 ])
    (plus:DI (zero_extend:DI (subreg/s/u:SI (reg/v:DI 137 [ a ]) 0))
        (reg/v:DI 139 [ b ])))
change not profitable (cost 4 -> cost 8)

So that should be the same cost as a regular DImode addition when the ZBA
extension is enabled.  But it ends up costing more because the clause to cost
this variant isn't prepared to handle a SUBREG.  That results in the RTL above
having too high a cost and fwprop gives up.

One approach would be to replace the REG_P  with REG_P || SUBREG_P in the
costing code.  I ultimately decided against that and instead check if the
operand in question passes register_operand.

By far the most important case to handle is the DImode PLUS.  But for the sake
of consistency, I changed the other instances in riscv_rtx_costs as well.  For
those other cases we're talking about improvements in the .000001% range.

While we are into stage4, this just hits cost modeling which we've generally
agreed is still appropriate (though we were mostly talking about vector).  So
I'm going to extend that general agreement ever so slightly and include scalar
cost modeling :-)

gcc/
	* config/riscv/riscv.cc (riscv_rtx_costs): Handle SUBREG and REG
	similarly.

gcc/testsuite/

	* gcc.target/riscv/reg_subreg_costs.c: New test.

	Co-authored-by: Jivan Hakobyan <[email protected]>
@antoyo antoyo force-pushed the master branch 4 times, most recently from 51fa28a to be000af Compare February 9, 2024 14:12
@antoyo antoyo force-pushed the master branch 4 times, most recently from cdd8978 to ad4ffde Compare February 16, 2024 21:34
@antoyo antoyo force-pushed the master branch 2 times, most recently from cf95541 to b6f163f Compare March 1, 2024 15:35
GuillaumeGomez pushed a commit to GuillaumeGomez/gcc that referenced this pull request Apr 8, 2024
We evaluate constexpr functions on the original, pre-genericization bodies.
That means that the function body we're evaluating will not have gone
through cp_genericize_r's "Map block scope extern declarations to visible
declarations with the same name and type in outer scopes if any".  Here:

  constexpr bool bar() { return true; } // #1
  constexpr bool foo() {
    constexpr bool bar(void); // rust-lang#2
    return bar();
  }

it means that we:
1) register_constexpr_fundef (#1)
2) cp_genericize (#1)
   nothing interesting happens
3) register_constexpr_fundef (foo)
   does copy_fn, so we have two copies of the BIND_EXPR
4) cp_genericize (foo)
   this remaps rust-lang#2 to #1, but only on one copy of the BIND_EXPR
5) retrieve_constexpr_fundef (foo)
   we find it, no problem
6) retrieve_constexpr_fundef (rust-lang#2)
   and here rust-lang#2 isn't found in constexpr_fundef_table, because
   we're working on the BIND_EXPR copy where rust-lang#2 wasn't mapped to #1
   so we fail.  We've only registered #1.

It should work to use DECL_LOCAL_DECL_ALIAS (which used to be
extern_decl_map).  We evaluate constexpr functions on pre-cp_fold
bodies to avoid diagnostic problems, but the remapping I'm proposing
should not interfere with diagnostics.

This is not a problem for a global scope redeclaration; there we go
through duplicate_decls which keeps the DECL_UID:
  DECL_UID (olddecl) = olddecl_uid;
and DECL_UID is what constexpr_fundef_hasher::hash uses.

	PR c++/111132

gcc/cp/ChangeLog:

	* constexpr.cc (get_function_named_in_call): Use
	cp_get_fndecl_from_callee.
	* cvt.cc (cp_get_fndecl_from_callee): If there's a
	DECL_LOCAL_DECL_ALIAS, use it.

gcc/testsuite/ChangeLog:

	* g++.dg/cpp0x/constexpr-redeclaration3.C: New test.
	* g++.dg/cpp0x/constexpr-redeclaration4.C: New test.
@antoyo antoyo force-pushed the master branch 2 times, most recently from e744a94 to bb9fe7d Compare September 28, 2024 13:44
@antoyo antoyo force-pushed the master branch 2 times, most recently from 9cec8ab to c4ee893 Compare October 3, 2024 21:55
@antoyo antoyo force-pushed the master branch 6 times, most recently from 1e817bd to b4002fd Compare October 17, 2024 14:21
@antoyo antoyo force-pushed the master branch 4 times, most recently from 2e49eb4 to 85e56c5 Compare November 14, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant