Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RISC-V] Bit counting intrinsics #114337

Merged
merged 9 commits into from
Apr 8, 2025
Merged

Conversation

tomeksowi
Copy link
Contributor

@tomeksowi tomeksowi commented Apr 7, 2025

(Leading|Trailing)ZeroCount, PopCount.

Part of #84834, cc @dotnet/samsung

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 7, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 7, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@tomeksowi
Copy link
Contributor Author

No regressions.

Diffs are based on 170,565 contexts (22,709 MinOpts, 147,856 FullOpts).

MISSED contexts: 1,020 (0.59%)

Overall (-14,440 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 121,944,848 -14,440 -0.21%
MinOpts (-180 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 47,488,724 -180 -0.00%
FullOpts (-14,260 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
linux.riscv64.Checked.mch 74,456,124 -14,260 -0.28%
Example diffs
linux.riscv64.Checked.mch
-36 (-50.00%) : 12110.dasm - System.Numerics.Tensors.TensorPrimitives+LeadingZeroCountOperator`1[int]:Invoke(int):int (FullOpts)
@@ -5,12 +5,10 @@
 ; fp based frame
 ; partially interruptible
 ; No matching PGO data
-; 0 inlinees with PGO data; 1 single block inlinees; 1 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 arg0         [V00,T00] (  4,  3.50)     int  ->   a0         single-def
+;  V00 arg0         [V00,T00] (  3,  3   )     int  ->   a0         single-def
 ;# V01 OutArgs      [V01    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V02 tmp1         [V02,T01] (  3,  2   )     int  ->   a0         "Inline return value spill temp"
 ;
 ; Lcl frame size = 0
 Frame info. #outsz=0; #framesz=16; lcl=0
@@ -22,28 +20,15 @@ G_M40987_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mv             fp, sp
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M40987_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            sext.w         t6, a0
-            beqz           t6, G_M40987_IG04
-						;; size=8 bbWeight=1 PerfScore 4.00
-G_M40987_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 7
-            ld             a1, 0xD1FFAB1E(a1)
-            jalr           a1		// System.Numerics.BitOperations:Log2SoftwareFallback(uint):int
-            xori           a0, a0, 31
-            j              G_M40987_IG05
-						;; size=28 bbWeight=0.50 PerfScore 4.25
-G_M40987_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            addi           a0, zero, 0xD1FFAB1E
-						;; size=4 bbWeight=0.50 PerfScore 0.25
-G_M40987_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref, epilog, nogc
+            clzw           a0, a0
+						;; size=4 bbWeight=1 PerfScore 0.50
+G_M40987_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 72, prolog size 16, PerfScore 25.00, instruction count 18, allocated bytes for code 72 (MethodHash=a4435fe4) for method System.Numerics.Tensors.TensorPrimitives+LeadingZeroCountOperator`1[int]:Invoke(int):int (FullOpts)
+; Total bytes of code 36, prolog size 16, PerfScore 17.00, instruction count 9, allocated bytes for code 36 (MethodHash=a4435fe4) for method System.Numerics.Tensors.TensorPrimitives+LeadingZeroCountOperator`1[int]:Invoke(int):int (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -54,7 +39,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 18 (0x00012) Actual length = 72 (0x000048)
+  Function Length   : 9 (0x00009) Actual length = 36 (0x000024)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-36 (-45.00%) : 32994.dasm - System.Number+BigInteger:CountSignificantBits(uint):uint (FullOpts)
@@ -5,12 +5,10 @@
 ; fp based frame
 ; partially interruptible
 ; No matching PGO data
-; 0 inlinees with PGO data; 0 single block inlinees; 1 inlinees without PGO data
 ; Final local variable assignments
 ;
-;  V00 arg0         [V00,T00] (  4,  3.50)     int  ->   a0         single-def
+;  V00 arg0         [V00,T00] (  3,  3   )     int  ->   a0         single-def
 ;# V01 OutArgs      [V01    ] (  1,  1   )  struct ( 0) [sp+0x00]   do-not-enreg[XS] addr-exposed "OutgoingArgSpace" <Empty>
-;  V02 tmp1         [V02,T01] (  3,  2   )     int  ->   a0         "Inline return value spill temp"
 ;
 ; Lcl frame size = 0
 Frame info. #outsz=0; #framesz=16; lcl=0
@@ -22,32 +20,17 @@ G_M36112_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mv             fp, sp
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M36112_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            sext.w         t6, a0
-            beqz           t6, G_M36112_IG04
-						;; size=8 bbWeight=1 PerfScore 4.00
-G_M36112_IG03:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 7
-            ld             a1, 0xD1FFAB1E(a1)
-            jalr           a1		// System.Numerics.BitOperations:Log2SoftwareFallback(uint):int
-            xori           a0, a0, 31
-            j              G_M36112_IG05
-						;; size=28 bbWeight=0.50 PerfScore 4.25
-G_M36112_IG04:        ; bbWeight=0.50, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            addi           a0, zero, 0xD1FFAB1E
-						;; size=4 bbWeight=0.50 PerfScore 0.25
-G_M36112_IG05:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+            clzw           a0, a0
             subw           a0, zero, a0
             addiw          a0, a0, 0xD1FFAB1E
-						;; size=8 bbWeight=1 PerfScore 1.00
-G_M36112_IG06:        ; bbWeight=1, epilog, nogc, extend
+						;; size=12 bbWeight=1 PerfScore 1.50
+G_M36112_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 80, prolog size 16, PerfScore 26.00, instruction count 20, allocated bytes for code 80 (MethodHash=581172ef) for method System.Number+BigInteger:CountSignificantBits(uint):uint (FullOpts)
+; Total bytes of code 44, prolog size 16, PerfScore 18.00, instruction count 11, allocated bytes for code 44 (MethodHash=581172ef) for method System.Number+BigInteger:CountSignificantBits(uint):uint (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -58,7 +41,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 20 (0x00014) Actual length = 80 (0x000050)
+  Function Length   : 11 (0x0000b) Actual length = 44 (0x00002c)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
-24 (-40.00%) : 12401.dasm - System.Numerics.Tensors.TensorPrimitives+PopCountOperator`1[long]:Invoke(long):long (FullOpts)
@@ -5,7 +5,6 @@
 ; fp based frame
 ; partially interruptible
 ; No matching PGO data
-; 0 inlinees with PGO data; 1 single block inlinees; 1 inlinees without PGO data
 ; Final local variable assignments
 ;
 ;  V00 arg0         [V00,T00] (  3,  3   )    long  ->   a0         single-def
@@ -21,21 +20,15 @@ G_M15459_IG01:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref,
             mv             fp, sp
 						;; size=16 bbWeight=1 PerfScore 9.00
 G_M15459_IG02:        ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
-            lui            a1, 0xD1FFAB1E
-            addiw          a1, a1, 0xD1FFAB1E
-            slli           a1, a1, 31
-            srli           a1, a1, 26
-            ld             a1, 0xD1FFAB1E(a1)
-            jalr           a1		// System.Numerics.BitOperations:<PopCount>g__SoftwareFallback|23_0(ulong):int
-            slliw          a0, a0, 0
-						;; size=28 bbWeight=1 PerfScore 7.50
+            cpop           a0, a0
+						;; size=4 bbWeight=1 PerfScore 0.50
 G_M15459_IG03:        ; bbWeight=1, epilog, nogc, extend
             ld             ra, 8(sp)
             ld             fp, 0(sp)
             addi           sp, sp, 16
             ret						;; size=16 bbWeight=1 PerfScore 7.50
 
-; Total bytes of code 60, prolog size 16, PerfScore 24.00, instruction count 15, allocated bytes for code 60 (MethodHash=1bb0c39c) for method System.Numerics.Tensors.TensorPrimitives+PopCountOperator`1[long]:Invoke(long):long (FullOpts)
+; Total bytes of code 36, prolog size 16, PerfScore 17.00, instruction count 9, allocated bytes for code 36 (MethodHash=1bb0c39c) for method System.Numerics.Tensors.TensorPrimitives+PopCountOperator`1[long]:Invoke(long):long (FullOpts)
 ; ============================================================
 
 Unwind Info:
@@ -46,7 +39,7 @@ Unwind Info:
   E bit             : 0
   X bit             : 0
   Vers              : 0
-  Function Length   : 15 (0x0000f) Actual length = 60 (0x00003c)
+  Function Length   : 9 (0x00009) Actual length = 36 (0x000024)
   ---- Epilog scopes ----
   ---- Scope 0
   Epilog Start Offset        : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e)
+0 (0.00%) : 171568.dasm - ILGEN_0x2193e4b5:Method_0xeffa27b5(ubyte,long,float,float,byte,double,uint,ubyte,short):int (FullOpts)

No diffs found?

+0 (0.00%) : 171536.dasm - Generated442:StructConstrainedInterfaceCallsTest() (FullOpts)

No diffs found?

+0 (0.00%) : 171472.dasm - ValueNumberingCheckedCastsOfConstants:g__ConfirmUInt64OneDecrementUnderUInt64MaxValueCastToUInt32Overflows|97_24() (FullOpts)

No diffs found?

Details

Size improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same size Improvements (bytes) Regressions (bytes)
linux.riscv64.Checked.mch 11,634 160 0 11,474 -14,440 +0

PerfScore improvements/regressions per collection

Collection Contexts with diffs Improvements Regressions Same PerfScore Improvements (PerfScore) Regressions (PerfScore) PerfScore Overall in FullOpts
linux.riscv64.Checked.mch 11,634 160 0 11,474 -13.86% 0.00% -0.0161%

Context information

Collection Diffed contexts MinOpts FullOpts Missed, base Missed, diff
linux.riscv64.Checked.mch 170,565 22,709 147,856 1,020 (0.59%) 1,020 (0.59%)

jit-analyze output

@risc-vv
Copy link

risc-vv commented Apr 7, 2025

RISC-V Release-CLR-VF2: 9531 / 9551 (99.79%)
=======================
      passed: 9531
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9657
 TOTAL tests: 9657
   REAL time: 2h 10min 49s 48ms
=======================

Release-CLR-VF2.md, Release-CLR-VF2.xml, testclr_output.tar.gz

Build information and commands

GIT: 7e34899dd40e671e2d93af7b36e1c44dff737911
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-CLR-QEMU: 9531 / 9551 (99.79%)
=======================
      passed: 9531
      failed: 3
     skipped: 106
      killed: 17
------------------------
  TOTAL libs: 9657
 TOTAL tests: 9657
   REAL time: 2h 47min 38s 826ms
=======================

Release-CLR-QEMU.md, Release-CLR-QEMU.xml, testclr_output.tar.gz

Build information and commands

GIT: 7e34899dd40e671e2d93af7b36e1c44dff737911
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-VF2: 628462 / 660362 (95.17%)
=======================
      passed: 628462
      failed: 1300
     skipped: 1422
      killed: 30600
------------------------
  TOTAL libs: 259
 TOTAL tests: 661784
   REAL time: 3h 3min 15s 688ms
=======================

Release-FX-VF2.md, Release-FX-VF2.xml, testfx_output.tar.gz

Build information and commands

GIT: 7e34899dd40e671e2d93af7b36e1c44dff737911
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

RISC-V Release-FX-QEMU: 643107 / 667247 (96.38%)
=======================
      passed: 643107
      failed: 1545
     skipped: 1606
      killed: 22595
------------------------
  TOTAL libs: 259
 TOTAL tests: 668853
   REAL time: 2h 33min 44s 306ms
=======================

Release-FX-QEMU.md, Release-FX-QEMU.xml, testfx_output.tar.gz

Build information and commands

GIT: 7e34899dd40e671e2d93af7b36e1c44dff737911
CI: a8426a46d8575dfcb3b5fec0d7d0b7a7c118d690
REPO: dotnet/runtime
BRANCH: main
CONFIG: Release
LIB_CONFIG: Release

@am11
Copy link
Member

am11 commented Apr 7, 2025

RISC-V Release-CLR-VF2: 9531 / 9551 (99.79%)

From https://gist.githubusercontent.com/risc-vv/49ddc17989113384472374ebc51ded44/raw/dc887c16439f41205bffc36465db5297291669df/Release-CLR-VF2-7e34899dd40e671e2d93af7b36e1c44dff737911.xml

  <testsuite name="readytorun.determinism.crossgen2determinism" time="1.970">
    <testcase name="crossgen2determinism.sh" classname="readytorun.determinism.crossgen2determinism.crossgen2determinism" result="FAILED" time="1.970">
      <failure message="Unknown exit code 101." type="exitcode_101">Could not find file '/var/lib/go-agent/pipelines/Release-CLR-VF2/coreclr.Release/readytorun/determinism/crossgen2determinism/crossgen2smoke1.ildll'.
File name: '/var/lib/go-agent/pipelines/Release-CLR-VF2/coreclr.Release/readytorun/determinism/crossgen2determinism/crossgen2smoke1.ildll'
   at Interop.ThrowExceptionForIoErrno(ErrorInfo errorInfo, String path, Boolean isDirError)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String path, OpenFlags flags, Int32 mode, Boolean failForSymlink, Boolean&amp; wasSymlink, Func`4 createOpenException)
   at Microsoft.Win32.SafeHandles.SafeFileHandle.Open(String fullPath, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize, UnixFileMode openPermissions, Int64&amp; fileLength, UnixFileMode&amp; filePermissions, Boolean failForSymlink, Boolean&amp; wasSymlink, Func`4 createOpenException)
   at System.IO.File.OpenHandle(String path, FileMode mode, FileAccess access, FileShare share, FileOptions options, Int64 preallocationSize)
   at System.IO.File.ReadAllBytes(String path)
   at Program.TestEntryPoint() in /godata/pipelines/Release-build/runtime/src/tests/readytorun/determinism/Program.cs:line 13
   at __GeneratedMainWrapper.Main() in /godata/pipelines/Release-build/runtime/artifacts/tests/coreclr/obj/linux.riscv64.Release/Managed/readytorun/determinism/crossgen2determinism/XUnitWrapperGenerator/XUnitWrapperGenerator.XUnitWrapperGenerator/SimpleRunner.g.cs:line 7

do we know if that path is wrong on the test runner or the file wasn't generated by the build? File should be generated at build time

$CORE_ROOT/crossgen2/crossgen2 --map -r:$CORE_ROOT/*.dll -r:./crossgen2smoke/helperdll.dll -r:./crossgen2smoke/helperildll.dll -o:crossgen2smoke1.ildll ./crossgen2smoke/crossgen2smoke.dll
and placed at pwd when the test runs:
byte[] file1 = File.ReadAllBytes("crossgen2smoke1.ildll");

@tomeksowi
Copy link
Contributor Author

do we know if that path is wrong on the test runner or the file wasn't generated by the build? File should be generated at build time

The file is built, it does pass locally. Looks like it's not collected or at wrong path for the test run. @sirntar will be working on it.

@sirntar
Copy link
Member

sirntar commented Apr 8, 2025

@am11 The file is generated at build time, but isn't sent to CI tests runners. I am investigating this at the moment. It's probably some packing issue...

@BruceForstall BruceForstall merged commit 44dd838 into dotnet:main Apr 8, 2025
120 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arch-riscv Related to the RISC-V architecture area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants