-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[RISC-V] Introduce C extension for Integer Register-Register Operations #117408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Diffs are based on 13,180 contexts (10,390 MinOpts, 2,790 FullOpts). Overall (-12,754 bytes)
MinOpts (-7,098 bytes)
FullOpts (-5,656 bytes)
Example diffstest.mch-4 (-8.33%) : 7688.dasm - Microsoft.CodeAnalysis.CSharp.BinderFlagsExtensions:Includes(uint,uint):bool (Tier1)@@ -27,14 +27,14 @@ G_M26379_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
sext.w a0, a0
subw a0, a0, a1
sltiu a0, a0, 1
- ;; size=16 bbWeight=1 PerfScore 2.00
+ ;; size=12 bbWeight=1 PerfScore 2.00
G_M26379_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
addi sp, sp, 16
ret ;; size=16 bbWeight=1 PerfScore 7.50
-; Total bytes of code 48, prolog size 16, PerfScore 18.50, instruction count 12, allocated bytes for code 48 (MethodHash=815098f4) for method Microsoft.CodeAnalysis.CSharp.BinderFlagsExtensions:Includes(uint,uint):bool (Tier1)
+; Total bytes of code 44, prolog size 16, PerfScore 18.50, instruction count 12, allocated bytes for code 44 (MethodHash=815098f4) for method Microsoft.CodeAnalysis.CSharp.BinderFlagsExtensions:Includes(uint,uint):bool (Tier1)
; ============================================================
Unwind Info:
@@ -45,7 +45,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 12 (0x0000c) Actual length = 48 (0x000030)
+ Function Length : 22 (0x00016) Actual length = 44 (0x00002c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -8 (-8.00%) : 3238.dasm - Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (Tier1)@@ -40,14 +40,14 @@ G_M65205_IG02: ; bbWeight=1, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
or a0, a1, a0
sext.w a0, a0
addiw a0, a0, 0xD1FFAB1E
- ;; size=68 bbWeight=1 PerfScore 8.50
+ ;; size=60 bbWeight=1 PerfScore 8.50
G_M65205_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
addi sp, sp, 16
ret ;; size=16 bbWeight=1 PerfScore 7.50
-; Total bytes of code 100, prolog size 16, PerfScore 25.00, instruction count 25, allocated bytes for code 100 (MethodHash=8bf1014a) for method Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (Tier1)
+; Total bytes of code 92, prolog size 16, PerfScore 25.00, instruction count 25, allocated bytes for code 92 (MethodHash=8bf1014a) for method Microsoft.CodeAnalysis.CachingBase`1[System.__Canon]:AlignSize(int):int (Tier1)
; ============================================================
Unwind Info:
@@ -58,7 +58,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 25 (0x00019) Actual length = 100 (0x000064)
+ Function Length : 46 (0x0002e) Actual length = 92 (0x00005c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) -14 (-7.29%) : 4638.dasm - System.Reflection.RuntimeMethodInfo:GetHashCode():int:this (Instrumented Tier1)@@ -83,14 +83,14 @@ G_M36882_IG02: ; bbWeight=1, gcrefRegs=0400 {a0}, byrefRegs=0000 {}, byre
srliw a1, a0, 16
xor a0, a0, a1
sext.w a0, a0
- ;; size=160 bbWeight=1 PerfScore 38.50
+ ;; size=146 bbWeight=1 PerfScore 38.50
G_M36882_IG03: ; bbWeight=1, epilog, nogc, extend
ld ra, 8(sp)
ld fp, 0(sp)
addi sp, sp, 16
ret ;; size=16 bbWeight=1 PerfScore 7.50
-; Total bytes of code 192, prolog size 16, PerfScore 55.00, instruction count 43, allocated bytes for code 192 (MethodHash=8fbf6fed) for method System.Reflection.RuntimeMethodInfo:GetHashCode():int:this (Instrumented Tier1)
+; Total bytes of code 178, prolog size 16, PerfScore 55.00, instruction count 43, allocated bytes for code 178 (MethodHash=8fbf6fed) for method System.Reflection.RuntimeMethodInfo:GetHashCode():int:this (Instrumented Tier1)
; ============================================================
Unwind Info:
@@ -101,7 +101,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 48 (0x00030) Actual length = 192 (0x0000c0)
+ Function Length : 89 (0x00059) Actual length = 178 (0x0000b2)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 13168.dasm - LUDecomp:lusolve(double[][],int,double[]):int (Instrumented Tier0)@@ -114,7 +114,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 71 (0x00047) Actual length = 284 (0x00011c)
+ Function Length : 142 (0x0008e) Actual length = 284 (0x00011c)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 13152.dasm - Neural:DoNNetIteration(long):long (Tier1)@@ -271,7 +271,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 213 (0x000d5) Actual length = 852 (0x000354)
+ Function Length : 426 (0x001aa) Actual length = 852 (0x000354)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) +0 (0.00%) : 13120.dasm - Neural:do_out_error(int) (Instrumented Tier0)@@ -284,7 +284,7 @@ Unwind Info:
E bit : 0
X bit : 0
Vers : 0
- Function Length : 210 (0x000d2) Actual length = 840 (0x000348)
+ Function Length : 420 (0x001a4) Actual length = 840 (0x000348)
---- Epilog scopes ----
---- Scope 0
Epilog Start Offset : 3523193630 (0xd1ffab1e) Actual offset = 3523193630 (0xd1ffab1e) Offset from main function begin = 3523193630 (0xd1ffab1e) DetailsSize improvements/regressions per collection
PerfScore improvements/regressions per collection
Context information
jit-analyze output |
RISC-V Release-CLR-QEMU: 9041 / 9092 (99.44%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 9146 / 9251 (98.86%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9036 / 9090 (99.41%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 4322 / 4437 (97.41%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
You also may want to see if R2RDump works with the new formats, there may be some silent assumptions about instruction width in e.g. |
Co-authored-by: Tomasz Sowiński <[email protected]>
RISC-V Release-FX-QEMU: 8632 / 8737 (98.80%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9035 / 9090 (99.39%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-QEMU: 9042 / 9093 (99.44%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
…neration Co-authored-by: Tomasz Sowiński <[email protected]>
This error seems related. ☝️ |
Thanks for reviewing! @tomeksowi @am11 @sirntar After the reviews, I can see many things aren't quite right yet in this PR:
For now, I'll mark this PR as a draft again, make the necessary changes and I'll try to test the CLR tests locally first before pushing 👍 But feel free to keep reviewing and please let me know if there's anything I might've missed! Thanks everyone 😃 |
Hi @sirntar , somehow the CI is triggered for the same commit multiple times 😅 Can you please help check when you have the time? Thank you very much 🙏 Sorry if it is somehow caused by build failure. In the meantime, I'll push a quick fix patch just in case 🙏 Edit: already pushed temporary "fix" for build error on 270a63d |
…enerate compressed instruction in prolog / epilog
RISC-V Release-CLR-VF2: 9036 / 9089 (99.42%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-QEMU: 9041 / 9094 (99.42%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
RISC-V Release-FX-QEMU: 9046 / 9154 (98.82%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 5993 / 6108 (98.12%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
Something to watch out for: hardcoded branch offsets, e.g. here: runtime/src/coreclr/jit/codegenriscv64.cpp Lines 1605 to 1609 in 6d869a8
Is there anything preventing allocating the temps in a way to allow We have them in a couple of places, they may become pain spots as we add support for more compressed instructions, there's nothing checking if the offset matches the intended target. |
I've went through the code to understand the issue raised by @tomeksowi. From what I understand, at some places we use To fix this issue, I propose three alternative solutions:
Alternative 1 should probably be done in a separate PR. While the quick fixes can be easily incorporated to this PR. Any thoughts? Please let me know if I there's anything I misunderstood 🙏 |
Maybe just convert to branches with labels? e.g. here: runtime/src/coreclr/jit/codegenriscv64.cpp Lines 2324 to 2336 in 513ff1a
IMO it's acceptable to sacrifice through-put a little to get a safe "C" implementation going. |
RISC-V Release-CLR-QEMU: 9085 / 9116 (99.66%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9085 / 9116 (99.66%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 283563 / 284649 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 270226 / 271908 (99.38%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
… c.* prefix in RVC instruction names
Co-authored-by: Tomasz Sowiński <[email protected]>
RISC-V Release-CLR-QEMU: 9087 / 9117 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-CLR-VF2: 9088 / 9118 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-QEMU: 285333 / 286424 (99.62%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V Release-FX-VF2: 250861 / 252567 (99.32%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
Hi @jakobbotsch @MichalStrehovsky can you help review this PR once you're available? Thank you 🙂🙏 |
@jakobbotsch @MichalStrehovsky, pinging again for code review. Thanks! |
I don't see anything I could review. The change in TargetDetails.cs LGTM provided RISC-V requires methods to be aligned at 2 byte boundary. The change in ElfObjectWriter.cs LGTM provided the code that is generated matches the ELF ABI specified there. These are all codegen questions. |
Can you please resolve the conflicts? |
@risc-vv /run |
RISC-V pull_request-CLR-QEMU: 9106 / 9136 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V pull_request-CLR-VF2: 9107 / 9137 (99.67%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V pull_request-FX-QEMU: 0 / 0 (100.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst RISC-V pull_request-FX-VF2: 0 / 52 (0.00%)
report.xml, report.md, failures.xml, testclr_details.tar.zst Build information and commandsGIT: |
We're introducing RVC instructions to RISC-V JIT in dotnet/runtime: dotnet/runtime#117408 With that PR and this PR, output of `R2RDump` tool for the following C# method: ```cs public static long FunAdd(long a, long b) { return a + b; } ``` is as follows: ```asm long Program.FunAdd(long, long) Handle: 0x06000002 Rid: 2 EntryPointRuntimeFunctionId: 1 Number of RuntimeFunctions: 1 long Program.FunAdd(long, long) Id: 1 StartAddress: 0x00010BD0 Size: 34 bytes UnwindRVA: 0x0001091C Debug Info Bounds: Native Offset: 0x0, Prolog, Source Types: StackEmpty Native Offset: 0x10, IL Offset: 0x0000, Source Types: StackEmpty Native Offset: 0x12, Epilog, Source Types: StackEmpty Variable Locations: Variable Number: 0 Start Offset: 0x0 End Offset: 0x10 Loc Type: VLT_REG Register: A0 Variable Number: 1 Start Offset: 0x0 End Offset: 0x10 Loc Type: VLT_REG Register: A1 10bd0: 13 01 01 ff addi sp, sp, -16 10bd4: 23 30 81 00 sd s0, 0(sp) 10bd8: 23 34 11 00 sd ra, 8(sp) 10bdc: 13 04 01 00 mv s0, sp 10be0: 2e 95 add a0, a0, a1 ;; RVC instruction 10be2: 83 30 81 00 ld ra, 8(sp) 10be6: 03 34 01 00 ld s0, 0(sp) 10bea: 13 01 01 01 addi sp, sp, 16 10bee: 67 80 00 00 ret ```
This is a first attempt at utilizing the C extension for the RISC-V JIT compiler.
Here's what's changed in this PR:
R2RDump
format with other architecture with variable length instructions (Arm32 / XArch)ProbeRiscV64Quirks
inR2RDump
This PR does not fully utilize the C extension yet. Here are some things that must be addressed in a follow up PR:
ProbeRiscV64Quirks
inR2RDump
to detect patterns by forward traversing, similar toArm64
implementation`.R2RDump
sample output:Part of #84834, cc @dotnet/samsung