Skip to content

Conversation

LuckyXu-HF
Copy link
Contributor

@LuckyXu-HF LuckyXu-HF commented Jul 24, 2025

@LuckyXu-HF
Copy link
Contributor Author

Please NOTE:

  • Higher versions of clang for LA64 enabled LSX by default (while LASX is not enabled by default for now).
  • Therefore, when building relevant elf and .a files in the SDK to run on 2K embedded platforms which not support LSX/LASX, the clang compilation flags -mno-lsx -mno-lasx must also be added to disable LSX/LASX support.

@LuckyXu-HF
Copy link
Contributor Author

Hi @driver1998 , does this solve your problem?
cc @shushanhf

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 24, 2025
@driver1998
Copy link
Contributor

driver1998 commented Jul 24, 2025

cc @xen0n for review.
I have a similar patch locally driver1998/dotnet-nolsx@e608ccb and that works. This should just work I guess.

You may also need to test the 2K1000/2000/3000 series, those have LSX but no LASX.

(I didn't post my patch upstream because I am still waiting for my 2K3000 board to arrive lol)

@driver1998
Copy link
Contributor

Although IIRC HWCAP should be preferred over CPUCFG, since there might be situations where the CPU supports LSX/LASX but the kernel disabled them.

see https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6772567 for example.

@LuckyXu-HF
Copy link
Contributor Author

cc @xen0n for review. I have a similar patch locally driver1998/dotnet-nolsx@e608ccb and that works. This should just work I guess.

You may also need to test the 2K1000/2000/3000 series, those have LSX but no LASX.

(I didn't post my patch upstream because I am still waiting for my 2K3000 board to arrive lol)

Yes, this patch can handle no LSX/LASX, only LSX , LSX && LASX cases on the coreclr side.

On the clang side, AFAIK lsx start enabled by default on clang19, so if we are using clang version>=19, we will need to add clang compilation option -mno-lsx for the platform which do not support LSX/LASX.

@LuckyXu-HF
Copy link
Contributor Author

Although IIRC HWCAP should be preferred over CPUCFG, since there might be situations where the CPU supports LSX/LASX but the kernel disabled them.

see https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6772567 for example.

Thanks, then we should consider similar judgment approach.
Does .NET have demand in such environments?

@driver1998
Copy link
Contributor

Does .NET have demand in such environments?

Not sure, but AFAIK a VM running non-LASX kernel on something like a 3A6000 is used for simulating a 2K3000 in distro testing.

@am11
Copy link
Member

am11 commented Jul 24, 2025

Although IIRC HWCAP should be preferred over CPUCFG, since there might be situations where the CPU supports LSX/LASX but the kernel disabled them.

see https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6772567 for example.

Good point. We can use similar approach taken here for riscv64 #113676 (see src/native/minipal/cpufeatures.c changes). You can add both old and new instructions in separate asm macros and at the call-site, use cpufeatures detection.

@LuckyXu-HF
Copy link
Contributor Author

Although IIRC HWCAP should be preferred over CPUCFG, since there might be situations where the CPU supports LSX/LASX but the kernel disabled them.
see https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6772567 for example.

Good point. We can use similar approach taken here for riscv64 #113676 (see src/native/minipal/cpufeatures.c changes). You can add both old and new instructions in separate asm macros and at the call-site, use cpufeatures detection.

Thanks very much. This approach is good to detect cpu features to add atomic/hwintrinsic/simd ISAs instead of inline assembly. I will add this in cpufeatures.c.

@jkotas
Copy link
Member

jkotas commented Jul 24, 2025

The current version of the change looks good to me. Is it ok to merge it?

Thanks very much. This approach is good to detect cpu features to add atomic/hwintrinsic/simd ISAs instead of inline assembly. I will add this in cpufeatures.c.

This can be done in a follow up.

Copy link

@xen0n xen0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please properly separate logical changes, and say what you did, e.g. "Handle LASX/LSX/FP context depending on ISA availability" in the commit title. "Fix crash on 2K series SoC" should appear in the commit body because it's providing helpful additional context.

Add HWCAP get in minipal_getcpufeatures().

Co-authored-by: driver1998 <[email protected]>
@LuckyXu-HF LuckyXu-HF changed the title [LoongArch64] Fix SDK running on 2K series SoC. [LoongArch64] Handle LASX/LSX/FP context depending on ISA availability. Jul 25, 2025
@LuckyXu-HF
Copy link
Contributor Author

LuckyXu-HF commented Jul 25, 2025

If we had to run on the assumed environment which CPU supports LSX/LASX but the kernel disabled them (#118007 (comment)). Then we should call minipal_getcpufeatures to check instead of the cpucfg instruction, this should affect the performance. getauxval() does not involve syscall.

diff
diff --git a/src/coreclr/pal/src/arch/loongarch64/context2.S b/src/coreclr/pal/src/arch/loongarch64/context2.S
index 4dad08e6006..e77df4eb9fb 100644
--- a/src/coreclr/pal/src/arch/loongarch64/context2.S
+++ b/src/coreclr/pal/src/arch/loongarch64/context2.S
@@ -37,14 +37,19 @@ LOCAL_LABEL(Restore_CONTEXT_FLOATING_POINT):
     andi $t1, $r21, (1 << CONTEXT_FLOATING_POINT_BIT)
     beqz $t1, LOCAL_LABEL(No_Restore_CONTEXT_FLOATING_POINT)
 
-    ori $r21, $zero, 2
-    cpucfg $r21, $r21
-    // CPUCFG.2.LASX[bit7]
-    andi $t1, $r21, 0x80
-    bnez $t1, LOCAL_LABEL(Restore_CONTEXT_LASX)
-    // CPUCFG.2.LSX[bit6]
-    andi $t1, $r21, 0x40
-    bnez $t1, LOCAL_LABEL(Restore_CONTEXT_LSX)
+    PROLOG_SAVE_REG_PAIR_INDEXED  22, 1, 32
+    st.d  $a0, $fp, 16
+    st.d  $a1, $fp, 24
+    bl  C_FUNC(minipal_getcpufeatures)
+    ori  $t1, $a0, 0
+    ld.d  $a0, $fp, 16
+    ld.d  $a1, $fp, 24
+    EPILOG_RESTORE_REG_PAIR_INDEXED  22, 1, 32
+
+    andi $t3, $t1, CONTEXT_HWCAP_LOONGARCH_LASX_BIT
+    bnez $t3, LOCAL_LABEL(Restore_CONTEXT_LASX)
+    andi $t3, $t1, CONTEXT_HWCAP_LOONGARCH_LSX_BIT
+    bnez $t3, LOCAL_LABEL(Restore_CONTEXT_LSX)
 
     // Neither LSX or LASX is supported.
     fld.d  $f0 , $a0, CONTEXT_FPU_OFFSET
@@ -300,13 +305,16 @@ LOCAL_LABEL(Done_CONTEXT_INTEGER):
     andi  $t3, $t1, (1 << CONTEXT_FLOATING_POINT_BIT)
     beqz  $t3, LOCAL_LABEL(Done_CONTEXT_FLOATING_POINT)
 
-    ori $t1, $zero, 2
-    cpucfg $t1, $t1
-    // CPUCFG.2.LASX[bit7]
-    andi $t3, $t1, 0x80
+    PROLOG_SAVE_REG_PAIR_INDEXED  22, 1, 32
+    st.d  $a0, $fp, 16
+    bl  C_FUNC(minipal_getcpufeatures)
+    ori  $t1, $a0, 0
+    ld.d  $a0, $fp, 16
+    EPILOG_RESTORE_REG_PAIR_INDEXED  22, 1, 32
+
+    andi $t3, $t1, CONTEXT_HWCAP_LOONGARCH_LASX_BIT
     bnez $t3, LOCAL_LABEL(Store_CONTEXT_LASX)
-    // CPUCFG.2.LSX[bit6]
-    andi $t3, $t1, 0x40
+    andi $t3, $t1, CONTEXT_HWCAP_LOONGARCH_LSX_BIT
     bnez $t3, LOCAL_LABEL(Store_CONTEXT_LSX)

@driver1998
Copy link
Contributor

driver1998 commented Jul 25, 2025

Maybe the value can be cached somewhere? I don't expect it will be changed at runtime.
Or maybe the performance penalty is not actually that high.

Also this can be delayed for later as well, no need to block this PR.

@LuckyXu-HF
Copy link
Contributor Author

LuckyXu-HF commented Jul 25, 2025

Or maybe the performance penalty is not actually that high.

I just debugged the getauxval() and it does not involve syscall, so I think call minipal_getcpufeatures here is feasible.

Also this can be delayed for later as well, no need to block this PR.

Agree. I will also try to find an environment where hardware supports LSX/LASX but the kernel disabled them.

@risc-vv
Copy link

risc-vv commented Jul 25, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@am11
Copy link
Member

am11 commented Jul 25, 2025

@LuckyXu-HF, I agree with @jkotas that we can keep cpufeatures changes out until the time they are actually used in C/C++ code. I see that you have CPU feature detection duplicated in asm and new code in cpufeatures.c is never exercised (i.e. minipal_getcpufeatures() not called and LoongArch64IntrinsicConstants_LSX and _LASX are unused). If you are not planing to use them, lets revert src/native/minipal for now.

@LuckyXu-HF
Copy link
Contributor Author

@LuckyXu-HF, I agree with @jkotas that we can keep cpufeatures changes out until the time they are actually used in C/C++ code. I see that you have CPU feature detection duplicated in asm and new code in cpufeatures.c is never exercised (i.e. minipal_getcpufeatures() not called and LoongArch64IntrinsicConstants_LSX and _LASX are unused). If you are not planing to use them, lets revert src/native/minipal for now.

Ok, already reverted src/native/minipal in this PR. minipal_getcpufeatures() may need to be called to solve #118007 (comment) and #118007 (comment)
We will add it in a follow up.

@risc-vv
Copy link

risc-vv commented Jul 29, 2025

@dotnet/samsung Could you please take a look? These changes may be related to riscv64.

@LuckyXu-HF
Copy link
Contributor Author

LuckyXu-HF commented Jul 31, 2025

With d06e11d the test passed:
Release-CoreRoot-SPC-PE32+: Total: 14979 ; Passed: 14924 ; Failed: 0 ; Skipped: 55 ; Time: 2316.147s ; PassRate:100%
Debug-CoreRoot-SPC-PE32+: Total: 14979 ; Passed: 14924 ; Failed: 0 ; Skipped: 55 ; Time: 12141.417s ; PassRate:100%
Release-libs.tests-SPC-PE32+:TotalDir:258 ; Total: 707112; Passed: 705430; Errors: 0 ; Failed: 0 ; Skipped: 1682; Time: 3820.5s ; PassRate:100%

Hi @driver1998 we are trying to build a similar environment #118007 (comment) , could you please help to share more information about the vm and related image resources? Thank you very much.

@jkotas jkotas requested review from xen0n, am11 and driver1998 August 1, 2025 22:37
Copy link
Member

@am11 am11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@driver1998
Copy link
Contributor

driver1998 commented Aug 2, 2025

could you please help to share more information about the vm and related image resources? 

It is a bit theoretical, but CONFIG_CPU_HAS_LASX=n in the kernel config should work.

LGTM otherwise.

Copy link

@xen0n xen0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but one minor question: is it okay to have different context layouts depending on runtime ISA extension availability? The logic implemented in this patch seems to favor optimized space consumption, for example $f10/$vr10/$xr10 are placed at offsets of 8*10/16*10/32*10 depending on runtime HW & kernel capability, instead of being fixed at say 32*10. Given that $fNN is the lower half of $vrNN, which is in turn the lower half of $xrNN, the current behavior seems a bit counter-intuitive.

I don't know if this is going to complicate low-level debuggers or anything else that needs to be aware of these details, in the .NET ecosystem; that's why I'm asking. I'd be happy if this isn't a problem in practice!

@driver1998
Copy link
Contributor

driver1998 commented Aug 2, 2025

is it okay to have different context layouts depending on runtime ISA extension availability?

It is now labelled via ContextFlags, so downstream tools like debuggers will be able to tell them apart.
I think it should be fine.

@jkotas jkotas merged commit 6dc15b5 into dotnet:main Aug 2, 2025
98 checks passed
@LuckyXu-HF LuckyXu-HF deleted the main-LA64-1 branch August 4, 2025 01:26
@github-actions github-actions bot locked and limited conversation to collaborators Sep 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-loongarch64 area-PAL-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants