-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[LoongArch64] Handle LASX/LSX/FP context depending on ISA availability. #118007
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* do some clean up.
Please NOTE:
|
Hi @driver1998 , does this solve your problem? |
cc @xen0n for review. You may also need to test the 2K1000/2000/3000 series, those have LSX but no LASX. (I didn't post my patch upstream because I am still waiting for my 2K3000 board to arrive lol) |
Although IIRC HWCAP should be preferred over CPUCFG, since there might be situations where the CPU supports LSX/LASX but the kernel disabled them. see https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6772567 for example. |
Yes, this patch can handle On the clang side, AFAIK |
Thanks, then we should consider similar judgment approach. |
Not sure, but AFAIK a VM running non-LASX kernel on something like a 3A6000 is used for simulating a 2K3000 in distro testing. |
Good point. We can use similar approach taken here for riscv64 #113676 (see |
Thanks very much. This approach is good to detect cpu features to add |
The current version of the change looks good to me. Is it ok to merge it?
This can be done in a follow up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please properly separate logical changes, and say what you did, e.g. "Handle LASX/LSX/FP context depending on ISA availability" in the commit title. "Fix crash on 2K series SoC" should appear in the commit body because it's providing helpful additional context.
Add HWCAP get in minipal_getcpufeatures(). Co-authored-by: driver1998 <[email protected]>
If we had to run on the assumed environment which CPU supports diffdiff --git a/src/coreclr/pal/src/arch/loongarch64/context2.S b/src/coreclr/pal/src/arch/loongarch64/context2.S
index 4dad08e6006..e77df4eb9fb 100644
--- a/src/coreclr/pal/src/arch/loongarch64/context2.S
+++ b/src/coreclr/pal/src/arch/loongarch64/context2.S
@@ -37,14 +37,19 @@ LOCAL_LABEL(Restore_CONTEXT_FLOATING_POINT):
andi $t1, $r21, (1 << CONTEXT_FLOATING_POINT_BIT)
beqz $t1, LOCAL_LABEL(No_Restore_CONTEXT_FLOATING_POINT)
- ori $r21, $zero, 2
- cpucfg $r21, $r21
- // CPUCFG.2.LASX[bit7]
- andi $t1, $r21, 0x80
- bnez $t1, LOCAL_LABEL(Restore_CONTEXT_LASX)
- // CPUCFG.2.LSX[bit6]
- andi $t1, $r21, 0x40
- bnez $t1, LOCAL_LABEL(Restore_CONTEXT_LSX)
+ PROLOG_SAVE_REG_PAIR_INDEXED 22, 1, 32
+ st.d $a0, $fp, 16
+ st.d $a1, $fp, 24
+ bl C_FUNC(minipal_getcpufeatures)
+ ori $t1, $a0, 0
+ ld.d $a0, $fp, 16
+ ld.d $a1, $fp, 24
+ EPILOG_RESTORE_REG_PAIR_INDEXED 22, 1, 32
+
+ andi $t3, $t1, CONTEXT_HWCAP_LOONGARCH_LASX_BIT
+ bnez $t3, LOCAL_LABEL(Restore_CONTEXT_LASX)
+ andi $t3, $t1, CONTEXT_HWCAP_LOONGARCH_LSX_BIT
+ bnez $t3, LOCAL_LABEL(Restore_CONTEXT_LSX)
// Neither LSX or LASX is supported.
fld.d $f0 , $a0, CONTEXT_FPU_OFFSET
@@ -300,13 +305,16 @@ LOCAL_LABEL(Done_CONTEXT_INTEGER):
andi $t3, $t1, (1 << CONTEXT_FLOATING_POINT_BIT)
beqz $t3, LOCAL_LABEL(Done_CONTEXT_FLOATING_POINT)
- ori $t1, $zero, 2
- cpucfg $t1, $t1
- // CPUCFG.2.LASX[bit7]
- andi $t3, $t1, 0x80
+ PROLOG_SAVE_REG_PAIR_INDEXED 22, 1, 32
+ st.d $a0, $fp, 16
+ bl C_FUNC(minipal_getcpufeatures)
+ ori $t1, $a0, 0
+ ld.d $a0, $fp, 16
+ EPILOG_RESTORE_REG_PAIR_INDEXED 22, 1, 32
+
+ andi $t3, $t1, CONTEXT_HWCAP_LOONGARCH_LASX_BIT
bnez $t3, LOCAL_LABEL(Store_CONTEXT_LASX)
- // CPUCFG.2.LSX[bit6]
- andi $t3, $t1, 0x40
+ andi $t3, $t1, CONTEXT_HWCAP_LOONGARCH_LSX_BIT
bnez $t3, LOCAL_LABEL(Store_CONTEXT_LSX) |
Maybe the value can be cached somewhere? I don't expect it will be changed at runtime. Also this can be delayed for later as well, no need to block this PR. |
I just debugged the
Agree. I will also try to find an environment where hardware supports LSX/LASX but the kernel disabled them. |
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
@LuckyXu-HF, I agree with @jkotas that we can keep cpufeatures changes out until the time they are actually used in C/C++ code. I see that you have CPU feature detection duplicated in asm and new code in cpufeatures.c is never exercised (i.e. |
Ok, already reverted |
@dotnet/samsung Could you please take a look? These changes may be related to riscv64. |
With d06e11d the test passed: Hi @driver1998 we are trying to build a similar environment #118007 (comment) , could you please help to share more information about the vm and related image resources? Thank you very much. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
It is a bit theoretical, but CONFIG_CPU_HAS_LASX=n in the kernel config should work. LGTM otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but one minor question: is it okay to have different context layouts depending on runtime ISA extension availability? The logic implemented in this patch seems to favor optimized space consumption, for example $f10
/$vr10
/$xr10
are placed at offsets of 8*10
/16*10
/32*10
depending on runtime HW & kernel capability, instead of being fixed at say 32*10
. Given that $fNN
is the lower half of $vrNN
, which is in turn the lower half of $xrNN
, the current behavior seems a bit counter-intuitive.
I don't know if this is going to complicate low-level debuggers or anything else that needs to be aware of these details, in the .NET ecosystem; that's why I'm asking. I'd be happy if this isn't a problem in practice!
It is now labelled via ContextFlags, so downstream tools like debuggers will be able to tell them apart. |
Uh oh!
There was an error while loading. Please reload this page.