Skip to content

oops in stmmac driver parsing RX packets #2887

@pamolloy

Description

@pamolloy

Split headers (SPH) is a DMA feature to split the header and payload in the receive path. See SPH bit in EMAC_DMA[n]_CTL register on page 30-255 of the ADSP-SC598 HRM.

With this feature enabled I ran into a oops fairly quickly on 5.15 and immediately on 6.12. Initially when copying data over SSH, but reproduced with iperf3.

https://lore.kernel.org/all/CABnpCuCLN6VNgmoWHwc4_8AT34xqmQnEoUHLncvE2yLqYZBaKg@mail.gmail.com/

Matches the above stack trace most closely and includes some debugging information.

https://lore.kernel.org/all/[email protected]

Includes some debug code from an Nvidia Tegra developer who states the code, "causes other issues".

That appears to have been used to implement a patch in the ADSP Yocto project to avoid the problem:

https://github.com/analogdevicesinc/lnxdsp-adi-meta/blob/main/meta-adi-adsp-sc5xx/recipes-kernel/linux/linux-adi/0001-SC598-fix-stmmac-dma-split-header-crash.patch

In my testing passing buf1_len instead causes the following:

Connection to ... closed by remote host.

Note that dwmac-intel.c and dwmac-dwc-qos-eth.c, primarily used by Nvidia Tegra SoCs, have already disabled this feature:

47f753c ("net: stmmac: disable Split Header (SPH) for Intel platforms")
029c1c2 ("net: stmmac: dwc-qos: Disable split header for Tegra194")

In testing disabling split headers with the following commits resulted in a 200 Mbits/sec decrease in performance on receive.

[   88.577353] Unable to handle kernel paging request at virtual address ffff00019799bf80
[   88.585166] Mem abort info:
[   88.587920]   ESR = 0x0000000096000145
[   88.591634]   EC = 0x25: DABT (current EL), IL = 32 bits
[   88.596927]   SET = 0, FnV = 0
[   88.599983]   EA = 0, S1PTW = 0
[   88.603089]   FSC = 0x05: level 1 translation fault
[   88.607968] Data abort info:
[   88.610814]   ISV = 0, ISS = 0x00000145
[   88.614633]   CM = 1, WnR = 1
[   88.617579] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000090df3000
[   88.624268] [ffff00019799bf80] pgd=180000009dff9003, p4d=180000009dff9003, pud=0000000000000000
[   88.632953] Internal error: Oops: 0000000096000145 [#1] PREEMPT SMP
[   88.639195] Modules linked in:
[   88.642234] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.168-yocto-standard #2
[   88.649608] Hardware name: ADI 64-bit SC598 SOM EZ Kit (DT)
[   88.655164] pstate: 00000009 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   88.662107] pc : dcache_inval_poc+0x24/0x54
[   88.666273] lr : arch_sync_dma_for_cpu+0x1c/0x24
[   88.670874] sp : ffff800008003c60
[   88.674172] x29: ffff800008003c60 x28: ffff000092be0880 x27: ffff000092be0880
[   88.681290] x26: ffff000094780000 x25: 00000000000043a8 x24: ffff0000947843a8
[   88.688408] x23: 0000000000000000 x22: 000000009799c000 x21: 0000000000000002
[   88.695525] x20: 00000000ffffffa8 x19: ffff000091dd5c10 x18: 0000000000000000
[   88.702643] x17: 913e3b33e73e0a08 x16: 0101000004c0e801 x15: 10805e90dd27b2db
[   88.709760] x14: b6df1600e8c55903 x13: b701913e3b33e73e x12: 0a080101000004c0
[   88.716878] x11: e80110805e90dd27 x10: b2dbb6df1600e8c5 x9 : 0000000000000600
[   88.723995] x8 : 064000409ad0dc05 x7 : 0000000000000001 x6 : 0000000000000000
[   88.731113] x5 : ffff80000858c33c x4 : 0000000000000000 x3 : 000000000000003f
[   88.738231] x2 : 0000000000000040 x1 : ffff00019799bf80 x0 : ffff00009799c000
[   88.745349] Call trace:
[   88.747780]  dcache_inval_poc+0x24/0x54
[   88.751598]  dma_direct_sync_single_for_cpu+0x3c/0x6c
[   88.756632]  dma_sync_single_for_cpu+0x30/0x3c
[   88.761059]  stmmac_napi_poll_rx+0x860/0xa7c
[   88.765312]  __napi_poll.constprop.0+0x30/0x154
[   88.769826]  net_rx_action+0x118/0x23c
[   88.773558]  handle_softirqs+0x1f8/0x28c
[   88.777464]  __do_softirq+0x10/0x18
[   88.780937]  __irq_exit_rcu+0x70/0xbc
[   88.784582]  irq_exit+0xc/0x18
[   88.787620]  handle_domain_irq+0x48/0x6c
[   88.791526]  gic_handle_irq+0x9c/0xfc
[   88.795172]  call_on_irq_stack+0x20/0x30
[   88.799078]  do_interrupt_handler+0x40/0x58
[   88.803244]  el1_interrupt+0x2c/0x54
[   88.806803]  el1h_64_irq_handler+0x14/0x1c
[   88.810882]  el1h_64_irq+0x74/0x78
[   88.814268]  arch_cpu_idle+0x14/0x20
[   88.817826]  default_idle_call+0x48/0x68
[   88.821732]  do_idle+0x12c/0x1e4
[   88.824944]  cpu_startup_entry+0x20/0x38
[   88.828850]  rest_init+0xe8/0xf4
[   88.832062]  arch_call_rest_init+0xc/0x14
[   88.836054]  start_kernel+0x61c/0x65c
[   88.839700]  __primary_switched+0xa0/0xa8
[   88.843698] Code: d1000443 ea03003f 8a230021 54000040 (d50b7e21) 
[   88.849772] ---[ end trace c5846783b615ddeb ]---
[   88.854371] Kernel panic - not syncing: Oops: Fatal exception in interrupt
[   88.861231] Kernel Offset: disabled
[   88.864698] CPU features: 0x6,00000100,a0300a42
[   88.869212] Memory Limit: none
[   88.872254] ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions