|
@Liran_Alon | |||||
|
Having said that, I wonder if on these scenarios it's sufficiently ok to just wmb()+writeX_relaxed() on write to doorbell even though it exec unnecessary SFENCE on Intel. Because probably it cause implicit SFENCE on write to UC to be much faster? This is all very weird... (3/3)
|
||||||
|
||||||
|
Liran Alon
@Liran_Alon
|
29. pro |
|
Encountered a strange x86 cache-coherency inconsistency: Intel guarantees to flush WCBs on read/write UC mem but @AMD does so only for read. If true, Linux should have new flush_wcb_writeX() util that differ between CPU vendors? (1/3)@fagiolinux @_msw_ @DanielMarcovit3 @_AlexGraf pic.twitter.com/kT3Tw6VBKz
|
||
|
|
||
|
Liran Alon
@Liran_Alon
|
29. pro |
|
This applies to some NIC drivers I recently reviewed. They have a feature that Tx desc is written to PCI BAR mapped as WC (Instead to mem) to avoid one DMA read. Thus, only on AMD they require wmb() before writing to doorbell (UC). For example, mlx4 BlueFlame feature. (2/3)
|
||
|
|
||
|
Liran Alon
@Liran_Alon
|
30. pro |
|
Also on ARM64, wmb()+writeX_relaxed() compared to writel() will change dma_wmb() to wmb() unnecessarily. As dma_wmb()==DMB(OSHST) is sufficient to flush WCBs. I'm not sure if write to doorbell (UC Device mem) does implicit wmb()==DSB(ST) anyway as in x86 Intel. ARM expert here?..
|
||
|
|
||
|
Matthew S. Wilson
@_msw_
|
29. pro |
|
ENA driver also has a "low latency queue" mode. Will take a closer look once back in the office...
|
||
|
|
||
|
Liran Alon
@Liran_Alon
|
29. pro |
|
I know. I already submitted a patch that fix this to relevant AWS team. We can talk private on this. :)
|
||
|
|
||