|
@Liran_Alon | |||||
|
Given Intel DDIO provides device with direct access to limited set of LLC ways, I would also expect to have a non-temporal store instruction that not only write directly to LLC, but can be hinted to write to DDIO-accessible LLC ways. E.g. To accelerate NIC/NVMe submissions. (3/3)
|
||||||
|
||||||
|
Liran Alon
@Liran_Alon
|
16. pro |
|
Q: Producer/Consumer ring is a common pattern for high perf comm between 2 CPU cores or CPU core & device. Thus, I expected Intel to have non-temporal store instruction that write to LLC without polluting L1/L2. Useful also with device DDIO. But MOVNT* also bypass LLC. Why? (1/3)
|
||
|
|
||
|
Liran Alon
@Liran_Alon
|
16. pro |
|
i.e. Producer isn't expected to read descriptors it writes to submission queue. Thus, no need to load their cache-lines to producer's L1/L2. Which also hurts Consumer latency on reading them. Thoughts? (2/3)
|
||
|
|
||
|
Matthew S. Wilson
@_msw_
|
16. pro |
|
Cc @rsinghal1
|
||
|
|
||
|
Elazar Leibovich
@elazarl
|
16. pro |
|
How can you be sure client and server are sharing L2 cache? It limits portability of code.
If you're adding an instruction, make it "send to ACPIC_ID, mem", which will make sure the write is visible to the other CPU w/ minimal perf hit.
See also: twitter.com/AviKivity/stat…
|
||
|
|
||
|
Liran Alon
@Liran_Alon
|
16. pro |
|
Cool, I didn't remember that @AviKivity brought up similar question. Writing directly to target's L1/L2 cache in case of core2core comm is indeed an even better non-temporal store for that case. For devices, you want to write to LLC DDIO-accessible ways. But yes idea is the same.
|
||
|
|
||