Twitter | Search | |
Ian Cutress
Intel released a new optimization manual. To get the best local latency, enable on-die NUMA/clustering. 4 NUMA nodes per CPU. Chapter 8.
Reply Retweet Like More
Guillermo Lovato Jul 12
Replying to @IanCutress
i wonder the implications for ESX, my guess is that it should be enabled in all virt scenarios
Reply Retweet Like
Stephen Brooks 🦆 Jul 12
Replying to @IanCutress
Is there any significant difference between the 4x4 and 2x2 matrix? The 4x4 basically looks like the 2x2 expanded into blocks + some noise
Reply Retweet Like
Kevin Krewell Jul 12
Replying to @IanCutress
That's the same NUMA clustering as EPYC.
Reply Retweet Like
Ryan Shrout Jul 12
Replying to @Krewell @IanCutress
Oh the irony.
Reply Retweet Like
Ian Cutress Jul 12
Replying to @stephenjbrooks
Dual socket - so cross socket comms
Reply Retweet Like
Stephen Brooks 🦆 Jul 12
Replying to @IanCutress
So the 2x2 is 2 sockets = 2 nodes, the 4x4 is 2 sockets = 4 nodes (one per CPU?)
Reply Retweet Like
Ken Mitchell Jul 12
Replying to @IanCutress
see Chapter 8: Introducing sub-numa clustering
Reply Retweet Like
Ian Cutress Jul 12
Replying to @KenMitchellKen
Yup, that's where I got it from ;)
Reply Retweet Like
Ian Cutress Jul 12
Replying to @KenMitchellKen
Did you see anything else new/different in this manual version? I've not flicked through yet
Reply Retweet Like
NerdTech Jul 12
Replying to @IanCutress
What was the recommendation for Broadwell-E, 2 NUMA nodes per CPU due to 2 separate Ringbuses? 😓
Reply Retweet Like
Jorge De Pedro Jul 13
Replying to @IanCutress
I don't see a really good advantage of using SNC, the disparities on latencies are quite substantial in socket interconnections
Reply Retweet Like
Ken Mitchell Jul 24
Replying to @IanCutress
2017-06 vs 2016-06 changes include: 2.1, 8, 13, 16, B.2, D
Reply Retweet Like