Twitter | Pretraživanje | |
Liran Alon
2.020
Tweetovi
687
Pratim
574
Osobe koje vas prate
Tweetovi
Liran Alon 31. sij
I later saw this great talk: that explains how RISC may achieve CISC-like perf with clever micro-arch tricks. Main concepts is Macro-Fusion in which decoder generates a single uOp for multiple MacroOps & CISC having both 2 and 4 bytes instructions.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 31. sij
Odgovor korisniku/ci @fagiolinux @cynicalsecurity
It would've been great to see a technical paper comparing x86, ARM & RISC-V and explicitly calling out the advantages and disadvantages of each and the various factors and tradeoff decisions made to reach to these specific ISA designs. Otherwise, it's just handwaving discussions.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 27. sij
Odgovor korisniku/ci @herbertbos @vu5ec
What makes these considered as new vulns is that current microcode MD_CLEAR implementation don't flush the micro-arch buffers that may theoretically propagate to MDS-leakable buffers post executing MD_CLEAR. Thus, mitigation requires microcode update but no software change.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
RS entries have operand fields with either data or tagged with PSrc such that they will capture their data when it is sent on common bus by functional units executing dependent uOps producing PSrc data. RS entries are dispatched to ports only once all operand are resolved...
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
I don't think that the RS is what performs the load uOps re-dispatch. The RS entry for the load uOps is suppose to deallocate once dispatched to load-port (i.e. AGU and then MOB). The load-buffer entry in MOB is what it's re-dispatched to DCU once DTLB resolve physical address.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
The MCA generated uOps that implements page-walks are not "stuffed loads". They have valid PDst and have an allocated RS and ROB entry. i.e. The non-speculative page-walk that happens at retirement do not perform "stuffed loads". This is described in patent.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
Well, more precisely they first go through AGU to generate linear address and then go to MOB. Which enforce proper memory ordering, perform linear address translation, dispatched to DCU, support store-to-load-forwarding and etc.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
When uOps are dispatched from FrontEnd Allocator to OoO BackEnd engine, it creates ROB and RS entries (After register renaming and etc.). RS holds uOps to be dispatched to functional-units via ports when their operands are ready. load/store uOps are dispatched from RS to MOB.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
Right. It's a good question (now that I understand it). I believe PMH is triggered anyway because of hardware simplicity (It's a slow-path not worth optimising). DTLB only needs wires to MOB and not ROB to mark entry with MCA. PMH needs such functionality anyway to trigger PF.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
Yes it doesn't interrupt OoO flow of course. i.e. Other load/stores in MOB that doesn't require PMH. Yes, the load remains in the MOB (Not RS) until the translation arrives and when then it can be dispatched to DCU.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
Maybe you're asking why in case of D==0 (In contrast to A==0 that may need to set multiple A bits in multiple PT levels), the DTLB doesn't at this point just mark uOp with MCA at retirement and skip trigger PMH. It's because D may be 1 in mem (DTLB stale) and also not worth opt.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
What do you mean by parallel? In case of store when dirty-bit is cleared, TLB just asserts a miss signal that triggers PMH. Similar to not having a TLB match at all. Thus, load/store-entry in MOB wasn't given physical address which prevents MOB to dispatch it to DCU.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
I'm not sure I understand the question. Every speculative execution path that is eventually not retired is by definition a perf hit. As all ROB is flushed and all uOps that have executed speculatively and not retired was just a waste of CPU cycles. Nothing special for this case..
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
That part is consistent yes. My "not exactly" was referring to the separate handling you described between access and dirty bits. You must avoid setting either of them in case of speculative exec and they don't have different type of MCAs.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
The first walk done by PMH while uOps was executing speculatively, could have been aborted when PMH deduced that it's about to access non-speculative memory. In this case, PMH marks parent uOp to redispatch only at retirement which would re-trigger PMH as-well.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @damageboy @trav_downs i 2 ostali
The dirty-bit is cached in DTLB for trivial reason. It's not different than perm bits. i.e. When dirty-bit is cleared, loads from linear address should hit DTLB while stores should miss and trigger PMH. Similarly, Shadow MMU implementations mark SPTE RO if D-bit is cleared.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
When retirement circuit try to retire parent uOp, it checks if uOp is marked to trigger MCA (Signalled by PMH if need to set A/D bits). If yes, ROB is flushed and MCA generates new uOps that implements page-walk and set A/D bits. Else, PMH performs page-walk as usual.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @trav_downs @damageboy i 2 ostali
Not exactly. DTLB just reports miss for either load/store when A-bit 0 or store when D-bit 0. This triggers PMH to PT-walk (May use PTE caches). If mem is non-speculative (By MTRR/PAT) or requires to modify A/D bits, PMH aborts walk and signal uOp to re-dispatch at retirement.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 23. sij
Odgovor korisniku/ci @damageboy @geofflangdale i 2 ostali
Read this Intel patent: . In short, the access & dirty bits are set by the PMH that executes uOps dispatched by microcode-assist (MA) which is triggered on parent uOp retirement. The parent uOps is set to trigger the MA on retirement by the PMH as-well.
Reply Retweet Označi sa "sviđa mi se"
Liran Alon 18. sij
Odgovor korisniku/ci @dwizzzleMSFT @smealum @rsinghal1
This discussion may interest you.
Reply Retweet Označi sa "sviđa mi se"