Brutkey

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@vbabka@mastodon.social @amonakov@mastodon.gamedev.place @ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org Those were easy times…
Now imagine an Ice Lake with a micro-op queue fed by the IDQ (Instruction Decode Queue), which offers three paths: DSB (decoded Icache), MITE (legacy decode pipeline) and MS (microcode sequencer).

Alexander Monakov
@amonakov@mastodon.gamedev.place

@ptesarik@infosec.exchange @vbabka@mastodon.social @ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org I think I'd rather work with these than Pentium4's "trace cache"


Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@amonakov@mastodon.gamedev.place @vbabka@mastodon.social @ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org I can feel the pain. Then again, I still missed an important bit on my Ice Lake case. Most likely, it's not L1I aliasing but BPU (Branch Prediction Unit) aliasing. Although the statistics counters don't show any relevant difference between the fast and slow builds, it seems that a single mispredicted branch sends the instruction decoder onto the wrong path, causes a penalty for switching from DSB to MITE and evicts useful information from L1 I-Cache. Unfortunately, I'm unable to confirm this hypothesis, because more tracing also ruins the equilibrium, of course. But if true, it's insane that a single case of bad speculation can cost over 4% in a microbenchmark.