@vbabka@mastodon.social
@amonakov@mastodon.gamedev.place @ptesarik@infosec.exchange @ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org in the end it turned out to be quite logical. The code that got its performance improved was fftw with certain buffer sizes, which happened to leave dirty output data in the cache after execution. On another immediate execution it would read cold input data, forcing a flush of the dirty cache, slowing itself down. Interleaving execution means the other code paid the price of the flush and possibly leaving clean data in the cache...
@ptesarik@infosec.exchange
@vbabka@mastodon.social @amonakov@mastodon.gamedev.place @ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org Those were easy timesβ¦
Now imagine an Ice Lake with a micro-op queue fed by the IDQ (Instruction Decode Queue), which offers three paths: DSB (decoded Icache), MITE (legacy decode pipeline) and MS (microcode sequencer).