Brutkey

Alexander Monakov
@amonakov@mastodon.gamedev.place

@ptesarik@infosec.exchange have you checked how much of the microbenchmark runs out of the DSB? I'm actually curious how much repeated decoding happens there.

I'm very very surprised that you see no slowdown from rethunk's forced return mispredictions. Unless the hunks are somehow not active in your case? Do you see them if you do 'perf record'/'perf report'?

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@amonakov@mastodon.gamedev.place
ALL_IDQ_UOPS = 198633974709
%UOPS.DSB = 62.3%
%UOPS.MITE = 27.6%
%UOPS.MS = 10.1%

The high proportion of micro-ops from the microcode sequencer is due to the
rep movsb in raw_copy_from_user().


Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@amonakov@mastodon.gamedev.place Oh, and yes, I do see a lot of hits in its_return_thunk:

Samples β”‚        ffffffff81d940e0 :
        β”‚        .skip 32, 0xcc
        β”‚        SYM_CODE_START(its_return_thunk)
        β”‚        UNWIND_HINT_FUNC
        β”‚        ANNOTATE_NOENDBR
        β”‚        ANNOTATE_UNRET_SAFE
        β”‚        ret
   6088 β”‚ffffffff81d940e0: ← ret
        β”‚        int3
        β”‚ffffffff81d940e1:   int3

Alexander Monakov
@amonakov@mastodon.gamedev.place

@ptesarik@infosec.exchange ah, this its_return_thunk is new, it doesn't desync the return address prediction stack!

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@amonakov@mastodon.gamedev.place Oh, right, I thought I made it clear that this is a jmp to a ret, nothing more.