Brutkey

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

Linux kernel hacker.
Pronouns: he/him


Notes
443
Following
0
Followers
0
BLOG
https://sigillatum.tesarici.cz/
GITHUB
https://github.com/ptesarik/
Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

There's only one task that #genAI is truly good at: Deceiving its users!
Change my mind.


Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange
global politics

By negotiating with Mr. Putin, you are doing Russians a disservice.
Putin does not represent Russians, because there are no citizens in Russia, only prisoners of the ruler.

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@amonakov@mastodon.gamedev.place Oh, and yes, I do see a lot of hits in its_return_thunk:

Samples β”‚        ffffffff81d940e0 :
        β”‚        .skip 32, 0xcc
        β”‚        SYM_CODE_START(its_return_thunk)
        β”‚        UNWIND_HINT_FUNC
        β”‚        ANNOTATE_NOENDBR
        β”‚        ANNOTATE_UNRET_SAFE
        β”‚        ret
   6088 β”‚ffffffff81d940e0: ← ret
        β”‚        int3
        β”‚ffffffff81d940e1:   int3

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@amonakov@mastodon.gamedev.place Interesting. Indeed, on a second thought, my explanation doesn't make much sense. But the observed reality is I can reliably get netperf throughput >= 1850 Mbps with CONFIG_RETHUNK=y but <= 1840 Mbps without (and with no other changes to the setup). All with tiny (64-byte) buffer size, so an extremely syscall-heavy workload.
EDIT: Obviously, the memory layout also changes, but I have checked that L1I cache misses are comparable (and approx. 0.1%) this time.

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@amonakov@mastodon.gamedev.place But are you 100% certain that BPU cannot predict never-before-seen unconditional jumps?

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

I think I even know why. The unconditional jump is fully handled by the BPU, which comes before IDQ. And since the return thunk is always at the same address, it is most likely in the Decoded ICache already. As a result, the return thunk allows the CPU to skip instruction decoding.
Does this explanation make sense,
@amonakov@mastodon.gamedev.place?

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

The kernel may run slower.
(help text for CONFIG_MITIGATION_RETHUNK)
When I look at my benchmark results, I'm tempted to send a patch that adds: β€œThe kernel may run faster.β€œ Because that's what I can see: On an Ice Lake system, jumping to a ret is faster than executing this ret in-place. 🀯🀯

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place
I'm done, but the result is a bit disappointing.

There is a 12,000% increase in
ICACHE_DATA.STALLS between the GCC7 build and the GCC13 build, but AFAICT the GCC7 memory layout was simply extremely lucky to hit no L1I aliasing, and GCC13 layout is extremely unlucky to hit a lot of L1I aliasing on this specific Ice Lake CPU with this kernel version and configuration.

In short, if the performance of your code sucks, try re-ordering compile units and/or functions within a compile unit, and it'll get better. Or worse. But that's something you all knew already, isn't it?

There's one lesson learned, though:
With a little bit of luck, all of
netperf fits into the L1 I-cache on modern CPUs.
With a little bit of bloomin' luck.

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place I'm close to giving up without any result.
I'm trying to decide if my case is front-end bound. Intel Optimization Reference Manual says %FE_BOUND > 30% is front-end bound, and %FE_BOUND < 20% is not front-end bound. Needless to say, I'm getting approx. 27%…

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place On the bright side, there is little variation. Those 27% are surprisingly stable across many runs.

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

Just got: β€œNo such file or directory” trying to view System.nap. Something for @vbabka to fix?
Cc
@ljs@mastodonapp.uk

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

Hm. What would be the best way to discontinue the GitHub repository now?

Petr TesaΕ™Γ­k
@ptesarik@infosec.exchange

Started moving my GitHub projects to #codeberg. Quite smooth, actually…
https://codeberg.org/ptesarik

UPDATE: A bit of churn to update inter-project links, but I hope a simple git grep github was enough to find them all.