@ptesarik@infosec.exchange
There's only one task that #genAI is truly good at: Deceiving its users!
Change my mind.
Linux kernel hacker.
Pronouns: he/him
There's only one task that #genAI is truly good at: Deceiving its users!
Change my mind.
By negotiating with Mr. Putin, you are doing Russians a disservice.
Putin does not represent Russians, because there are no citizens in Russia, only prisoners of the ruler.
@amonakov@mastodon.gamedev.place
ALL_IDQ_UOPS = 198633974709
%UOPS.DSB = 62.3%
%UOPS.MITE = 27.6%
%UOPS.MS = 10.1%
The high proportion of micro-ops from the microcode sequencer is due to the rep movsb in raw_copy_from_user().
@amonakov@mastodon.gamedev.place Oh, and yes, I do see a lot of hits in its_return_thunk:
Samples β ffffffff81d940e0 :
β .skip 32, 0xcc
β SYM_CODE_START(its_return_thunk)
β UNWIND_HINT_FUNC
β ANNOTATE_NOENDBR
β ANNOTATE_UNRET_SAFE
β ret
6088 βffffffff81d940e0: β ret
β int3
βffffffff81d940e1: int3
@amonakov@mastodon.gamedev.place Interesting. Indeed, on a second thought, my explanation doesn't make much sense. But the observed reality is I can reliably get netperf throughput >= 1850 Mbps with CONFIG_RETHUNK=y but <= 1840 Mbps without (and with no other changes to the setup). All with tiny (64-byte) buffer size, so an extremely syscall-heavy workload.
EDIT: Obviously, the memory layout also changes, but I have checked that L1I cache misses are comparable (and approx. 0.1%) this time.
@amonakov@mastodon.gamedev.place But are you 100% certain that BPU cannot predict never-before-seen unconditional jumps?
The kernel may run slower.When I look at my benchmark results, I'm tempted to send a patch that adds: βThe kernel may run faster.β Because that's what I can see: On an Ice Lake system, jumping to a
(help text for CONFIG_MITIGATION_RETHUNK)
ret is faster than executing this ret in-place. π€―I think I even know why. The unconditional jump is fully handled by the BPU, which comes before IDQ. And since the return thunk is always at the same address, it is most likely in the Decoded ICache already. As a result, the return thunk allows the CPU to skip instruction decoding.
Does this explanation make sense, @amonakov@mastodon.gamedev.place?
The kernel may run slower.When I look at my benchmark results, I'm tempted to send a patch that adds: βThe kernel may run faster.β Because that's what I can see: On an Ice Lake system, jumping to a
(help text for CONFIG_MITIGATION_RETHUNK)
ret is faster than executing this ret in-place. π€―@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place On the bright side, there is little variation. Those 27% are surprisingly stable across many runs.
@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place
I'm done, but the result is a bit disappointing.
There is a 12,000% increase in ICACHE_DATA.STALLS between the GCC7 build and the GCC13 build, but AFAICT the GCC7 memory layout was simply extremely lucky to hit no L1I aliasing, and GCC13 layout is extremely unlucky to hit a lot of L1I aliasing on this specific Ice Lake CPU with this kernel version and configuration.
In short, if the performance of your code sucks, try re-ordering compile units and/or functions within a compile unit, and it'll get better. Or worse. But that's something you all knew already, isn't it?
There's one lesson learned, though:
With a little bit of luck, all of netperf fits into the L1 I-cache on modern CPUs.
With a little bit of bloomin' luck.
@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place I'm close to giving up without any result.
I'm trying to decide if my case is front-end bound. Intel Optimization Reference Manual says %FE_BOUND > 30% is front-end bound, and %FE_BOUND < 20% is not front-end bound. Needless to say, I'm getting approx. 27%β¦
@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place On the bright side, there is little variation. Those 27% are surprisingly stable across many runs.
Just got: βNo such file or directoryβ trying to view System.nap. Something for @vbabka to fix?
Cc @ljs@mastodonapp.uk
Started moving my GitHub projects to #codeberg. Quite smooth, actuallyβ¦
https://codeberg.org/ptesarik
UPDATE: A bit of churn to update inter-project links, but I hope a simple git grep github was enough to find them all.
Hm. What would be the best way to discontinue the GitHub repository now?
Started moving my GitHub projects to #codeberg. Quite smooth, actuallyβ¦
https://codeberg.org/ptesarik
UPDATE: A bit of churn to update inter-project links, but I hope a simple git grep github was enough to find them all.