Thread | Brutkey

@mpdesouza@floss.social Good.
I replaced the copyin caller _copy_from_iter with the GCC7 version. Slightly better now: down from -4.4% to -2.9%.
Now, I can continue replacing more callers until I replace the whole kernel and get the original performance, but that's rather pointless. Instead, since this first change seems to have some effect, let me try to understand why.
At this point, I believe it's related to the microarchitecture: I'm seeing this on 3rd Gen Intel Xeon Scalable (Ice Lake).
@ljs@mastodonapp.uk @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place

Lorenzo Stoakes
@ljs@mastodonapp.uk

@ptesarik@infosec.exchange @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place ah yeah I learnt in a harsh way how uArch can play a role...

Petr Tesařík
@ptesarik@infosec.exchange

@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place I'm close to giving up without any result.
I'm trying to decide if my case is front-end bound. Intel Optimization Reference Manual says %FE_BOUND > 30% is front-end bound, and %FE_BOUND < 20% is not front-end bound. Needless to say, I'm getting approx. 27%…

Petr Tesařík
@ptesarik@infosec.exchange

@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place On the bright side, there is little variation. Those 27% are surprisingly stable across many runs.

Lorenzo Stoakes
@ljs@mastodonapp.uk

@ptesarik@infosec.exchange @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place

Petr Tesařík
@ptesarik@infosec.exchange

@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place
Hail him, indeed!
🍷

shriek

Petr Tesařík
@ptesarik@infosec.exchange

@ljs@mastodonapp.uk @mpdesouza@floss.social @gnutools@fosstodon.org @amonakov@mastodon.gamedev.place
Hail him, indeed!
🍷

shriek