Skip to content

Conversation

@almayne
Copy link

@almayne almayne commented Nov 24, 2025

Significant performance improvements are gained by the proposed changes on both c7g (NEOVERSEV1) and c8g (NEOVERSEV2) instances. To reproduce these values you need to run with OMP_ADAPTIVE=1. The plots below show the average time taken for 10000 iterations of sgemv operations on increasing square matrix/vector sizes, from 2x2 through to 1024x1024. The x axis reaches 2046 as we first run sgemv without transposition, then with. I've also include plots of the ratio, relative to the original stats (lower is better). These generate the following stats:
Geometric mean for c7g_sgemv.txt: 0.890437968914142
Geometric mean for c8g_sgemv.txt: 0.7884951978206536

image image image image

@aditew01
Copy link
Contributor

cc: @martin-frbg @Mousius

: (MN < 1050625L) ? MIN(ncpu, 40)
: ncpu;
#else
return (MN < 25600L) ? 1
Copy link
Contributor

@aditew01 aditew01 Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to guard it for NEOVERSEV2 / NEOVERSEN2?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline, happy that this is done by the calling function. No changes needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants