![]() |
![]() |
#1 |
Sep 2016
2×5×37 Posts |
![]()
I can't say why anyone would want to do this, but I'm looking for a 2nd set of eyes.
Can this instruction: Code:
vpcompressd [mem]{k}, zmm Code:
vpcompressd zmm0{k1}{z}, zmm0 kmovd eax, k1 pext eax, -1, eax (the -1 in a register of course) kmovd k1, eax vmovdqu32 [mem]{k1}, zmm0 Last fiddled with by Mysticial on 2022-09-15 at 02:11 |
![]() |
![]() |
![]() |
#2 | |
"Ben"
Feb 2007
3,733 Posts |
![]() Quote:
Pack the selected mask-bit locations into the lower-order bits of eax(k1). Mask store them into unaligned memory. Looks good to me! Why do you want to do this, again? ![]() |
|
![]() |
![]() |
![]() |
#3 | |
Sep 2016
2·5·37 Posts |
![]() Quote:
Thanks! I think it could also be done as: Code:
vpcompressd zmm0{k1}{z}, zmm0 vpcompressd zmm1{k1}{z}, [-1] (constant of all 1s) vpcmpd k1, zmm1, [-1], 0 (constant of all 1s), compare for equality vmovdqu32 [mem]{k1}, zmm0 |
|
![]() |
![]() |
![]() |
#4 |
Sep 2022
2 Posts |
![]()
Ah :) Now the reason becomes clear (Zen4).
FWIW users of compressstoreu will typically want to know the popcount of the mask, so they can increment pointers. Seems like that would work better with the PEXT approach because popcount also requires the mask in a GPR. |
![]() |
![]() |
![]() |
#5 |
"Ben"
Feb 2007
3,733 Posts |
![]()
Agreed. Yafu uses compressstoreu along with popcount in its bucket sieve.
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
AVX512-IFMA cpus | bsquared | Hardware | 17 | 2020-11-10 12:15 |
AVX512 hardware recommendations? | kriesel | Hardware | 60 | 2020-06-23 01:05 |
AVX512 performance on new shiny Intel kit | heliosh | Hardware | 19 | 2020-01-18 04:01 |
29.5 build 5 beta with AVX512 optimizations shows a 15% speed increase | simon389 | Software | 20 | 2018-12-13 21:01 |