mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Programming (https://www.mersenneforum.org/forumdisplay.php?f=29)
-   -   Emulating AVX512 vpcompressd (https://www.mersenneforum.org/showthread.php?t=28069)

 Mysticial 2022-09-15 02:10

Emulating AVX512 vpcompressd

I can't say why anyone would want to do this, but I'm looking for a 2nd set of eyes.

Can this instruction:
[CODE] vpcompressd [mem]{k}, zmm[/CODE]be emulated as follows:
[CODE]
vpcompressd zmm0{k1}{z}, zmm0
kmovd eax, k1
pext eax, -1, eax (the -1 in a register of course)
kmovd k1, eax
vmovdqu32 [mem]{k1}, zmm0
[/CODE]I haven't tested this sequence yet, but I feel like this is too simple to be correct.

 bsquared 2022-09-15 13:30

[QUOTE=Mysticial;613465]I can't say why anyone would want to do this, but I'm looking for a 2nd set of eyes.

Can this instruction:
[CODE] vpcompressd [mem]{k}, zmm[/CODE]be emulated as follows:
[CODE]
vpcompressd zmm0{k1}{z}, zmm0
kmovd eax, k1
pext eax, -1, eax (the -1 in a register of course)
kmovd k1, eax
vmovdqu32 [mem]{k1}, zmm0
[/CODE]I haven't tested this sequence yet, but I feel like this is too simple to be correct.[/QUOTE]

Compress epi32's into zmm (packing selected words into lower-order locations of zmm).
Pack the selected mask-bit locations into the lower-order bits of eax(k1).
Mask store them into unaligned memory.

Looks good to me!

Why do you want to do this, again? :razz:

 Mysticial 2022-09-15 17:31

[QUOTE=bsquared;613500]Compress epi32's into zmm (packing selected words into lower-order locations of zmm).
Pack the selected mask-bit locations into the lower-order bits of eax(k1).
Mask store them into unaligned memory.

Looks good to me!

Why do you want to do this, again? :razz:[/QUOTE]

Thanks! I think it could also be done as:
[CODE]
vpcompressd zmm0{k1}{z}, zmm0
vpcompressd zmm1{k1}{z}, [-1] (constant of all 1s)
vpcmpd k1, zmm1, [-1], 0 (constant of all 1s), compare for equality
vmovdqu32 [mem]{k1}, zmm0[/CODE]There may be a reason to do this soon.

 janwas 2022-09-26 15:40

Ah :) Now the reason becomes clear (Zen4).
FWIW users of compressstoreu will typically want to know the popcount of the mask, so they can increment pointers. Seems like that would work better with the PEXT approach because popcount also requires the mask in a GPR.

 bsquared 2022-09-26 17:28

[QUOTE=janwas;614206]Ah :) Now the reason becomes clear (Zen4).
FWIW users of compressstoreu will typically want to know the popcount of the mask, so they can increment pointers. Seems like that would work better with the PEXT approach because popcount also requires the mask in a GPR.[/QUOTE]

Agreed. Yafu uses compressstoreu along with popcount in its bucket sieve.

 All times are UTC. The time now is 17:53.