View Single Post
Old 2022-09-15, 13:30   #2
bsquared's Avatar
Feb 2007

3×17×73 Posts

Originally Posted by Mysticial View Post
I can't say why anyone would want to do this, but I'm looking for a 2nd set of eyes.

Can this instruction:
    vpcompressd [mem]{k}, zmm
be emulated as follows:
    vpcompressd zmm0{k1}{z}, zmm0
    kmovd       eax, k1
    pext        eax, -1, eax     (the -1 in a register of course)
    kmovd       k1, eax
    vmovdqu32   [mem]{k1}, zmm0
I haven't tested this sequence yet, but I feel like this is too simple to be correct.
Compress epi32's into zmm (packing selected words into lower-order locations of zmm).
Pack the selected mask-bit locations into the lower-order bits of eax(k1).
Mask store them into unaligned memory.

Looks good to me!

Why do you want to do this, again?
bsquared is offline   Reply With Quote