mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2022-09-15, 02:10   #1
Mysticial
 
Mysticial's Avatar
 
Sep 2016

2·5·37 Posts
Default Emulating AVX512 vpcompressd

I can't say why anyone would want to do this, but I'm looking for a 2nd set of eyes.

Can this instruction:
Code:
    vpcompressd [mem]{k}, zmm
be emulated as follows:
Code:
    vpcompressd zmm0{k1}{z}, zmm0
    kmovd       eax, k1
    pext        eax, -1, eax     (the -1 in a register of course)
    kmovd       k1, eax
    vmovdqu32   [mem]{k1}, zmm0
I haven't tested this sequence yet, but I feel like this is too simple to be correct.

Last fiddled with by Mysticial on 2022-09-15 at 02:11
Mysticial is offline   Reply With Quote
Old 2022-09-15, 13:30   #2
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

72118 Posts
Default

Quote:
Originally Posted by Mysticial View Post
I can't say why anyone would want to do this, but I'm looking for a 2nd set of eyes.

Can this instruction:
Code:
    vpcompressd [mem]{k}, zmm
be emulated as follows:
Code:
    vpcompressd zmm0{k1}{z}, zmm0
    kmovd       eax, k1
    pext        eax, -1, eax     (the -1 in a register of course)
    kmovd       k1, eax
    vmovdqu32   [mem]{k1}, zmm0
I haven't tested this sequence yet, but I feel like this is too simple to be correct.
Compress epi32's into zmm (packing selected words into lower-order locations of zmm).
Pack the selected mask-bit locations into the lower-order bits of eax(k1).
Mask store them into unaligned memory.

Looks good to me!

Why do you want to do this, again?
bsquared is offline   Reply With Quote
Old 2022-09-15, 17:31   #3
Mysticial
 
Mysticial's Avatar
 
Sep 2016

2·5·37 Posts
Default

Quote:
Originally Posted by bsquared View Post
Compress epi32's into zmm (packing selected words into lower-order locations of zmm).
Pack the selected mask-bit locations into the lower-order bits of eax(k1).
Mask store them into unaligned memory.

Looks good to me!

Why do you want to do this, again?

Thanks! I think it could also be done as:
Code:
    vpcompressd zmm0{k1}{z}, zmm0
    vpcompressd zmm1{k1}{z}, [-1]      (constant of all 1s)
    vpcmpd k1, zmm1, [-1], 0           (constant of all 1s), compare for equality
    vmovdqu32   [mem]{k1}, zmm0
There may be a reason to do this soon.
Mysticial is offline   Reply With Quote
Old 2022-09-26, 15:40   #4
janwas
 
Sep 2022

210 Posts
Default

Ah :) Now the reason becomes clear (Zen4).
FWIW users of compressstoreu will typically want to know the popcount of the mask, so they can increment pointers. Seems like that would work better with the PEXT approach because popcount also requires the mask in a GPR.
janwas is offline   Reply With Quote
Old 2022-09-26, 17:28   #5
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

E8916 Posts
Default

Quote:
Originally Posted by janwas View Post
Ah :) Now the reason becomes clear (Zen4).
FWIW users of compressstoreu will typically want to know the popcount of the mask, so they can increment pointers. Seems like that would work better with the PEXT approach because popcount also requires the mask in a GPR.
Agreed. Yafu uses compressstoreu along with popcount in its bucket sieve.
bsquared is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
AVX512-IFMA cpus bsquared Hardware 17 2020-11-10 12:15
AVX512 hardware recommendations? kriesel Hardware 60 2020-06-23 01:05
AVX512 performance on new shiny Intel kit heliosh Hardware 19 2020-01-18 04:01
29.5 build 5 beta with AVX512 optimizations shows a 15% speed increase simon389 Software 20 2018-12-13 21:01

All times are UTC. The time now is 13:26.


Sat Nov 26 13:26:44 UTC 2022 up 100 days, 10:55, 1 user, load averages: 1.32, 1.10, 1.06

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔