mersenneforum.org  

Go Back   mersenneforum.org > Search Forums

Showing results 1 to 25 of 370
Search took 0.06 seconds.
Search: Posts Made By: Mysticial
Forum: y-cruncher 2022-11-12, 12:04
Replies: 11
Views: 1,461
Posted By Mysticial
Just curious, what happens if you remove all the...

Just curious, what happens if you remove all the computation and test the raw access pattern?

This could be a good starting for point inserting hardware counters.
Forum: y-cruncher 2022-11-12, 11:45
Replies: 11
Views: 1,461
Posted By Mysticial
The mesh cache is inferior to ring cache in...

The mesh cache is inferior to ring cache in almost every performance aspect. Core-to-core latency is about double. Bandwidth is terrible (only twice the DRAM bandwidth).

It's bad enough that I...
Forum: y-cruncher 2022-11-09, 22:43
Replies: 11
Views: 1,461
Posted By Mysticial
Are the writes normal writes or non-temporal? ...

Are the writes normal writes or non-temporal?

From what I've read, Skylake X's mesh cache takes longer to retire NT-stores which can clog up your reorder buffer.

Also, about your multi-threaded...
Forum: y-cruncher 2022-11-09, 10:22
Replies: 11
Views: 1,461
Posted By Mysticial
Is adjacent line prefetch on or off? Because that...

Is adjacent line prefetch on or off? Because that effectively doubles the cache line sizes to 128 bytes. So if you're only touching 64 bytes at a time, you're doubling up your bandwidth usage. (not...
Forum: Information & Answers 2022-10-12, 00:04
Replies: 86
Views: 19,310
Posted By Mysticial
Actually I lied. It's not 4GB to hold the...

Actually I lied. It's not 4GB to hold the transform data, it's just 2GB. I forgot the factor of two from the wrap-around.

So it's read+write 4GB of memory per iteration, or about ~60ms...
Forum: Information & Answers 2022-10-11, 23:42
Replies: 86
Views: 19,310
Posted By Mysticial
If we want to look at theoreticals, a 2^33-bit...

If we want to look at theoreticals, a 2^33-bit convolution using the optimal NTT method will need ~4GB of ram to hold the transform data.

Assuming standard 2-pass approach will full pass-merging...
Forum: Information & Answers 2022-10-11, 23:31
Replies: 86
Views: 19,310
Posted By Mysticial
The best implementation I have needs about ~0.65...

The best implementation I have needs about ~0.65 seconds to do a ~2^33-bit multiply convolution on my 7950X with memory @ 4400.

Squaring would certainly be faster, though I don't have benchmarks...
Forum: Information & Answers 2022-10-11, 17:41
Replies: 86
Views: 19,310
Posted By Mysticial
What's the current state-of-the-art iteration...

What's the current state-of-the-art iteration time on F33?

I think this is well into the region where NTTs are gonna win simply based on memory usage. And a power-of-two convolution length does...
Forum: Hardware 2022-09-30, 16:35
Replies: 21
Views: 96,334
Posted By Mysticial
I bet there's some valid reason to it. It only...

I bet there's some valid reason to it. It only took me like 2 minutes to figure out how to emulate it, so I doubt they'd have overlooked it.

This post suggests it could be a last minute bug that...
Forum: Hardware 2022-09-28, 15:00
Replies: 21
Views: 96,334
Posted By Mysticial
Oh hi Peter. Long time no see! :lol: ...

Oh hi Peter. Long time no see! :lol:



Same code on Intel is fast. My test does reuse the same address, but when I replace it with a regular masked store it's only 20 cycles/instruction - from...
Forum: Hardware 2022-09-28, 02:10
Replies: 21
Views: 96,334
Posted By Mysticial
Some corrections and clarifications: After...

Some corrections and clarifications:

After speaking with Travis Downs, I think the cost is actually O(N*log(N)) instead of O(N^2) to the granularity. I missed that the # of bits in each lane...
Forum: Hardware 2022-09-27, 15:01
Replies: 20
Views: 3,462
Posted By Mysticial
The memory is rated for 4800, but the mobo is...

The memory is rated for 4800, but the mobo is running it at 4400. But as I mentioned in the other thread, I can't access the BIOS to tinker with anything.


AFAICT, 4400 is considered an overclock...
Forum: Hardware 2022-09-27, 07:42
Replies: 20
Views: 3,462
Posted By Mysticial
And the 1k benchmark: (with PBO enabled) ...

And the 1k benchmark: (with PBO enabled)

Timings for 1K FFT length (16 cores, 1 worker): 0.00 ms. Throughput: 1040087.82 iter/sec.
Timings for 1K FFT length (16 cores, 2 workers): 0.00, 0.00...
Forum: Hardware 2022-09-27, 07:09
Replies: 21
Views: 96,334
Posted By Mysticial
Dropped the benchmarks in a new thread:...

Dropped the benchmarks in a new thread: https://www.mersenneforum.org/showthread.php?t=28107

--------------------


Finally got around to looking at more of the regular reviews. Looks like some...
Forum: Hardware 2022-09-27, 06:50
Replies: 20
Views: 3,462
Posted By Mysticial
Zen4 7950X Benchmarks

Chip is running at stock. Memory is 4 x 16GB @ 4400 MT/s. So the memory is actually quite slow here.

Have results for both PBO on (up to 5.7 GHz) and PBO off (4.5 GHz) since I forgot to turn it on...
Forum: Hardware 2022-09-26, 19:52
Replies: 21
Views: 96,334
Posted By Mysticial
Yeah, I can run those later tonight.

Yeah, I can run those later tonight.
Forum: Hardware 2022-09-26, 19:26
Replies: 21
Views: 96,334
Posted By Mysticial
I can't actually access the BIOS because of a...

I can't actually access the BIOS because of a video issue. So there's no display until it boots into some OS. Linux never seems to get past the bootloader phase and I can't debug it because of no...
Forum: Hardware 2022-09-26, 19:12
Replies: 21
Views: 96,334
Posted By Mysticial
Unfortunately, my engineering mobo is so janky...

Unfortunately, my engineering mobo is so janky that I was never able to boot Linux on it. So any Linux tests will need to wait until I get a proper motherboard for it. (the joys of getting hardware...
Forum: Hardware 2022-09-26, 12:59
Replies: 21
Views: 96,334
Posted By Mysticial
Zen4's AVX512 Teardown

Embargo has lifted! So here's that teardown I've promised!

I won't get into how this happened, but AMD graciously sent me two test setups this year. A retail Zen3 setup back in January, and an ...
Forum: Programming 2022-09-15, 17:31
Replies: 4
Views: 1,706
Posted By Mysticial
Thanks! I think it could also be done as: ...

Thanks! I think it could also be done as:

vpcompressd zmm0{k1}{z}, zmm0
vpcompressd zmm1{k1}{z}, [-1] (constant of all 1s)
vpcmpd k1, zmm1, [-1], 0 (constant of all...
Forum: y-cruncher 2022-09-15, 17:15
Replies: 12
Views: 3,744
Posted By Mysticial
Oh woah! Was not expecting this! Looks like...

Oh woah! Was not expecting this!

Looks like I'll make it a new thread in the Hardware section. :razz:
Forum: y-cruncher 2022-09-15, 03:15
Replies: 12
Views: 3,744
Posted By Mysticial
Problem is that I'll be linking externally. :lol:

Problem is that I'll be linking externally. :lol:
Forum: Programming 2022-09-15, 02:10
Replies: 4
Views: 1,706
Posted By Mysticial
Emulating AVX512 vpcompressd

I can't say why anyone would want to do this, but I'm looking for a 2nd set of eyes.

Can this instruction:
vpcompressd [mem]{k}, zmmbe emulated as follows:

vpcompressd zmm0{k1}{z},...
Forum: y-cruncher 2022-09-15, 01:08
Replies: 12
Views: 3,744
Posted By Mysticial
Just wondering: Where should I post my Zen4...

Just wondering:

Where should I post my Zen4 AVX512 breakdown when the embargo lifts?

If I post it here, it's already buried beneath a bunch of posts. If I post a new thread in the Hardware...
Forum: y-cruncher 2022-09-04, 07:53
Replies: 12
Views: 3,744
Posted By Mysticial
The other problem with launch day reviews is that...

The other problem with launch day reviews is that they fail to capture the new product with optimizations for it. Simply because the developers of the benchmark/game have not had the opportunity to...
Showing results 1 to 25 of 370

 
All times are UTC. The time now is 10:18.


Mon Dec 5 10:18:00 UTC 2022 up 109 days, 7:46, 0 users, load averages: 0.66, 0.71, 0.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔