mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2020-11-14, 22:53   #45
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

24BA16 Posts
Default

Quote:
Originally Posted by PhilF View Post
That's a cool test bench!
ROFLMAO... Clearly Mike doesn't have cats...
chalsall is online now   Reply With Quote
Old 2020-11-15, 00:30   #46
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

23·3·331 Posts
Default

Our study is a cat-free environment. The rest of the house?

Xyzzy is offline   Reply With Quote
Old 2020-11-15, 00:52   #47
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

23×3×331 Posts
Default

Quote:
Originally Posted by Xyzzy View Post
The board we ended up with (C8I) "trained" to a 1T command rate for our memory. We have never gotten 1T to work on any previous board with this memory and certainly not automatically.
Further detective work has revealed that we have "geardown mode" enabled. This is apparently a stability option.

Here is an explanation from https://www.reddit.com/r/overclockin...ram_overclock/
Quote:
What GDM does is essentially forces the tCL and tCWL timings* to use an internal half-frequency clock instead of the memory clock. That is, if you're running for example 3000MHz, instead of the timings running off of 1500MHz (the real memory clock), they will reference a 750MHz clock. To make this work, the timings have to** be rounded up and divided by two. So if you're running CAS 15 with GDM on, the system will tell you you're running at CAS 16, but technically you're actually running CAS 8 at half the frequency. The latency works out the same, CAS commands are just asserted half as often.

So that's why it increases stability: it both loosens tCL and tCWL if they are odd and reduces the rate at which the corresponding signals are asserted, which all means the memory is a little bit less stressed.

* - not sure if there are others but that's what I've read
** - "have to" might be strong wording
Quote:
Memory has two communication interfaces - the data bus which goes direct from pins on the CPU to pins on a memory chip and runs at the full DDR speed (eg 3200MT/s for DDR4-3200), and the command/address bus which goes from the CPU to ALL the memory chips via a loop-the-loop* and runs at the reduced "physical clock" speed (eg 1600MHz for DDR4-3200). The command/address bus can often be a limit on memory speed.

As /u/varexos717 said, geardown mode slows down the command/address bus by only allowing communication to take place every other cycle. The communication still only takes one cycle (as opposed to 2T command rate where a command has to be sent over two cycles), but then the bus can return to a 'neutral' level between a 1 and 0 which make it easier for the next signal to get through.

*This is not a joke. DDR5 will have a much more sensible layout.
We put in the new memory, which is the exact same as the old memory except it is dual rank instead of single rank and it has twice the capacity.

Old: https://www.gskill.com/product/165/1...35V16GB-(2x8GB)
New: https://www.gskill.com/product/165/1...5V32GB-(2x16GB)

We "erased" the motherboard's memory timings and had it go through the training process again. It ended up with the same numbers as before even though the signal/clock/whatever load is significantly increased.

As an experiment, we then forced geardown mode off and the command rate to 1. It passed a severe memory check with that setting, but any gain we measured was lost in the run-to-run variation of our benchmarks. IOW, we think the difference was negligible. So we enabled geardown mode to have a safety net for stability. We like fast things but only if they are utterly reliable.

Attached Thumbnails
Click image for larger version

Name:	rm.png
Views:	30
Size:	259.9 KB
ID:	23782   Click image for larger version

Name:	test.png
Views:	31
Size:	10.6 KB
ID:	23783  
Xyzzy is offline   Reply With Quote
Old 2020-11-15, 00:54   #48
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

23×3×331 Posts
Default

Attached are benchmark timings for our usual 560K and 6144K FFT lengths.

You may notice that the 560K FFT single rank and dual rank timings are very similar. We figure this is because the data is cached. The 6144K FFT data shows a surprising (to us) increase in throughput, up to 25% higher with six cores running.

Attached Files
File Type: txt 560K-SR.txt (4.5 KB, 17 views)
File Type: txt 560K-DR.txt (4.5 KB, 17 views)
File Type: txt 6144K-SR.txt (4.5 KB, 23 views)
File Type: txt 6144K-DR.txt (4.5 KB, 23 views)
Xyzzy is offline   Reply With Quote
Old 2020-11-16, 05:41   #49
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

29·101 Posts
Default

25% higher! Wow!

That does say something about how memory starved the 6 core chip is though, if rank interleaving can provide that much more bandwidth.

Could you test that dual rank memory configuration at different CPU clock speeds? I'm curious where "knee" in performance is for the 6144k FFT. That could save a lot of power.
Mark Rose is offline   Reply With Quote
Old 2020-11-16, 20:32   #50
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

1F0816 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
Could you test that dual rank memory configuration at different CPU clock speeds? I'm curious where "knee" in performance is for the 6144k FFT. That could save a lot of power.
We tested at three levels of power. We only tested one worker because additional workers are rarely if ever faster and by using just one we can benchmark in a reasonable time.

ECO = 40W
STK = 57W
PBO = 105W

Code:
ECO
Timings for 6144K FFT length (1 core, 1 worker): 18.20 ms.  Throughput: 54.95 iter/sec.
Timings for 6144K FFT length (2 cores, 1 worker):  9.67 ms.  Throughput: 103.40 iter/sec.
Timings for 6144K FFT length (3 cores, 1 worker):  6.94 ms.  Throughput: 144.17 iter/sec.
Timings for 6144K FFT length (4 cores, 1 worker):  5.55 ms.  Throughput: 180.20 iter/sec.
Timings for 6144K FFT length (5 cores, 1 worker):  4.85 ms.  Throughput: 206.26 iter/sec.
Timings for 6144K FFT length (6 cores, 1 worker):  4.30 ms.  Throughput: 232.78 iter/sec.

STK
Timings for 6144K FFT length (1 core, 1 worker): 18.11 ms.  Throughput: 55.21 iter/sec.
Timings for 6144K FFT length (2 cores, 1 worker):  9.49 ms.  Throughput: 105.32 iter/sec.
Timings for 6144K FFT length (3 cores, 1 worker):  6.67 ms.  Throughput: 149.83 iter/sec.
Timings for 6144K FFT length (4 cores, 1 worker):  5.23 ms.  Throughput: 191.16 iter/sec.
Timings for 6144K FFT length (5 cores, 1 worker):  4.41 ms.  Throughput: 226.98 iter/sec.
Timings for 6144K FFT length (6 cores, 1 worker):  4.00 ms.  Throughput: 249.94 iter/sec.

PBO
Timings for 6144K FFT length (1 core, 1 worker): 18.33 ms.  Throughput: 54.55 iter/sec.
Timings for 6144K FFT length (2 cores, 1 worker):  9.58 ms.  Throughput: 104.38 iter/sec.
Timings for 6144K FFT length (3 cores, 1 worker):  6.85 ms.  Throughput: 146.04 iter/sec.
Timings for 6144K FFT length (4 cores, 1 worker):  5.24 ms.  Throughput: 190.98 iter/sec.
Timings for 6144K FFT length (5 cores, 1 worker):  4.48 ms.  Throughput: 223.26 iter/sec.
Timings for 6144K FFT length (6 cores, 1 worker):  3.84 ms.  Throughput: 260.43 iter/sec.
PS - Note that ~300 iterations per second with six cores would be perfect scaling.
Attached Thumbnails
Click image for larger version

Name:	6144K.PNG
Views:	31
Size:	15.3 KB
ID:	23795  
Xyzzy is offline   Reply With Quote
Old 2020-11-16, 20:51   #51
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

29×101 Posts
Default

That eco mode is super efficient! What clock speeds do you see running in eco?
Mark Rose is offline   Reply With Quote
Old 2020-11-16, 23:07   #52
Xyzzy
 
Xyzzy's Avatar
 
"Mike"
Aug 2002

794410 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
That eco mode is super efficient! What clock speeds do you see running in eco?
We were going to try to write down the frequency but it changes every second and the swings are pretty wild. That is why we used the power figure instead. And that power figure was actually a twelve-thread small-FFT torture test so it is the worst possible case scenario.

Xyzzy is offline   Reply With Quote
Old 2020-11-16, 23:18   #53
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3×5×72 Posts
Default

Ah this is the good stuff. If only there were a way to pipe benchmarks directly into a vein.
M344587487 is offline   Reply With Quote
Old 2020-11-16, 23:46   #54
PhilF
 
PhilF's Avatar
 
Feb 2005
Colorado

577 Posts
Default

Quote:
Originally Posted by M344587487 View Post
Ah this is the good stuff. If only there were a way to pipe benchmarks directly into a vein.
Lol!

Someone in this thread is addicted...
PhilF is online now   Reply With Quote
Old 2020-11-17, 00:03   #55
M344587487
 
M344587487's Avatar
 
"Composite as Heck"
Oct 2017

3×5×72 Posts
Default

I can quit whenever I want!




But there'll be more right...
M344587487 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 04:28.

Fri Jan 22 04:28:40 UTC 2021 up 50 days, 39 mins, 0 users, load averages: 2.58, 2.90, 2.46

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.