mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-03-15, 07:53   #1
Lasse
 
Mar 2021

32 Posts
Smile Trying to build dedicated hardware for LL testing - Poor performance

Hi All

I’m trying to assemble dedicated hardware for LL testing.
Currently I have a few I7-9700K (8 cores @ 3.6GHz) processors with a single 4GB ram stick but performance is not anywhere near what I expected.

Testing a 103M exponent with one worker on all 8 cores takes around 30 days. (25ms / iteration)
In comparison I have a laptop doing LL testing with a E3-1575M (4 cores @ 3GHz) processor and 4 x 16GB memory and that takes 8-9 days (6-7ms / iteration). Since the I7-9700K should be faster I was excepting it to take less time.

When I look at the memory usage the entire system is using less than 500MB so 4GB memory should be enough. I’m wondering if there is a bottleneck on the memory bandwidth?
Would it help if I installed 2 or even 4 ram sticks to give more bandwidth?

I have been trying to find a way to see how busy the memory is with no luck.


I’m using Fedora with Linux64,Prime95,v30.3,build 6.


Any points would be highly apricated.
Also if someone is having a recipe for a 2021 bang for bug hardware list I would very much like to hear about it.


Thanks.
Lasse is offline   Reply With Quote
Old 2021-03-15, 07:58   #2
axn
 
axn's Avatar
 
Jun 2003

115378 Posts
Default

Quote:
Originally Posted by Lasse View Post
I’m wondering if there is a bottleneck on the memory bandwidth?
Would it help if I installed 2 or even 4 ram sticks to give more bandwidth?
Yes and yes. You are severely bottlenecked on memory bandwidth.

Either 2x16GB or 4x8GB of the fastest RAM that you can get will give you the most bandwidth. Performance will scale (near) linearly with RAM bandwidth.

Last fiddled with by axn on 2021-03-15 at 07:59
axn is online now   Reply With Quote
Old 2021-03-15, 08:08   #3
Lasse
 
Mar 2021

32 Posts
Default

Quote:
Originally Posted by axn View Post
Yes and yes. You are severely bottlenecked on memory bandwidth.

Either 2x16GB or 4x8GB of the fastest RAM that you can get will give you the most bandwidth. Performance will scale (near) linearly with RAM bandwidth.



Thanks a lot for fast reply. I will order 4 of the fastest supported RAM i can find and test with 2 and 4 modules.

Is there any way i can check the system to see if i'm maxing out the memory bandwidth?
Lasse is offline   Reply With Quote
Old 2021-03-15, 08:39   #4
axn
 
axn's Avatar
 
Jun 2003

10011010111112 Posts
Default

Quote:
Originally Posted by Lasse View Post
Is there any way i can check the system to see if i'm maxing out the memory bandwidth?
If you're memory bottlenecked, downclocking the CPU will cause no (or virtually no) reduction in performance, and neither will overclocking give any increase in performance. Once you've sufficient memory bandwidth, you'll see performance being better correlated with CPU clockspeed. That's an indirect way of verifying this.

I don't know how to monitor the memory bandwidth usage directly, sorry.
axn is online now   Reply With Quote
Old 2021-03-15, 09:14   #5
Lasse
 
Mar 2021

32 Posts
Default

Quote:
Originally Posted by axn View Post
If you're memory bottlenecked, downclocking the CPU will cause no (or virtually no) reduction in performance, and neither will overclocking give any increase in performance. Once you've sufficient memory bandwidth, you'll see performance being better correlated with CPU clockspeed. That's an indirect way of verifying this.

I don't know how to monitor the memory bandwidth usage directly, sorry.



Thanks for the clarification. I have just tested with a single core and I’m getting the same speed as when I’m running on all 8 cores.

For reference the current setup is as follows:
Motherboard: TUF H370-PRO GAMING
CPU: I7-9700K
RAM: 1 x Kingston ValueRAM - DDR4 - 4 GB - DIMM 288-PIN - 2400 MHz / PC4-19200 - CL17 - 1.2 V
PSU: 1650W (Totally overkill)


I have now ordered:
2 x HyperX Predator - DDR4 8 GB - DIMM 288-PIN - 2666 MHz / PC4-21300 - CL13 - 1.35 V
2 x HyperX FURY - DDR4 4GB - DIMM 288-PIN - 2666 MHz / PC4-21300 - CL16 - 1.2 V

When the RAM is delivered and I’m getting some time to play around with it I will post results here.


If anyone else have any inputs please do not hesitate to post :)
Lasse is offline   Reply With Quote
Old 2021-03-15, 10:56   #6
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

419 Posts
Default

Faster ram always helps in this scenario, but with a H370 mobo I believe you're limited to whatever the official speed is supported by the CPU. It would have cost more, but perform better to get a Z370/Z390 mobo and faster ram.

If you keep the current mobo, then the best you can do is to put in 4 modules. Capacity doesn't matter. This is in two parts: firstly you get dual channel which already doubles what you had. Secondly, you get more than one rank per channel. This helps you get more effective usage of the bandwidth of dual channels. 4x4gb might be the most economic if you can still find modules that small. You can try testing with the mismatched pairs already ordered.

4GB modules probably were always single rank.
8GB modules way back when it was still relatively new in 2015-ish might have been dual rank, but they've been single rank for a long time.
16GB modules were all dual rank, but I understand newer ones coming out now are single rank.
So generally speaking if you don't need capacity, the cheapest way to rank up is to use 4 modules on a dual channel system.

Even on a quad core 6700k dual channel ram is wholly inadequate. It was a while ago, but from memory going 2xSR to 4xSR or equivalently 2xDR at 3000 speed gave around 20-25% speedup.

CL rating doesn't seem to make much difference in my testing so I wouldn't pay extra for it.
mackerel is offline   Reply With Quote
Old 2021-03-15, 12:29   #7
Lasse
 
Mar 2021

32 Posts
Default

Quote:
Originally Posted by mackerel View Post
Faster ram always helps in this scenario, but with a H370 mobo I believe you're limited to whatever the official speed is supported by the CPU. It would have cost more, but perform better to get a Z370/Z390 mobo and faster ram.

If you keep the current mobo, then the best you can do is to put in 4 modules. Capacity doesn't matter. This is in two parts: firstly you get dual channel which already doubles what you had. Secondly, you get more than one rank per channel. This helps you get more effective usage of the bandwidth of dual channels. 4x4gb might be the most economic if you can still find modules that small. You can try testing with the mismatched pairs already ordered.

4GB modules probably were always single rank.
8GB modules way back when it was still relatively new in 2015-ish might have been dual rank, but they've been single rank for a long time.
16GB modules were all dual rank, but I understand newer ones coming out now are single rank.
So generally speaking if you don't need capacity, the cheapest way to rank up is to use 4 modules on a dual channel system.

Even on a quad core 6700k dual channel ram is wholly inadequate. It was a while ago, but from memory going 2xSR to 4xSR or equivalently 2xDR at 3000 speed gave around 20-25% speedup.

CL rating doesn't seem to make much difference in my testing so I wouldn't pay extra for it.



Thanks. That was very helpful.


I have now ordered the following:
Gigabyte Z390 M Micro-ATX LGA1151 Intel Z390

4 x CORSAIR Vengeance DDR4 8GB 3600MHz CL18





Will update once i get everything tested. :)
Lasse is offline   Reply With Quote
Old 2021-03-15, 12:45   #8
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

3×5×72×13 Posts
Default

Please don't do LL tests, unless they are to double check. PRP is know the preferred test time for first time tests. It has superior error checking built into it. (Making errors that pas through very few and far between.) Also, using the latest version of either Prime95 or GpuOwL will produce a file that will allow the run to be verified quickly on another nachine. This will save 95% of the effort that a traditional Double Check would take,

Also, consider adding a GPU. You will get more throughput from a good GPU than the CPU.
Uncwilly is online now   Reply With Quote
Old 2021-03-15, 13:00   #9
Lasse
 
Mar 2021

10012 Posts
Default

Quote:
Originally Posted by Uncwilly View Post
Please don't do LL tests, unless they are to double check. PRP is know the preferred test time for first time tests. It has superior error checking built into it. (Making errors that pas through very few and far between.) Also, using the latest version of either Prime95 or GpuOwL will produce a file that will allow the run to be verified quickly on another nachine. This will save 95% of the effort that a traditional Double Check would take,

Also, consider adding a GPU. You will get more throughput from a good GPU than the CPU.

Thanks for your advice. I have briefly looked into PRP and i need to look more into it but i will definitely use PRP going forward.


In regards to GPU i have picked up 10 used Nvidia Tesla K80 i'm hoping to get up and run soon. Just waiting for some power adapters to arrive. Long delivery time.

From what i could see from benchmark reports the K80 and GPU's in general is that they are way faster to be used for factor checking compare to LL testing. Maybe this is not the case with PRP?

For that reasons my plan was to get some fast CPU's to do LL/PRP testing and use the GPU's for factor checking.



Please correct me if i am mistaken :)
Lasse is offline   Reply With Quote
Old 2021-03-15, 13:42   #10
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

3×5×72×13 Posts
Default

There are some GPU's that are better than others at PRP. This chart can be an approximate guide: https://www.mersenne.ca/cudalucas.php
This one covers factoring: https://www.mersenne.ca/mfaktc.php

You can punch in different cards and compare them. PRP and LL compare well for speed comparisons.
Uncwilly is online now   Reply With Quote
Old 2021-03-15, 14:16   #11
DrobinsonPE
 
Aug 2020

23×11 Posts
Default

Quote:
Originally Posted by Lasse View Post
Hi All

I’m trying to assemble dedicated hardware for LL testing.
Currently I have a few I7-9700K (8 cores @ 3.6GHz) processors with a single 4GB ram stick but performance is not anywhere near what I expected.
I am currently running a similar setup

Gigabyte B365M DS3H, I7-9700K, four sticks of DDR-2666 4GB ram. The motherboard limits the ram to 2666.

Attached is a picture of my testing data for the CPU. It might be useful to you. You can save a lot of energy by decreasing the CPU clock without decreasing mprime throughput.
Attached Thumbnails
Click image for larger version

Name:	i7-9700 Efficiency Data.png
Views:	33
Size:	97.9 KB
ID:	24499  
DrobinsonPE is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
The prime-crunching on dedicated hardware FAQ (II) jasonp Hardware 46 2016-07-18 16:41
Xbox for TF (aka not higher performance hardware) spaz Hardware 4 2009-12-14 17:11
The prime-crunching on dedicated hardware FAQ jasonp Hardware 142 2009-11-15 23:20
Optimal Hardware for Dedicated Crunching Computer Angular Hardware 5 2004-01-16 12:37
Athlon: poor performance need help phalanx Software 4 2003-12-12 07:00

All times are UTC. The time now is 03:35.

Sat May 8 03:35:24 UTC 2021 up 29 days, 22:16, 0 users, load averages: 0.91, 1.24, 1.42

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.