mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2021-07-23, 14:59   #1
Zhangrc
 
"University student"
May 2021
Beijing, China

127 Posts
Default Something productive from "Is it forbidden..." thread

Quote:
Originally Posted by kriesel View Post
A 100Mdigit primality test is ~15 days on a Radeon VII GPU.
However not every user here has a Radeon VII

Many people, like me, run Prime95 on laptops or home computers. For safety and environmental reasons, we could not run Prime95 24/7. Thus it takes us longer to finish assignments, that's months for 108M exponents, and 1.5 years for 332M.

Last fiddled with by axn on 2021-07-25 at 13:01
Zhangrc is offline   Reply With Quote
Old 2021-07-23, 15:32   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10111001001012 Posts
Default

Quote:
Originally Posted by Zhangrc View Post
However not every user here has a Radeon VII
... Thus it takes us longer to finish assignments, that's months for 108M exponents, and 1.5 years for 332M.
The slow tail of the results distribution is not the distribution. Dobri claimed manual assignments were cat 4 and took (required as a minimum) years to complete. See https://www.mersenne.org/report_recent_results/ and note that it's quite common for 105M PRP to complete in 2 days or less from assignment, usually without any involvement of a Radeon VII. (Ben Delo does ~half and uses mprime on AWS CPUs.) Most slow outliers are ~110M or 111M and most of those take a month or less, not years. (Curtisc runs a lot on mprime or prime95 completing in ~23 days. Still not years or even months or Radeon VII.)

None of what I post should be misconstrued as disparagement of the small-throughput user or their hardware. It's all welcome, as long as it does not interfere with orderly progress, and all adds up.
kriesel is online now   Reply With Quote
Old 2021-07-23, 21:25   #3
tuckerkao
 
"Tucker Kao"
Jan 2020
Head Base M168202123

5×113 Posts
Default

Buy AMD Threadripper 5970X and Nvidia Geforce 3080 Ti, exponents of the M332M should finish within at most 3 weeks.
tuckerkao is offline   Reply With Quote
Old 2021-07-24, 21:19   #4
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

26×7 Posts
Default

Quote:
Originally Posted by tuckerkao View Post
Buy AMD Threadripper 5970X and Nvidia Geforce 3080 Ti, exponents of the M332M should finish within at most 3 weeks.
I'm no expert on GPUs, but I thought if you were going to buy a GPU to test exponents, the CPU does not need to be very powerful - the GPU is doing all the work. Obviously if one is not constrained by money, heat or power consumption, then buy the best of everything. But if one is trying to achieve a good performance system without spending a fortune, then buying both a high-end CPU and a high-end GPU, would be unnecessary.

Last fiddled with by drkirkby on 2021-07-24 at 21:20
drkirkby is offline   Reply With Quote
Old 2021-07-24, 22:26   #5
tuckerkao
 
"Tucker Kao"
Jan 2020
Head Base M168202123

5·113 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I'm no expert on GPUs, but I thought if you were going to buy a GPU to test exponents, the CPU does not need to be very powerful - the GPU is doing all the work. Obviously if one is not constrained by money, heat or power consumption, then buy the best of everything. But if one is trying to achieve a good performance system without spending a fortune, then buying both a high-end CPU and a high-end GPU, would be unnecessary.
If I PRP-test 1 exponent using my CPU on Prime95 and the other on my GPU using Gpuowl, both will still maintain the speed of over 90% of normal on the same machine, then it'll be worth it.

I'm waiting to hear from another user who already bought Geforce 3080 Ti, the details of heat consumptions and GHz days/Day.

I use the CPU of my current old machine to run all the P-1 factoring of all M168,***,*23 with B1 = 1,000,000 and B2 = 40,000,000, will take around 20 hours each. Running my GPU of my current old machine to finish those exponents up to 2^78, it seems to me that both can function at the same time without significant slowing downs.

When I get my new PC which will likely be after Nov 21, 2021(Threadripper 5970X release date), I can perform 2 PRPs at the same time, 1 on CPU and 1 on GPU.

Last fiddled with by tuckerkao on 2021-07-24 at 23:18
tuckerkao is offline   Reply With Quote
Old 2021-07-25, 00:28   #6
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

592510 Posts
Default

RTX 3080 Ti is fast at ~4800 GHD/d in TF, but at ~93. GHD/day for PRP, LL or P-1, is comparable to a GTX 1080 Ti or RX 5500 XT or ~30% of an RX 6900XT or Radeon VII per https://www.mersenne.ca/cudalucas.php
I have multiple Radeon VIIs on a system served by a Celeron G1840, so yes it does not take much CPU to keep GPU apps going. Except when doing GCDs in P-1. I recommend about as many physical CPU cores as GPUs & HT so the GPUs are unlikely to wait for each other. Also >16GB of system ram if doing a lot of GPU P-1 on multiple 16GB-vram GPUs simultaneously.
kriesel is online now   Reply With Quote
Old 2021-07-25, 03:29   #7
tuckerkao
 
"Tucker Kao"
Jan 2020
Head Base M168202123

56510 Posts
Default

Quote:
Originally Posted by PhilF View Post
I'll tell you what, since you seem to really believe in luck: Tell me which number in that range that is most likely to be prime and I'll run it on a Radeon VII. If it turns out to be prime, I'll split the prize and recognition with you.

But what if it's composite? What do I get out of the deal?
It's more important me which GPU card I should buy at the moment. It seems like AMD RX 6900XT will be the best when dealing with Gpuowl while Nvidia Geforce 3090 is the best when perform trial factoring on M168173323 from 2^81 to 2^82.

How do I know exactly the amount of days and hours needed to finish a PRP test of M168779323 on AMD RX 6900XT is no one else runs it the first time.

Glad Kriesel mentioned about the difference between trial factoring and PRPs on GPU that Geforce 3080 Ti cannot support both.

Once I get the new machine, I won't ask anyone's help, I'll just run myself.
tuckerkao is offline   Reply With Quote
Old 2021-07-25, 12:11   #8
drkirkby
 
"David Kirkby"
Jan 2021
Althorne, Essex, UK

1110000002 Posts
Default

Quote:
Originally Posted by kriesel View Post
RTX 3080 Ti is fast at ~4800 GHD/d in TF, but at ~93. GHD/day for PRP, LL or P-1, is comparable to a GTX 1080 Ti or RX 5500 XT or ~30% of an RX 6900XT or Radeon VII per https://www.mersenne.ca/cudalucas.php
I have multiple Radeon VIIs on a system served by a Celeron G1840, so yes it does not take much CPU to keep GPU apps going. Except when doing GCDs in P-1. I recommend about as many physical CPU cores as GPUs & HT so the GPUs are unlikely to wait for each other. Also >16GB of system ram if doing a lot of GPU P-1 on multiple 16GB-vram GPUs simultaneously.
I have a few questions about that table
  1. What is the point of listing (GHd)2/W? I must be overlooking something, but I can't see what useful information that gives.
  2. What is JVR?
  3. What is JVR2?
The 300 W Radeon VII gives 1.053 GHz day/W. That's more power hungry than my 8167M CPUs. I reckon I get about 240 GHd/day from one of those, which at 165 W TDP is 1.45 GHd/W. I will need to double-check that 240 GHd/day as I have stopped mprime several times to play around with it, but I'm pretty sure it is more power efficient than 1.053 GHz day/W, based on its TDP. However, I'm feeding about 490 W into the UPS when running a couple of 8167Ms flat out, which is rather more than 2*165=330 W. There are some obvious losses
  • UPS efficiency.
  • Motherboard
  • Fans
  • PSU
  • Disks - one is a mechanical hard drive, which I guess I should remove as I am not using it.
  • GPU (75 W, but virtually idle).
  • Quad Ethernet card - I should remove that, but need to add a GPIB controller card.
The basic PC with 16 GB RAM and 8-core 2.1 GHz CPU was $1100. The 8167Ms were £300 (about $413) used each. Since it needs ECC RAM, that gets expensive, although I still have the Dell RAM and CPU that I should sell.
For PRP tests it is not clear the GPU wins, but for trial-factoring the CPUs are not good.
drkirkby is offline   Reply With Quote
Old 2021-07-25, 14:49   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×52×79 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I have a few questions about that table
For the CUDALucas or TF benchmark pages on mersenne.ca, and any blue column heading or the downward arrow right of GHzDays/day, pause your mouse cursor on them for popup descriptions.

The mersenne.ca CUDALucas benchmark page is useful within its limitations for relative comparisons between GPUs. The ~295 GHD/day values for Radeon VII are old, from old less efficient versions of Gpuowl, or from CUDALucas, and considerably understate maximum available performance with recent versions of Gpuowl.

Tdulcet reported ~75% longer run times on Colab & NVIDIA Tesla GPUs with CUDALucas than recent Gpuowl.

I've extensively benchmarked a Radeon VII across a wide variety of Gpuowl versions and all fft lengths supported in them from 3M to 192M, on Windows 10, for specified conditions. Resulting timings in ms/iter can be seen at the last attachment of https://www.mersenneforum.org/showpo...35&postcount=2. Those timings correspond to a range of performance for best version timing per fft length, from 316. to 486. GHD/day. (It might be possible to find other fft formulations that perform better; I used the first / default for each size. On occasion an alternate may perform better.)

Note that these measurements were made while the GPU was neither as aggressively clocked as I and others have been able to reliably use on Radeon VIIs with Hynix Vram, nor operating at full GPU power, nor highest performance OS/driver combo. Benchmarking was done at 86% power limit for improved power efficiency. Also, reportedly ROCm on Linux provides the highest performance, with Woltman having reported 510 GHD/day with it on IIRC 5M fft. Compare to 447. at reduced power and clock on Windows at 5M. Finally, power consumption may be elevated by the more aggressive than standard GPU fan curve I'm using.

Note also that prime/prime95 and Gpuowl each have some fft lengths for which running the next higher fft can be faster.
I've found in benchmarking Gpuowl that the 13-smooth ffts (3.25M, 6.5M etc) tend to be slower than the next larger fft (3.5M, 7M, etc.), as does 15M.

At current wavefront ~105.1M, 5.5M fft applies, and Gpuowl V6.11-380 benchmarked at 0.821 ms/iter, which corresponds to 0.9987 day/exponent/GPU, 419. GHD/day/GPU, again at reduced GPU power, on Windows, with below-maximum reliable vram clocking. I computed ~1.53 GHD/d/W for a multi-RadeonVII system, with power measured at the AC power cord, while running prime95 on its cpu. The GPU-only efficiency would be slightly higher.
That AC input power accounts for all power used, including the system ram which drkirkby omitted from his list, and at 384GiB ECC on his system, is probably consuming considerable power in his system. Due to the high cost of a >1KW output UPS, I am running my GPUs rig with inline surge suppression but not UPS.
Indicated GPU power per GPU range from 190 to 212W at the 86% setting. Total AC input power divided by number of GPUs operating was less than the nominal max GPU TDP. I'm currently running these GPUs at 80% for better power efficiency. The 419. GHD/day/GPU/~200Wactual/GPU is ~2.1 GHD/d/W on the GPUs alone, omitting system overhead and conversion losses.

One Radeon VII so configured can match the throughput of the dual-26-core-8167M $5000 system under certain conditions, at better power efficiency, and original cost of the entire open frame system divided by number of GPUs was ~$700. More power efficient, and much more capital efficient per unit throughput. And would still be ~4x more cost effective today than the 8167M system if created with current GPU costs.

Last fiddled with by kriesel on 2021-07-25 at 15:45
kriesel is online now   Reply With Quote
Old 2021-07-25, 15:51   #10
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

3×52×79 Posts
Default

Quote:
Originally Posted by drkirkby View Post
I have a few questions about that table
For the CUDALucas or TF benchmark pages on mersenne.ca, and any blue column heading or the downward arrow right of GHzDays/day, pause your mouse cursor on them for popup descriptions.
kriesel is online now   Reply With Quote
Old 2021-07-25, 16:08   #11
axn
 
axn's Avatar
 
Jun 2003

5,197 Posts
Default

Quote:
Originally Posted by drkirkby View Post
There are some obvious losses
TDP doesn't mean maximum power consumed by the CPU. A 165W TDP processor could easily consume 200W or more running flat out. Not saying that's what your CPUs are doing, but it is possible.

Also 12 sticks of RAM consumes a fair bit of power.
axn is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Is it forbidden to talk about manual testing strategies? Dobri Dobri 45 2021-07-26 03:39
HTTP forbidden message? bchaffin Aliquot Sequences 1 2011-12-26 06:48
Which of these CPUs is most productive? Rodrigo Hardware 123 2011-02-05 21:42
LLR benchmark thread Oddball Riesel Prime Search 5 2010-08-02 00:11
Deutscher Thread (german thread) TauCeti NFSNET Discussion 0 2003-12-11 22:12

All times are UTC. The time now is 20:06.


Sun Dec 5 20:06:48 UTC 2021 up 135 days, 14:35, 1 user, load averages: 1.57, 2.34, 2.12

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.