![]() |
![]() |
#3092 | |
If I May
"Chris Halsall"
Sep 2002
Barbados
940210 Posts |
![]() Quote:
It might also be worth testing at 332M, to see if there's any optimization which could be squeezed out using different kernels going to 81 "bits". |
|
![]() |
![]() |
![]() |
#3093 | |
"Oliver"
Mar 2005
Germany
2×3×5×37 Posts |
![]() Quote:
Last time I did some benchmarks barrett 87 and 88 was faster than 77 (Pascal series). Oliver Last fiddled with by TheJudger on 2019-03-14 at 23:00 |
|
![]() |
![]() |
![]() |
#3094 | |
"Sam Laur"
Dec 2018
Turku, Finland
5128 Posts |
![]() Quote:
So, on the code as it is, for compute capability bigger than 1.x, 76-77 gets barrett87_mul32_gs 75-77 gets barrett77_mul32_gs 78-79 gets barrett87_mul32_gs 77-79 gets barrett79_mul32_gs 79-80 gets barrett87_mul32_gs 78-80 or 79-81 will actually get 95bit_mul32_gs But I'd like to think that since factoring at these bit levels takes quite a while, most people would be running with the default Stages=1 set in mfaktc.ini. This is my reasoning behind that "in effect never"... The one thing I'm not at all sure about is the 1% improvement. On real life work the difference seems to be less than that (still on Turing). I'll have to gather some more timing information, but this will take a while longer. ![]() |
|
![]() |
![]() |
![]() |
#3095 | |
"Sam Laur"
Dec 2018
Turku, Finland
5128 Posts |
![]() Quote:
So, nothing needs to be changed, it doesn't make any difference. Meh. ![]() |
|
![]() |
![]() |
![]() |
#3096 | |
"Oliver"
Mar 2005
Germany
2·3·5·37 Posts |
![]() Quote:
Oliver |
|
![]() |
![]() |
![]() |
#3097 |
"Arvid Björklin"
Apr 2016
Pitea, Sweden
10010012 Posts |
![]()
Help. I'm running mfaktc 0.21 cuda 65 right now. I have a GTX 960 and saw there was a cuda 80 and a cuda100 vercion of mfaktc. whats the diffrence between them and should I rund an other version?
/Arvid |
![]() |
![]() |
![]() |
#3098 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
113538 Posts |
![]()
Test them and see what's faster on your card. Note that mfaktc tuning can make a several percent difference for a set version. CUDA 6.5 has done well in speed comparisons in my testing in CUDALucas. (I don't have a GTX960.)
|
![]() |
![]() |
![]() |
#3099 | |
"Arvid Björklin"
Apr 2016
Pitea, Sweden
73 Posts |
![]() Quote:
The speed on 80 was just 8% faster than 65. So 100 it is. ty for help. /Arvid Last fiddled with by Thecmaster on 2019-03-18 at 11:09 |
|
![]() |
![]() |
![]() |
#3100 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
29×167 Posts |
![]() Quote:
Was your testing with or without tuning? See https://mersenneforum.org/showpost.p...postcount=2505 Gpu clock constant, or power limited, or allowed to fluctuate? Last fiddled with by kriesel on 2019-03-18 at 14:16 |
|
![]() |
![]() |
![]() |
#3101 | |
"Arvid Björklin"
Apr 2016
Pitea, Sweden
73 Posts |
![]() Quote:
I looked around in the mfaktc.ini file and found some interesting things to tweak but I don't know where to start. Have done some tuning now. GPUSieveProcessSize=32 GPUSieveSize=128 GPUSievePrimes=110000 (this gets adjusted to 110134 when program starts) This gave me a bit nor through put. With 6.5 I got 303 GHz-d/Day With 10.0 I got 331 After tweaking I got 337 This on a GTX 960 2GB Last fiddled with by Thecmaster on 2019-03-18 at 19:21 |
|
![]() |
![]() |
![]() |
#3102 |
1976 Toyota Corona years forever!
"Wayne"
Nov 2006
Saskatchewan, Canada
5×907 Posts |
![]()
I have a 2080Ti GPU running mfaktc
on a i7-7820X with 32GB of 3600DDR4 RAM running Large P-1 on all 8 cores. The CPU is running at 60 degrees F and the GPU at 81 degrees F. The GPU is at about 3,900 GHZDays/Day but if I stop Prime95 the GPU thruput immediately goes to about 4,250. The GPU stays at 81 degrees F. If I restart Prime95 the GPU stays at 4,250 until about the time all 8 cores are started, have the RAM allocated and are running the P-1 again. In other words the total thruput of the rig is LOWER when the CPU is busy. It does about 75 GhzDays/Day of P1 while the GPU loses about 300. I don't know if the impact would be the same if I was running LL instead of P-1 (much less RAM); though my guess is it would be about the same impact. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1668 | 2020-12-22 15:38 |
The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |