mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-03-14, 20:47   #3092
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

940210 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
It would be worth it to test numbers around 90M, 100M, and 110M, too.
Indeed. The GPU TF'er are currently taking 91M and up to 77 "bits"; 90M is already done.

It might also be worth testing at 332M, to see if there's any optimization which could be squeezed out using different kernels going to 81 "bits".
chalsall is offline   Reply With Quote
Old 2019-03-14, 22:56   #3093
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×3×5×37 Posts
Default

Quote:
Originally Posted by nomead View Post
There is a selection table in mfaktc.c that only checks for compute capability 1.x (where the speed order was 76 -> 77 -> 87 -> 88 -> 79 -> 92) and all the rest get 76 -> 87 -> 88 -> 77 -> 79 -> 92. So the barrett77_mul32_gs kernel is in effect never selected on anything newer than GTX2xx.
Are you sure about this? I'm not! Hint: check kernel_possible() in the same file.

Last time I did some benchmarks barrett 87 and 88 was faster than 77 (Pascal series).

Oliver

Last fiddled with by TheJudger on 2019-03-14 at 23:00
TheJudger is offline   Reply With Quote
Old 2019-03-15, 01:05   #3094
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

5128 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Are you sure about this? I'm not! Hint: check kernel_possible() in the same file.

Last time I did some benchmarks barrett 87 and 88 was faster than 77 (Pascal series).

Oliver
Well, not 100% sure of course, but isn't kernel_possible() just called from tf() to see whether a certain kernel works at all for the selected bit range combination, and it says nothing about the relative speed? I may have oversimplified a bit when I said "in effect never gets selected", as it can fall through all the way to barrett77 if 87 and 88 wouldn't work. Ah yes, there's that extra check on the barrett87, 88 and 92 rows to see whether it's factoring more than one bit depth range at once, and then those aren't selected.

So, on the code as it is, for compute capability bigger than 1.x,
76-77 gets barrett87_mul32_gs
75-77 gets barrett77_mul32_gs
78-79 gets barrett87_mul32_gs
77-79 gets barrett79_mul32_gs
79-80 gets barrett87_mul32_gs
78-80 or 79-81 will actually get 95bit_mul32_gs
But I'd like to think that since factoring at these bit levels takes quite a while, most people would be running with the default Stages=1 set in mfaktc.ini. This is my reasoning behind that "in effect never"...

The one thing I'm not at all sure about is the 1% improvement. On real life work the difference seems to be less than that (still on Turing). I'll have to gather some more timing information, but this will take a while longer.
nomead is offline   Reply With Quote
Old 2019-03-15, 16:47   #3095
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

5128 Posts
Default

Quote:
Originally Posted by nomead View Post
I'll have to gather some more timing information, but this will take a while longer.
Okay, I was shocked. For whatever reason, there is pretty much no measurable performance difference between barrett77 and 87 as tested on real work. So, again, RTX 2080, GPU clock locked at 1800 MHz. Six exponents each in the M9152xxxx range factored from 76 to 77 bits. All are reported as 167.21 GHz-days. Average for the barrett77 runs: 1 hour 18 minutes 40.223 seconds. And for the barrett87 runs: 1 hour 18 minutes... 42.352 seconds. It's well within the measurement error margin now. I wonder why I saw that 1% earlier, but then, that was for a single run for each kernel.

So, nothing needs to be changed, it doesn't make any difference. Meh.
nomead is offline   Reply With Quote
Old 2019-03-15, 19:21   #3096
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2·3·5·37 Posts
Default

Quote:
Originally Posted by nomead View Post
Okay, I was shocked. For whatever reason, there is pretty much no measurable performance difference between barrett77 and 87 as tested on real work. So, again, RTX 2080, GPU clock locked at 1800 MHz. Six exponents each in the M9152xxxx range factored from 76 to 77 bits. All are reported as 167.21 GHz-days. Average for the barrett77 runs: 1 hour 18 minutes 40.223 seconds. And for the barrett87 runs: 1 hour 18 minutes... 42.352 seconds. It's well within the measurement error margin now. I wonder why I saw that 1% earlier, but then, that was for a single run for each kernel.

So, nothing needs to be changed, it doesn't make any difference. Meh.
No problem. And yes, those run to run variations are annoying. On a stock Geforce you have powertarget, temperature target, actual temperature and so on. Even when you try to lock a specific clockrate you have those (minor) run to run variations. This happens on Tesla, too. And on Tesla it is much easier to make sure you're running on a fixed clockrate (just set a relative low application clock). For benchmarks/comparisons you should always run in a realistic setting and not on stuff like "RAW GPU BENCH".

Oliver
TheJudger is offline   Reply With Quote
Old 2019-03-17, 23:31   #3097
Thecmaster
 
"Arvid Björklin"
Apr 2016
Pitea, Sweden

10010012 Posts
Default

Help. I'm running mfaktc 0.21 cuda 65 right now. I have a GTX 960 and saw there was a cuda 80 and a cuda100 vercion of mfaktc. whats the diffrence between them and should I rund an other version?
/Arvid
Thecmaster is offline   Reply With Quote
Old 2019-03-18, 01:13   #3098
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

113538 Posts
Default

Quote:
Originally Posted by Thecmaster View Post
Help. I'm running mfaktc 0.21 cuda 65 right now. I have a GTX 960 and saw there was a cuda 80 and a cuda100 vercion of mfaktc. whats the diffrence between them and should I rund an other version?
/Arvid
Test them and see what's faster on your card. Note that mfaktc tuning can make a several percent difference for a set version. CUDA 6.5 has done well in speed comparisons in my testing in CUDALucas. (I don't have a GTX960.)
kriesel is offline   Reply With Quote
Old 2019-03-18, 11:01   #3099
Thecmaster
 
"Arvid Björklin"
Apr 2016
Pitea, Sweden

73 Posts
Default

Quote:
Originally Posted by kriesel View Post
Test them and see what's faster on your card. Note that mfaktc tuning can make a several percent difference for a set version. CUDA 6.5 has done well in speed comparisons in my testing in CUDALucas. (I don't have a GTX960.)
Just tested cuda 100 and got 10% faster. I will test 80 to and take the one with best speed.

The speed on 80 was just 8% faster than 65. So 100 it is. ty for help.
/Arvid

Last fiddled with by Thecmaster on 2019-03-18 at 11:09
Thecmaster is offline   Reply With Quote
Old 2019-03-18, 14:15   #3100
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29×167 Posts
Default

Quote:
Originally Posted by Thecmaster View Post
Just tested cuda 100 and got 10% faster. I will test 80 to and take the one with best speed.

The speed on 80 was just 8% faster than 65. So 100 it is. ty for help.
/Arvid
Thanks, I've not looked into above 8.0 myself yet, looks like there may be some gains there for some of my fleet too.

Was your testing with or without tuning? See https://mersenneforum.org/showpost.p...postcount=2505

Gpu clock constant, or power limited, or allowed to fluctuate?

Last fiddled with by kriesel on 2019-03-18 at 14:16
kriesel is offline   Reply With Quote
Old 2019-03-18, 19:02   #3101
Thecmaster
 
"Arvid Björklin"
Apr 2016
Pitea, Sweden

73 Posts
Default

Quote:
Originally Posted by kriesel View Post
Thanks, I've not looked into above 8.0 myself yet, looks like there may be some gains there for some of my fleet too.

Was your testing with or without tuning? See https://mersenneforum.org/showpost.p...postcount=2505

Gpu clock constant, or power limited, or allowed to fluctuate?
No. I didn't tune any of that. I was just on my way to search for information on that or ask about it.

I looked around in the mfaktc.ini file and found some interesting things to tweak but I don't know where to start.


Have done some tuning now.

GPUSieveProcessSize=32
GPUSieveSize=128
GPUSievePrimes=110000 (this gets adjusted to 110134 when program starts)

This gave me a bit nor through put.

With 6.5 I got 303 GHz-d/Day
With 10.0 I got 331
After tweaking I got 337

This on a GTX 960 2GB

Last fiddled with by Thecmaster on 2019-03-18 at 19:21
Thecmaster is offline   Reply With Quote
Old 2019-03-20, 18:43   #3102
petrw1
1976 Toyota Corona years forever!
 
petrw1's Avatar
 
"Wayne"
Nov 2006
Saskatchewan, Canada

5×907 Posts
Default CPU impacts GPU more than I expected.

I have a 2080Ti GPU running mfaktc
on a i7-7820X with 32GB of 3600DDR4 RAM running Large P-1 on all 8 cores.

The CPU is running at 60 degrees F and the GPU at 81 degrees F.

The GPU is at about 3,900 GHZDays/Day
but if I stop Prime95 the GPU thruput immediately goes to about 4,250.
The GPU stays at 81 degrees F.
If I restart Prime95 the GPU stays at 4,250 until about the time all 8 cores are started, have the RAM allocated and are running the P-1 again.

In other words the total thruput of the rig is LOWER when the CPU is busy. It does about 75 GhzDays/Day of P1 while the GPU loses about 300.

I don't know if the impact would be the same if I was running LL instead of P-1 (much less RAM); though my guess is it would be about the same impact.
petrw1 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 09:37.

Tue Jan 19 09:37:34 UTC 2021 up 47 days, 5:48, 0 users, load averages: 2.29, 2.33, 2.39

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.