mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2017-03-24, 01:22   #2696
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

3×23×47 Posts
Default

Could you (and anyone else with one) please submit a benchmark for 1080Ti:
http://www.mersenne.ca/mfaktc.php#benchmark

Even taking your clock speed into account I'm only expecting 1350GHd/d and you're getting 1480...
James Heinrich is offline   Reply With Quote
Old 2017-04-06, 16:18   #2697
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

28×7 Posts
Default

It seems like I read something here a while back that the author of mfaktc is no longer working on this application. If this is true, then is anyone else going to pick it up and move forward?
storm5510 is offline   Reply With Quote
Old 2017-04-10, 19:37   #2698
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

21268 Posts
Default

Quote:
Originally Posted by storm5510 View Post
It seems like I read something here a while back that the author of mfaktc is no longer working on this application. If this is true, then is anyone else going to pick it up and move forward?
Some minor changes in devel version but nothing serious. Seems like the current code runs fine on Pascal generation so no new code is needed.

Oliver
TheJudger is offline   Reply With Quote
Old 2017-04-25, 05:37   #2699
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

1011011100012 Posts
Default

For what it's worth, the Linux tar is missing the necessary line to compile for the 1060/1070/1080. It's

Code:
NVCCFLAGS += --generate-code arch=compute_61,code=sm_61
for anyone else who runs into this.
Mark Rose is offline   Reply With Quote
Old 2017-06-07, 23:20   #2700
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

9210 Posts
Default

Code:
C:\Users\ethan\Desktop\mfaktc-0.21>.\mfaktc-win-64.exe -tf 66362159 75 76
mfaktc v0.21 (64bit built)
[...]
CUDA version info
  binary compiled for CUDA  8.0
  CUDA runtime version      8.0
  CUDA driver version       8.0

CUDA device info
  name                      GeForce GTX 1080 Ti
  compute capability        6.1
  max threads per block     1024
  max shared memory per MP  98304 byte
  number of multiprocessors 28
  clock rate (CUDA cores)   1683MHz
  memory clock rate:        5505MHz
  memory bus width:         352 bit

[...]
Date    Time | class   Pct |   time     ETA | GHz-d/day    Sieve     Wait
Jun 07 16:14 |    0   0.1% |  6.088   1h37m |   1704.62    82485    n.a.%
Jun 07 16:14 |    4   0.2% |  6.014   1h36m |   1725.59    82485    n.a.%
Jun 07 16:14 |    9   0.3% |  6.001   1h35m |   1729.33    82485    n.a.%
Jun 07 16:14 |   12   0.4% |  6.006   1h35m |   1727.89    82485    n.a.%
Jun 07 16:14 |   16   0.5% |  6.003   1h35m |   1728.76    82485    n.a.%
Jun 07 16:14 |   24   0.6% |  6.005   1h35m |   1728.18    82485    n.a.%
Jun 07 16:14 |   25   0.7% |  6.004   1h35m |   1728.47    82485    n.a.%
Jun 07 16:14 |   37   0.8% |  6.006   1h35m |   1727.89    82485    n.a.%
Jun 07 16:15 |   40   0.9% |  6.008   1h35m |   1727.32    82485    n.a.%
Jun 07 16:15 |   45   1.0% |  6.008   1h35m |   1727.32    82485    n.a.%
Jun 07 16:15 |   49   1.1% |  6.015   1h35m |   1725.31    82485    n.a.%
Jun 07 16:15 |   52   1.3% |  6.017   1h35m |   1724.73    82485    n.a.%
Jun 07 16:15 |   60   1.4% |  6.010   1h34m |   1726.74    82485    n.a.%
This is from an "ASUS ROG STRIX GeForce® GTX 1080 TI 11GB OC Edition VR Ready 5K HD Gaming HDMI DisplayPort DVI Overclocked PC Graphics Card" (P/N ROG-STRIX-GTX1080TI-O11G-GAMING).

TDP Limit is raised to 120% and Boost Frequency Offset is set to +127; the card is limited by TDP Cap but ends up running around 2050MHz indefinitely; it needs about 80% fan to keep GPU temp at 60C in 27C ambient.
Ethan (EO) is offline   Reply With Quote
Old 2017-06-08, 04:23   #2701
Rodrigo
 
Rodrigo's Avatar
 
Jun 2010
Pennsylvania

32×103 Posts
Default

Pretty impressive.

Out of curiosity, what was the power draw for the 1080 during the TF run?

Last fiddled with by Rodrigo on 2017-06-08 at 04:24
Rodrigo is offline   Reply With Quote
Old 2017-06-08, 07:42   #2702
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

22×23 Posts
Default

Quote:
Originally Posted by Rodrigo View Post
Pretty impressive.

Out of curiosity, what was the power draw for the 1080 during the TF run?
The card was reporting ~290W. That works out to 5.9GHzd/d/W (or 68kHzd/J), and over 10000(GHzd/d)^2/W. With a price of $760, this yields JVR of 1.24 and JVR2 of 2138!

I should note that this isn't a hard-core overclock - it's just lifting TDP and Boost caps and letting the chip run as it will at lowish temps. The clocking is completely dynamic according to internal temp/voltage/frequency/load tables. Since the card is being still limited by TDP caps under these conditions, lowering memory frequency and lowering voltage might produce even higher results.

Unfortunately I've got to make this card work for a living for a little while before setting it on GIMPS work
Ethan (EO) is offline   Reply With Quote
Old 2017-08-05, 16:14   #2703
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

12EB16 Posts
Default Less console output please

Good program. Is there a way in Mfaktc 0.20 to reduce the interim output by about a factor of ten to 200? It's generating sometimes over 30 or even 200 screen lines a minute, and accumulating about 22MB of screen output per week, redirected into a log file for later inspection. In the upper bit levels that I usually don't run, it's not too bad, but the default bit levels assigned by are quite verbose per unit time. These rates are as observed on a GTX480; a modern high end GPU would exceed them considerably.

I don't see a way in mfakto.ini (other than paring down the print format drastically to reduce file size for the same line count)

# possible values for PrintMode:
# 0: print a new line for each finished class
# 1: overwrite the current line (more compact output on screen, still large volume but lacks crlf in redirected output)
# 2: not supported, output status after every ~ 10 classes
# 3: not supported, output status after every ~ 20 classes
# 4: not supported, output status after every ~ 50 classes
# 5: not supported, output status after every ~ 100 classes
# 6: not supported, output status after every ~ 200 classes
#
# Default: PrintMode=0

PrintMode=0

Autoselection of frequency of screen output to no more often than a user-settable interval in seconds (perhaps 1, 2, 5, 10, 20, 50, 100, 200, 500) and at resumption and completion of a bit level would be great. As things stand now, the frequency is quite high and strongly dependent on the bit level being run.

Sample outputs:

Starting trial factoring M151289627 from 2^69 to 2^70 (0.79 GHz-days)
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Jul 19 05:16 | 0 0.1% | 0.213 n.a. | 333.93 82485 n.a.%
Jul 19 05:16 | 9 0.2% | 0.213 n.a. | 333.93 82485 n.a.%
Jul 19 05:16 | 13 0.3% | 0.302 4m49s | 235.52 82485 n.a.%
Jul 19 05:16 | 24 0.4% | 0.213 n.a. | 333.93 82485 n.a.%
Jul 19 05:16 | 28 0.5% | 0.214 n.a. | 332.37 82485 n.a.%
Jul 19 05:16 | 33 0.6% | 0.213 n.a. | 333.93 82485 n.a.%
Jul 19 05:16 | 37 0.7% | 0.214 n.a. | 332.37 82485 n.a.%
Jul 19 05:16 | 40 0.8% | 0.300 4m46s | 237.09 82485 n.a.%
Jul 19 05:16 | 48 0.9% | 0.213 n.a. | 333.93 82485 n.a.%
Jul 19 05:16 | 49 1.0% | 0.213 n.a. | 333.93 82485 n.a.%
Jul 19 05:16 | 52 1.1% | 0.213 n.a. | 333.93 82485 n.a.%

(Screen output every ~0.29 seconds! Would prefer about 1/100 or 1/200 as often)


Starting trial factoring M139505099 from 2^75 to 2^76 (54.85 GHz-days)

found a valid checkpoint file!

Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Aug 05 09:03 | 780 17.0% | 14.800 3h16m | 333.56 82485 n.a.%
Aug 05 09:03 | 784 17.1% | 14.409 3h11m | 342.61 82485 n.a.%
Aug 05 09:03 | 789 17.2% | 14.261 3h08m | 346.16 82485 n.a.%
Aug 05 09:03 | 792 17.3% | 14.862 3h16m | 332.17 82485 n.a.%
Aug 05 09:04 | 796 17.4% | 14.262 3h08m | 346.14 82485 n.a.%
Aug 05 09:04 | 801 17.5% | 14.261 3h08m | 346.16 82485 n.a.%
Aug 05 09:04 | 804 17.6% | 14.261 3h08m | 346.16 82485 n.a.%
Aug 05 09:04 | 817 17.7% | 14.261 3h07m | 346.16 82485 n.a.%
Aug 05 09:05 | 820 17.8% | 14.263 3h07m | 346.12 82485 n.a.%

(Screen output every ~11 seconds; would prefer about 1/10 as often)

Last fiddled with by kriesel on 2017-08-05 at 16:15
kriesel is offline   Reply With Quote
Old 2017-08-05, 16:37   #2704
James Heinrich
 
James Heinrich's Avatar
 
"James Heinrich"
May 2004
ex-Northern Ontario

62538 Posts
Default

For short-runtime assignments you can use the less-classes version of mfatktc
http://download.mersenne.ca/mfaktc/m...a-versions.zip
James Heinrich is offline   Reply With Quote
Old 2017-08-05, 16:55   #2705
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
U.S.A.

28·7 Posts
Default

Quote:
Originally Posted by kriesel View Post
...Autoselection of frequency of screen output to no more often than a user-settable interval in seconds (perhaps 1, 2, 5, 10, 20, 50, 100, 200, 500) and at resumption and completion of a bit level would be great. As things stand now, the frequency is quite high and strongly dependent on the bit level being run.

(Screen output every ~0.29 seconds! Would prefer about 1/100 or 1/200 as often)

(Screen output every ~11 seconds; would prefer about 1/10 as often)
The suggestion above of interval seconds could be in percent. No more than 1%. Actually, the best thing to do is nothing. Leave it alone!
storm5510 is offline   Reply With Quote
Old 2017-08-07, 16:52   #2706
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29×167 Posts
Default less classes for CUDA6.5?

Quote:
Originally Posted by storm5510 View Post
The suggestion above of interval seconds could be in percent. No more than 1%. Actually, the best thing to do is nothing. Leave it alone!
If it were up to me, the behavior people are used to would not only still be available, it would be the default so as not to annoy or inconvenience anyone happy with & preferring the status quo.

Mfaktc relative performance, on GTX480 on Windows 7 Pro, informal benchmark follows

V0.20 CUDA6.5 64-bit & V355.60 driver package
Starting trial factoring M139505237 from 2^75 to 2^76 (54.85 GHz-days)
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Aug 07 06:29 | 0 0.1% | 14.252 3h47m | 346.38 82485 n.a.%
Aug 07 06:29 | 3 0.2% | 14.277 3h47m | 345.78 82485 n.a.%
Aug 07 06:30 | 4 0.3% | 14.254 3h47m | 346.33 82485 n.a.%
Aug 07 06:30 | 7 0.4% | 14.329 3h48m | 344.52 82485 n.a.%
Aug 07 06:30 | 12 0.5% | 14.246 3h46m | 346.53 82485 n.a.%
Aug 07 06:30 | 15 0.6% | 14.270 3h46m | 345.95 82485 n.a.%
Aug 07 06:31 | 19 0.7% | 14.262 3h46m | 346.14 82485 n.a.%
Aug 07 06:31 | 24 0.8% | 14.271 3h46m | 345.92 82485 n.a.%
Aug 07 06:31 | 28 0.9% | 14.247 3h45m | 346.50 82485 n.a.%
Aug 07 06:31 | 39 1.0% | 14.263 3h45m | 346.12 82485 n.a.%
...
Aug 07 09:48 | 3963 85.8% | 14.282 32m22s | 345.65 82485 n.a.%

V0.20 Less Classes CUDA 4.2 & V355.60 driver package
Starting trial factoring M139505237 from 2^75 to 2^76 (54.85 GHz-days)
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Aug 07 10:02 | 0 1.0% | 145.12 3h49m | 340.18 82485 n.a.%
Aug 07 10:05 | 3 2.1% | 144.54 3h46m | 341.55 82485 n.a.%
Aug 07 10:07 | 4 3.1% | 144.51 3h43m | 341.62 82485 n.a.%
Aug 07 10:09 | 7 4.2% | 144.80 3h42m | 340.92 82485 n.a.%

V0.21 Less Classes CUDA 8.0 & V378.92 driver package (won't run with V355.60 driver package)
Starting trial factoring M139505237 from 2^75 to 2^76 (54.85 GHz-days)
Date Time | class Pct | time ETA | GHz-d/day Sieve Wait
Aug 07 10:28 | 0 1.0% | 145.01 3h49m | 340.43 82485 n.a.%
Aug 07 10:31 | 3 2.1% | 145.42 3h47m | 339.47 82485 n.a.%
Aug 07 10:33 | 4 3.1% | 145.41 3h45m | 339.49 82485 n.a.%
Aug 07 10:36 | 7 4.2% | 145.07 3h42m | 340.29 82485 n.a.%
Aug 07 10:38 | 12 5.2% | 145.60 3h40m | 339.05 82485 n.a.%
Aug 07 10:41 | 15 6.3% | 153.94 3h50m | 320.68 82485 n.a.%
Aug 07 10:43 | 19 7.3% | 153.97 3h48m | 320.62 82485 n.a.%

simple summary:
V0.20 CUDA 6.5 346.38 GhzD/d ----fastest 100%
V0.21 Less Classes CUDA 4.2 340.18 ---98.2%
V0.21 Less Classes CUDA 8.0 340.43 ---98.3%
Running the less classes versions could cost ~0.9 week per year of throughput.

Note, v0.20 checkpoint file was ignored by either flavor of less-classes, 3 hours throughput was lost when switching version, by the Less-Classes versions of mfaktc starting the bit depth over at each version change. I was able to get it back by reverting to V0.20 CUDA6.5 64-bit version. Which I was doing anyway for that 1.8% speed advantage.

Any chance of compiling a less-classes version for CUDA6.5?

Last fiddled with by kriesel on 2017-08-07 at 17:05
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1668 2020-12-22 15:38
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 21:48.

Tue Jan 19 21:48:49 UTC 2021 up 47 days, 18 hrs, 0 users, load averages: 0.99, 1.46, 1.68

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.