mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2016-11-28, 08:26   #1
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

2·1,163 Posts
Default Prime95 vs. CUDALucas

This is in regard to the following assignment:

Quote:
Test=A5992806202E1212029B4F9445CE945D,79437629,75,1
I was thinking Culu would run much faster than Prime95. In the attached image showing a comparison between the two running the same test. Culu runs only 13% faster than Prime95. Does anyone have any ideas as to why this is happening?

Edit: I am using an nVidia GTX-750Ti and CUDA 8.
Attached Thumbnails
Click image for larger version

Name:	p95_culu.jpg
Views:	248
Size:	94.0 KB
ID:	15193  

Last fiddled with by storm5510 on 2016-11-28 at 08:34 Reason: Additional Information
storm5510 is offline   Reply With Quote
Old 2016-11-28, 08:44   #2
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11001001101102 Posts
Default

And what processor are you comparing it against?

Modern graphics cards do not have exceptionally good double-precision performance; a GTX1080 is 256 gigaflops peak, which is the same peak as a quad-core 4GHz Haswell. The GTX750Ti is about 40 gigaflops peak, so slower than a single core of a 4GHz Haswell.

Last fiddled with by fivemack on 2016-11-28 at 08:45
fivemack is offline   Reply With Quote
Old 2016-11-28, 10:51   #3
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

23×32×47 Posts
Default

The last Nvidia cards with "good" double precision performance was like GTX 580/590 and then the original Titan from 2013 and Titan Black / Titan Z from 2014 in the 700 series.

By "good" I mean 1/3rd of its single precision performance. All consumer cards since has DP performance of 1/24th or 1/32th of its SP performance.

http://www.mersenne.ca/cudalucas.php


Maybe you should use the SP performance for factoring with mfaktc instead.

Your 750Ti has 1306 GFLOPs SP and 40.8 GFLOPs DP:
https://en.wikipedia.org/wiki/GeForce_700_series

Last fiddled with by ATH on 2016-11-28 at 10:56
ATH is offline   Reply With Quote
Old 2016-11-28, 16:54   #4
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

1001000101102 Posts
Default

Quote:
Originally Posted by fivemack View Post
And what processor are you comparing it against?
i5-3570 @ 3.4 GHz.

Quote:
Originally Posted by ATH
Maybe you should use the SP performance for factoring with mfaktc instead.

Your 750Ti has 1306 GFLOPs SP and 40.8 GFLOPs DP:
So, you are saying there are two different ways to run Culu and mfaktc? I am not familiar with SP and DP. How do I do this?
storm5510 is offline   Reply With Quote
Old 2016-11-28, 17:16   #5
CRGreathouse
 
CRGreathouse's Avatar
 
Aug 2006

5,987 Posts
Default

Quote:
Originally Posted by storm5510 View Post
So, you are saying there are two different ways to run Culu and mfaktc? I am not familiar with SP and DP. How do I do this?
SP is single-precision (32 bits), DP is double-precision (64 bits). Your GPU can do single-precision operations 32 times faster than it can do double-precision operations, so you'd be better off doing work that requires only single-precision. (If SP was, say, only 4 times faster than DP you'd be better off doing DP work.)

I think that ATH was suggesting that you use mfaktc instead of Culu, rather than switching modes of one or the other.
CRGreathouse is offline   Reply With Quote
Old 2016-11-28, 17:28   #6
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

64708 Posts
Default

Yes, CUDALucas requires double precision, and it is therefore slow because it is running only 1/32 of your cards single precision performance.

It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision) instead of LL tests with CUDALucas (double precision).

Last fiddled with by ATH on 2016-11-28 at 17:30
ATH is offline   Reply With Quote
Old 2016-11-28, 17:36   #7
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

2·1,163 Posts
Default

Quote:
Originally Posted by ATH View Post
It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision)...
This is primarily what I have been doing. I wanted to see how CUDALucas would perform on this hardware. Obviously, not as good as others. Case closed.
storm5510 is offline   Reply With Quote
Old 2022-04-17, 16:21   #8
Magellan3s
 
Mar 2022

3·23 Posts
Default

Quote:
Originally Posted by ATH View Post
Yes, CUDALucas requires double precision, and it is therefore slow because it is running only 1/32 of your cards single precision performance.

It would probably be more beneficial for GIMPS and for the amount of GHz-days accumulating on your account (if you care about that) if you do factoring on the card with mfaktc (single precision) instead of LL tests with CUDALucas (double precision).
If we could get CUDALucas to work with single precision the performance would be 32 times higher!
Magellan3s is offline   Reply With Quote
Old 2022-04-17, 17:49   #9
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

25·32·19 Posts
Default

Quote:
Originally Posted by Magellan3s View Post
If we could get CUDALucas to work with single precision the performance would be 32 times higher!
No, because doing single-precision FFT would require many many more operations per iteration to keep error levels low enough for the computation to be correct.

It's not impossible, just less efficient.
VBCurtis is offline   Reply With Quote
Old 2022-04-17, 21:08   #10
Magellan3s
 
Mar 2022

3×23 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
No, because doing single-precision FFT would require many many more operations per iteration to keep error levels low enough for the computation to be correct.

It's not impossible, just less efficient.
Ah, how many more times faster would it be though?
Magellan3s is offline   Reply With Quote
Old 2022-04-18, 05:11   #11
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

25×32×19 Posts
Default

Per iteration, slower. That's what I mean by "less efficient". Otherwise it would have been implemented by now.
VBCurtis is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
CudaLucas Residual evoflash GPU Computing 21 2017-11-13 12:04
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas gives all-zero residues fivemack GPU Computing 4 2016-07-21 15:49
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 22:55.


Mon Sep 26 22:55:35 UTC 2022 up 39 days, 20:24, 0 users, load averages: 1.36, 1.47, 1.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔