mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-01-13, 11:51   #1
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

29·167 Posts
Default CUDA - Class problems. Factor divisible by 2, 3, 5, 7, or 11

I did an advanced search on the forum, and I found that this error often hinted to a not proper CUDA toolkit/CC/driver/executable configuration: recompilation with the appropriate CC in the makefile, or reinstallation of the toolkit/driver usually solved the issue with both mfaktc and mmff programs.

Unfortunately, a friend of mine incurred in this same error with mmff.exe on Windows 10, but at first look his configurtion is correct.

He has a Pascal card (either GTX 1050 or GTX 1060) and just installed the toolkit and the driver from Nvidia. Here is the screenshot of the issue:

Code:
mmff v0.28 (64bit built)

Compiletime options
  THREADS_PER_BLOCK         256
  MORE_CLASSES              enabled

Runtime options
  GPU Sieving               enabled
WARNING: Cannot read GPUSievePrimes from mmff.ini, using default value (82486)
  GPUSievePrimes            depends on worktodo entry
  GPUSieveSize              16M bits
WARNING: Cannot read GPUSieveProcessSize from mmff.ini, using default value (8)
  GPUSieveProcessSize       8K bits
  WorkFile                  worktodo.txt
  Checkpoints               enabled
  CheckpointDelay           300s
  StopAfterFactor           disabled
  PrintMode                 compact
  V5UserID                  (none)
  ComputerID                (none)
WARNING, no ProgressFormat specified in mmff.ini, using default
  TimeStampInResults        no

CUDA version info
  binary compiled for CUDA  10.0
  CUDA runtime version      10.0
  CUDA driver version       10.0

CUDA device info
  name                      GeForce GTX 1050 Ti with Max-Q Design
  compute capability        6.1
  maximum threads per block 1024
  number of mutliprocessors 6 (unknown number of shader cores)
  clock rate                1417MHz

got assignment: k*2^167+1, k range 1835000000 to 1836000000 (198-bit factors)
Starting trial factoring of k*2^167+1 in k range: 1835M to 1836M (198-bit factors)
 k_min = 1835000000
 k_max = 1836000000
Using GPU kernel "mfaktc_barrett204_F160_191gs"
ERROR: Class problems.  Factor divisible by 2, 3, 5, 7, or 11
Now, apart from the lack of the configuration file mmff.ini I can't see errors.

The makefile "Makefile.win" is set to produce code for CC 3.0 and above (including 6.1 which covers Pascal cards) through the command

Code:
--generate-code arch=compute_61,code=sm_61
the card and the CC are recognized, the kernel is correct...

In red the parts I have never seen before (but my newest GPU card is a GTX 980...)

What might possibly have been gone wrong?

Luigi

Last fiddled with by ET_ on 2019-01-13 at 11:53
ET_ is offline   Reply With Quote
Old 2019-01-13, 14:12   #2
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11010110000002 Posts
Default

I don't run mmff. But when I run mfaktx or CUDAPm1, and get a bad factor, it seems to indicate problem gpu hardware. Running a thorough memory test (multiple patterns, multiple repeats, full memory range) has indicated bad gpu memory in that case.
kriesel is offline   Reply With Quote
Old 2019-01-13, 14:44   #3
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

29·167 Posts
Default

Quote:
Originally Posted by kriesel View Post
I don't run mmff. But when I run mfaktx or CUDAPm1, and get a bad factor, it seems to indicate problem gpu hardware. Running a thorough memory test (multiple patterns, multiple repeats, full memory range) has indicated bad gpu memory in that case.
Thanks Ken, but it does not look as a GPU problem, as the GPU worked fine before the transition to CUDA 10...
ET_ is offline   Reply With Quote
Old 2019-01-13, 15:00   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26·107 Posts
Default

Still may be worth running, since it could rule out memory issues, or the hardware health may have declined recently. I had a gpu that drastically increased memory error rate in a year. (Probably faster; I only tested memory a year apart.)
kriesel is offline   Reply With Quote
Old 2019-01-13, 19:19   #5
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

2×33×11 Posts
Default

Just finished a memory test (memtestg80.exe) with the GTX 1050 ti mentioned in the first post of this thread with 3 GB used and 1000 test iterations. There were errors, but only with the random blocks part of the test, and not every iteration had these errors. Not sure if errors in this part implies a failure with mmff. If need be, I can redo the test with fewer iterations and upload the output.
Dylan14 is offline   Reply With Quote
Old 2019-01-14, 02:09   #6
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

17×47 Posts
Default

Luigi,

The stuff in red from your post doesn't matter.

I suspect the number he is trying to factor is out-of-range.
mmff comes with a "test" worktodo.txt file that has a number of known fermat factors.
Has this test file been run?
tServo is offline   Reply With Quote
Old 2019-01-14, 09:15   #7
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

12EB16 Posts
Default

Quote:
Originally Posted by tServo View Post
Luigi,

The stuff in red from your post doesn't matter.

I suspect the number he is trying to factor is out-of-range.
mmff comes with a "test" worktodo.txt file that has a number of known fermat factors.
Has this test file been run?
It is part of the worktodo.txt file distributed with the executable.
I tested it on a gtx 680, and it worked nicely and didn't throw the error out, as the k is very small with relation to the acual k sizes.

Last fiddled with by ET_ on 2019-01-14 at 09:15
ET_ is offline   Reply With Quote
Old 2019-01-15, 15:11   #8
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

25216 Posts
Default

To further debug the issue, I've made the following changes to the file tf_validate.h, which verifies if a factor doesn't have small factors in itself:


in line 271, comment out the exit(1);
in line 274, comment out the exit(1);


The effect of these was to allow the code to carry on after encountering the class problems error. I then recompiled the code without issue, and then run the sample worktodo-test256.txt file included in the source (renamed to worktodo.txt of course). When I run mmff this time, it yields a bunch of errors and then it appears to hang. I've included the output from this run, see the attachment.
Attached Files
File Type: txt output_immediateexitdisabled.txt (13.9 KB, 230 views)
Dylan14 is offline   Reply With Quote
Old 2019-01-15, 17:03   #9
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

26·107 Posts
Default

Quote:
Originally Posted by Dylan14 View Post
Just finished a memory test (memtestg80.exe) with the GTX 1050 ti mentioned in the first post of this thread with 3 GB used and 1000 test iterations. There were errors, but only with the random blocks part of the test, and not every iteration had these errors. Not sure if errors in this part implies a failure with mmff. If need be, I can redo the test with fewer iterations and upload the output.
I suspect your card has some hardware issue. It may be not reliable enough for mmff.
If I understand you correctly, you are seeing issues with it in mmff and in memtestg80.

Try CUDALucas 2.06 May 2017 beta, -memtest option, with as much memory coverage as it will let you run (nearly all the 4GB on the gpu card, except for a bit ~100MB for the program to sit in), and follow with a full double check run. Standard operating procedure is to duplicate a run of a known Mersenne prime, such as M6972593. If it can't pass that test, repeatedly, it's probably not reliable enough to use in mmff or other number theory software either.
In your memtestg80 testing, what did it show you in terms of error counts and location? I've had good results for limited times, running trial factoring on a card that became unusable for P-1 and then LL testing. As the memory cells failed over time, it came to cover more of the address space. P-1 is the most memory hungry, primality testing is intermediate, trial factoring has a small footprint. Eventually the card became unusably unreliable even for TF and was retired.
How old is your card? (Warranty expired?)
Do you see any visual artifacts if you use it to drive a display?

EDIT:
Several choices for gpu testing are listed at https://www.raymond.cc/blog/having-p...st-its-memory/

Last fiddled with by kriesel on 2019-01-15 at 17:16
kriesel is offline   Reply With Quote
Old 2019-01-15, 21:08   #10
Dylan14
 
Dylan14's Avatar
 
"Dylan"
Mar 2017

2×33×11 Posts
Default

Quote:
Originally Posted by kriesel View Post
I suspect your card has some hardware issue. It may be not reliable enough for mmff.
If I understand you correctly, you are seeing issues with it in mmff and in memtestg80.

Try CUDALucas 2.06 May 2017 beta, -memtest option, with as much memory coverage as it will let you run (nearly all the 4GB on the gpu card, except for a bit ~100MB for the program to sit in), and follow with a full double check run. Standard operating procedure is to duplicate a run of a known Mersenne prime, such as M6972593. If it can't pass that test, repeatedly, it's probably not reliable enough to use in mmff or other number theory software either.
In your memtestg80 testing, what did it show you in terms of error counts and location? I've had good results for limited times, running trial factoring on a card that became unusable for P-1 and then LL testing. As the memory cells failed over time, it came to cover more of the address space. P-1 is the most memory hungry, primality testing is intermediate, trial factoring has a small footprint. Eventually the card became unusably unreliable even for TF and was retired.
How old is your card? (Warranty expired?)
Do you see any visual artifacts if you use it to drive a display?

EDIT:
Several choices for gpu testing are listed at https://www.raymond.cc/blog/having-p...st-its-memory/

CUDAlucas -memtest - ran with 125 chunks of memory, 1 iteration. No errors found.
CUDAlucas double check - used M36 and M37 (exponents 2976221 and 3021377, respectively). Both came up prime, as expected.
memtestg80 - As stated before the errors only occur in the random block phase. It does not tell me where the errors occur, just how many. See the file memtestg80-output.txt which is included in the zip file attached to this post.
The GTX 1050 ti that I am testing is in a laptop, which I recieved in late June 2018. The warranty is active until 6/10/19.
No artifacts are present.
Attached Files
File Type: zip gputesting.zip (10.1 KB, 232 views)
Dylan14 is offline   Reply With Quote
Old 2019-01-15, 23:35   #11
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

29×277 Posts
Default

Try a PM to TheJudger. He maintains mfaktc and is very familiar with all the CUDA changes over the years. mmff is a derivative of mfaktc.
Prime95 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
2LMs with exponent divisible by 13 Batalov Cunningham Tables 1 2011-04-14 10:23
5th RPS Drive: 14 Ks < 300 divisible by 3 Kosmaj Riesel Prime Search 756 2008-07-04 12:50
Divisible by a Prime davar55 Puzzles 13 2007-09-12 17:35
Divisible by 7 ? davar55 Puzzles 4 2007-08-09 20:10
Divisible by 7 davar55 Puzzles 3 2007-05-14 22:05

All times are UTC. The time now is 10:44.


Sun Oct 2 10:44:21 UTC 2022 up 45 days, 8:12, 0 users, load averages: 1.16, 1.07, 1.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔