mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-04-13, 14:15   #2773
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×127 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
I think I just now saw the tEXPONENT and cEXPONENT files update in sync with screen output instead of the checkpoint interval this time; but it's hard to be certain.

I'm going to try adding the -c command line flag with 100000 for the checkpoint value to see if this error occurs that way. After that I'll try setting screen output larger than checkpoint right in the ini file to begin with and use no command line flags/switches to see if the error also occurs.
I've run some test cases, in v2.06 (May 5 2017 build), with console redirection to a log file, and saveallcheckpoints=1, for easy review after the fact. It looks to me that if reportiterations<=checkpointiterations, and checkpointiterations is a multiple of reportiterations, it acts as expected, saving checkpoints every specified number of iterations. But if reportiterations>checkpointiterations, checkpoints are saved at the least common multiple of the reportiterations and checkpointiterations settings in effect. I recommend for now, picking checkpointiterations according to computing time loss tolerance, and using equal or smaller value than that for the reportiterations settings.

Also, investigate why your system is having errors that make restart from last checkpoint an issue. Test the cpu/system side memory as well as the gpu side, etc. Does prime95/mprime work reliably on it?

SaveAllCheckpoints=1
reportiterations=100000
checkpointiterations=500000
checkpoints at 500000 iteration intervals as intended
(up to 6pm 4/12/19 in my test folder)

reduced checkpointiterations to 200000 in ini file and restarted;
increased reportiterations to 500000 via Y keyboard input twice;
checkpoints then occurred at 1M intervals (which is when screen and checkpoint nominal intervals coincide; least common multiple)
(up to 1am 4/13/19 in my test)

reduced reportiterations to 200000 via y keyboard input at 1am
checkpoint occurred at 400K after the last one at 102am. (the software correctly did a checkpoint at the next n*200k point after the change)
after that checkpoints and screen reports are at 200000 intervals as intended
(UP TO 7:16AM)

reduced reportiterations to 100000 via y keyboard input at 716am, gave
checkpoints at 200k, screen reports at 100k as intended

Last fiddled with by kriesel on 2019-04-13 at 14:20
kriesel is offline   Reply With Quote
Old 2019-04-13, 16:20   #2774
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

32·331 Posts
Default

Anyone know what causes this error which stops the CUDALucas run?

Code:
CUDALucas.cu(1989) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated.
Resetting device and restarting from last checkpoint.

CUDALucas.cu(1115) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
ATH is online now   Reply With Quote
Old 2019-04-13, 17:07   #2775
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37·127 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
I've made this complaint in previous versions before and you've managed to code it back in again. It's quite an annoying error because it causes immense loss of processing time and waste of utility bills.
When did you first see and report this behavior (date or post link here preferably, or version)? (A quick forum search on "GhettoChild" didn't turn up anything relevant prior to yesterday.)
Edit: better search turned up https://www.mersenneforum.org/showpo...postcount=2207
and owftheevil responds in https://www.mersenneforum.org/showpo...postcount=2208

Quote:
Thanks for pointing this out. I had assumed that checkpoints would be less frequent than screen reports, so this was intended to save time in case the cufft hang bug manifested. I'll set that to go with whichever is more frequent.
Quote:
Originally Posted by ATH View Post
Anyone know what causes this error which stops the CUDALucas run?

Code:
CUDALucas.cu(1989) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated.
Resetting device and restarting from last checkpoint.

CUDALucas.cu(1115) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
A search of some of my gpu run logs (some constituting a year or two) yielded no matching occurrences. I have nothing about those error numbers in the bug and wish list or the edited readme. Sorry. Good luck, and please post any info updates or answers on these.

Last fiddled with by kriesel on 2019-04-13 at 17:32
kriesel is offline   Reply With Quote
Old 2019-04-14, 02:49   #2776
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

41 Posts
Talking

@kriesel:
I have not tested 2.06 yet, I get 1-5 errors per day on a single instance of 2.05.1 when I run more than one instance simultaneously; (currently running a 4M FFT and 14M FFT exponents each instance). Error are more likely to trigger when I run the Chrome browser while the two instances are processing. A single instance can go many days without errors unless ambient temps get too high. Errors are not difficult to manage or avoid, they frequently go away when the instance retests the iteration in question. It's just the size of reprocessing or how far back it restarts that can be a huge waste.

I tested again the difference between using only ini file checkpoint and screen output intervals vs specifying switches/flags in command-line. Results are the same, it uses the screen output value instead of checkpoint value no matter which one is smaller; and no matter if the values are specified only in the ini file or in both ini file and command-line. Again this is on version 2.05.1 as I have not used 2.06 yet but I will try the newer one later tonight.

Quote:
Originally Posted by ATH View Post
Anyone know what causes this error which stops the CUDALucas run?

Code:
CUDALucas.cu(1989) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated.
Resetting device and restarting from last checkpoint.

CUDALucas.cu(1115) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
I would guess you have a program that is hogging the GPU. I get a similar, but not identical, error from CUDAPm1 during/starting Stage-2, if I already have CUDALucas running 1 or 2 instances simultaneously. I also see the graphics driver reset when I'm running Chrome with enough tabs or activity simultaneously to running CUDALucas. I'm using a GTX 770 with low spec PC hardware.

Last fiddled with by GhettoChild on 2019-04-14 at 02:51
GhettoChild is offline   Reply With Quote
Old 2019-04-25, 22:05   #2777
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

41 Posts
Post

I get identical results in v2.05.1 and v2.06beta. The screen output iteration value overrides the checkpoint value no matter if specified in the .ini file or in the command-line or in both simultaneously. If I set screen output to 500,000 and checkpoint to 100,000, the checkpoint will only update every 500,000.

I decided to use the checkpoint files from 2.05.1 to continue processing in 2.06beta. I didn't want to toss weeks of processing without confirmation they are incompatible.

Right now I have CUDAPm1 v0.20 CUDA5.5 running a 14432K FFT and CUDALucas v2.06beta CUDA8.0. I can't run CUDA10 because it makes a crt dll error prompt. Similar with CUDAPm1 v.22.
GhettoChild is offline   Reply With Quote
Old 2019-04-26, 01:09   #2778
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010010110112 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
I get identical results in v2.05.1 and v2.06beta. The screen output iteration value overrides the checkpoint value no matter if specified in the .ini file or in the command-line or in both simultaneously. If I set screen output to 500,000 and checkpoint to 100,000, the checkpoint will only update every 500,000.

I decided to use the checkpoint files from 2.05.1 to continue processing in 2.06beta. I didn't want to toss weeks of processing without confirmation they are incompatible.

Right now I have CUDAPm1 v0.20 CUDA5.5 running a 14432K FFT and CUDALucas v2.06beta CUDA8.0. I can't run CUDA10 because it makes a crt dll error prompt. Similar with CUDAPm1 v.22.
Set screen output to 100,000, or 50,000. I don't understand why you seem so attached to running extremely rare screen updates. (Several hours between screen updates, per a prior post.) The overhead of 50,000 or 10,000 vs. 500,000 is small. It's just another line of text. Some people think nothing of running mfaktc on low bit levels on high exponents that completes several factor classes and outputs several lines of screen output per SECOND.

Continuing with CUDALucas v2.06 from a V2.05.1 start or even slightly earlier is no problem. (Mfaktc version or class count change will restart from the beginning of a bit level.)

I recommend NOT running large fft lengths until you get your rig and config more stable at modest fft lengths. (Note, the self tests only go up to 8M fft length.)

Running CUDAPm1 on a 4GB gpu along with another CUDA GIMPS program is probably asking for trouble, especially in a host-system-RAM-starved system. CUDAPm1 is described as ALPHA software (less mature than beta, which is less mature than a release). P-1 stage 2 is the most memory-hungry computation type there is in GIMPS and will try to use most or all the gpu ram. CUDAPm1 checks available gpu memory early in a run, and apparently assumes an equal amount will be available hours or days later. It decides how many primes between B1 and B2 to work on simultaneously, at the beginning of stage 2, based partly on available memory. If another program is comiing and going and varying requirements, and therefore varying the amount of memory available, that could be a problem.

It sounds like the issues you're having are more than negating any plausible slight gain from running multiple applications. Multiple instances on one gpu seems to work with mfaktc and provide some aggregate throughput increase. Mfaktc does not need much gpu memory or system memory.

What vintage is your computer? Three GB sounds like too little ram to me, and consistent with perhaps a Core2 Duo system. Running prime95 primality tests on old systems can be very expensive per test (~$12/85M primality test without any restarts), much more than cloud computing costs, even if only considering the electricity. I've bought used system boxes (workstation grade, solidly built, well documented and labeled, toolfree maintenance, dual-Xeon, Windows OS, etc) with 12GB to 128GB for US$200-800 and some included an older gpu or two in that; capable of doing the same test for a lot less $. I don't think I have any hardware with less host system ram than the ram contained in an individual gpu installed in it. Asking the OS and driver to reliably handle gpu to host transfers spilling directly to the paging file seems risky to me.
kriesel is offline   Reply With Quote
Old 2019-04-26, 13:34   #2779
tServo
 
tServo's Avatar
 
"Marv"
May 2009
near the Tannhäuser Gate

2×271 Posts
Default

Quote:
Originally Posted by ATH View Post
Anyone know what causes this error which stops the CUDALucas run?

Code:
CUDALucas.cu(1989) : cudaSafeCall() Runtime API error 6: the launch timed out and was terminated.
Resetting device and restarting from last checkpoint.

CUDALucas.cu(1115) : cudaSafeCall() Runtime API error 46: all CUDA-capable devices are busy or unavailable.
ATH,
What video board and what exponent are you using?
Does this happen soon after starting?
Windows/linux?
If windoze, could a virus check or update be going on?

I have seen this when too much is going on in the system.
tServo is offline   Reply With Quote
Old 2019-04-26, 15:10   #2780
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

32·331 Posts
Default

Quote:
Originally Posted by tServo View Post
ATH,
What video board and what exponent are you using?
Does this happen soon after starting?
Windows/linux?
If windoze, could a virus check or update be going on?

I have seen this when too much is going on in the system.
It is an old Titan Black and it mostly happened when I used CUDA 5.0 version of CUDALucas, so I stopped using that version. But I'm about to scrap this card now, it keeps giving bad results randomly even though when I test it with GPUmemtest and CUDALucas extended self tests and mfaktc extended self tests there are no errors.
ATH is online now   Reply With Quote
Old 2019-04-27, 06:28   #2781
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

2916 Posts
Post

Core 2 Quad Q6600. It was a test install that became my main PC; when my install medium died during install and my main PC (same generation) had a main drive die. My PC is only pulling 375Watts according to the UPS. 4yrs ago I was running a GTX 295 the same way and it was pulling 350Watts. Electricity price in my home is arbitrary but I don't have air-conditioning so the power consumption is no worse that those with A/C. I turn off all this high drain number processing during heat waves.

As for why I want really large/slow screen outputs and much smaller/faster checkpoint updates. I run these CUDA programs for days and weeks at a time. I observe performance, errors and ETA. I don't want to scroll through pages and pages of screens full of output lines per instance/program to observe its performance trend. 100,000/hr for 3-5days and more of output. That's a waste.
GhettoChild is offline   Reply With Quote
Old 2019-04-27, 15:04   #2782
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

37×127 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
I don't want to scroll through pages and pages of screens full of output lines per instance/program to observe its performance trend. 100,000/hr for 3-5days and more of output. That's a waste.
I know the feeling. I log all GIMPS gpu runs to disk. Mfaktc and mfakto are very verbose and there's no control of that whatsoever, for a given exponent, bit level, and number of classes. At least in CUDALucas we have some controls of interim output rate. https://www.youtube.com/watch?v=jv9sDn_2XkI
kriesel is offline   Reply With Quote
Old 2019-05-21, 00:15   #2783
flashjh
 
flashjh's Avatar
 
"Jerry"
Nov 2011
Vancouver, WA

112310 Posts
Default CUDALucas 2.06

Hello all!

I compiled a new windows CUDALucas 2.06 for all CUDA 4.0 - 10.1. Support for win32 ended at CUDA 6.5.

No big changes, removed nmvl requirement and made small updates for new CUDA versions.

2.06 is here

Lib files are here, if you need them

I was able to do basic testing, but please test thoroughly before using for production.

Everyone, let me know what you find that needs to be fixed and what you would like changed.

~Cheers
flashjh is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 22:25.

Mon Nov 23 22:25:33 UTC 2020 up 74 days, 19:36, 4 users, load averages: 2.33, 2.46, 2.55

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.