mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Blogorrhea > kriesel

Closed Thread
 
Thread Tools
Old 2018-06-01, 22:42   #12
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19×311 Posts
Default TF & P-1 optimization and tradeoff with each other and primality testing

Some basics are covered at https://www.mersenne.org/various/math.php. Numbers there presumably are for some vintage of Prime95's cpu-oriented code.

Trial factoring is quick at low bit depths and exponentially slower as the bit depth increases. The next bit depth takes roughly as long as all the preceding. Also as the bit depth increases, the GhzDay/day rating declines so it's a bit worse. That effect is only partially offset by the density of primes declining slowly as bit level increases. P-1 factoring is effective at finding factor sizes that are unbearably long or completely unfeasible via trial factoring. The smaller the trial factor, the more likely it is to be a factor. The lowest bit levels of TF are the most productive per unit of computing time, so the initial factoring effort is trial factoring done at the lowest available levels. So it makes sense to begin with TF, before P-1, and with the lowest TF levels first. Finding a factor early is more productive because it is faster and allows skipping the slower portion of TF, the time consuming P-1, and the very time consuming primality tests.

TF depth recommended on gpus is available in charts per gpu model. For example,
http://www.mersenne.ca/cudalucas.php?model=684 for GTX1080 or
http://www.mersenne.ca/cudalucas.php?model=716 for Vega 56.

Full exponent status per exponent is available at James Heinrich's site, for example http://www.mersenne.ca/exponent/290001377 Poking around there at a few exponents of your specific interest can be useful. It readily illustrates the difference in preferred cpu (primenet) and gpu (GPU72) thresholds for TF and P-1 bounds. There's also http://www.mersenne.ca/graphs/factor...s_20180601.png

P-1 is more complicated, since there are lots of parameters. CUDAPM1 computes at the outset of an exponent, many adjustable-parameter cases, and selects the parameters it predicts will produce the greatest probable time saving allowing for its own run time and probability of finding a factor versus number and duration of primality tests saved, within the limits of the gpu memory size. Prime95 does similar, within the limits of the system memory utilization cap entered by the user. Gpuowl depends on the user to specify bounds. I recommend using the https://www.mersenne.ca/prob.php?gue...sts=2&K=1&C=-1 P-1 bounds for a given exponent, with the gpu72 TF bit level.

TF can be and routinely is scattered among multiple systems and users for a given exponent over time, by giving out separate assignments for each step in bit level (going from 73 to 74 bits, or 74 to 75, etc.) This is practical because all that's needed to be communicated is a concise worktodo line, with AID, exponent, starting and ending bit levels. An individual TF computation is extremely fast, so fast that the primenet server double checks all submitted factors.

P-1 usually is done as a once-through run for the exponent. It usually occurs in two stages. After stage one, the assembly of powers of many small primes modulo the number being tested, a gcd computation is performed, which might find a factor then and terminate the run. If no factor is found then, it runs stage 2, using whatever B2 bound was user-specified or program-selected to check many cases for factors which are generated from including also a single prime between B1 and B2. At the end of a lengthy stage 2, another gcd computation is performed to determine whether a factor was found. The save files for stage 1 or 2 are of order megabytes or tens of megabytes each for current GIMPS wavefront exponents, so it is not practical to upload these to the primenet server or issue them. While CUDAPm1 does not support extending a run from bounds used for a savefile to higher bounds, some other P-1 software does. For that software, the user can consider making a P-1 to modest bounds, which might find a factor, and then if no factor is found yet, extending bounds and trying again, at less cost than for two completely separate runs, and possibly some net savings relative to one initial run to the higher bounds.
Prime95/ mprime support bounds extension runs from existing save files. From prime95's undoc.txt:
"By default P-1 work does not delete the save files when the work unit completes.
This lets you run P-1 to a higher bound at a later date. You can force
the program to delete save files by adding this line to prime.txt:
KeepPminus1SaveFiles=0"

Gpus are far faster at trial factoring, which uses single-precision computation, than they are at Lucas-Lehmer, testing Probable Prime testing, or P-1 factoring, which use double-precision computation. There are separate ratings for TF and LL at James's site. The ratio between ratings varies among gpu models. Gpus I've checked have TF/LL ratings ratios from 11.3 to 15.5, except for an Intel HD620 igp at 22.5, implying they're an order of magnitude, or more, more effective at trial factoring. All GTX10xx tried were 15.5. Older CUDA gpus were 11 to 13. I've seen reports that the RTX are around 40. For comparison, TF/LL ratings for cpus I have range from 0.72 to 1.25. Newer cpus tended to be lower ratio.

Because of combinations of the above considerations, all TF is usually done before P-1, followed by primality testing, and much later, double checking. However, since some first primality tests were done before gpu TF was common, the TF was done to the significantly lower primenet cpu-tradeoff-based TF levels, so it can pay to run some additional TF on gpus for such exponents before performing a DC for these exponents. It's typically 3 more bit levels. When all phases were done on cpus, it made sense to do P-1 before the last TF bit level or two.

Another factoring method is ECM. This was an effective use of computing time only somewhere p<20M, so is not useful for the current wavefront progress of the GIMPS.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-02-20 at 20:48
kriesel is offline  
Old 2018-06-03, 19:11   #13
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19·311 Posts
Default Assorted handy links

The black hole of number theory reading lists http://www.numbertheory.org/ntw/N4.html

Binomial distribution, useful for estimating probabilities of factors https://www.easycalculation.com/stat...stribution.php
Online binomial calculator http://stattrek.com/online-calculator/binomial.aspx

Online factoring calculator for up to 1012 https://www.calculatorsoup.com/calcu...me-factors.php
Online factoring calculator for up to 14 digits https://www.calculator.net/factoring-calculator.html
Online big-number calculator (limits unknown, will evaluate 244497) https://www.calculator.net/big-number-calculator.html
Online factoring calculator for large numbers https://alpertron.com.ar/ECM.HTM

Detailed exponent lookup at mersenne.ca (note, exponents below 109 may lag mersenne.org until the next sync) http://www.mersenne.ca/exponent/
Exponent status lookup at mersenne.org https://www.mersenne.org/report_exponent/
Manual assignments at mersenne.org https://www.mersenne.org/manual_assignment/
Manual results reporting at mersenne.org https://www.mersenne.org/manual_result/
gpu computing threads http://www.mersenneforum.org/forumdisplay.php?f=92
appeal to number theory folks for sound checks leading to Jacobi check for LL, Gerbicz check for PRP http://mersenneforum.org/showthread.php?t=22509
getting reliable results from unreliable hardware http://www.mersenneforum.org/showthread.php?t=22471
gpu computing cheat sheet & LL testing FAQ (old) http://www.mersenneforum.org/showthread.php?t=16780

Search the mersennne forum! http://mersenneforum.org/search.php

Available Software thread http://www.mersenneforum.org/showthread.php?t=22450

GPU Computing by gpu type
NVIDIA CUDA:
mfaktc trial factoring http://www.mersenneforum.org/showthread.php?t=12827
CUDAPm1 P-1 factoring http://www.mersenneforum.org/showthread.php?t=17835
CUDALucas Lucas-Lehmer test http://www.mersenneforum.org/showthread.php?t=12576
PRP on gpus (none known)

OpenCl based (AMD mostly)
mfakto trial factoring (AMD, Intel) http://mersenneforum.org/showthread.php?t=15646
P-1 factoring; see gpuOwL v4.x or v7.x for a form of P-1 performed with the PRP; Gpuowl v6.8 or later for separate P-1
gpuOwL PRP with Gerbicz check (previously Lucas Lehmer test) (AMD, Intel and recently NVIDIA via OpenCl) http://www.mersenneforum.org/showthread.php?t=22204
clLucas Lucas-Lehmer test (AMD) http://mersenneforum.org/showthread.php?t=18297

How deep to factor with gpus vs. lltest with gpus (this example is for the NVIDIA gtx1070) http://www.mersenne.ca/cudalucas.php?model=683

Primenet server problems thread http://www.mersenneforum.org/showthread.php?p=464094

NVIDIA TDR info
http://developer.download.nvidia.com...n_Recovery.htm
https://www.pugetsystems.com/labs/hp...xperience-777/


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-01-08 at 19:23 Reason: formatting; opencl P-1 update
kriesel is offline  
Old 2018-06-22, 13:32   #14
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19×311 Posts
Default Found a new prime? Really? What next?

What's the process if the lucky discoverer of a prime?

My take on an appropriate protocol:

Step one should be to check configuration and worker logs carefully. For gpu based computation, there are known issues that will generate bad interim residues and a final indication of a prime, for an exponent that does not yield an actual prime, just as a side effect of the computation gone wrong. Such errors need to be checked for and either the cause addressed or ruled out. For cpu-based computation, look in the mersenne computation logs for any indications of past computing error rate. Check system logs for any signs of system stability issues in any case. The overall rate of error in LL test is around 1.6% for cpus, 2% for gpus. Individual hardware units may have an error rate much higher, 10, 20, 50, even 100%. Such unreliable hardware should be identified and repaired or deactivated. When possible, run software for PRP with Gerbicz check instead, which has superior error detection and handling.

If running prime95 or mprime and operating through PrimeNet, various people should receive notification through the PrimeNet server, automatically. But sometimes that automatic email doesn't work. (It's not like there's frequent volume.) Gpu run results can be manually submitted and should also generate such notification.

Promptly private message with the same send to the extent possible, prime95, ewmayer, skurowski, m29 (luke welsh), madpoo and maybe a couple other discoverers of previous Mersenne primes, details of exponent, what it was run on, when the run completed. Copy them a log excerpt, answer questions. Promptly, because it establishes time frame, and there could be money involved, depending on who found it first.
Either PrimeNet notification or private messaging set in motion the process of redundant independent verification runs. Multiple people run verification runs using different software and hardware to reduce the chance of a false positive slipping through even temporarily (which also tests that the various software can find the new prime).

Other than that private message, don't announce there's a discovery until it is verified.
There's a process for putting a press release together. Notifying the Mersenne Research Inc. board members and officers sets verification and disclosure into motion.

Preserve all related files. The last save file may in particular be requested for a rerun by someone else. Make an offline, preferably offsite backup.

Initiate your own double-check, different hardware and software, perhaps before notifying anyone.

Follow the terms and conditions of use. https://www.mersenne.org/legal/
Note especially the Discovery Non-Disclosure Period described there.

Get guidance from the people who have dealt with it before.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-05-08 at 14:24 Reason: title change
kriesel is offline  
Old 2018-06-28, 01:11   #15
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10111000101012 Posts
Default NVIDIA-SMI

NVIDIA-smi is a command line utility available on Linux or Windows. The following relates to Windows. It can be buried deep in the directory tree. Windows Explorer search, followed by drag and drop onto a command prompt window, makes short work of that, no typing involved. Note it gives status for several parameters on all gpus and lists processes using them. Output following is if no command line parameters are given. There are pages of info available on options in the program, including looping output, using -h to display the options.
Code:
$ C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_neutral_cc2df69582aea972\nvidia-smi.exe  
Wed Jun 27 19:35:12 2018  
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 378.66                 Driver Version: 378.66                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070   WDDM  | 0000:03:00.0     Off |                  N/A |
| 89%   85C    P2   108W / 158W |    345MiB /  8192MiB |     98%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Quadro 2000        WDDM  | 0000:1C:00.0     Off |                  N/A |
|100%   90C    P0    N/A /  N/A |   1016MiB /  1024MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 105... WDDM  | 0000:28:00.0     Off |                  N/A | 
| 49%   81C    P0    66W /  75W |    154MiB /  4096MiB |     98%      Default | 
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      4972    C   ...ktc\2\mfaktc-win-64.LessClasses-CUDA8.exe N/A      |
|    0      6884    C   ...faktc\mfaktc-win-64.LessClasses-CUDA8.exe N/A      |
|    1       512    C   ...-q2000\CUDAPm1_win64_20130923_CUDA_55.exe N/A      |
|    2      6500    C   ...050ti\mfaktc-win-64.LessClasses-CUDA8.exe N/A      |
+-----------------------------------------------------------------------------+
It's fully implemented for Quadro and Tesla, and a subset is available for GeForce drivers and GTX / RTX gpus. Here's a --query example output for RTX2080 Super
Code:
"c:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe" --query

==============NVSMI LOG==============

Timestamp                           : Thu Jul 09 11:26:02 2020
Driver Version                      : 442.19
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:03:00.0
    Product Name                    : GeForce RTX 2080 SUPER
    Product Brand                   : GeForce
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : N/A
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : WDDM
        Pending                     : WDDM
    Serial Number                   : N/A
    GPU UUID                        : GPU-449f386b-dcaf-5433-af27-7650c45bd88f
    Minor Number                    : N/A
    VBIOS Version                   : 90.04.7A.00.CD
    MultiGPU Board                  : No
    Board ID                        : 0x300
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.02.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x03
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1E8110DE
        Bus Id                      : 00000000:03:00.0
        Sub System Id               : 0x30813842
        GPU Link Info
            PCIe Generation
                Max                 : 1
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : 51 %
    Performance State               : P2
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : Not Active
            HW Power Brake Slowdown : Not Active
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 8192 MiB
        Used                        : 1364 MiB
        Free                        : 6828 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 11 MiB
        Free                        : 245 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 100 %
        Memory                      : 1 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            SRAM Correctable        : N/A
            SRAM Uncorrectable      : N/A
            DRAM Correctable        : N/A
            DRAM Uncorrectable      : N/A
        Aggregate
            SRAM Correctable        : N/A
            SRAM Uncorrectable      : N/A
            DRAM Correctable        : N/A
            DRAM Uncorrectable      : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 66 C
        GPU Shutdown Temp           : 100 C
        GPU Slowdown Temp           : 97 C
        GPU Max Operating Temp      : 89 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : Supported
        Power Draw                  : 126.22 W
        Power Limit                 : 125.00 W
        Default Power Limit         : 250.00 W
        Enforced Power Limit        : 125.00 W
        Min Power Limit             : 125.00 W
        Max Power Limit             : 292.00 W
    Clocks
        Graphics                    : 1545 MHz
        SM                          : 1545 MHz
        Memory                      : 7500 MHz
        Video                       : 1425 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 2100 MHz
        SM                          : 2100 MHz
        Memory                      : 7751 MHz
        Video                       : 1950 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 3188
            Type                    : C
            Name                    : C:\Users\...\rtx2080super-mfaktc
\mfaktc-2047-win-64.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 6992
            Type                    : C
            Name                    : C:\Users\...\tx2080super-mfaktc
\3\mfaktc-2047-win-64.exe
            Used GPU Memory         : Not available in WDDM driver model
        Process ID                  : 8156
            Type                    : C
            Name                    : C:\Users\...\rtx2080super-mfaktc
\2\mfaktc-2047-win-64.exe
             Used GPU Memory         : Not available in WDDM driver model
It can show gpu serial numbers on some gpus.

It fails if any NVIDIA gpu has a driver or hardware issue:
Code:
"c:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe" --query
  Unable to determine the device handle for GPU 0000:06:00.0: Unknown Error
Putting it inside a simple batch file allows update upon a single keystroke and with low cpu overhead.
Code:
:loop
C:\Windows\System32\DriverStore\FileRepository\nv_dispi.inf_amd64_neutral_cc2df69582aea972\nvidia-smi.exe  
pause
goto loop
Fix the directory path there to match the location in your system. Name it something convenient, like nv.bat.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-01-08 at 19:31
kriesel is offline  
Old 2018-10-07, 14:48   #16
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19×311 Posts
Default TF & LL GhzD/day ratings & ratios and SP/DP ratios for certain GPUs

The attached PDF shows TF GhzD/day rating, LL GhzD/day rating, their ratio, and SP/DP ratio for certain gpus. TF and LL throughputs are actually functions of exponent and other variables, not constant. Values here are for representative independent variables relevant to current GIMPS wavefront. Data are listed in a table and shown on a log chart. Feel free to PM me with data for additional gpus. LL ratings are mostly based on CUDALucas performance, which is substantially slower than recent versions of GpuOwl on the same inputs and GPU model.
Based on these performance ratios, some GPUs are much better suited for TF (NVIDIA GTX10xx and later), while others are well suited to PRP/GEC/proof or P-1 or LL DC with Jacobi (recent AMD models, Vega 56 and newer, especially Radeon VII).


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1
Attached Files
File Type: pdf tf ll ghzd and ratios vs gpu model.pdf (44.7 KB, 47 views)

Last fiddled with by kriesel on 2021-07-29 at 19:01
kriesel is offline  
Old 2018-12-07, 16:48   #17
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19×311 Posts
Default P-1 bounds determination

(following originated as https://www.mersenneforum.org/showpo...3&postcount=36)

As far as I can determine, it's not PrimeNet doing the B1, B2, d, e or NRP determination and dictating to the applications, it's most applications optimizing the bounds and other parameters, unless specified by the user, and the applications afterward telling PrimeNet in the results record what parameters were selected and used.
(However, note that if the bounds reported for P-1 completed without finding a factor are not sufficient, PrimeNet does not retire the P-1 factoring task for the exponent, but instead will reissue the task to someone else. Therefore much of the first P-1 attempt's computation will be duplicated elsewhere, which is inefficient.)

The applications, mprime, prime95, and CUDAPm1 (but not Gpuowl v5.0's PRP-1 or later Gpuowl versions' P-1), unless the user specifies otherwise, estimate P-1 factoring run time and probability of saving some number of primality tests to try to optimize the probable savings in total computing time for the exponent, based on computed probabilities over combinations of many B1 values and several B2 values, of finding a P-1 factor, for
  • a given prior TF level (number of bits trial factored to)
  • a given number of future primality tests potentially saved, typically 1 or 2,
  • available memory resource limits (system or gpu),
  • and probably the system / gpu's performance characteristics / benchmark results.
The mprime, prime95, and CUDAPm1 programs try many combinations of B1 and B2 values while seeking that optimal.

Or the user dictates the P-1 bounds in the worktodo line (or command line as applicable). For mprime or prime95, that explicit bounds specification can be done in a Pminus1 worktodo line, but not a Pfactor line. It seems like a lot of work, and a poor bet that the average user can do better than the coded optimization algorithm created by the author of prime95, mprime, gwnum and some bits of Gpuowl.

From experiments with prime95, with somewhat larger exponents, it appears that optimization calculation occurs also during prime95 Test Status output generation, which shows considerable lag for P-1 work compared to other computation types. It appears there's no caching of previous computation of the optimal P-1 bounds. In my experience prime95 status output without a stack of P-1 work assignment is essentially instantaneous, while this example attached takes 5 seconds, even immediately after a preceding one. With larger P-1 exponents or more P-1 assignments (deeper work caching or more complete dedication of a system to P-1 work than the 1/4 in my example) I think that 5 seconds will increase.

prime95.log:
Code:
Got assignment [aid redacted]: P-1 M89787821
Sending expected completion date for M89787821: Dec 05 2018
...
 [Thu Dec 06 09:17:24 2018 - ver 29.4]
Sending result to server: UID: Kriesel/emu, M89787821 completed P-1, B1=730000, B2=14782500, E=12, Wg4: 123E2311, AID: redacted

PrimeNet success code with additional info:
CPU credit is 7.3113 GHz-days.
The prime95 worktodo.txt record for a PrimeNet-given P-1 assignment contains no B1 or B2 specification.
Code:
Pfactor=[aid],1,2,89794319,-1,76,2
George's description of the optimization process is in the P-1 Factoring section of https://www.mersenne.org/various/math.php.
It's there to read in the source codes also.

CUDAPm1 example:
worktodo entry from manual assignment:
Code:
PFactor=[aid],1,2,292000031,-1,81,2
program output:
Code:
CUDAPm1 v0.20
------- DEVICE 1 -------
name                GeForce GTX 480
Compatibility       2.0
clockRate (MHz)     1401
memClockRate (MHz)  1848
totalGlobalMem      zu
totalConstMem       zu
l2CacheSize         786432
sharedMemPerBlock   zu
regsPerBlock        32768
warpSize            32
memPitch            zu
maxThreadsPerBlock  1024
maxThreadsPerMP     1536
multiProcessorCount 15
maxThreadsDim[3]    1024,1024,64
maxGridSize[3]      65535,65535,65535
textureAlignment    zu
deviceOverlap       1

CUDA reports 1426M of 1536M GPU memory free.
Index 91
Using threads: norm1 256, mult 128, norm2 32.
Using up to 1408M GPU memory.
Selected B1=1830000, B2=9607500, 2.39% chance of finding a factor
  Starting stage 1 P-1, M292000031, B1 = 1830000, B2 = 9607500, fft  length = 16384K
Aaron Haviland has rewritten part of CUDAPm1's bounds selection code in v0.22 https://www.mersenneforum.org/showpo...&postcount=646, building on his earlier 2014 fork. https://www.mersenneforum.org/showpo...&postcount=592

Gpuowl's PRP-1 implementation is a bit different approach, and requires user selection of B1. It defaults to B2=p but allows other B2 to be user specified. See https://www.mersenneforum.org/showth...=22204&page=70, posts 765-767 for Preda's description of gpuowl v5.0 P-1 handling. (See posts 694-706 for his earlier B1-only development; https://www.mersenneforum.org/showth...=22204&page=64.) Gpuowl's P-1 bounds defaults, cost and algorithm are continuing to evolve over time, with substantial performance increases in V6.11 and significant cost reduction by overlap with PRP squarings introduced in V7.0. As of V6.11, p ~104M, B1 defaults to 1M, B2 defaults to 30 * B1. As of V7.0 I think, Gpuowl does not run stage 2 if the available gpu ram is not sufficient for 15 or more buffers. On a 16GB gpu, that's above 900M exponent in my experience.

Lookup on mersenne.ca for an exponent provides separate guides for GPU or CPU use bounds. For example, https://www.mersenne.ca/exponent/104089423 shows

PrimeNet B1=450000, B2=22000000
GPU72 B2=650000, B2=24000000
When I reserve a block of ~30 to run on a GPU, I'll typically specify the GPU72 bounds for the last (largest) of the block. That way the bounds are just sufficient for all,, retiring the P-1 task for all the exponents in the block, regardless of exponent in the block and whether the primality test will run on CPU or GPU. It's also about 40% faster than taking the Gpuowl default B1=1000000, B2=30000000, allowing me to do more of them in a day, with a near optimal probability weighted saving of compute time overall by finding factors. That helps reduce the number that get inadequately run by other GIMPS participants, on CPUs with default prime95 memory allocation, that does stage 1 only, no stage 2.



(Code authors are welcome to weigh in re any errors, omissions, nuances etc.)


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-05-21 at 21:15 Reason: consistent case on PrimeNet, Gpuowl; notes on Pminus1
kriesel is offline  
Old 2019-01-11, 22:17   #18
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19·311 Posts
Default What limits trial factoring

What limits how high an exponent or factoring depth we run?

1) Utility per unit run time. There's a tradeoff between the probability function of finding a factor by trial factoring (TF), versus by P-1 factoring, or the time it takes to complete a conclusive primality test or pseudoprime test.

Consider a few cases (ignoring P-1 for now for simplicity):
A) 40 mersenne numbers are trial factored from their current level to a bit depth that corresponds to a 2.5% probability of finding a factor.
This takes the same time as primality testing one exponent.
There's a two percent error rate in LL tests, so on average, each unfactored mersenne number requires 2.04 primality tests. So in that average lot of 40 exponents, there's 1.04 test times saved, and 39 first tests, 39 second tests, 78*.02 = 1.56 third tests, .03 fourth tests. Total effort is equivalent to 1+39 +39 +1.56 + .03= 80.59 tests.

B) Factor the same 40 as above, one bit deeper, which takes as long as all preceding factoring, or two primality tests of effort. The odds of finding a factor go up by 1.7% to 4.2%. Of the 40 considered before,
1.68 are factored, leaving 38.32 to primality test. 38.32 * 2 tests = 76.64, plus 38.32 * 2 * .02 = 1.53, plus 38.32 * 2 * .02 * .02 = .03; total effort = 2 + 76.64 + 1.53 +.03 = 80.2. This is better than A. (If the idea of finding factors of 1.68 mersenne numbers bothers you, consider doing 4000 exponents instead.)

C) Same as b, but the odds of finding a factor go up by 1.2% to 3.6%. Of the 40 considered before, 1.44 are factored, leaving 38.56 to primality test. 38.56 * 2 = 77.12, plus 38.56 * 2 * .02 = 1.54
plus 38.56 * 2 * .02 * .02 = .03; total effort 2 + 77.12 + 1.54 +.03 = 79.12+1.54+.03= 80.69. This is slower than A.

Now enter P-1. It's complicated. Given the exponent, prior trial factoring level, and estimate functions for probabilities of finding a factor versus various B1 and B2 bounds and corresponding run times, and the number of primality tests that could be saved by finding a factor, and estimate functions of run times for the exponent on the same hardware, the programs try a lot of B1 and B2 value combinations and estimate the probable net savings in run time, and go with what maximizes estimated saved time.

2) Preference or traits of the user. Some people are not willing to wait for the long run times of some exponent/bit level combinations. Faster run times per bit level are associated with very large exponents or very low bit levels or both. Some people enjoy finding factors quickly. Some people prioritize assisting finding new primes. Some value finding an actual factor higher than the knowledge a given Mersenne is composite but no factors for it are known.

3) Utility to finding new Mersenne primes. New Mersenne primes are likely to be found in the smaller part of the unsearched exponent range. Factoring effort on exponents ten times or even triple the approximate value of the next find don't help bring that find about. For equal computing time, many more exponents can be tested at low exponent value than at high; primality testing with the best algorithms scales as approximately p2.1.

4) Software feature limits.
A) Trial factoring is not currently limited by the features of available software. Mfaktc supports trial factors up to 95 bits; Mfakto up to 92 bits. (Those are each more than enough to cover the 86 bit or less optimal TF for exponents up to 109, mersenne.org's limit. 92 is good to exponents about 232; 95 to almost 233, per lookups like https://www.mersenne.ca/exponent/8183844937.)
B) Max supported exponent is 232-1 in Mfaktc and Mfakto. Modifying to support larger exponents would make it slower.
C) Factor5 is not limited in exponent or bit level, but is limited in practice by run time / performance. Some build options of Ernst Mayer's Mfactor program are not limited in exponent or bit level, and would be faster than Factor5, but are also limited in practice by run time / performance.
D) Availability of software making efficient use of the available computing hardware and available APIs and drivers. There is software for CUDA and OpenCL, but not for OpenGL, VULKAN, etc. For now, that seems to leave out some older GPUs or IGPs.

5) Supported parameters at the assignment and result coordination sites mersenne.org and mersenne.ca.
Mersenne.org supports work assignments and exponent status up to 109. Mersenne.ca supports exponent status up to 1010 (and even some factor data for >1010), and TF work assignments up to 232.

6) Memory requirements on a GPU is not typically an issue for trial factoring. Its memory footprint is measured in MB while VRAM capacity per GPU is in GB.

7) Run time versus reliability and probably hardware lifetime can be a limiting issue. Factoring an exponent to 84 bits on a Quadro 4000 takes months. Going to 94 bits would take about 1024 times as long, so would be years even on a GTX 1080Ti. (1.4 years estimated for https://www.mersenne.ca/exponent/6013456871 without adjustment for the required code modifications) The lower performance of integrated graphics processors impose significant limits; eg HD4600, HD620, UHD630 are all around 18-20 GhzD/day throughput rating, while GTx1080 and above are over 1000, so what the faster can do in a day, the IGPs take months.

Are there more?


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-04-26 at 17:25 Reason: minor grammar edit
kriesel is offline  
Old 2019-02-24, 01:52   #19
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19×311 Posts
Default Error rates

(This originated as https://www.mersenneforum.org/showpo...1&postcount=23 and https://www.mersenneforum.org/showpo...5&postcount=55)

What are typical error rates? The usual figure is about 2.% per LL test near the wavefront. That might be from before the addition of the Jacobi check to prime95. It has fluctuated over time as exponent increased, and additional code was written, additional bugs introduced, and later fixed, additional error checks added, etc. It will go up some as running larger exponents takes longer, requiring time roughly in proportion to p2.1, for the same hardware reliability hourly. The probability of an LL test being in error goes up considerably if the error counts accumulated during a prime95 run are nonzero. Even a single illegal sumout error recorded raises the probability of erroneous final residue to around 40%, if I recall Madpoo's recent post about that correctly. Hardware tends to get less reliable with age.

PRP with Gerbicz check is much more reliable in producing correct final residues, and in sufficiently recent versions of prime95/mprime or gpuowl, also produces a proof file that allows avoiding over 99.5% of double-check effort, so run PRP with proof whenever possible. PRP/GEC was bulletproof on a very unreliable system I tested it on. It is still possible to have errors in the final residue with PRP and Gerbicz check, but it is unlikely, and the best we can do for now. The PRP/GEC test overall error rate per test is so small we don't have sufficient statistics to gauge an error rate for it, with only a few known erroneous results after two years of frequent use. There was a case in prime95 where in code outside the GEC check an error could occur. That code has been modified to address that. The PRP/GEC overall error rate is thought to be orders of magnitude smaller than the LL/Jacobi-check error rate. It's so low we do not have a sufficient empirical / statistical basis on which to compute an error rate. In a check for PRP tests on exponents > 50M reported 2019-08-12 to 2021-08-12 (>123,000 verified PRP/GEC or PRP/GEC/proof results), 3 bad results were found, indicating an error rate ~24. per million PRP tests. At least one of those 3 errors (possibly all 3) was from before the addition of error hardening for the code handling final residues outside the reach of the GEC.

In any event, run self tests such as double checks regularly, at least annually, to check system reliability on these very unforgiving calculations.

Error rate does depend on the software and hardware used. Mlucas, CUDALucas, cllucas and some LL-capable versions of gpuowl do LL and do not have the Jacobi check. The Jacobi check has a 50% chance of detecting an error if one occurs. Hardware with unreliable memory is more error prone. Overclocking too high or overheating increases error rates.
In CUDALucas, there are CUDA levels and GPU models that interact badly, even on highly reliable hardware. These produce errors such as instead of the usual LL sequence, at some point all zeros gets returned. If that is before the subtraction of 2, then FFF...FFD is the result (the equivalent of -2). It gets squared and 2 subtracted, and voila, now you have 000...02. (-2)2-2=2. Then it will iterate at 2 until the end. These sorts of errors can be triggered at will. Some of them under certain circumstances have the side effect of making the iterations go much faster than expected. If something seems too good to be true, it probably is. (CUDA 4.0 or 4.1, 1024 threads, or certain fft lengths, typically is trouble in CUDALucas, if I recall correctly.) That is an example where the first and second test probability of a false positive match may be 100%. More typical would be of order 10-6 to 10-12. CUDALucas 2.06 May 5 2017 version has software traps for these error residues built in. There are other modes of error. The recent false positive by CUDALucas 2.05.1 was resulting in the interim residue having value zero. I'm guessing that's some failure to copy an array of values. Don't run CUDALucas versions earlier than 2.06, and don't let your friends either.

Other applications also have characteristic error residues. Someone who wanted to use such bugs as the CUDALucas early zero bug to fake finding a prime would be disappointed, as the error would be quickly discovered early in the verification process.
I've created application-specific reference threads for several of the popular GIMPS applications. Most of them have a post with a bug and wishlist tabulation attached, specific to that application.
https://www.mersenneforum.org/forumdisplay.php?f=154
It helps to know what to avoid and how to avoid it.
If you identify any issues that are not listed there yet, please PM me with details.
As such issues are identified, they might be fixable, or code to detect and guard against them could be added, if still of sufficient interest. (Fixing or trapping for CUDA 4.0 or 4.1 issues is not of much interest now, since many GPUs are running at CUDA 8 or above.)

It's common practice for the applications to keep more than one save file, and be able to restart from one or the other if something's detected to have gone seriously wrong in the past minutes of a lengthy run, thereby perhaps saving most of the time already expended. Some users will run side by side on two sets of hardware, duplicate runs that take months, periodically comparing interim 64-bit residues that should match along the way.
Re odds of matching wrong residues:
Given that the number of residues for a completed run of first and second LL checks of all primes below the current mersenne.org limit is about 2 n ~ 50 847 478* 2 ~101,694,956, while the number of possible unique 64-bit residues is r=18,446,744,073,709,551,616, with only the zero value indicating a correctly completed LL test of any of the 50-plus Mersenne prime exponents, the chance of one randomly distributed wrong residue coinciding with another randomly distributed wrong residue is very slim. If every prime exponent resulted in one randomly distributed wrong unique residue, the last wrong one, which has the most other residues in a list to dodge, and so the highest odds of coinciding, would have a chance 1/(r-2n)*2n of coinciding with another residue; ~5.5 10-12. If only 2% of residues are wrong and the wrong ones are randomly distributed, that chance drops by 49% to ~2.8 10-12 . The odds of any of the incorrect wrong residues coinciding with another residue by random chance is ~0.00014 if every exponent has one wrong randomly distributed residue, ~2.9 10-6 if 2% have a randomly distributed wrong residue. (Note though that the preceding figures do not account for the run times and so the error rates climbing with exponent. Or alternately, progress is assumed to occur roughly in sync with computing speed advances, so that run time and error rate do not grow.)
The problem is the bad residues from software or hardware issues are not randomly distributed. If they were, we would not be patching and trapping and searching databases for known application-specific bad residues as markers of what exponents to double or triple check. https://www.mersenneforum.org/showpo...&postcount=142 https://www.mersenneforum.org/showpo...&postcount=150
There is an LL primality test error rate of ~2%/exponent, and similarly on second checks. We iterate until there's a match.
We're always on the lookout for ways to reduce and to catch errors (without hurting performance too much). Some error detections if efficient enough will increase net accurate throughput.

We know from some GPU runs that some bugs/misconfigurations will preferentially stabilize on a specific wrong res64 result, not a random wrong one. One such value is a false positive, as Madpoo has long known and dealt with. So that's an existence proof of nonrandom result from error, that occurs despite nonzero offset. A patch to detect and halt such runs was added. (See item 4 in the CUDALucas bug and wish list attached at https://www.mersenneforum.org/showpo...24&postcount=3)
Quote:
# (in perl form) application-specific bad residues, indicative of some problem causing the calculation to go wrong
# for applications other than gpuowl their detection means the run should be halted and the problem fixed before continuing
# for gpuowl, the Gerbicz check will cause a lot of iterations recalculation requiring more time. Fixing the issue is recommended
%badresidues=(
'cllucas', '0x0000000000000002, 0xffffffff80000000',
'cudalucas', '0x0000000000000000, 0x0000000000000002, 0xffffffff80000000, 0xfffffffffffffffd',
'cudapm1', '0x0000000000000000, 0x0000000000000001, 0xfff7fffbfffdfffe, 0xfff7fffbfffdffff, 0xfff7fffbfffffffe, 0xfff7fffbffffffff, '.
'0xfff7fffffffdfffe, 0xfff7fffffffdffff, 0xfff7fffffffffffe, 0xfff7ffffffffffff, 0xfffffffbfffdfffe, 0xfffffffbfffdffff, '.
'0xfffffffbfffffffe, 0xfffffffbffffffff, 0xfffffffffffdfffe, 0xfffffffffffdffff, 0xfffffffffffffffe, 0xffffffffffffffff',
'gpuowl', '0x0000000000000000',
'mfaktc', '',
'mfakto', ''
); #fff* added to cudapm1 list 7/19/18
# note, since second to last LL iteration's full residue can be +-2^[(p+1)/2], for a Mersenne prime,
# and, above M127, that looks like in a res64, '0x0000000000000000, 0xffffffffffffffff', special handling may be required
# for iteration p-3 for cllucas and cudalucas; add checks for below to cllucas and cudalucas checking code as (ok) exceptions to bad residues
$llpm3okresidues='0x0000000000000000, 0xffffffffffffffff';
# see http://www.mersenneforum.org/showthread.php?t=5862
# see also http://www.hoegge.dk/mersenne/resultspenultimate.txt
# http://www.hoegge.dk/mersenne/penultimate.txt
You might find the strategic double check thread https://www.mersenneforum.org/showth...462#post508462 and trippple check thread https://www.mersenneforum.org/showth...=17108&page=82 interesting background also.

Historically, error rates were somewhat higher. https://www.mail-archive.com/mersenn.../msg07476.html

With the approximate empirical 2% error rate per primality test completed, and certain assumptions that seem plausible, the chance of one exponent (total, out of the 50 million plus prime exponents p<109, not individually per prime exponent) having two matched wrong residues is ~2.9ppm. This seems to me to be a lower bound for matched wrong residues slipping by error detection. It's difficult to estimate probabilities for the nonrandom sources of incorrect matching residues; undetected software bugs, malicious reports, etc. which are additional. So let's suppose for now that the combined chance of random and nonrandom error producing matching wrong residues is 10ppm. Assuming further that it is distributed uniformly and independently over the ~50847478 prime exponents below 109, containing a probable number of ~55 mersenne primes, the chance of matching wrong residues occurring, times the chance of it coinciding with a Mersenne prime is 10ppm x 55/50847478 or 1.08x10-12. If we assume the occurrence of matching wrong residues is somehow connected to the Mersenne number being prime, the probability estimate of missing a Mersenne prime rises to the assumed 10ppm value. If we assume the occurrence of matching wrong residues is somehow connected to the Mersenne number being composite, the probability estimate of missing a Mersenne prime through matched wrong residues falls to zero. We could make various sets of assumptions about the relative probabilities of various assumptions (independent, prime-connected, composite-connected) and compute new probabilities, as possible estimates of the real probability. At some point, such estimates seem to rest on too shaky a foundation of assumptions and guesses to pursue further. Perhaps someone with a better background in statistics could help here.
Working three cases here for illustration, carrying to much higher precision than justified;

independent 99.98% 98 34
prime-linked .01 1 33
composite-linked .01 1 33

0.9998x1.08x10-12 + 0.0001 x 10ppm + 0.0001 x 0ppm = 0.00100108 ppm
0.98x1.08x10-12 + 0.01 x 10ppm + 0.01 x 0ppm = 0.10000106 ppm
0.34x1.08x10-12 + 0.33 x 10ppm + 0.33 x 0ppm = 3.30000037 ppm
Intuition tells me to weight "independent" heavily, but it's unclear how many nines to give it.

Now note that the assumption made earlier about the chance of error being distributed uniformly among the prime exponents is a convenient simplification for estimating probabilities, but it is wrong. It would be hard to get the primality of M2 or M3 wrong. It gets easier and more likely to have an error as the exponent gets bigger. I suppose we could sum the relative run times of all prime exponents, and assign a computed probability of error proportional to individual run times, fit through the empirical experience.

The odds of three matching wrong residues due to independent error would be much smaller. As I recall, triple checking was done of all exponents below ~3M. Some have had many more matching residues reported. See for example https://www.mersenne.org/report_expo...=101000&full=1
and note that in that range, any matching PRP results were preceded by matching LL results. It's my understanding that so far, all GIMPS discoveries of Mersenne primes were by first LL test, not by double check or later.

At the outset I gave a rough figure of LL test error rate as 2%. From my own running experience, it's clearly possible to do much better. Over a period of producing 447 verified LL tests, I also produced 6 bad residues, for a rate of 1.32% overall, 1.2% on prime95, 1.47% on GPUs. Also for a small sample of verified PRP3 tests, zero errors (23 prime95, 1 gpuowl). More to the point, decommissioning CPUs and GPUs that produce bad residues has led to zero GPU-produced bad residues since late 2017, and only one CPU-produced bad residue since then. Addition of more software checks would also help prevent completion and submission of bad runs.

It's also clearly possible to do much worse than the 2.% figure. A small sample of 47 LL tests on 21 100Mdigit exponents yields at least 9 bad residues, possibly as many as 13, for an estimated error rate of 19.% per LL test in that region. Extrapolating based on run time yields for 300Mdigit exponents, an estimate of 88.% error rate per test.

George computed the likely number of Mersenne primes below p=109 according to the Wagstaff conjecture and posted the result, 57.09, at https://www.mersenneforum.org/showpo...&postcount=204. Note that's a bit higher than the 55 I used above in computing estimates of matching wrong residue probability, but not by enough to shift the probabilities much. Two more primes is about a 4% greater value for the primes' generally negligible contribution.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-08-13 at 15:31 Reason: updated for PRP/GEC test error rate
kriesel is offline  
Old 2019-03-30, 14:58   #20
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

590910 Posts
Default Costs

Cost will vary widely depending on the age/speed/efficiency of the computing hardware used, local electrical rates including any applicable taxes, whether the waste heat is a benefit for comfort heating or vented to the outdoors without additional cost or constitute a load raising air conditioning needs and cost, whether equipment is being purchased for the purpose and assumptions about depreciation rate or whether the equipment was already purchased for other reasons and only the possibility of increased wear and tear is considered, etc.

Some ballpark figures (all US$) mostly from my own fleet, for primality testing around 85M, per exponent tested, which include 4 year straight line depreciation, to zero salvage/resale value, of hardware purchased used or new, and the effect of US$0.11663/kw-hr, and neither heating benefit nor cooling penalty:

gpu, PRP in gpuowl or LL in CUDALucas, around $1.07 (Radeon VII), $2.29 (RX480) to $3.23 for modern new AMD or NVIDIA gpus, up to $4.75 to $9 for CUDA2.x old used gpus;
cpu, $3.37 for e5-2670 or e5-2690, up to $6.50 for i7-7500U based laptop, $6.70 for i7-8750H based laptop, $7.40 for X5650 tower, $9.30 for E5645 tower, $11.40 for E5520, $19.50 for Core 2 Duo, and 32-bit Intel processors are even higher. (Very old cpus can be both too slow for most assignments' expiration limits, and cost hundreds per primality test at 85M, or $3000 to $5500 for a pentium 133 which also takes about 45. YEARS!)
Price, timings and wattage for a used Samsung S7 phone running Mlucas 18 provided by ewmayer yielded around $8.60.

The electrical cost only, ranges from $0.81 (Radeon VII) or $1.71 (GTX1080) to $8 (Quadro 5000) for gpus tested; new laptops $0.72; e5-26x0 $2.20; i3-370M $3.36; X5650 $6.20; E5645 $7.55; E5520 $8.12; Core 2 Duo $12; S7 phone $2.93.

Costs only matter if the software will run the desired operands successfully. There was a time while Preda reported good results on a (16GB) Radeon VII, but users were unable to successfully run gpuowl P-1 on gpus on Windows. CUDAPm1 runs on a variety of NVIDIA hardware ranging from 1 to 11 GB, but is unable to do both stages on any gpu I have tried above p~432,500,000. Prime95 on the FMA3-capable i7-8750H seems to be the best bet for high p P-1; I have 901M running now.

For my natural gas heating, furnace specs, central AC specs and utility rates, the heating benefits reduce the net electrical cost by 20.6%, while the cooling costs will increase it 36% and the non-heating-season sales tax will increase it another 5.5%. (Sales tax is not applied to heating fuel or electricity during the heating season here.) These effects combine to make the marginal electrical cost 78% higher in the cooling season than in the heating season.
Therefore, some systems that are economic to run during the heating system are not when there's no heating benefit, and additional become uneconomic during the cooling season.

Using cloud computing is an interesting alternative. It's hard to beat free, as in free trials for hundreds of hours. Otherwise, costs vary, but around $7/85M is feasible, at spot rates; lower than the electrical cost for some of my existing hardware. Some rough data and links related to cloud computing for GIMPS follow.

How-to guide for running LL tests on the Amazon EC2 cloud
https://www.mersenneforum.org/showpo...21&postcount=1
Amazon 36 cores on EC2 with 144 GB RAM and 2x900 GB SSD is $0.6841 per hour.
2017 cost per primality test at 80M $6.21 (extrapolates to about $7.05/85M)
https://www.mersenneforum.org/showpo...6&postcount=23
2019 current EC2 costs ~$.019/hr $6.4 to 9.7 for 89M primality test (so ~$5.8 and up for 84M)
https://www.mersenneforum.org/showpo...37&postcount=2
Google Colaboratory "Colab" (free) https://www.mersenneforum.org/showthread.php?t=24839

M344587487 contemplating providing a PRP testing service at around $5/85M
https://www.mersenneforum.org/showth...138#post512138

https://www.phoronix.com/scan.php?pa...acket-Roll-Out
32 ARM cores @ 3.3Ghz + 128GB of RAM and 480GB of SSD storage $1/hour
This worked out per https://www.mersenneforum.org/showpo...9&postcount=23 to 30.73ms/iter at 84M, an astonishingly costly $717./84M exponent.
Ernst Mayer estimates several instances rather than a single instance would produce better performance and cost/throughput.
Debian 9, Ubuntu 16.04 LTS, and Ubuntu 18.04 LTS are the current operating system options for this Ampere instance type.
Numerous instance types here. Note discounts at reserved and spot. https://www.packet.com/cloud/servers/

google compute
https://www.mersenneforum.org/showpo...96&postcount=4
free trial
https://cloud.google.com/free/docs/gcp-free-tier

Microsoft Azure
https://www.mersenneforum.org/showthread.php?t=21440

https://www.atlantic.net/cloud-hosting/pricing/
https://www.hetzner.com/cloud
https://www.scaleway.com/pricing/
https://www.ovh.com/world/vps/
https://us.ovhcloud.com/products/ser...ucture-servers

Contrasting to personal gpu cost, ~$2 and up/85M https://www.mersenneforum.org/showpo...44&postcount=3
Judicious clock and voltage tweaking may improve those numbers. For an example of electrical power variation with clock, see https://www.mersenneforum.org/showpo...1&postcount=52


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-02-20 at 20:52 Reason: misc updates in costs only matter paragraph, colab thread link added
kriesel is offline  
Old 2019-06-22, 16:20   #21
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19·311 Posts
Default reserving a specific exponent

See https://www.mersenneforum.org/showpo...68&postcount=1 for reserving on prime95.

The following assumes gpu application(s) gpuowl, CUDALucas etc. are already running and are not near completion of the current work item(s). Add gpu application stops and starts as appropriate if that assumption is not valid. It also assumes that the exponents are being selected from the strategic double and triple checks thread https://www.mersenneforum.org/showthread.php?t=24148.
  • Post which DC or TC candidates you're taking.
  • Grab the lines of the exponents that you will do.
continue, with either Method 1:
  • Go to https://www.mersenne.org/manual_assignment/
  • Set preferred work type to Double check ll or double check PRP tests as applicable
  • set both Optional exponent range fields to the same one of the exponents grabbed
  • click Get Assignments
  • If it succeeds, copy the assignment with AID into the appropriate GPU worktodo.txt file, and save. Note that to match PRP residue type using gpuowl, you must properly select a gpuowl version. See https://www.mersenneforum.org/showpo...3&postcount=15
  • If it fails, likely it is with error 40, no assignment available. Use a prime95 session as uncwilly described, at https://www.mersenneforum.org/showpo...68&postcount=1, to get the assignment, then copy that to the GPU worktodo.txt, again, overwriting the no-AID version entry.
  • Repeat previous steps of method 1 for each DC or TC. Depending on how many you're doing, it can get tedious.
or continue with Method 2:
  • reserve as described by uncwilly in a prime95 session in https://www.mersenneforum.org/showpo...68&postcount=1
  • Copy the new assignments with AIDs from the prime95 worktodo file to the gpu(s) worktodo file(s), and save, then close the gpu worktodo file(s).
  • Move the new assignments in the prime95 worktodo file, to the end of the prime95 worktodo.txt file behind some ongoing lengthy work intended for the cpu, save the prime95 worktodo file, stop and restart prime95, and check with Test, Status, that the assignments order in effect on the cpu is as intended.
For either method, wrap-up:
  • If you're unable to get a reservation for any exponent you posted you were taking, by either method, edit your initial post to exclude it. If the edit period already expired, post which you were unable to reserve in a new message.
Optionally: if any gpu assignment is about to expire, because manual extension sometimes does not work, try
  • Open the prime95 worktodo file, temporarily put that assignment's worktodo entry at the front of the prime95 work list, save the prime95 worktodo file, stop and restart prime95 computation to reach at least 0.1% of completion on the expiring exponent, before the deadline for starting the exponent arrives. (See assignment rules https://www.mersenne.org/thresholds/)
  • Go to the prime95 menu Advanced-> Manual Communication
  • Make sure "Contact PrimeNet server now" and "Send new expected completion dates to server" are checked. Click OK, and wait for prime95 to finish talking with the PrimeNet server
  • Stop prime95, move the extended assignment to the end of the prime95 worktodo file, save and close prime95 worktodo file, restart prime95, check with Test, Status.
  • After the gpu assignment completes, remove the corresponding assignment entry from the prime95 worktodo file, save and close the prime95 worktodo file, stop computation and resume in prime95, delete the prime95 save file and backup files for the relatively few iterations performed on the cpu for the completed gpu assignment.
By putting it at the end of the cpu's work list, behind a lengthy other task on the prime95 instance, it should report status periodically from there, and yet not use up CPU cycles partially duplicating GPU work.

A side effect of using this method for extension is it changes the assignment from manual to the prime95 instance used to extend it. At least that is what the assignments page shows when I use it.

Because this assignment or extension method might confuse PrimeNet about what's happening on the prime95 instance, and any expirations may count against that instance/system in getting new assignments, consider using an old slow CPU for making these reservations or extensions, where it will make little difference regarding assignment of future work via PrimeNet.


Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2020-07-03 at 17:13
kriesel is offline  
Old 2019-07-22, 16:54   #22
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

19·311 Posts
Default Worktodo entry formats

This is a quick reference for worktodo entry formats for multiple GIMPS programs.
These are a best-effort summary and not guaranteed to be correct or complete. For confirmation, see the performance or help output of the respective programs, their documentation files, and communications by the authors.

Syntax used here:

\ [ or \ ] means a literal [ or ] is present. (It needs that space between, or it itself does unwanted formatting of this post)
[a|x] either a or x (but not both)
[a] optional a, nul otherwise
[a|b|<nul>] either a or b or neither, but not both
[a[,b[,c]]] a is optional, if neither ,b nor ,c are present; ,b is optional, if ,c is not present; ,c is optional

(parenthetical comment on an example, that should not be included in an actual worktodo entry)
whatever is outside of <> and [] and () is literal
<variable> means substitute the string representation of the value of variable in place of <variable>

Variables used here:
AID = 32-character hexadecimal assignment ID
B1 = first stage bound
B2 = second stage bound
comment = free form text not processed by the application
exponent = power in m=2exponent - 1 and typically a prime
k, b, n, c, for number m = k bn + c and typically a prime. For a Mersenne number, k=1, b=2, c=-1.
from = trial factoring starting bit level; lowest candidate factor ~ 2from
to = trial factoring ending bit level; highest candidate factor ~ 2to
prp_base = the number that gets raised to a high power in the PRP test, typically 3
how_far_factored = bits level previously factored to in TF

tests_saved = (usually integer) number of future primality tests saved if a factor is found, usually issued by the server as 2 for a first test candidate, 1 for a double-check candidate, 0 if a sufficient bounds P-1 has already been completed. (Note that "sufficient" may be smaller bounds than would be optimal. I do not know specifically what the PrimeNet server's threshold for "sufficient" is.) Optionally, value can be manually reduced from 2 to 1 if it is known with certainty the primality test will be a PRP with GEC and proof generation, which has a high likelihood of only requiring one primality test. Optionally in mprime /prime95 or CUDAPm1, value can be manually increased up to 10 for aggressive P-1 factoring (larger value inputs are reduced to 10). Mprime/prime95 support rational number (floating point) tests_saved values in decimal, compatible with C %f format, such as 1.4 or 1.16 or 0.96. Mprime/prime95 computes bounds for the entered value, subject to a maximum of 10, and various other considerations including allowable memory usage, and rounds tests_saved in its output to two significant digits, without affecting the bounds selected and used. Values may be restricted to be at least 0 or 1 depending on application and context. (CUDAPm1 requires integer > 0. Gpuowl processes the input value as unsigned integer. Mlucas processes the input as unsigned long, and exits if the value > 2, with a message that the value should be 0, 1, or 2.)

p-1_done = 1 if done to adequate bounds, 0 if not done already to adequate bounds
squarings
= number of squaring iterations for performing a PRP verification/certificate generation
nul = null string, no characters. Not to be confused with \0 the nul character.
nth_run = [1|2|3] corresponding to start values 2/7, 6/5, or random1/random2 for P+1
nworker = natural number for the worker header / section; 1 2 3 etc as applicable

residue_type: From the prime95 undoc.txt:
"PRP supports 5 types of residues for compatibility with other PRP programs. If a is the PRP base and N is the number being tested, then the residue types are:
1 = 64-bit residue of a^(N-1), a traditional Fermat PRP test used by most other programs
2 = 64-bit residue of a^((N-1)/2)
3 = 64-bit residue of a^(N+1), only available if b=2
4 = 64-bit residue of a^((N+1)/2), only available if b=2
5 = 64-bit residue of a^(N*known_factors-1), same as type 1 if there are no known factors"
Additionally, there is a residue type 0 in some versions of gpuowl that is simultaneous PRP and P-1, and most versions of gpuowl perform only a single residue_type of PRP without simultaneous P-1.

Typically,
Factor=[<AID>,]<exponent>,<from>,<to>
PFactor=[AID>,]<k>,<b>,<n>,<c>,<how_far_factored>,<tests_saved>
Test=[<AID>,]<exponent>,<how_far_factored>,<p-1_done>
Doublecheck=[<AID>,]<exponent>,<how_far_factored>,<p-1_done>
PRP varies

Formats and examples follow for several commonly or historically used GIMPS programs.
Note some are known to be case-sensitive. PFactor != Pfactor for example.


Mfaktc or Mfakto
Factor=[<AID>,]<exponent>,<from>,<to>
[#|//|\\]whatever comment you want, including valid worktodo entry lines
Factor=7F797F1F8BC4B5C234CB29D8BAD8B680,93040447,74,75
// some comment
CUDAPm1
PFactor=[<AID>,]1,2,<exponent>,-1,<how_far_factored>,<tests_saved>[[#|//|\\]<comment>]
[#|//|\\]whatever comment you want, including valid worktodo entry lines
PFactor=1,2,415000043,-1,82,2
# some comment
Note, to specify the bounds, use the command-line -b1 and -b2 options. Also, CUDAPm1 internally pads tests_saved assuming a primality test error rate of 1.8% per test times two tests. Source code indicates support for inline comments on worktodo lines also.
CUDALucas or cllucas
Test=[<AID>,]<exponent>,<how_far_factored>,<p-1_done>
Doublecheck=[<AID>,]<exponent>,<how_far_factored>,<p-1_done>
[#|//|\\]whatever comment you want, including valid worktodo entry lines
Test=402143717,81,1
DoubleCheck=4BACCC7E79F9878B2D2F606C6DF40123,50879989,74,1
\\ some comment
Gpuowl (may vary in syntax and availability versus version)
(A note of caution, PRP residue type is dependent on gpuowl version. To qualify as a PRP DC, residue types must match.)
TF (V3.7 - v3.9?)
Factor=[<AID>,]<exponent>,<from>,<to>
Factor=332298607,76,77

P-1
[B1=<B1>[,B2=<B2>];]PFactor=[<AID>|0],1,2,<exponent>,-1,<how_far_factored>,<tests_saved> (v6.x)
B1=790000,B2=16590000;PFactor=A125254BD75564243D4B73D4EC601234,1,2,91538501,-1,77,2 (v6.x)
PFactor=B1:<B1>,<exponent> (v4.x)
PFactor=B1:20000,83780327 (v4.x)

LL (version <0.7)
[Test|DoubleCheck]=[<AID>|0],<exponent>,<how_far_factored>,<p-1_done>
[Test|DoubleCheck]=<exponent>
DoubleCheck=0,70100200,0,0
Test=70100200
LL (version >~v6.11-252)
DoubleCheck=[AID|0],<exponent>,<how_far_factored>,<p-1_done>
DoubleCheck=1AAFFAAD0000000FFFF,51456287,74,1

PRP (version >0.6; varies in residue type by gpuowl version, type 1 preferred)
PRP=[<AID>|0],<k>,<b>,<n>,<c>,<how_far_factored>,<p-1_done>
PRP=0,1,2,1500000041,-1,87,1

PRP with merged P-1 (V7.x, standard PRP residue type 1, seed 3)
[B1=<B1>[,B2=<B2>];]PRP=[<AID>|0],1,2,<exponent>,-1,<how_far_factored>,<tests_saved>

PRP-1 (PRP residue type 0)
[B1=<B1>,][B2=<B2>,]PRP=[<AID>|0],1,2,<exponent>,-1,<how_far_factored>,<tests_saved> (~v4.7 - v5.0; defaults to B2=exponent)
B1=2000000,B2=40000000;PRP=0,1,2,82252003,-1,76,0

PRP-CF is not supported in any version to date

Comment
; preface a worktodo line with a semicolon (or perhaps anything unexpected) will cause it to be ignored, functioning as a comment, except for the error message it will generate.
Also, for a PRP line, additional content following <p-1 done> and a separating , appears to be ignored, so could serve as an inline comment about the work item in certain gpuowl versions
prime95 / mprime
Supports Fermat numbers and Mersenne numbers. Only Mersenne numbers are covered here. Supports ECM also, but since that is not useful for finding new Mersenne primes, it is not currently covered here.
<AID>: an already-issued assignment identifier
N/A: no AID issued yet, and don't get one at the next PrimeNet checkins
<nul>: no AID issued yet, but get one at the next PrimeNet checkin

Worker header (precedes worktodo entries for the given worker)
\ [Worker #<nworker>\ ]
[Worker #1]
TF
Factor=[<AID>,|N/A,|<nul>]<exponent>,<from>,<to>
Factor=N/A,1257787,1,64

P-1
Pminus1=[<AID>,|N/A,|<nul>]<k>,<b>,<n>,<c>,<B1>,<B2>[,"comma-separated-list-of-known-factors"]
Pfactor=[<AID>,|N/A,|<nul>]<k>,<b>,<n>,<c>,<how_far_factored>,<tests_saved>
Pminus1=1,2,660000031,-1,3600000,180000000
Pfactor=N/A,1,2,500000693,-1,82,2
Pfactor=N/A,1,2,500000693,-1,82,1.4

P+1 (new with v30.6)
(Note, not productive or recommended for GIMPS wavefront work!) Pplus1=[<AID>,|N/A,|<nul>]<k>,<b>,<n>,<c>,<B1>,<B2>,<nth_run>[,<how_far_factored>][,"comma-separated-list-of-known-factors"]
Pplus1=N/A,1,2,103598543,-1,1000000,30000000,1
Pplus1=N/A,1,2,103598543,-1,1000000,30000000,2
Pplus1=1,2,103598543,-1,1000000,30000000,3

LL
Test=[<AID>,|N/A,|<nul>]<exponent>,<how_far_factored>,<p-1_done>
Doublecheck=[<AID>,|N/A,|<nul>]<exponent>,<how_far_factored>,<p-1_done>
Test=N/A,82589933,82,1

PRP (and PRP DC for manual assignments, or most versions)
PRP=[<AID>,|N/A,|<nul>]<k>,<b>,<n>,<c>[,<how_far_factored>,<tests_saved>[,<prp_base>,<residue_type>[,"comma-separated-list-of-known-factors"]]]
PRP=N/A,1,2,82589933,-1 (mersenne prime record)
PRP=N/A,1,2,268435459,1,80,0,3,5,"3" (Wagstaff number)
PRP=1,2,82589933,-1,82,0 (to have PrimeNet issue an AID for it at the next checkin)
NOTE: as of v30.x, it's recommended to include <how_far_factored,<tests_saved>, to prevent repeating unnecessary TF from 0 bits, and prevent repeating unnecessary P-1 factoring

PRPDC (via PrimeNet API with new enough prime95/mprime version; per Woltman post; not independently confirmed yet)
PRPDC= instead of PRP=, otherwise same as for PRP as shown above

PRP-CF / PRP-CF-DC
see above, PRP, with comma separated list of known factors

Comment
;whatever comment you want, including valid worktodo entry line copies

PRP Certificate/verification (beginning at v30.1b1)
Cert=AID,k,b,n,c,squarings
Cert=(redacted),1,2,97457587,-1,380694
Mlucas
Supports Fermat numbers and Mersenne numbers. Only Mersenne numbers are covered here.
LL
[Test|DoubleCheck]=[<AID>,]<exponent>,<how_far_factored>,<p-1_done>
<exponent> (does an LL test of the corresponding Mersenne number)
DoubleCheck=B83D23BF447184F586470457AD1E03AF,22831811,66,1
Test=DDD21F2A0B252E499A9F9020E02FE232,48295213,69,0
Test=332220523,80,1
332220523

PRP
V19 adds PRP and PRP DC forms, for Mersenne numbers only, residue-type 1 or 5:
PRP=<AID>,<k>,<b>,<n>,<c>,<to>,<tests_saved>
PRP=<AID>,<k>,<b>,<n>,<c>,<to>,<tests_saved>,<base-of-first-test>,<residue-type>
PRP=<AID>,1,2,<n>,-1,<to>,<tests_saved>
PRP=<AID>,1,2,<n>,-1,<to>,<tests_saved>,<base-of-first-test>,[1|5]
PRP=0123456789ABCDEF0123456789ABCDEF,1,2,332220523,-1,0
PRP=0123456789ABCDEF0123456789ABCDEF,1,2,332220523,-1,0,3,1

P-1 (new with V20.0; following is from my reading of its Mlucas.c and error messages from experiments; note, full length AID is required, even for manually generated unassigned lines)
P[M|m]inus1=<AID>,<k>,<b>,<n>,<c>,<B1>,<B2>[,"comma-separated-list-of-known-factors"]
P[f|F]actor=<AID>,<k>,<b>,<n>,<c>,<how_far_factored>,<tests_saved>
PFactor=00000000000000000000000000000000,1,2,468000023,-1,82,2

ECM (this appears to me in the V20 source code, not functional for Mersennes, after a quick look)
Mlucas primenet.py (work queuing)
[Test|DoubleCheck]=<AID>,<exponent>,<how_far_factored>,<p-1_done>
PRP=<AID>,<k>,<b>,<n>,<c>,<to>,<tests_saved>
PRP=<AID>,<k>,<b>,<n>,<c>,<to>,<tests_saved>,<base-of-first-test>,<residue-type>

Mfactor
Supports TF on Mersenne numbers, double mersennes, and Fermat numbers.
No worktodo support. Command line options only.

Factor5
Supports TF on Mersenne numbers.
No worktodo support. Command line options only.

Top of this reference thread: https://www.mersenneforum.org/showpo...89&postcount=1
Top of reference tree: https://www.mersenneforum.org/showpo...22&postcount=1

Last fiddled with by kriesel on 2021-11-19 at 17:40 Reason: added mfactor and factor5 (no worktodo support)
kriesel is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
"The Librarians" on TNT ... Mersenne Prime reference Madpoo Lounge 6 2017-01-31 20:03
GPU Computing Cheat Sheet (a.k.a. GPU Computing Guide) Brain GPU Computing 20 2015-10-25 18:39
How do you obtain material of which your disapproval governs? jasong jasong 97 2015-09-14 00:17
NFS reference Jushi Math 2 2006-08-28 12:07
The difference between P2P and distributed computing and grid computing GP2 Lounge 2 2003-12-03 14:13

All times are UTC. The time now is 08:51.


Sun Nov 28 08:51:39 UTC 2021 up 128 days, 3:20, 0 users, load averages: 1.24, 1.12, 1.01

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.