mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 v30.4/30.5/30.6 (https://www.mersenneforum.org/showthread.php?t=26376)

James Heinrich 2021-08-27 09:53

[QUOTE=diep;586606]other than source doesn't download[/QUOTE]The source link should be [url]https://www.mersenne.org/ftp_root/gimps/p95v306b4.source.zip[/url]

moytrage 2021-08-28 14:32

Link to all Prime95 soft versions
 
[QUOTE=diep;586599]is p95 30.3 v6 the latest builld?[/QUOTE]

You can see a full list of Prime95 software here [url]https://www.mersenne.org/ftp_root/gimps/[/url], this link gives you a browseable/downloadable directory with archive of all versions, from oldest to newest. Scroll down to very bottom to see newest versions.

diep 2021-08-28 21:05

thanks for the reactions!

Question: how do i have P95 log to disk each k * 2^n - 1 riesel residue in a textfile after it has been found to be composite?

So something like this to textfile - and nothing else (except when it's prime of course! but i see the box even could be made beeping in that case):

32767*2^4011041-1 is not prime. LLR Res64: 04F8A5D38CED2C05 Time : 17772.735 sec.
32767*2^4012161-1 is not prime. LLR Res64: 41073B7FFFBA3580 Time : 16653.764 sec.
32767*2^4013153-1 is not prime. LLR Res64: C1396DC9FC070AA2 Time : 15868.032 sec.
32767*2^4014993-1 is not prime. LLR Res64: 51C0FD8DBB946348 Time : 16358.199 sec.
32767*2^4015793-1 is not prime. LLR Res64: 3582E75CB445F99D Time : 16236.208 sec.
32767*2^4017473-1 is not prime. LLR Res64: 3E7C80E3227E14DF Time : 17978.968 sec.
32767*2^4018593-1 is not prime. LLR Res64: E1DEB5A16C48F9B9 Time : 16855.197 sec.
32767*2^4019985-1 is not prime. LLR Res64: 465780C4776B339C Time : 17471.892 sec.
32767*2^4021121-1 is not prime. LLR Res64: E869291EA7523B47 Time : 16854.644 sec.
32767*2^4022049-1 is not prime. LLR Res64: DB919CF0776935F4 Time : 16119.627 sec.
32767*2^4023809-1 is not prime. LLR Res64: B4569E92043DE985 Time : 17582.490 sec.
32767*2^4024929-1 is not prime. LLR Res64: DDE16460198F436E Time : 16172.751 sec.
32767*2^4025649-1 is not prime. LLR Res64: 6DA8FE747A427D9B Time : 15618.931 sec.
32767*2^4026945-1 is not prime. LLR Res64: 50DBCA4AA364B7AD Time : 14572.847 sec.
32767*2^4028273-1 is not prime. LLR Res64: 437022E8C82A4A0C Time : 14656.529 sec.
32767*2^4029569-1 is not prime. LLR Res64: B338BAAB048344EA Time : 18490.592 sec.
32767*2^4030193-1 is not prime. LLR Res64: 928E6CEEE20DC864 Time : 18691.475 sec.
32767*2^4031409-1 is not prime. LLR Res64: 87F2E55E3E4785D3 Time : 16519.308 sec.

diep 2021-08-28 21:11

Please note that P95 gui messes up my mouse cursor. Is there a way to fix that? (this is in win2019pro)

diep 2021-08-28 22:30

P95 is generating a proof afterwards that wastes tens of thousands if not more squarings. How to turn this off?
And also it writes a .residues seemingly for this. All this while we are busy trying to speedy do a prp test. How to have it not write that?

i had added already some things to the diff files:

to prime.txt (of version 30.6) :
ProofUploads=0
PreallocateDisk=0
PrintTimedIterations=0
OutputIterations=2000000

And to local.txt:
CertWork=0

Though i have no idea what it is doing - it is not in the undoc.txt what this variable means.

What more can i modify?

Thanks in Advance,
Vincent

VBCurtis 2021-08-29 00:14

Why are you trying to make P95 do what LLR is designed for?

paulunderwood 2021-08-29 01:56

[QUOTE=VBCurtis;586772]Why are you trying to make P95 do what LLR is designed for?[/QUOTE]

I recommended off-forum that Vincent uses Prime95 mainly because there is no 64-bit Windows version of LLR that has GEC (Gerbicz Error Correction). Also Prime95 has the nice worktodo.add service, making administration of candidates easier on his multi-core system.

They're about equal for speed of tests.

At the moment the main problem he has is turning off proof generation, which is unecessary and uses lots of disk space.

Prime95 2021-08-29 03:18

Try WorkerDiskSpace=0

diep 2021-08-29 10:22

[QUOTE=VBCurtis;586772]Why are you trying to make P95 do what LLR is designed for?[/QUOTE]

cllr.exe is 40% slower than all this.

cllr64.exe version 3.8.23 with gwnum 29.8 is same speed like P95 but Paul is spooking me it has no Gerbicz.

p.s. 40% slower systematically over all 12 processes cllr.exe so 'average'. Really every proces.

and in windows scheduling by hand i didn't figure out how to do this clever - right now i do it by tons of mouseclicks for each proces. I give each proces its own socket which gives optimal speed normally at such intel 2 socket box.

kriesel 2021-08-29 11:42

Per the whatsnew.txt that ships with mprime / prime95, v29.4-v29.8 have GEC but not proof file generation.

Zhangrc 2021-08-29 11:50

[QUOTE=kriesel;586799]Per the whatsnew.txt that ships with mprime / prime95, v29.4-v29.8 have GEC but not proof file generation.[/QUOTE]

They are about 5% slower than the current version. I see no point using them.

diep 2021-08-29 12:13

[QUOTE=Prime95;586779]Try WorkerDiskSpace=0[/QUOTE]

Obviously trying that now and then measure speed diff with old 29.8

diep 2021-08-29 12:14

[QUOTE=Zhangrc;586800]They are about 5% slower than the current version. I see no point using them.[/QUOTE]

What hardware are you benchmarking this on?

Zhangrc 2021-08-29 16:08

[QUOTE=diep;586804]What hardware are you benchmarking this on?[/QUOTE]

On my laptop AMD R7 4800H @4GHz, 8G*2 DDR4, and 512G NVME SSD.

diep 2021-08-29 17:44

[QUOTE=Zhangrc;586814]On my laptop AMD R7 4800H @4GHz, 8G*2 DDR4, and 512G NVME SSD.[/QUOTE]

Ah yes AMD - i believe you directly. "only 5% speedwin" would be little then.

cllr64.exe version 3.8.23 gwnum 29.8 looks identical in timing to latest P95 gwnum 30.6 at my
Intel Xeon e5-2699 v4 ES, 22 cores a cpu, so 44 cores in total.
Under full load runs 2.0Ghz and eats 360 watt measured at the wall (additional gpu power used not measured if you go move with mouse graphics a lot as i let it just calculate).

With watercooling of course keeps cpu cores at 50C here.

Modern processors eat easily 10-20% more power when running far above room temperature (or far under room temperature).

Cruelty 2021-08-29 23:20

Just migrated from 29.8 to 30.6 on Win64 for some of my PRP tests.
Is there a definitive way to disable proof generation?
This solution:
[code]WorkerDiskSpace=0[/code] doesn't disable it and every couple of minutes (ca. 2) I get a new "residue file" > 512MB.

Prime95 2021-08-30 04:35

[QUOTE=Cruelty;586832]Just migrated from 29.8 to 30.6 on Win64 for some of my PRP tests.
Is there a definitive way to disable proof generation?.[/QUOTE]

Try ProofPower=0

Cruelty 2021-08-30 10:20

[QUOTE=Prime95;586843]Try ProofPower=0[/QUOTE]
I tried putting it both in local.txt and prime.txt, but it doesn't work. Residue file is still being created+updated frequently.

diep 2021-08-30 13:51

Avoids massive amounts of files here!
Makes things more workable here! (the WorkerDiskSpace=0)

Cruelty 2021-08-30 14:55

This is strange, as I left the PC running for several hours and now "residue" files are gone, so I am not really sure what fixed that :unsure:

kriesel 2021-08-30 18:01

[QUOTE=Cruelty;586861]This is strange, as I left the PC running for several hours and now "residue" files are gone, so I am not really sure what fixed that :unsure:[/QUOTE]Stopping and restarting will apply new settings. Or in some cases the next worktodo assignment will benefit from them without a stop/restart of the program. There are also some settings that are applied immediately, such as changing allowed ram for P-1/P+1/ECM, generating a worker stop/continue-with-new-limits if applicable.

Cruelty 2021-08-30 20:41

I have edited config files while the application was not running, and started it after I saved the configs, so it must have been the next worktodo assignment that triggered it.
Thanks for the clarification :smile:

chalsall 2021-08-30 21:00

[QUOTE=Cruelty;586887]I have edited config files while the application was not running, and started it after I saved the configs, so it must have been the next worktodo assignment that triggered it.[/QUOTE]

Not running as in paused? Or fully exited? Very different states.

[URL="https://www.youtube.com/watch?v=rksCTVFtjM4"]Have you tried turning it off and on again?[/URL] is asked by all Help Desks for a reason (no matter the complaint).

tha 2021-08-31 13:45

I use: Linux64,Prime95,v30.6,build 3

My settings:
[CODE]Temporary disk space limit in GB/worker (6.000000):
Daytime P-1/P+1/ECM stage 2 memory in GB (10.800000):
Nighttime P-1/P+1/ECM stage 2 memory in GB (10.800000):
Upload bandwidth limit in Mbps (5.000000):
Upload large files time period start (00:00):
Upload large files time period end (24:00):
Download limit for certification work in MB/day (400):
Skip advanced resource settings (Y):
[/CODE]

Typical workload:
[CODE]
Pminus1=N/A,1,2,9380477,-1,2000000,21600000,71
Pminus1=N/A,1,2,9380521,-1,2000000,21600000,71
Pminus1=N/A,1,2,9380633,-1,2000000,21600000,71
Pminus1=N/A,1,2,9381019,-1,2000000,21600000,71[/CODE]

Whenever Mprime is the only application running, I never see things go wrong. When I run Firefox and surf the internet, usually youtube, multiple times a week the SSD will start to do an insane amount of traffic handling, the screen will freeze except for the mouse pointer which will be severely hampered and eventually mprime will be killed by the OS (Ubuntu 20.04)

Has done that for multiple weeks.

Anything I can do to get better clues of what happens?

axn 2021-08-31 14:01

You're probably running low on RAM. What's your swapfile settings?

kriesel 2021-08-31 14:47

[QUOTE=tha;586922]I use: Linux64,Prime95,v30.6,build 3
... When I run Firefox and surf the internet, usually youtube, multiple times a week the SSD will start to do an insane amount of traffic handling[/QUOTE]
What are your system specs, especially ram installed? What does a good system monitor say about free ram, & committed ram, when running just mprime and when also running a web browser? (Top or similar in Linux, Task Manager in Windows, either/both if WSL is in play.)

It may help to pare back the mprime ram allowance from 10.8 GiB to 9 or 8 (making sure day and night settings match). Or reduce number of open tabs in the web browser. Firefox is a notorious memory hog. A quick web search of "firefox memory usage per tab" yields many links, including [url]https://support.mozilla.org/en-US/kb/firefox-uses-too-much-memory-or-cpu-resources?redirectslug=firefox-uses-too-much-memory-ram&redirectlocale=en-US[/url] (Who knew Firefox has its own task manager?)
See also mprime's PauseWhileRunning as a possibility.

[QUOTE=chalsall;586889][URL="https://www.youtube.com/watch?v=rksCTVFtjM4"]Have you tried turning it off and on again?[/URL][/QUOTE]Nice t-shirt.

chalsall 2021-08-31 16:03

[QUOTE=kriesel;586926]Top or similar in Linux...[/QUOTE]

I like to use [C]vmstat -w 60[/C] for long-term trending. Also, installing and running [C]nmon[/C] is worth the effort.

pepi37 2021-08-31 23:52

[QUOTE=Prime95;586843]Try ProofPower=0[/QUOTE]
Since you write this application, please add one switch for users that doesnot need generation of proof task.
I use it for PRP on CRUS based sequences as on near-repdigit sequences, on Win and Linux, and I love piece of software, so I would like to use latest version, but without generation of proof task.

Thanks
Hope this is not big problem for you

PhilF 2021-09-01 00:17

[QUOTE=chalsall;586932]Also, installing and running [C]nmon[/C] is worth the effort.[/QUOTE]

Thanks for that!

But you were right -- it was a LOT of effort:

aptitude update
aptitude install nmon

:smile:

Prime95 2021-09-01 02:04

[QUOTE=pepi37;586951]Since you write this application, please add one switch for users that doesnot need generation of proof task.[/QUOTE]

What's wrong with ProofPower=0?

pepi37 2021-09-01 11:11

[QUOTE=Prime95;586958]What's wrong with ProofPower=0?[/QUOTE]


Thank you! Works perfect!

SethTro 2021-09-02 19:55

I missed a comma in a PMinus1 assignment which resulted in only stage one happening (which it completed).
When I added the exponent back with

PMinus1=1,2,21317,-1,30000000000,0,67,2

[CODE]
[Worker #5 Sep 1 23:29] P-1 on M21317 with B1=30000000000, B2=TBD
[Worker #5 Sep 1 23:29] Using AVX FFT length 1K
[Worker #5 Sep 1 23:29] M21317 stage 1 complete. 0 transforms. Time: 0.887 sec.
[Worker #5 Sep 1 23:29] Stage 1 GCD complete. Time: 0.001 sec.
[Worker #5 Sep 1 23:29] Available memory is 31000MB.
(6 hours to make up it's mind)
[Worker #5 Sep 2 05:36] With trial factoring done to 2^67, optimal B2 is 614891451*B1 = 18446743530000000000.
[Worker #5 Sep 2 05:36] If no prior P-1, chance of a new factor is 41.6%

[/CODE]I've seen 200*B1 and 1000*B1 with ecm but never 614,000,000*B1
Also weird that it took 6 hours to make up it's mind.

Prime95 2021-09-03 01:18

[QUOTE=SethTro;587096]
PMinus1=1,2,21317,-1,30000000000,0,67,2
I've seen 200*B1 and 1000*B1 with ecm but never 614,000,000*B1
Also weird that it took 6 hours to make up it's mind.[/QUOTE]

The 6 hours bug is fixed in 30.7. I've also capped the max B2/B1 at 1000 (can override with a prime.txt setting).

It may well be that large B2/B1 ratio makes sense. Prime95 looks at the incremental cost associated with increasing the B2/B1 ratio and asks "if I invested that cost in increasing B1, which would have the better chance of finding a factor"?

With your ultra large B1 I suppose there is little to be gained in raising B1, thus prime95 concludes a very large B2/B1 ratio makes sense. Your version finally stopped at a ratio of 614000000 because B2*B1 overflowed 64 bits.

Prime95 2021-09-03 03:19

[QUOTE=SethTro;587096]
PMinus1=1,2,21317,-1,30000000000,0,67,2[/QUOTE]

30.7 is going to have real trouble running stage 2 on this. Maybe if you keep the B2/B1 ratio low it will work.

30.7 uses a new prime pairing algorithm that is mainly targeting the common cases: B1 from 50,000 to 100,000,000 and B2/B1 from 10 to 200.

axn 2021-09-03 04:28

21317 has ECM done till half t45. If you convert that to a TF depth and use that instead of 67, maybe it will generate reasonable B2?

EDIT:- Or realize that asking P95 to generate "optimal" B2 with completely made up numbers like "67,2" is piling nonsense upon nonsense, and instead just specify the B2 manually (say B1*200).

EDIT2:- But naturally, this small an exponent should use GMP-ECM to do stage 2.

pepi37 2021-09-03 06:38

To rise old question from the dust, can in Prime95 be implemented option STOP K when prime is found ( same as LLR have) It will be great for CRUS search.

SethTro 2021-09-03 17:58

I think changing worker threads reset my progress on ECM.

BL;UP
Worker #2 is on curve 646, then I change to 1 worker with 8 threads and that the ECM workitem gets moved to Worker #1 and progress gets reset back to curve 1.

I do have e0125429 and e0125429_1 so it's likely some status has been saved but it's hard to inspect these files by hand to see which curve it thinks it's on.

Excerpts from my logs


[CODE]
[Worker #2 Sep 3 00:47] ECM on M125429: curve #644 with s=8339439611366174, B1=3000000, B2=300000000
[Worker #2 Sep 3 00:59] Stage 1 complete. 78140227 transforms, 1 modular inverses. Time: 668.117 sec.
[Worker #2 Sep 3 01:03] Stage 2 complete. 28174299 transforms, 2 modular inverses. Time: 255.439 sec.
[Worker #2 Sep 3 01:03] ECM on M125429: curve #645 with s=8285950124090187, B1=3000000, B2=300000000
[Worker #2 Sep 3 01:14] Stage 1 complete. 78140227 transforms, 1 modular inverses. Time: 670.593 sec.
[Worker #2 Sep 3 01:18] Stage 2 complete. 28174299 transforms, 2 modular inverses. Time: 252.148 sec.
[Worker #2 Sep 3 01:18] Stage 2 GCD complete. Time: 0.004 sec.
[Worker #2 Sep 3 01:18] ECM on M125429: curve #646 with s=5349357897010496, B1=3000000, B2=300000000

Main Menu

1. Test/Primenet
2. Test/Worker threads
3. Test/Status
4. Test/Stop
5. Test/Exit
6. Advanced/Test
7. Advanced/Time
8. Advanced/P-1
9. Advanced/ECM
10. Advanced/Manual Communication
11. Advanced/Unreserve Exponent
12. Advanced/Quit Gimps
13. Options/CPU
14. Options/Resource Limits
15. Options/Preferences
16. Options/Torture Test
17. Options/Benchmark
18. Help/About
19. Help/About PrimeNet Server
Your choice: 3

Below is a report on the work you have queued and any expected completion
dates.
[Worker thread #1]
No work queued up.
[Worker thread #2]
M125429, ECM 750 curves B1=3000000, Sun Sep 5 12:14 2021
[Worker thread #3]
No work queued up.
[Worker thread #4]
No work queued up.
[Worker thread #5]
No work queued up.
[Worker thread #6]
No work queued up.
[Worker thread #7]
No work queued up.


Hit enter to continue: [Worker #2 Sep 3 01:30] Stage 1 complete. 78140227 transforms, 1 modular inverses. Time: 688.191 sec.
[Worker #2 Sep 3 01:30] Using 892MB of memory in stage 2.
[Worker #2 Sep 3 01:30] Stage 2 init complete. 160213 transforms, 1 modular inverses. Time: 1.430 sec.
[Worker #2 Sep 3 01:34] Stage 2 complete. 28174299 transforms, 2 modular inverses. Time: 263.829 sec.
[Worker #2 Sep 3 01:34] Stage 2 GCD complete. Time: 0.006 sec.
[Worker #2 Sep 3 01:34] ECM on M125429: curve #647 with s=2997245615684352, B1=3000000, B2=300000000
[Worker #4 Sep 3 01:42] Resuming.
[Worker #4 Sep 3 01:42] No work to do at the present time. Waiting.

Main Menu

1. Test/Primenet
2. Test/Worker threads
3. Test/Status
4. Test/Stop
5. Test/Exit
6. Advanced/Test
7. Advanced/Time
8. Advanced/P-1
9. Advanced/ECM
10. Advanced/Manual Communication
11. Advanced/Unreserve Exponent
12. Advanced/Quit Gimps
13. Options/CPU
14. Options/Resource Limits
15. Options/Preferences
16. Options/Torture Test
17. Options/Benchmark
18. Help/About
19. Help/About PrimeNet Server
Your choice: 2

Number of workers to run (7): 1

CPU cores to use (multithreading) (1): 8

Accept the answers above? (Y): Y
[Main thread Sep 3 01:43] Restarting all worker windows.
Main Menu

1. Test/Primenet
2. Test/Worker threads
3. Test/Status
4. Test/Stop
5. Test/Exit
6. Advanced/Test
7. Advanced/Time
8. Advanced/P-1
9. Advanced/ECM
10. Advanced/Manual Communication
11. Advanced/Unreserve Exponent
12. Advanced/Quit Gimps
13. Options/CPU
14. Options/Resource Limits
15. Options/Preferences
16. Options/Torture Test
17. Options/Benchmark
18. Help/About
19. Help/About PrimeNet Server
Your choice: [Worker #1 Sep 3 01:43] Resuming.
[Worker #1 Sep 3 01:43] Worker stopped.
[Worker #4 Sep 3 01:43] Resuming.
[Worker #4 Sep 3 01:43] Worker stopped.
[Worker #7 Sep 3 01:43] Resuming.
[Worker #7 Sep 3 01:43] Worker stopped.
[Worker #3 Sep 3 01:43] Resuming.
[Worker #3 Sep 3 01:43] Worker stopped.
[Worker #6 Sep 3 01:43] Resuming.
[Worker #6 Sep 3 01:43] Worker stopped.
[Worker #5 Sep 3 01:43] Resuming.
[Worker #5 Sep 3 01:43] Worker stopped.
[Worker #2 Sep 3 01:43] Worker stopped.
[Main thread Sep 3 01:43] Restarting all worker windows using new settings.
[Main thread Sep 3 01:43] Too many sections in worktodo.txt. Moving work from section #2 to #1.
[Main thread Sep 3 01:43] Too many sections in worktodo.txt. Moving work from section #3 to #1.
[Main thread Sep 3 01:43] Too many sections in worktodo.txt. Moving work from section #4 to #1.
[Main thread Sep 3 01:43] Too many sections in worktodo.txt. Moving work from section #5 to #1.
[Main thread Sep 3 01:43] Too many sections in worktodo.txt. Moving work from section #6 to #1.
[Main thread Sep 3 01:43] Too many sections in worktodo.txt. Moving work from section #7 to #1.
[Worker #1 Sep 3 01:43] Worker starting
[Worker #1 Sep 3 01:43] Setting affinity to run worker on CPU core #2
[Worker #1 Sep 3 01:43] Setting affinity to run helper thread 1 on CPU core #3
[Worker #1 Sep 3 01:43] Setting affinity to run helper thread 3 on CPU core #5
[Worker #1 Sep 3 01:43] Setting affinity to run helper thread 4 on CPU core #6
[Worker #1 Sep 3 01:43] Setting affinity to run helper thread 5 on CPU core #7
[Worker #1 Sep 3 01:43] Setting affinity to run helper thread 2 on CPU core #4
[Worker #1 Sep 3 01:43] Using FMA3 FFT length 6K, Pass1=128, Pass2=48, clm=2, 6 threads
[Worker #1 Sep 3 01:43] ECM on M125429: curve #1 with s=4174744624872270, B1=3000000, B2=300000000
[Worker #1 Sep 3 02:38] Stage 1 complete. 78140227 transforms, 1 modular inverses. Time: 3308.460 sec.
[Worker #1 Sep 3 02:56] Stage 2 complete. 28174303 transforms, 2 modular inverses. Time: 1042.426 sec.
[Worker #1 Sep 3 02:56] ECM on M125429: curve #2 with s=1008188240961680, B1=3000000, B2=300000000
[Worker #1 Sep 3 03:49] Stage 1 complete. 78140227 transforms, 1 modular inverses. Time: 3180.581 sec.
[Worker #1 Sep 3 04:06] Stage 2 complete. 28174299 transforms, 2 modular inverses. Time: 1009.045 sec.
[Worker #1 Sep 3 04:06] ECM on M125429: curve #3 with s=2585226154145931, B1=3000000, B2=300000000
[/CODE]

Prime95 2021-09-03 19:30

[QUOTE=SethTro;587187]I think changing worker threads reset my progress on ECM.

I do have e0125429 and e0125429_1 so it's likely some status has been saved but it's hard to inspect these files by hand to see which curve it thinks it's on.[/QUOTE]

ECM save file names are a mess. The _1 is there so that multiple workers can all ECM the same number.

Step 1: backup the two files.
Step 2: delete and/or rename files until prime95 uses the one with 640+ curves.

Sorry for the trouble.

SethTro 2021-09-03 21:26

That worked.

Thanks for the advice and thanks for writing all of the code :)

Glenn 2021-09-11 00:29

Latest Prime95 Updating?
 
At what point should I update my Prime95 software? I'm currently using 30.6 Build 4, which seems to be the latest easily available, though I'm now reading about 30.7 in this thread.

James Heinrich 2021-09-11 00:47

Perhaps the better question is why the [url=https://www.mersenne.org/download/]download page[/url] still points to v30.3b6?

Glenn 2021-09-11 00:53

[QUOTE=James Heinrich;587668]Perhaps the better question is why the [url=https://www.mersenne.org/download/]download page[/url] still points to v30.3b6?[/QUOTE]

Good question! I would think that the home page should point to the latest stable version, whether 30.6b4 or a different version.:smile:

axn 2021-09-11 02:52

[QUOTE=Glenn;587666]though I'm now reading about 30.7 in this thread.[/QUOTE]
IIUC, 30.7 is still WIP with no ETA.

Supposedly with a better stage 2, and possibly integrated P-1 stage 1 with PRP.

Even if it dropped tomorrow, it will be a while before people had a chance to test and find out all the bugs. So we're ways away from 30.7 becoming "official".

I join James H in asking why 30.6 is still not official.

Prime95 2021-09-11 03:29

[QUOTE=axn;587672]I join James H in asking why 30.6 is still not official.[/QUOTE]

Several reasons:
1) The benefits are not too great for the average user. A PRP tester will see somewhat faster P-1 stage 2.
2) Upgrading to 30.6b4 will restart P-1 stage 2 from scratch. Let's face it, prior to upgrading most users won't read the fine print that this will happen. Upgrading from 30.6 to 30.7 has the same problem -- let's pay this upgrade frustration only once.
3) On my Linux boxes with no swap area I get occasional crashes during stage 2. The OS gives me the informative error message of "Killed". I suspect mprime is out of memory, but I've no idea why this happens during stage 2 rather than stage 2 init when gobs of memory is allocated.

30.7 is almost ready for some testing. The only feature remaining is Alder Lake support.

axn 2021-09-11 03:38

[QUOTE=Prime95;587675]Several reasons:
1) The benefits are not too great for the average user. A PRP tester will see somewhat faster P-1 stage 2.
2) Upgrading to 30.6b4 will restart P-1 stage 2 from scratch. Let's face it, prior to upgrading most users won't read the fine print that this will happen. Upgrading from 30.6 to 30.7 has the same problem -- let's pay this upgrade frustration only once.
...
30.7 is almost ready for some testing. The only feature remaining is Alder Lake support.
[/quote]
Fair enough.

[QUOTE=Prime95;587675]I suspect mprime is out of memory, but I've no idea why this happens during stage 2 rather than stage 2 init when gobs of memory is allocated.[/QUOTE]
I have observed this a few times myself. It happens when I fire up something (typically more tabs in my Firefox) which asks for more memory. I don't see any reason why this wouldn't happen with previous versions, except for the fact that the new stage 2 actually can use all available memory across the exponent range.

azhad 2021-09-20 06:43

Prime95 v30.6 b4 Out of Memory crashes
 
Hi,

Want to report that the latter beta 30.6 crashes with Out of Memory which does not happen with 30.3. Have reported this before and I confirm it occurs randomly.

I have 24GB of RAM, and Prime95 30.3 can use 20GB.

For 30.6, I have to set the limit at 16GB and even then it gives an Out of Memory message sometimes. Problem is half the time, Prime95 gets killed - hence I would have to look out when Stage 2 runs. Hope this does not happen in 30.7. Can test if needed.

Kind Regards.

chalsall 2021-09-20 18:32

[QUOTE=azhad;588210]I have 24GB of RAM, and Prime95 30.3 can use 20GB.[/QUOTE]

This is very likely a misconfiguration. You are "pushing the edge case" too close to the edge; leave some margin.

How much virtual memory do you have available? Are *any* other programs running which might require unswappable RAM (note: some OSs launch CPU and memory hungry tasks at times)?

And, just for clarity for diagnosing your particular use-case...

Is this Prime95 (WinBlows) or mprime (Linux)? In either/both case(s), please always give the full underlying version of the distribution to assist with the analysis matrix.

kriesel 2021-09-20 19:14

On: i7-8750H laptop, 16GB ram, Win10 x64 Home 20H2 build 19042.1165,
with prime95 V30.6b4, I'm running a large-exponent P-1 stage 2. On launch it happily updates progress at about 1 hour intervals. Over time the update interval increases. I've seen it up to about a day. Generally it is a case of heavy swapping to SSD. A full prime95 stop, exit, restart generally clears it up, and it reappears later.
Originally I was running that with 12GB allowed in resource limits, with no such issue. I've cut it back to 10 and it's still happening.
Other loads on the system are a multitab Firefox session (3 Google Colab, and a bunch of other stuff, total 16 tabs);
Task Manager; mfaktc on the discrete GTX 1050 Ti; nothing on the IGP;
Ubuntu 18.04 LTS on WSL1 and Mlucas using nominally 2 cores +HT, but really wildly core-hopping 4 cores as it goes. Also have top running at a slow update rate.

I just pared prime95 back from 10GB allowed to 9, to see if that behaves better.
Something has 27G committed of a 39GB dynamically sized virtual memory, and 15G ram in use per Windows Task Manager.
Ubuntu top indicates Mlucas using only 1% of ram.
Windows Task Manager indicates top ram users are
prime95 8.5 GB
Firefox 0.7 GB
Mlucas 0.15 GB
The rest combined shown in Task Manager does not add up to the other 5+GB; maybe ~1. Firefox fluctuates up to 1GB without me touching it, and prime95's usage varies too.
Windows has a limit of page file size = 3 x ram size, so this system could go to ~60-64GB total virtual size before hitting a hard limit.

On another system with 16GB, multiple GPUs, and Win10 also, I've seen trouble (system crashes upon running out of virtual memory) when pushing above 60GB virtual. Gpuowl can push it over the edge if too many P-1 stage 2 or GCD coincide, even though the stage 2 are happening mostly on the GPUs.

Zhangrc 2021-09-21 02:53

[QUOTE=kriesel;588247]I just pared prime95 back from 10GB allowed to 9, to see if that behaves better.
[/QUOTE]
Try turning off the swapfile.sys and allocating 8GB for your Prime95 (If there are no other applications running on your computer) and see if this happens again.

kriesel 2021-09-21 04:02

[QUOTE=Zhangrc;588298]If there are no other applications running on your computer)[/QUOTE]Re-read the post before yours. Firefox Mfaktc WSL Ubuntu Mlucas top Task Manager

kriesel 2021-09-23 17:55

prime95 idles lots of cores on a worker during P-1 GCD
 
1 Attachment(s)
See attachment. Same will apply to dual-manycore-Xeons and some other configurations.
P-1 during GCD idles all but 1 core of a worker. On a Xeon Phi this may be 64/n -1 or 68/n -1 cores, where n is number of workers. Depending on exponent the duration may be considerable.
Gpuowl handled this situation by running parallel threads, speculatively executing the next P-1 stage or the following PRP while the GCD ran in a separate thread. The factor finding probabilities with normal bounds are such that ~98% of the time the same exponent is involved, it pays off. And if it is a succession of P-1 on different exponents, or following is other type work on different exponents, whatever payoff there is occurs 100% of the time. Perhaps this approach would be productive on hyperthreaded CPUs in prime95 / mprime also. (And Mlucas?)
In the M880M case I was running, it took an hour for a GCD.

Similarly, during preallocation of disk space for the proof residues of an M500M PRP, it took a few minutes while nearly all cores of the worker were idle.

chalsall 2021-09-23 18:09

[QUOTE=kriesel;588492]Similarly, during preallocation of disk space for the proof residues of an M500M PRP, it took a few minutes while nearly all cores of the worker were idle.[/QUOTE]

Coordination of the concurrency of processes is a non-trivial problem space.

There are cases where it simply makes sense to go single-threaded. Particularly when IOPS are involved.

Computers are cheap. Talented humans are ***very*** expensive... :smile:

kriesel 2021-09-23 18:20

[QUOTE=chalsall;588494]Coordination of the concurrency of processes is a non-trivial problem space.
... Talented humans are ***very*** expensive... :smile:[/QUOTE]Except when they're free. George does what he does not for the money.

After watching George deliver very well for a quarter century, it seems clear to me he's up to the task. Multiple workers using multiple cores each, plus a PrimeNet communications thread. Maybe it just goes on a to-do-someday list beneath some other priorities. Or maybe there are good reasons not to try it, that I'm unaware of.

Gpuowl source provides an example of how the GCD parallelism may be handled. Different situation GPU & CPU combined there, but still.

For proof space preallocation, the potential time saving is smaller, but one could compute a time estimate for space preallocation and a time estimate for when depositing the first proof residue will be needed, and only parallelize when there's a comfortable time margin, and also ensure it wait for completion of preallocation.

V30.7 is in preparation. AFAIK this includes P-1 speed improvements in primes pairing, & Alder Lake support. Not sure what else.

chalsall 2021-09-23 18:36

[QUOTE=kriesel;588497]Except when they're free. George does what he does not for the money.[/QUOTE]

Time is the fundamental currency. Perfect is the enemy of good. [URL="https://www.youtube.com/watch?v=v0nmHymgM7Y"]Much like a poem, software is never finished. Simply abandoned.[/URL]

George /might/ have made a conscious decision that the effort required (including all the "in the wild" debugging) was not worth the tiny amount of throughput which /might/ be gained.

Or, maybe, he's just busy with other stuff... :tu:

kriesel 2021-09-23 19:12

Let's ballpark these for ppm of system productivity.

P-1 GCD 1 hour at 880M on Xeon Phi 7210. I have another similar-exponent P-1 that's projecting a week left to go for about half of stage 2. So let's assume 30 days for both stages on 880M, 7210; 60 minutes x 2 stages / (30x24x60) x 15/16 ~2600 ppm = 0.26% of P-1 time, which is ~1/40 of PRP time, so ~62. ppm of exponent (TF + P-1 + PRP) time. That might become worthwhile to pursue at some point, depending on what other optimization opportunities remain and effort needed.

Preallocate PRP proof space 3 minutes at 500M on Xeon Phi 7210.
Forecast PRP time 328.5 days ~473040 minutes. 3/473040 x 15cores/16cores= 6. ppm of PRP time.
That would need to be a very quick modification to be worth the programming and test time. Seems unlikely.

chalsall 2021-09-23 19:35

[QUOTE=kriesel;588500]Let's ballpark these for ppm of system productivity.[/QUOTE]

Let's... :wink:

You are working at the extreme edge. I understand the reasoning, but I would argue this should not inform "general policy".

My P-1'ers (using mprime (Linux64,Prime95,v30.5,build 2)) are currently taking about 5 seconds for the GCDs (single-threaded). Not a problem, in my Universe.

kriesel 2021-09-23 19:59

Assuming the ~p[SUP]2.1[/SUP] scaling also applies to GCD operations, and you're doing ~[B]106M[/B] P-1, there's a factor of ~4.2 unexplained difference in GCD speed in your favor. Maybe faster cores giving faster GCDs, and correspondingly faster stages too.

Timing I gave for large exponent was using ~10GB in stage 2, prime95 V30.6b4.


edit: chalsall's small exponent ~[B]27.4M[/B] more than explains the rest of the speed ratio. 5.05sec x 2 /2hr29min = 0.11% potential speedup for him. Except, i3-9100 is 4-core no hyperthreading. Gpouwl's parallelism came about because Mihai took pity on my multi-Radeon VII/slow-cpu-forGCD P-1 factory, which spent ~5 minutes of a 40 minute wavefront P-1 factoring in single-cpu-core GCD with the GPU idle and waiting. System didn't have enough max ram to support dual-instance P-1 on its GPUs to mitigate it. 40/35 = 14.% P-1 speedup via speculative parallelism. As always, George's call what is worth George's time, and not worthwhile.

chalsall 2021-09-23 20:11

[QUOTE=kriesel;588502]...there's nearly a factor of 5 unexplained difference in GCD speed in your favor.[/QUOTE]

All I can do is give you my empirical.

[CODE][Work thread Sep 23 09:30] M27430621 stage 1 complete. 2997862 transforms. Time: 2750.641 sec.
[Work thread Sep 23 09:30] Starting stage 1 GCD - please be patient.
[Work thread Sep 23 09:30] Stage 1 GCD complete. Time: 5.052 sec.
[Work thread Sep 23 09:30] D: 462, relative primes: 857, stage 2 primes: 3303121, pair%=90.33
[Work thread Sep 23 09:30] Using 9996MB of memory.
[Work thread Sep 23 09:30] Stage 2 init complete. 7751 transforms. Time: 15.059 sec.

[Work thread Sep 23 11:12] M27430621 stage 2 complete. 4016210 transforms. Time: 6135.924 sec.
[Work thread Sep 23 11:12] Starting stage 2 GCD - please be patient.
[Work thread Sep 23 11:12] Stage 2 GCD complete. Time: 5.054 sec.
[Work thread Sep 23 11:12] M27430621 completed P-1, B1=1039000, B2=56821000, Wi8: C6D8FB56
[Comm thread Sep 23 11:12] Sending result to server: UID: [redacted]/usbenv, M27430621 completed P-1, B1=1039000, B2=56821000, Wi8: C6D8FB56, AID: 8B45B0E3C88E84E8B42236C07C5F070A

[Work thread Sep 23 11:58] M27430643 stage 1 complete. 2997862 transforms. Time: 2749.861 sec.
[Work thread Sep 23 11:58] Starting stage 1 GCD - please be patient.
[Work thread Sep 23 11:58] Stage 1 GCD complete. Time: 5.043 sec.
[Work thread Sep 23 11:58] D: 462, relative primes: 857, stage 2 primes: 3303121, pair%=90.33
[Work thread Sep 23 11:58] Using 9996MB of memory.
[Work thread Sep 23 11:59] Stage 2 init complete. 7751 transforms. Time: 15.055 sec.

[Work thread Sep 23 13:41] M27430643 stage 2 complete. 4016210 transforms. Time: 6143.559 sec.
[Work thread Sep 23 13:41] Starting stage 2 GCD - please be patient.
[Work thread Sep 23 13:41] Stage 2 GCD complete. Time: 5.052 sec.
[Work thread Sep 23 13:41] M27430643 completed P-1, B1=1039000, B2=56821000, Wi8: C6B2FB4A
[Comm thread Sep 23 13:41] Sending result to server: UID: [redacted]/usbenv, M27430643 completed P-1, B1=1039000, B2=56821000, Wi8: C6B2FB4A, AID: 30BC556ED1625FFF02A0B1960F00B038[/CODE]

[CODE][chalsall@usbwalker prime]$ cat /proc/cpuinfo | grep name
model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz
model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz
model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz
model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz[/CODE]

chalsall 2021-09-23 20:37

[QUOTE=kriesel;588502]edit: chalsall's small exponent ~[B]27.4M[/B] more than explains the rest of the speed ratio. 5.05sec x 2 /2hr29min = 0.11% potential speedup for him.[/QUOTE]

I would argue that for future readers it might have been more valuable for you to quote my message to yours in a new post, rather than editing your post speaking to my subsequent post.

I deeply appreciate your curation skills, Ken. :tu:

It's a job description that few appreciate. And those that do, would only take on if the subject domain was important enough...

S485122 2021-09-30 07:33

"Sending interim residue" Mxxx / AID
 
Prime95 30.6 b4
Nothing dramatic but an inconsistency nevertheless [noparse];-)[/noparse]

When interim residues are sent to the server, between or during the periodic communication the format of the output to the screen and the prime.log file has the following format :[code][Comm thread Sep 16 19:22] Sending interim residue 40000000 for M58193041[/code]
But when the residues are sent together with a result the AID is used instead of the M followed by the exponent :[code][Comm thread Sep 16 23:26] Sending interim residue 55000000 for assignment 172076D8AD6993D981F397637613B8DC
[Comm thread Sep 16 23:26] Sending result to server: UID: S485122/i9-10920X, M58193041 is not prime. Res64: 1B5E1783A3861E57. Wh4: 67E20740,22995864,00000000, AID: 172076D8AD6993D981F397637613B8DC[/code]
(Never mind the AID in clear : the assignment has been completed the assignment and its ID are bygones.)

kriesel 2021-09-30 12:21

Found the following in an mprime run log immediately after starting mprime v30.6b4:[CODE][Main thread Sep 30 12:04] Mersenne number primality test program version 30.6 [Main thread Sep 30 12:04] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 55 MB
[Main thread Sep 30 12:04] Starting worker.
[Main thread Sep 30 12:04] Stopping all worker windows.
[Work thread Sep 30 12:04] Worker starting
[Work thread Sep 30 12:04] Worker stopped.
[Main thread Sep 30 12:04] Execution halted.
[Main thread Sep 30 12:04] Choose Test/Continue to restart[/CODE]That's hard to do when it's a Google Colab background process, no menu, no keyboard, no means of input.
Stop and Continue the notebook section seems to have worked.
No idea what caused the immediate stop.

ixfd64 2021-09-30 19:40

Any chance we could write PRP results to [C]results.txt[/C] too?

I understand that [C]results.txt[/C] has been deprecated in favor of the JSON file, but it would be nice to have data that is more human-readable. Or as a compromise, could we have an option to "pretty print" the JSON strings?

James Heinrich 2021-09-30 19:45

[QUOTE=ixfd64;589095]Any chance we could write PRP results to [C]results.txt[/C] too?

I understand that [C]results.txt[/C] has been deprecated in favor of the JSON file, but it would be nice to have data that is more human-readable. Or as a compromise, could we have an option to "pretty print" the JSON strings?[/QUOTE]If by pretty-print you mean presenting JSON over multiple lines with indenting and such then no, as this will break manual results which is based on the assumption that one-line=one-result.

I have no objection if George wants to add output to the non-JSON output, but support for any new format will not be added to manual results parsing (we don't want users submitting less data).

I'm curious what part you find less-than-readable about the JSON results? If it would be universally considered helpful the JSON elements could be re-ordered without causing any problems.

Prime95 2021-09-30 20:05

[QUOTE=ixfd64;589095]Any chance we could write PRP results to [C]results.txt[/C] too?
[/QUOTE]

Try "OutputComposites=1" and, in case you might get very lucky, "OutputPrimes=1"

chalsall 2021-09-30 20:31

[QUOTE=James Heinrich;589096]If by pretty-print you mean presenting JSON over multiple lines with indenting and such then no, as this will break manual results which is based on the assumption that one-line=one-result.[/QUOTE]

IMHO, JSON was designed for machines, not humans.

For the latter tools are available, for development, testing, and QA purposes.

[CODE]chalsall@hobbit:~$ echo '{"hello":"World"}' | jq
{
"hello": "World"
}
[/CODE]

ixfd64 2021-09-30 22:40

[QUOTE=Prime95;589100]Try "OutputComposites=1" and, in case you might get very lucky, "OutputPrimes=1"[/QUOTE]

Yes, that's exactly what I was looking for. I checked [C]undoc.txt[/C] and can't believe I didn't see these. Thank you!

Zhangrc 2021-10-01 12:27

[QUOTE=kriesel;588502]Assuming the ~p[SUP]2.1[/SUP] scaling also applies to GCD operations[/QUOTE]
The time complexity is almost linear logarithmic, O(log(ab)).
[CODE]
int gcd(int x,int y )
{
if(x < y) return gcd(y,x); // x>y
if( y == 0) return x; // if y=0, x is GCD
else
{
if( !(x%2) )
{
if( !(y%2) ) //x,y both even
return 2*gcd(x >> 1, y >> 1);
else // x is even, y is odd
return gcd(x >> 1, y );
}
else
{
if( !(y%2) ) // x is odd, y is even
return gcd(x, y >> 1);
else // x, y both odd
return gcd(y,x-y);
}
}
}
[/CODE]
and AFAIK there's no speed to gain using multithreaded GCD. It's a kind of function iteration where each step depends on the results from the last step.

tha 2021-10-01 14:44

What cause of action would be recommended for this core segmentation fault? Nothing else was running at the time.

[CODE]
[Comm thread Oct 1 13:40] Sending result to server: UID: Tha/Z-170, M9262289 completed P-1, B1=2000000, B2=146000000, Wi8: E7182B42
[Comm thread Oct 1 13:40]
[Work thread Oct 1 13:40]
[Work thread Oct 1 13:40] P-1 on M9194659 with B1=5000000, B2=TBD
[Work thread Oct 1 13:40] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Oct 1 13:40] Using FMA3 FFT length 480K, Pass1=384, Pass2=1280, clm=4, 4 threads
[Work thread Oct 1 13:40] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Oct 1 13:40] Setting affinity to run helper thread 3 on CPU core #4
[Comm thread Oct 1 13:40] PrimeNet success code with additional info:
[Comm thread Oct 1 13:40] CPU credit is 3.9457 GHz-days.
[Comm thread Oct 1 13:40] Done communicating with server.
[Work thread Oct 1 13:48] M9194659 stage 1 is 13.85% complete. Time: 458.543 sec.
[Work thread Oct 1 13:56] M9194659 stage 1 is 27.71% complete. Time: 454.645 sec.
[Work thread Oct 1 14:03] M9194659 stage 1 is 41.58% complete. Time: 458.479 sec.
[Work thread Oct 1 14:11] M9194659 stage 1 is 55.44% complete. Time: 456.359 sec.
[Work thread Oct 1 14:18] M9194659 stage 1 is 69.30% complete. Time: 460.225 sec.
[Work thread Oct 1 14:26] M9194659 stage 1 is 83.16% complete. Time: 436.106 sec.
[Work thread Oct 1 14:33] M9194659 stage 1 is 97.02% complete. Time: 436.115 sec.
[Work thread Oct 1 14:35] M9194659 stage 1 complete. 14429932 transforms. Time: 3254.242 sec.
[Work thread Oct 1 14:35] With trial factoring done to 2^71, optimal B2 is 91*B1 = 455000000.
[Work thread Oct 1 14:35] If no prior P-1, chance of a new factor is 8.92%
[Work thread Oct 1 14:35] D: 1050, relative primes: 2848, stage 2 primes: 23755332, pair%=94.69
[Work thread Oct 1 14:35] D: 1050, relative primes: 2819, stage 2 primes: 23755332, pair%=94.61
[Work thread Oct 1 14:35] Using 11021MB of memory.
[Work thread Oct 1 14:35] Stage 2 init complete. 27785 transforms. Time: 16.280 sec.
[Work thread Oct 1 14:44] M9194659 stage 2 is 7.41% complete. Time: 526.239 sec.
[Work thread Oct 1 14:52] M9194659 stage 2 is 14.96% complete. Time: 526.555 sec.
[Work thread Oct 1 15:01] M9194659 stage 2 is 22.61% complete. Time: 526.607 sec.
[Work thread Oct 1 15:10] M9194659 stage 2 is 30.29% complete. Time: 526.931 sec.
[Work thread Oct 1 15:19] M9194659 stage 2 is 37.96% complete. Time: 526.598 sec.
[Work thread Oct 1 15:27] M9194659 stage 2 is 45.63% complete. Time: 526.763 sec.
[Work thread Oct 1 15:36] M9194659 stage 2 is 53.28% complete. Time: 526.889 sec.
[Work thread Oct 1 15:45] M9194659 stage 2 is 60.82% complete. Time: 526.498 sec.
[Work thread Oct 1 15:54] M9194659 stage 2 is 68.22% complete. Time: 526.293 sec.
[Work thread Oct 1 16:03] M9194659 stage 2 is 75.58% complete. Time: 526.346 sec.
[Work thread Oct 1 16:11] M9194659 stage 2 is 82.96% complete. Time: 532.123 sec.
[Work thread Oct 1 16:20] M9194659 stage 2 is 90.33% complete. Time: 526.133 sec.
[Work thread Oct 1 16:29] M9194659 stage 2 is 97.73% complete. Time: 526.361 sec.
[Work thread Oct 1 16:32] M9194659 stage 2 complete. 26612490 transforms. Time: 7011.609 sec.
[Work thread Oct 1 16:32] Stage 2 GCD complete. Time: 1.302 sec.
[Work thread Oct 1 16:32] P-1 found a factor in stage #2, B1=5000000, B2=455000000.
[Work thread Oct 1 16:32] M9194659 has a factor: 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 (P-1, B1=5000000, B2=455000000)
[Comm thread Oct 1 16:32] Sending result to server: UID: Tha/Z-170, M9194659 has a factor: 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 (P-1, B1=5000000, B2=455000000)
[Comm thread Oct 1 16:32]
[Work thread Oct 1 16:32]
[Work thread Oct 1 16:32] P-1 on M9262387 with B1=2000000, B2=TBD
[Work thread Oct 1 16:32] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Oct 1 16:32] Using FMA3 FFT length 480K, Pass1=384, Pass2=1280, clm=4, 4 threads
[Work thread Oct 1 16:32] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Oct 1 16:32] Setting affinity to run helper thread 3 on CPU core #4
[Comm thread Oct 1 16:32] PrimeNet success code with additional info:
[Comm thread Oct 1 16:32] Composite factor 1184258844011627168698855977003308763574671960534764772350463733066848456139362163010122679897458456388791012515229078287455071 = 367786361 * 201803650751527169 * 267438738796927 * 101994518728598164687111 * 441343633 * 422426099229769 * 23010527886744119 * 62376566657 * 2185975009297
[Comm thread Oct 1 16:32] Already have factor 367786361 for M9194659
[Comm thread Oct 1 16:32] Already have factor 201803650751527169 for M9194659
[Comm thread Oct 1 16:32] Already have factor 267438738796927 for M9194659
[Comm thread Oct 1 16:32] Already have factor 441343633 for M9194659
[Comm thread Oct 1 16:32] Already have factor 422426099229769 for M9194659
[Comm thread Oct 1 16:32] Already have factor 23010527886744119 for M9194659
[Comm thread Oct 1 16:32] Already have factor 62376566657 for M9194659
[Comm thread Oct 1 16:32] Already have factor 2185975009297 for M9194659
[Comm thread Oct 1 16:32] CPU credit is 11.8294 GHz-days.
Segmentation fault (core dumped)
henk@Z170:~/mersenne$ ^C

[/CODE]
[CODE]

Linux64,Prime95,v30.6,build 3

Your choice: 14

Consult readme.txt prior to changing any of these settings.

Temporary disk space limit in GB/worker (6.000000):
Daytime P-1/P+1/ECM stage 2 memory in GB (10.800000):
Nighttime P-1/P+1/ECM stage 2 memory in GB (10.800000):
Upload bandwidth limit in Mbps (5.000000):
Upload large files time period start (00:00):
Upload large files time period end (24:00):
Download limit for certification work in MB/day (400):
Skip advanced resource settings (Y): [/CODE]

kriesel 2021-10-01 14:49

Parallelism in GCD computation; runtime order
 
[QUOTE=Zhangrc;589131]The time complexity is almost linear logarithmic, O(log(ab)).
and AFAIK there's no speed to gain using multithreaded GCD. It's a kind of function iteration where each step depends on the results from the last step.[/QUOTE]Thanks for that for single-precision operands. The larger operand would be Mp. The smaller would be ~Mp/2. o(log Mp[SUP]2[/SUP]/2) would be o(p[SUP]2[/SUP]). [URL]https://stackoverflow.com/questions/18137019/running-time-of-gcd-function-recursively-euclid-algorithm[/URL]

Now consider the usual GIMPS case where one of the operands is Mp, p~10[SUP]8[/SUP], requiring millions of words to store, not just one, and even simple operations may take time ~proportional to p to execute once. The other starting operand is 3[SUP]large_power[/SUP] Mod Mp, so ~p-1 bits large on average. Also note that potential factors are typically >2[SUP]64[/SUP] before P-1 is attempted, due to prior TF depth, so the final operands are likely multi-word also. Even assuming the operands are stored in multi-word arrays of >1M packed 64-bit-unsigned initially, one can imagine dividing x>>1 into some small m parallel threads for most of the work, and alternatively computing a new least significant word and bit offset and mask. I don't see how you avoid at least proportional to p iterations, since the algorithm posted removes one bit out of p bits per pass. What is the run time for multiprecision general subtraction, o(n) each where n is the number of bits in the lesser operand?

Anything less than ~p[SUP]2.1[/SUP] scaling (really O(p[SUP]2[/SUP] log p log log p)) makes my case stronger, that scaling downward from larger exponents to the wavefront exponents indicates significant single-core-GCD time at the wavefront.

GIMPS code parallelizes individual iterations of PRP or LL or P-1 powering. The fact that one iteration depends on the previous does not preclude parallelizing in iterations or individual operations within iterations, and gaining considerably in speed with available numbers of processor cores. (Four cores / task optimal throughput is routine in CPU-based code prime95, Mlucas. GPU code uses much higher parallelism for the same algorithms.)

Making use of parallelism to speed individual iterations of GCD is possible, as a quick web search reveals: [URL]https://www.sciencedirect.com/science/article/pii/S1570866707000585[/URL]. No one's coded it yet for GIMPS, choosing to focus optimization efforts on larger portions of the P-1 factoring computation first. That paper gives o(n / log n) for n[SUP]1+e[/SUP] processors "where [I]n[/I] is the bit-length of the larger input" which is p for Mp. That is a LOT of processors. GPUs have lots of processors, but not that many. Converting that back to sequential not parallel gives o(p[SUP]2+e[/SUP]/log p).

There are probably slight savings to be had based on knowledge of Mp and its possible factors always being odd, and knowing an Mp-oriented GCD can always be called with a specific operand order, avoiding the initial x>y test.

The order of GMP's GCD is given in [URL]https://oaciss.uoregon.edu/icpp18/publications/pos123s2-file1.pdf[/URL] as [QUOTE]T[FONT=sans-serif]he GCD algorithms used by[/FONT]
[FONT=sans-serif]GMP for large input are essentially[/FONT][FONT=sans-serif]O[/FONT][FONT=sans-serif]([/FONT][FONT=sans-serif]N[/FONT][FONT=sans-serif]^(1[/FONT][FONT=sans-serif]+[/FONT][FONT=sans-serif]ε)[/FONT][FONT=sans-serif] log[/FONT][FONT=sans-serif]N[/FONT][FONT=sans-serif])[/FONT][FONT=sans-serif] for a fairly large[/FONT]
[FONT=sans-serif]value of[/FONT][FONT=monospace]ε[/FONT][FONT=sans-serif][6, section 15.3.3][/FONT][/QUOTE](An interesting paper in its own right from 2018, extrapolating that modular integers may be useful to compute GCDs faster using multiple GPUs.)

[URL]http://www.iaeng.org/IJCS/issues_v42/issue_4/IJCS_42_4_01.pdf[/URL] is a somewhat different approach, where many GCDs are done in parallel on CPU or GPU.

Prime95 2021-10-01 15:19

[QUOTE=tha;589140]What cause of action would be recommended for this core segmentation fault? Nothing else was running at the time.[/QUOTE]

Probably already fixed (in 30.7). Looks like the "message too large from server" bug.

Zhangrc 2021-10-02 03:05

[QUOTE=kriesel;589141]o(log Mp[SUP]2[/SUP]/2) would be o(p[SUP]2[/SUP]). [/QUOTE]
No because log(ab)=log(a)+log(b). O(log Mp[SUP]2[/SUP]/2) is O(log 2[SUP]2p-1[/SUP]) = O(2p-1).

kriesel 2021-10-02 14:51

[QUOTE=Zhangrc;589190]No.[/QUOTE]
In gmp, & gmplib, which IIRC is used in prime95 / mprime, gpuowl, and now also Mlucas v20.x, P-1 GCD (and probably P+1 GCD in mprime/ prime95), per [URL]https://gmplib.org/manual/Nomenclature-and-Types[/URL]
"A limb means the part of a multi-precision number that fits in a single machine word. (We chose this word because a limb of the human body is analogous to a digit, only larger, and containing several digits.) Normally a limb is 32 or 64 bits."

[URL]https://gmplib.org/manual/Binary-GCD[/URL]
"At small sizes GMP uses an [B]O(N^2)[/B] binary style GCD." where N is number of limbs per operand.
"Currently, the binary algorithm is used for GCD only when N < 3." So x,y each less than 129 bits at most.

[URL]https://gmplib.org/manual/Lehmer_0027s-Algorithm[/URL]
Per iteration, reduces inputs by almost a word size.
"The resulting algorithm is asymptotically O(N^2), just as the Euclidean algorithm and the binary algorithm." where N is number of limbs per operand.
For Mp of interest to GIMPS wavefront activity, N= p/limbsize = ~10^8/64 or /32.

[URL]https://gmplib.org/manual/Subquadratic-GCD[/URL] says
"For inputs larger than GCD_DC_THRESHOLD, GCD is computed via the HGCD (Half GCD) function, as a generalization to Lehmer’s algorithm."
"The asymptotic running time of both HGCD and GCD is O(M(N)*log(N)), where M(N) is the time for multiplying two N-limb numbers." and where N is number of limbs per operand.
M(Mp) is O(p log p log log p) as per Knuth and many other sources for large operands using fft methods. Log2 Mp is ~p; N is p/64 or p/32 depending on unsigned-64-bit or 32-bit in gmp; log N is log (p-64 or p-32) = ~log p for large p such as the >10^8 bit operands common in GIMPS first test or preparatory P-1.
So O(M(N)*log(N)) ~ O(p log p log log p * log p) = [B]O(p (log p)^2 log log p)[/B]

(Hmm, where does one find or determine a value for GCD_DC_THRESHOLD?)

[URL]https://www.ams.org/journals/mcom/2008-77-261/S0025-5718-07-02017-0/home.html[/URL] confirms asymptotic run time O(n (log n)^2 log log n) where n is number of bits of an operand.

SethTro 2021-10-02 18:26

[QUOTE=kriesel;589205]
(Hmm, where does one find or determine a value for GCD_DC_THRESHOLD?)
[/QUOTE]

I wrote some of that documentation :)

You can find thresholds at [URL]https://gmplib.org/devel/thres/[/URL]
[URL="https://gmplib.org/devel/thres/GCD_DC_THRESHOLD"]GCD_DC_THRESHOLD[/URL] seems to be around 300-400 depending on platform.

tha 2021-10-02 20:12

[QUOTE=Prime95;589142]Probably already fixed (in 30.7). Looks like the "message too large from server" bug.[/QUOTE]

Can we download and test that version? Or should we wait for more wrinkles to be ironed out?

James Heinrich 2021-10-02 20:16

[QUOTE=tha;589223]Can we download and test that version? Or should we wait for more wrinkles to be ironed out?[/QUOTE]It's still George's development version, not yet ready for testing.

kriesel 2021-10-05 13:47

For M58793159, 3M fft length may be a bit too aggressive in prime95 v30.6b4; got roundoff error 0.4375 >0.4 at iteration 292709 on Celeron G1840.

kruoli 2021-10-10 19:07

v30.6b4: If stage 1 GCD is disabled, the optimal B2 seems to always be calculated as 100*B1. The worktodo had another value for B2 and the usual multiplicator for same-sized exponents when stage 1 GCD is enabled is way lower.

Prime95 2021-10-10 20:02

[QUOTE=kruoli;590118]v30.6b4: If stage 1 GCD is disabled, the optimal B2 seems to always be calculated as 100*B1. The worktodo had another value for B2 and the usual multiplicator for same-sized exponents when stage 1 GCD is enabled is way lower.[/QUOTE]

ECM? Does this happen in 30.7? If so, what is the worktodo.txt entry?

Prime95 2021-10-10 20:04

[QUOTE=kriesel;589533]For M58793159, 3M fft length may be a bit too aggressive in prime95 v30.6b4; got roundoff error 0.4375 >0.4 at iteration 292709 on Celeron G1840.[/QUOTE]

0.4375 is not that scary (to me). However, you aren't near the 3M FFT limit. Please advise if more (and worse) errors occur as you move higher.

James Heinrich 2021-10-10 20:10

[QUOTE=Prime95;590124]ECM? Does this happen in 30.7? If so, what is the worktodo.txt entry?[/QUOTE]Maybe when there's a new version released, George should post in the sticky thread of the previous version to let us lurkers know there's a new version on offer. I for one almost never browse the forum to see what's new, I just checked my watched threads when I get emailed.
[url]https://www.mersenneforum.org/showthread.php?t=27180[/url]

kruoli 2021-10-10 20:31

[QUOTE=Prime95;590124]ECM? Does this happen in 30.7? If so, what is the worktodo.txt entry?[/QUOTE]

This was with P-1, e.g. [C]Pminus1=1,2,27318307,-1,1200000,40500000,74[/C]. I'll retry with 30.7 soon. If you look at the exponent page of M[M]27318307[/M], you'll see I have two results there (mprime was killed by the kernel and the worktodo.txt content was still the same as of two days prior, I did not notice that before restarting and duplicated work that way, I thought WellBehavedWork would write the file more frequently; I digress). The first was before I added the stage 1 skip, the second one after, so it's definitely the same worktodo entry.

Prime95 2021-10-10 21:17

[QUOTE=kruoli;590130]This was with P-1, e.g. [C]Pminus1=1,2,27318307,-1,1200000,40500000,74[/C]. I'll retry with 30.7 soon. If you look at the exponent page of M[M]27318307[/M], you'll see I have two results there (mprime was killed by the kernel and the worktodo.txt content was still the same as of two days prior, I did not notice that before restarting and duplicated work that way, I thought WellBehavedWork would write the file more frequently; I digress). The first was before I added the stage 1 skip, the second one after, so it's definitely the same worktodo entry.[/QUOTE]

FYI: 30.7 always skips stage 1 GCD for P-1 (you'll get it for free during stage 2 init)
FYI2: Brent-Suyama is no more.

Do you keep old save file around for possibly later increasing B2? Could this bug have something to do with picking up an old save file and increasing its bounds rather than a consequence of stage1gcd=0?

kruoli 2021-10-10 21:21

Ah, okay. So I cannot proceed to stage 2 when a factor would be found in stage 1? (Yes, this is not necessary for most use cases, just asking.)

The problem I described earlier does therefore not occur in 30.7, I just tested it, sorry for the tumult.

kruoli 2021-10-10 21:24

[QUOTE=Prime95;590131]FYI2: Brent-Suyama is no more.[/QUOTE]
Sometimes things have to go. :smile: I think you mentioned that earlier.
Edit: I had the setting still active and the program wrote that it had used Brent-Suyama to the JSON file, maybe this should be ignored and not written to the result file? I'll delete the setting nonetheless, but I hope I do not forget to do this for all of my machines...

[QUOTE=Prime95;590131]Do you keep old save file around for possibly later increasing B2? Could this bug have something to do with picking up an old save file and increasing its bounds rather than a consequence of stage1gcd=0?[/QUOTE]

That's might be it! Because of running all jobs a second time (unintentionally), they were run two times with the same B1, the second time only resumed. I had not watched the machine for a while, otherwise I should have seen this. Shall I try this with 30.7, too?

Prime95 2021-10-10 23:33

[QUOTE=kruoli;590132]Ah, okay. So I cannot proceed to stage 2 when a factor would be found in stage 1? (Yes, this is not necessary for most use cases, just asking.).[/QUOTE]

It can be done. Read undoc.txt regarding Stage1GCD setting.

[QUOTE=kruoli;590134]That's might be it! Because of running all jobs a second time (unintentionally), they were run two times with the same B1, the second time only resumed. I had not watched the machine for a while, otherwise I should have seen this. Shall I try this with 30.7, too?[/QUOTE]

If you can reproduce a problem, I'll investigate a fix.

LaurV 2021-10-12 09:36

Not sure if this was reported or maybe even fixed in the last versions, I still have few computers using v30.3, and sometimes, for whatever reasons, they can't connect to the server (it may be network/rights related, my IT guys get paranoid sometimes, which is not a bad thing). The worktodo is therefore exhausted and the computers are waiting to get work for days (usually, over the weekend, when I can't attend them).

What I found out repeatedly is that in such case the computers can't connect to the server ever, even if P95 is restarted, but they will connect to the server if the spool file is deleted (moved to another folder), even if that is done during P95 runs. Putting the spool file back - error, can't connect to the server. Taking it out, no issue, connect, get new assignments, put it back, can't connect (but the work is progressing normal, and proof files are stacked up locally - especially for PRPCF assignments, which take little time to finish).

First time (second time, third time) we assumed that the spool file got malformed or it suffered some damage, so we just deleted it and continue from there. We tried first to recover unreported stuff from it, using a hex editor (which was quite successful). But the issue re-appeared few more times, therefore we decided to zip such file and keep it.

The file will crash the P95 connection if we unzip it in P95 folder, regardless of computer (i.e. if we put it on another computer, that will not be able to connect to the server and get and/or report work either).

@George: do you need it? (maybe to track what happens, etc), the zip is 7360 bytes (i.e. not big).

kriesel 2021-10-12 13:32

1 Attachment(s)
If the worker window estimates 31 days to go on a 50M fft CERT, why does the client tell the PrimeNet server it has one day to go?
If it has a month of high priority 50M fft CERT work to do, why does it interrupt that to run unneeded-for-a-month-at-least 3360K and 3456K benchmarks?
Will v30.7bx address these?
Are there settings I can apply to address them in v30.6b4?

Prime95 2021-10-12 14:23

[QUOTE=LaurV;590225]
The file will crash the P95 connection if we unzip it in P95 folder, regardless of computer (i.e. if we put it on another computer, that will not be able to connect to the server and get and/or report work either).

@George: do you need it? (maybe to track what happens, etc), the zip is 7360 bytes (i.e. not big).[/QUOTE]

Sure. PM me and I will look into it.

Prime95 2021-10-12 14:40

[QUOTE=kriesel;590250]If the worker window estimates 31 days to go on a 50M fft CERT, why does the client tell the PrimeNet server it has one day to go?
If it has a month of high priority 50M fft CERT work to do, why does it interrupt that to run unneeded-for-a-month-at-least 3360K and 3456K benchmarks?
Will v30.7bx address these?
Are there settings I can apply to address them in v30.6b4?[/QUOTE]

30.7b5 will send the estimated completion date as shown in Test/Status (which in your case is much sooner than 31 days). Auto-bench, test/status, and server estimated completion dates will all assume CERT work executes before other work types.

For now, in 30.6b4 you can turn auto-bench off.

ixfd64 2021-10-12 16:08

[QUOTE=Prime95;590131]FYI2: Brent-Suyama is no more.[/QUOTE]

I noticed it's not mentioned in [C]undoc.txt[/C] anymore. I'm guessing it's been completely removed from Prime95?

Viliam Furik 2021-10-12 16:22

[QUOTE=Prime95;590259]30.7b5 will send the [B]estimated completion date as shown in Test/Status (which in your case is much sooner than 31 days)[/B]. Auto-bench, test/status, and server estimated completion dates will all assume CERT work executes before other work types.

For now, in 30.6b4 you can turn auto-bench off.[/QUOTE]

But that's not the correct completion date. The 31-day estimate by the worker is the correct one.

kriesel 2021-10-12 18:39

[QUOTE=Prime95;590259]30.7b5 will send the estimated completion date as shown in Test/Status (which in your case is much sooner than 31 days). Auto-bench, test/status, and server estimated completion dates will all assume CERT work executes before other work types.

For now, in 30.6b4 you can turn auto-bench off.[/QUOTE]Thanks. Looking forward to b5 or 6.
From prime.log:

[CODE][Fri Oct 8 09:13:18 2021 - ver 30.6]
Updating computer information on the server
Sending expected completion date for M843112609: Oct 8 2021
...
[Tue Oct 12 08:23:44 2021 - ver 30.6]
Updating computer information on the server
Sending expected completion date for M63367621: Oct 16 2021
Sending expected completion date for M843112609: Oct 12 2021[/CODE]Oct 12 ~1:15 pm local, downed briefly to update to v30.7b4 (can't download v30.7b5 yet)
otherwise it's been running 24/7, and is now ~12.57% complete.
So linear extrapolation from ~4.17 days to 12.57%, 12.57/4.17 * 87.43 remaining ~ 29.0 days more, Nov 10.

I note during adding to prime.txt,
AutoBench=0
that v30.6b4 had apparently flipped my manual prime.txt setting from
WorkPreference=155
to
WorkPreference=151
without my knowledge. Reset that while in the editor.

Upon resumption of the big CERT with V30.7b4, test/status claims completion late on Oct [B]15[/B], ~3.3 days. Better than claiming same-day or next-day, but still seems ~8.8x too soon.
And what it reports to the server is next-day.
[CODE][Tue Oct 12 13:37:07 2021 - ver 30.7]
Exchanging program options with server
Updating computer information on the server
Sending expected completion date for M63367621: Oct 17 2021
Sending expected completion date for M843112609: Oct [B]13[/B] 2021[/CODE]

kruoli 2021-10-26 08:30

[QUOTE=kruoli;591583]It is completely stuck, every hour it states:
[CODE][Worker #3 Oct 25 18:26] Restarting worker to do priority work.
[Worker #3 Oct 25 18:26] Resuming.
[Worker #3 Oct 25 18:26] No work to do at the present time. Waiting.[/CODE]

I release this reservation.[/QUOTE]

What could have caused that certification to be unable to begin?

This was 30.6b3, Windows 7, Intel i7 3630QM. CPU-hours was set to 8.

Additional information:
[QUOTE=kruoli;591529]I have CertWork=1, upload and download limits to really high values, CertWork[B]er[/B] is set to the according worker etc. […] Prime95 shows no network activity.[/QUOTE]

ixfd64 2021-12-11 17:49

Don't know if this has been resolved in later Prime95 versions, but I found a minor edge case issue: if you pause a worker during a Jacobi error check on the last iteration of a LL test, then the worker only stops after finishing the error check and completing the few first iterations of the next exponent. It only affects double checks as the Gerbicz error check for PRP tests is near-instantaneous.

kriesel 2021-12-28 02:56

Time to unsticky?
 
Since v30.7 became the current release, how about unsticky this thread?


All times are UTC. The time now is 14:10.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.