mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 v30.4/30.5/30.6 (https://www.mersenneforum.org/showthread.php?t=26376)

Prime95 2021-01-04 21:41

[QUOTE=lycorn;568304]I have not yet installed Build 5,
Any ideas/thoughts/suggestions?[/QUOTE]

There was a bug or two in accessing the prime pairing bit array. Try build 5 and keep me updated.

[QUOTE=UBR47K;568338]
I'm hitting another issue right now, mprime gets killed by OOM[/QUOTE]

Trying to replicate here on a Linux quad-core system with 8GB memory. Set max mem allocation to 7GB.

[QUOTE=PhilF;568345]I don't remember if the P-1/ECM memory allocation refers to [I]per worker[/I] or not[/QUOTE]

The memory allocation is per system, not per worker. Only PRP emergency memory allocation is per worker.

One can get per-worker memory limits but not from the menus/dialog boxes. See undoc.txt.

[QUOTE=nordi;568366]I was also running into OOM problems when testing earlier builds of 30.4. They were supposed to be fixed in the new version, though. Which operating system are you using?[/QUOTE]

UBR47K's problem is different since the M1277 ECM was in stage 1. The primary fix for you was to limit stage 2 temporaries to 100,000.

Prime95 2021-01-05 01:00

The overallocating memory problem seems to be specific to Linux. If I add a malloc_trim() call at the end of each curve, the mprime process is not killed.

If any Linux gurus have insights, I'd appreciate your sharing them. I'm a little baffled as my reading of the mallopt man page seems to indicate malloc_trim is called automatically once 128KB can be freed.

axn 2021-01-05 02:02

[QUOTE=Prime95;568380]See undoc.txt. Please run a test on a known B-S factor.[/QUOTE]

Unless primes are being paired the exact same way from before, there is a good chance that 30.4 with B-S enabled will [B]not[/B] find the factor

Prime95 2021-01-05 02:06

[QUOTE=axn;568426]Unless primes are being paired the exact same way from before, there is a good chance that 30.4 with B-S enabled will [B]not[/B] find the factor[/QUOTE]

Doh! Of course you are right.

LaurV 2021-01-05 02:56

[QUOTE=axn;568426]Unless primes are being paired the exact same way from before, there is a good chance that 30.4 with B-S enabled will [B]not[/B] find the factor[/QUOTE]
Or will find factors that the first run didn't :razz:

tha 2021-01-05 07:52

[CODE]
Your choice: [Work thread Jan 5 08:51] Worker starting
[Work thread Jan 5 08:51] Setting affinity to run worker on CPU core #1
[Work thread Jan 5 08:51]
[Work thread Jan 5 08:51] P-1 on M15575663 with B1=1500000, B2=30000000
[Work thread Jan 5 08:51] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Jan 5 08:51] Using FMA3 FFT length 800K, Pass1=320, Pass2=2560, clm=4, 4 threads
[Work thread Jan 5 08:51] Setting affinity to run helper thread 3 on CPU core #4
[Work thread Jan 5 08:51] Cannot continue stage 2 from old P-1 save file. Restarting stage 2 from the beginning.
[Work thread Jan 5 08:51] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Jan 5 08:51] D: 840, relative primes: 1713, stage 2 primes: 1743704, pair%=90.11
[Work thread Jan 5 08:51] Using 11061MB of memory.
[Work thread Jan 5 08:51] Stage 2 init complete. 16961 transforms. Time: 12.169 sec.
Segmentation fault (core dumped)
[/CODE]
reproducible

I renamed the file mF57663 so mprime couldn't find it and restarted it. Seems to work.

About 33% increase in speed, what is the background behind that?

tha 2021-01-05 12:56

I turned on Brent–Suyama again manually to compare the results. About a 3% penalty for an occasional factor circumventing the B2 value. I leave it on.

lycorn 2021-01-05 19:27

[QUOTE=Prime95;568383]There was a bug or two in accessing the prime pairing bit array. Try build 5 and keep me updated.

[/QUOTE]

I just got home to find Prime95 (Build 5) had stopped after ~ 20 hours of work. The symptoms and exception code are the same as before. In case you´re willing to do some debugging, the fault offset is [B]0x0000000002345399[/B].
This was the only error recorded. The application had been functioning perfectly since I launched it. Just restarted it and it´s happily chugging along.

Prime95 2021-01-05 22:52

[QUOTE=Prime95;568415]The overallocating memory problem seems to be specific to Linux. If I add a malloc_trim() call at the end of each curve, the mprime process is not killed.[/QUOTE]

I'm still working on this. A Ubuntu build with debugging on seems to work. A CentOS build without debugging (the way official versions are built) does not. I presume the -g command line arg links in a different heap allocator.

[QUOTE=tha;568446][CODE]
[Work thread Jan 5 08:51] Cannot continue stage 2 from old P-1 save file. Restarting stage 2 from the beginning.
Segmentation fault (core dumped)
[/CODE]

About 33% increase in speed, what is the background behind that?[/QUOTE]

I have a fix for this. Making new builds will be spotty as my wife has my laptop. Her Mac is in the shop for butterfly keyboard repair.

Dig around in the 20M thread. The speed boost comes from new gwnum feature that does (a+b)*c in one call saving some memory bandwidth. More speed comes from better prime pairing ~90% vs. ~30% using a Mihai Preda idea.

[QUOTE=tha;568468]I turned on Brent–Suyama again manually to compare the results. About a 3% penalty for an occasional factor circumventing the B2 value. I leave it on.[/QUOTE]

BS will find fewer factors than before. With 3x better prime pairing, there are 1/3 fewer opportunities for a BS prime > B2 to be included.

[QUOTE=lycorn;568508]I just got home to find Prime95 (Build 5) had stopped after ~ 20 hours of work. The symptoms and exception code are the same as before. In case you´re willing to do some debugging, the fault offset is [B]0x0000000002345399[/B].
This was the only error recorded. The application had been functioning perfectly since I launched it. Just restarted it and it´s happily chugging along.[/QUOTE]

Can you send me details as to your machine, worktodo, and memory settings? Thanks.

lycorn 2021-01-05 23:54

[QUOTE=Prime95;568530]

Can you send me details as to your machine, worktodo, and memory settings? Thanks.[/QUOTE]

CPU: Intel i5-7400 (kaby Lake) @ 3GHz. 16 GB DDR4 2400 memory (dual channel - 2 x 8GB). Everything at default settings. The machine was made by Dell, and has been rock solid since November 2018, when I started using it. Never had crashes, BSODs, etc, and it has found a fair number of ECM factors for exponents < 1M. It came pre installed with Windows 10, and the regular updates have been done without any problems. The first time Prime95 died was on December, 29th, while running v 30.4 build 3 (or 4, I´m not 100% sure). Then it happened again on January, 3rd, running build 4, and then today, running build 5. As I posted earlier, the symptoms were similar on all occasions.

The worktodo I´m currently using is:

[Worker #1]

ECM2=blah-blah,1,2,547273,-1,250000,25000000,400
ECM2=blah-blah,1,2,547291,-1,250000,25000000,400
ECM2=blah-blah,1,2,542911,-1,250000,25000000,300

[Worker #2]

ECM2=blah-blah,1,2,547453,-1,250000,25000000,400
ECM2=blah-blah,1,2,547487,-1,250000,25000000,400
ECM2=blah-blah,1,2,542987,-1,250000,25000000,300


[Worker #3]

ECM2=blah-blah,1,2,547583,-1,250000,25000000,400
ECM2=blah-blah,1,2,543019,-1,250000,25000000,300

[Worker #4]

ECM2=blah-blah,1,2,547397,-1,250000,25000000,400
ECM2=blah-blah,1,2,547609,-1,250000,25000000,400

LaurV 2021-01-06 03:16

[QUOTE=lycorn;568536]The machine [...] has been rock solid since November 2018, when I started using it. Never had crashes, BSODs, etc, and it has found [/QUOTE]
How much are the temperatures of the toy? (that's the most important info, which I didn't see in the post).

It may be, or not be related to switching to v30, or just be coincidental. The machine is old, so it may need some maintenance, you know, removing the dust clogs from the fans, re-seating of the CPU (change/reapply the thermal paste), etc. We do this yearly, or even every 6 months or so. You know, my grandma was virgin for a very long time, but suddenly she wasn't. Luckily for me, otherwise I won't be anymore, and who would post stupid things on mersenneforum? Haha.

Your system may as well need nothing of it, but the new version of the program may be stressing the hardware a bit more than the old one, pushing it over the limit of stability. When you (general you) say your computer is stable, it is/was for the conditions you used it at. Any stable computer becomes unstable if you push it, and any crap computer is stable if you only type text documents in it. You may try to temporarily revert to v29 (or v30.3?) that was stable before, and see if the machine is still stable for a week or so. If you do only ECM, it won't matter much anyhow. If it is not stable anymore, you need dusting/re-seating, change or oil the fans, etc., like I said. Stable computers can become suddenly unstable sometimes.

If it is still stable with the old version, you still don't know if the issue is the new version of the program. It may be a bug in the new version, but it also could be that the new version is pushing the system a bit more, behind of its stability limit, of which you were very close before. The best way in that case, after upgrading to v30.4 again, is to try reducing the clocks just a little. If it becomes stable again, then the issue is not with P95. You still need dusting. Take the mop.

On the other hand, it still could be some new introduced bug in v30.4, it happened in the past, so you did well reporting it. If so, George will fix it, as usual (for sure, he is now at home in quarantine and has absolutely nothing else to do :razz:)


All times are UTC. The time now is 05:29.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.