mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2021-01-04, 21:41   #12
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

1CF316 Posts
Default

Quote:
Originally Posted by lycorn View Post
I have not yet installed Build 5,
Any ideas/thoughts/suggestions?
There was a bug or two in accessing the prime pairing bit array. Try build 5 and keep me updated.

Quote:
Originally Posted by UBR47K View Post
I'm hitting another issue right now, mprime gets killed by OOM
Trying to replicate here on a Linux quad-core system with 8GB memory. Set max mem allocation to 7GB.

Quote:
Originally Posted by PhilF View Post
I don't remember if the P-1/ECM memory allocation refers to per worker or not
The memory allocation is per system, not per worker. Only PRP emergency memory allocation is per worker.

One can get per-worker memory limits but not from the menus/dialog boxes. See undoc.txt.

Quote:
Originally Posted by nordi View Post
I was also running into OOM problems when testing earlier builds of 30.4. They were supposed to be fixed in the new version, though. Which operating system are you using?
UBR47K's problem is different since the M1277 ECM was in stage 1. The primary fix for you was to limit stage 2 temporaries to 100,000.

Last fiddled with by Prime95 on 2021-01-04 at 21:42
Prime95 is offline   Reply With Quote
Old 2021-01-05, 01:00   #13
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

163638 Posts
Default

The overallocating memory problem seems to be specific to Linux. If I add a malloc_trim() call at the end of each curve, the mprime process is not killed.

If any Linux gurus have insights, I'd appreciate your sharing them. I'm a little baffled as my reading of the mallopt man page seems to indicate malloc_trim is called automatically once 128KB can be freed.
Prime95 is offline   Reply With Quote
Old 2021-01-05, 02:02   #14
axn
 
axn's Avatar
 
Jun 2003

2×2,459 Posts
Default

Quote:
Originally Posted by Prime95 View Post
See undoc.txt. Please run a test on a known B-S factor.
Unless primes are being paired the exact same way from before, there is a good chance that 30.4 with B-S enabled will not find the factor
axn is online now   Reply With Quote
Old 2021-01-05, 02:06   #15
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

7,411 Posts
Default

Quote:
Originally Posted by axn View Post
Unless primes are being paired the exact same way from before, there is a good chance that 30.4 with B-S enabled will not find the factor
Doh! Of course you are right.
Prime95 is offline   Reply With Quote
Old 2021-01-05, 02:56   #16
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

83·113 Posts
Default

Quote:
Originally Posted by axn View Post
Unless primes are being paired the exact same way from before, there is a good chance that 30.4 with B-S enabled will not find the factor
Or will find factors that the first run didn't
LaurV is offline   Reply With Quote
Old 2021-01-05, 07:52   #17
tha
 
tha's Avatar
 
Dec 2002

2·34·5 Posts
Default

Code:
Your choice: [Work thread Jan 5 08:51] Worker starting
[Work thread Jan 5 08:51] Setting affinity to run worker on CPU core #1
[Work thread Jan 5 08:51] 
[Work thread Jan 5 08:51] P-1 on M15575663 with B1=1500000, B2=30000000
[Work thread Jan 5 08:51] Setting affinity to run helper thread 1 on CPU core #2
[Work thread Jan 5 08:51] Using FMA3 FFT length 800K, Pass1=320, Pass2=2560, clm=4, 4 threads
[Work thread Jan 5 08:51] Setting affinity to run helper thread 3 on CPU core #4
[Work thread Jan 5 08:51] Cannot continue stage 2 from old P-1 save file.  Restarting stage 2 from the beginning.
[Work thread Jan 5 08:51] Setting affinity to run helper thread 2 on CPU core #3
[Work thread Jan 5 08:51] D: 840, relative primes: 1713, stage 2 primes: 1743704, pair%=90.11
[Work thread Jan 5 08:51] Using 11061MB of memory.
[Work thread Jan 5 08:51] Stage 2 init complete. 16961 transforms. Time: 12.169 sec.
Segmentation fault (core dumped)
reproducible

I renamed the file mF57663 so mprime couldn't find it and restarted it. Seems to work.

About 33% increase in speed, what is the background behind that?

Last fiddled with by tha on 2021-01-05 at 08:30
tha is offline   Reply With Quote
Old 2021-01-05, 12:56   #18
tha
 
tha's Avatar
 
Dec 2002

2·34·5 Posts
Default

I turned on Brent–Suyama again manually to compare the results. About a 3% penalty for an occasional factor circumventing the B2 value. I leave it on.
tha is offline   Reply With Quote
Old 2021-01-05, 19:27   #19
lycorn
 
lycorn's Avatar
 
Sep 2002
Oeiras, Portugal

22×3×112 Posts
Default

Quote:
Originally Posted by Prime95 View Post
There was a bug or two in accessing the prime pairing bit array. Try build 5 and keep me updated.
I just got home to find Prime95 (Build 5) had stopped after ~ 20 hours of work. The symptoms and exception code are the same as before. In case you´re willing to do some debugging, the fault offset is 0x0000000002345399.
This was the only error recorded. The application had been functioning perfectly since I launched it. Just restarted it and it´s happily chugging along.
lycorn is offline   Reply With Quote
Old 2021-01-05, 22:52   #20
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

741110 Posts
Default

Quote:
Originally Posted by Prime95 View Post
The overallocating memory problem seems to be specific to Linux. If I add a malloc_trim() call at the end of each curve, the mprime process is not killed.
I'm still working on this. A Ubuntu build with debugging on seems to work. A CentOS build without debugging (the way official versions are built) does not. I presume the -g command line arg links in a different heap allocator.

Quote:
Originally Posted by tha View Post
Code:
[Work thread Jan 5 08:51] Cannot continue stage 2 from old P-1 save file.  Restarting stage 2 from the beginning.
Segmentation fault (core dumped)
About 33% increase in speed, what is the background behind that?
I have a fix for this. Making new builds will be spotty as my wife has my laptop. Her Mac is in the shop for butterfly keyboard repair.

Dig around in the 20M thread. The speed boost comes from new gwnum feature that does (a+b)*c in one call saving some memory bandwidth. More speed comes from better prime pairing ~90% vs. ~30% using a Mihai Preda idea.

Quote:
Originally Posted by tha View Post
I turned on Brent–Suyama again manually to compare the results. About a 3% penalty for an occasional factor circumventing the B2 value. I leave it on.
BS will find fewer factors than before. With 3x better prime pairing, there are 1/3 fewer opportunities for a BS prime > B2 to be included.

Quote:
Originally Posted by lycorn View Post
I just got home to find Prime95 (Build 5) had stopped after ~ 20 hours of work. The symptoms and exception code are the same as before. In case you´re willing to do some debugging, the fault offset is 0x0000000002345399.
This was the only error recorded. The application had been functioning perfectly since I launched it. Just restarted it and it´s happily chugging along.
Can you send me details as to your machine, worktodo, and memory settings? Thanks.
Prime95 is offline   Reply With Quote
Old 2021-01-05, 23:54   #21
lycorn
 
lycorn's Avatar
 
Sep 2002
Oeiras, Portugal

22×3×112 Posts
Default

Quote:
Originally Posted by Prime95 View Post

Can you send me details as to your machine, worktodo, and memory settings? Thanks.
CPU: Intel i5-7400 (kaby Lake) @ 3GHz. 16 GB DDR4 2400 memory (dual channel - 2 x 8GB). Everything at default settings. The machine was made by Dell, and has been rock solid since November 2018, when I started using it. Never had crashes, BSODs, etc, and it has found a fair number of ECM factors for exponents < 1M. It came pre installed with Windows 10, and the regular updates have been done without any problems. The first time Prime95 died was on December, 29th, while running v 30.4 build 3 (or 4, I´m not 100% sure). Then it happened again on January, 3rd, running build 4, and then today, running build 5. As I posted earlier, the symptoms were similar on all occasions.

The worktodo I´m currently using is:

[Worker #1]

ECM2=blah-blah,1,2,547273,-1,250000,25000000,400
ECM2=blah-blah,1,2,547291,-1,250000,25000000,400
ECM2=blah-blah,1,2,542911,-1,250000,25000000,300

[Worker #2]

ECM2=blah-blah,1,2,547453,-1,250000,25000000,400
ECM2=blah-blah,1,2,547487,-1,250000,25000000,400
ECM2=blah-blah,1,2,542987,-1,250000,25000000,300


[Worker #3]

ECM2=blah-blah,1,2,547583,-1,250000,25000000,400
ECM2=blah-blah,1,2,543019,-1,250000,25000000,300

[Worker #4]

ECM2=blah-blah,1,2,547397,-1,250000,25000000,400
ECM2=blah-blah,1,2,547609,-1,250000,25000000,400

Last fiddled with by LaurV on 2021-01-06 at 03:25 Reason: removed keys from the assignments - general wisdom is that is not good to post those publicly
lycorn is offline   Reply With Quote
Old 2021-01-06, 03:16   #22
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

83×113 Posts
Default

Quote:
Originally Posted by lycorn View Post
The machine [...] has been rock solid since November 2018, when I started using it. Never had crashes, BSODs, etc, and it has found
How much are the temperatures of the toy? (that's the most important info, which I didn't see in the post).

It may be, or not be related to switching to v30, or just be coincidental. The machine is old, so it may need some maintenance, you know, removing the dust clogs from the fans, re-seating of the CPU (change/reapply the thermal paste), etc. We do this yearly, or even every 6 months or so. You know, my grandma was virgin for a very long time, but suddenly she wasn't. Luckily for me, otherwise I won't be anymore, and who would post stupid things on mersenneforum? Haha.

Your system may as well need nothing of it, but the new version of the program may be stressing the hardware a bit more than the old one, pushing it over the limit of stability. When you (general you) say your computer is stable, it is/was for the conditions you used it at. Any stable computer becomes unstable if you push it, and any crap computer is stable if you only type text documents in it. You may try to temporarily revert to v29 (or v30.3?) that was stable before, and see if the machine is still stable for a week or so. If you do only ECM, it won't matter much anyhow. If it is not stable anymore, you need dusting/re-seating, change or oil the fans, etc., like I said. Stable computers can become suddenly unstable sometimes.

If it is still stable with the old version, you still don't know if the issue is the new version of the program. It may be a bug in the new version, but it also could be that the new version is pushing the system a bit more, behind of its stability limit, of which you were very close before. The best way in that case, after upgrading to v30.4 again, is to try reducing the clocks just a little. If it becomes stable again, then the issue is not with P95. You still need dusting. Take the mop.

On the other hand, it still could be some new introduced bug in v30.4, it happened in the past, so you did well reporting it. If so, George will fix it, as usual (for sure, he is now at home in quarantine and has absolutely nothing else to do )

Last fiddled with by LaurV on 2021-01-06 at 07:19
LaurV is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 06:05.

Mon Apr 19 06:05:04 UTC 2021 up 11 days, 45 mins, 0 users, load averages: 1.34, 1.39, 1.43

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.