mersenneforum.org  

Go Back   mersenneforum.org > New To GIMPS? Start Here! > Information & Answers

Reply
 
Thread Tools
Old 2022-09-04, 08:24   #1
testcb00
 
Sep 2022

2·3 Posts
Default Segmentation fault (core dumped) exists in FFT 224K and 240K only

Hi everyone, I am new to this forum.


I buy an old server recently and I try to use Prime95 to do a burn test.


However, I find that the server cannot do test in FFT 224K and FFT 240K. Segmentation fault (core dumped) will show after a few minutes and the Prime95 program crashed.



If I use Smallest FFTs / Small FFTs / Large FFTs, the server can run 24 Hours and no fail. It seems that those tests do not use the FFT 224K and FFT 240K.


Do the server have problems? or this is a BUG?



Server Details
Supermicro X9SRG-F
Intel Xeon E5-2648Lv2 10C20T
256GB DDR3-1600 LRDIMM (Samsung M386B8G70DE0-CK03Q 64GB x4)
testcb00 is offline   Reply With Quote
Old 2022-09-14, 08:57   #2
testcb00
 
Sep 2022

2·3 Posts
Default Prime95 Error exists when Memory to use (in MB) > 81919MB in FFT 224K/240K

Regarding the previous topic: https://www.mersenneforum.org/showthread.php?t=28043


Server Details
Supermicro X9SRG-F
Intel Xeon E5-2648Lv2 10C20T
128GB DDR3-1833 RDIMM (Samsung M393B2G70DB0-CMA 16GB x8)

After some testing, I find that the error only exists when the Memory to use (in MB) > 81919MB, lets say 81920MB, the error will exist (both enable/disable AVX)



I am not sure if this case is a bug of the software or a flaw of Xeon E5-2600 v2 CPU...


I have tested a Xeon E5-2600 v4 server with 128 GB RAM, no matter I disable AVX2 or AVX, the test do not have error.
testcb00 is offline   Reply With Quote
Old 2022-09-14, 16:39   #3
Batalov
 
Batalov's Avatar
 
"Serge"
Mar 2008
Phi(4,2^7658614+1)/2

7·1,427 Posts
Default

Quote:
Originally Posted by testcb00 View Post
After some testing, I find that the error only exists when the Memory to use (in MB) > 81919MB, lets say 81920MB, the error will exist (both enable/disable AVX)

I am not sure if this case is a bug of the software or a flaw of Xeon E5-2600 v2 CPU...
...or a faulty memory.
Recommendation: test on a different similarly configured node and/or swap the memory modules.

And don't create new threads; continue discussing a specific topic in the same thread
Batalov is offline   Reply With Quote
Old 2022-09-14, 17:23   #4
testcb00
 
Sep 2022

2×3 Posts
Default

Quote:
Originally Posted by Batalov View Post
...or a faulty memory.
Recommendation: test on a different similarly configured node and/or swap the memory modules.

And don't create new threads; continue discussing a specific topic in the same thread

memtest86 passed 3x 4tests
that's why I think Memory are not broken
testcb00 is offline   Reply With Quote
Old 2022-09-14, 18:37   #5
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11011110010002 Posts
Default

Quote:
Originally Posted by testcb00 View Post
Hi everyone, I am new to this forum.

I buy an old server recently and I try to use Prime95 to do a burn test.

However, I find that the server cannot do test in FFT 224K and FFT 240K. Segmentation fault (core dumped) will show after a few minutes and the Prime95 program crashed.

If I use Smallest FFTs / Small FFTs / Large FFTs, the server can run 24 Hours and no fail. It seems that those tests do not use the FFT 224K and FFT 240K.

Do the server have problems? or this is a BUG?
Welcome to the forum. Please consider using the reference info collection. And there are existing threads for such questions.

If your hardware can not run prime95 for months or at least days continuously, it has issues. Check for dust buildup, dead or slow fans, etc. Try running some short useful manual assignments; PRP-CF on smallish exponents would be quick and contains excellent error detection.

The 224K & 240K ffts are not used much, corresponding to ~5M exponent.

Last fiddled with by kriesel on 2022-09-14 at 18:37
kriesel is online now   Reply With Quote
Old 2022-09-14, 20:08   #6
testcb00
 
Sep 2022

1102 Posts
Default

Quote:
Originally Posted by kriesel View Post
Welcome to the forum. Please consider using the reference info collection. And there are existing threads for such questions.

If your hardware can not run prime95 for months or at least days continuously, it has issues. Check for dust buildup, dead or slow fans, etc. Try running some short useful manual assignments; PRP-CF on smallish exponents would be quick and contains excellent error detection.

The 224K & 240K ffts are not used much, corresponding to ~5M exponent.

I am sorry that I am a noob in this case.


If I understanding is correct, the PRP-CF should mean for Probable Prime Cofactor. This is a new term to me...
I can only guess that you are suggesting me to force the hardware to calculate the base of a Probable Prime.


May I know the procedure of

Try running some short useful manual assignments; PRP-CF on smallish exponents would be quick and contains excellent error detection.
?

Last fiddled with by testcb00 on 2022-09-14 at 20:08
testcb00 is offline   Reply With Quote
Old 2022-09-14, 22:41   #7
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

11011110010002 Posts
Default

You've done some of https://www.mersenne.org/gettingstarted/ already, to be able to run a torture test.
For PRP-CF, which is short assignments and uses the excellent error detection rate Gerbicz Error Check,
either
a) create a primenet account and configure your prime95 install to use that to fetch assignments and report results, and select a work type to do, as shown in the attachment,
or if you only want to do brief testing and prefer to copy/paste assignments and results instead,
b) get assignments from https://www.mersenne.org/manual_assignment/ and paste them into a worktodo.txt file in the prime95 program's directory, start the program, and report completed assignments results.json.txt output at https://www.mersenne.org/manual_result/

Important! If you decide to quit running them before finishing them all, unreserve any assignments you don't complete.
Attached Thumbnails
Click image for larger version

Name:	prpcf.png
Views:	45
Size:	64.4 KB
ID:	27321  
kriesel is online now   Reply With Quote
Old 2022-09-15, 09:30   #8
testcb00
 
Sep 2022

2×3 Posts
Default

Quote:
Originally Posted by kriesel View Post
You've done some of https://www.mersenne.org/gettingstarted/ already, to be able to run a torture test.
For PRP-CF, which is short assignments and uses the excellent error detection rate Gerbicz Error Check,
either
a) create a primenet account and configure your prime95 install to use that to fetch assignments and report results, and select a work type to do, as shown in the attachment,
or if you only want to do brief testing and prefer to copy/paste assignments and results instead,
b) get assignments from https://www.mersenne.org/manual_assignment/ and paste them into a worktodo.txt file in the prime95 program's directory, start the program, and report completed assignments results.json.txt output at https://www.mersenne.org/manual_result/

Important! If you decide to quit running them before finishing them all, unreserve any assignments you don't complete.


Thank you for your explanation, kriesel.


I try the a) option you specified but I see only few RAM is used. My understanding is that GIMPS is not going to use all the hardware. I would like to know if there are methods to increase the RAM usage. My target is to test the load > 81919MB Memory.
Attached Thumbnails
Click image for larger version

Name:	1822.png
Views:	42
Size:	65.5 KB
ID:	27322  
testcb00 is offline   Reply With Quote
Old 2022-09-26, 21:31   #9
falk
 
Sep 2022
Munich, Germany

24 Posts
Default Possible bug in Prime95 Version 30.8 build 15 Win64

I am a novice user coming here because the torture test failed.

After reading stress.txt and a few days later, I come here to report that it most likely is a software bug in Prime95, not a hardware issue with my system. Which is extremely frustrating because people run Prime95 to debug their hardware, not their testing software ...

I am a novice, but these are my observations which led to my conclusion:

1. If I start the torture test simply by clicking start after starting the program in the torture test dialog, the test will report several errors (like rounding errors etc.) and eventually, it will crash because of accessing address -all FFs-. Within a minute.

2. This behaviour is deterministic.

3. Running the torture test with ANY size other than the default (a blend of all) runs ok.

4. My system is 8 physical Intel cores, 128 MB memory, not overclocked and no issues otherwise. CPU remains at ~60 degrees C.

5. Memtest86 passes fine, incl. Test 13, the Hammertest. Y Cruncher runs fine. Prime95 runs fine with any preset size. AI workloads run fine. Only the default config crashes within a minute. Always, and always at the very same moment.

Still, stress.txt insists that it MUST be my hardware because the software cannot fail.

Anybody here on the forum to help me out? I am unsure if to still trust my system or not ...

Thanks, Falk
falk is offline   Reply With Quote
Old 2022-09-26, 21:59   #10
falk
 
Sep 2022
Munich, Germany

24 Posts
Default Second user here with the exact same issue, forum please take us seriously

I am s second user who came here for the exact same issue, forum please take us seriously.

I created another thread in the Software forum but I do now see that @testcb00 has the exact same issue. So, let me continue the discussion here.

First, I was 98% that this is a bug in Prime95, now with the posting of @testcb00, I am 99.9% sure.

Let me add some of my details to what @testcb00 already posted:

1. Prime95 fails within a minute if started via the default torture test option. Always, always in the exact same way (reporting a few rounding error, then access invalid address -all FFs-.

2. Prime95 does NOT fail if started as a torture test but with any of the fixed sizes.

My speculation: One of the sizes is grayed out, the default (a blend of all) possibly still uses the grayed out option and then fails.

3. memtest86 never fails, Y cruncher does not fail, nothing else fails, no overclocking, just 60 deg C.

4. Intel 7900X (10 phys. Cores), 128 GB RAM. Note that like @testcb00, I have memory in excess of 81919MB! Prime95 Prime95 Version 30.8 build 15 Win64.

My suspicion is that Prime95 is broken for the default torture test on systems with large memory configs. But nobody in the forum or from the maintainers took it serious enough for the last few weeks.

stress.txt says if Prime95 fails then it MUST be your hardware. Well, then ...


Otherwise, maybe somebody who is not a noob can have a closer look at this anyway. I would kindly appreciate any help.

P.S.
Running Prime95 in any other way, like participating in the actual search, doesn't help as Prime95 won't fail except for the exact default config!

P.S.2
It still may be an issue with memory at some high address with both the system of @testcb00 and myself. Therefore, I would highly appreciate somebody giving the default torture test a short spin on a 128GB Windows system. Just to make sure there is no regression bug in Prime95 ...

P.S.3
Regarding the reference info linked by @kriesel: I scanned it and nothing there applies to the case of @testcb00 or myself, AFAICS.

Last fiddled with by falk on 2022-09-26 at 22:19
falk is offline   Reply With Quote
Old 2022-09-26, 22:13   #11
falk
 
Sep 2022
Munich, Germany

24 Posts
Default

I now found a similiar topic in another subforum.

Probably, better to continue the discussion there

-> https://www.mersenneforum.org/showth...248#post614248

Last fiddled with by falk on 2022-09-26 at 22:14
falk is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
odd segmentation fault ChristianB YAFU 4 2015-09-09 19:38
Segmentation fault in msieve. include Msieve 4 2012-11-14 00:59
Segmentation fault PhilF Linux 5 2006-01-07 17:12
Linux FC3 - mprime v23.9 : Segmentation fault (core dumped) nohup ./mp -d T.Rex Software 5 2005-06-22 04:22
Segmentation Fault sirius56 Software 2 2004-10-02 21:43

All times are UTC. The time now is 05:38.


Mon Dec 5 05:38:33 UTC 2022 up 109 days, 3:07, 0 users, load averages: 0.81, 0.89, 0.98

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔