mersenneforum.org mprime segmentation fault on RHEL
 Register FAQ Search Today's Posts Mark Forums Read

 2004-02-08, 23:38 #1 bej   Jan 2004 3·5 Posts mprime segmentation fault on RHEL Is anyone running mprime on Red Hat Enterprise Linux 3? On a newly installed system, I get a segmentation fault after mprime runs for a while. Initially it appeared to be crashing after Stage 1 GCD completed so I set Stage1GCD=0 in prime.ini, but now it crashes at the end of stage 1 of P-1 factoring. I don't know much about debugging under Linux, but I tried downloading the prime95 source and building mprime with debug but got a seg fault much sooner so that was no help. I've also tried the statically linked mprime and it fails also. Also installed Windows XP on the machine temporarily, and prime95 runs OK there, so I don't think it's a hardware problem. Any suggestions on how to debug/resolve this problem? Thanks, Brian
 2004-02-10, 22:43 #2 geoff     Mar 2003 New Zealand 115710 Posts I am just guessing here, but could memory allocation be the problem? Before starting stage 2 P-1 a big chunk of memory has to be allocated. What are your memory settings (DayMemory and NightMemory lines in local.ini), and how much total (virtual) memory is available on your machine? When you ran Prime95 on Windows, was it the same version as the mprime version you ran on Linux? Was there any swapping activity when stage 2 P-1 started on Windows? What made me think of this is that I recently upgraded from linux kernel 2.4 to 2.6 and found that my version 0 swap partition couldn't be used, I had to reformat it as version 1 swap. Does RHE Linux 3 use kernel 2.6?
 2004-02-11, 00:55 #3 Xyzzy     Aug 2002 840410 Posts It uses 2.4... I've tested mprime and it works fine in RHAS 2 and 3...
 2004-02-11, 05:52 #4 bej   Jan 2004 1510 Posts It doesn't seem to be related to memory allocation (or over allocation). I had the memory limits at 32M/32M day/night, and my system has 512M real memory and a 1G swap partition. I don't notice any swapping before it crashes -- up until it crashes, my VM usage is only running at 76-77M. I probably didn't run the exact same version of Prime95 on Windows on this machine -- I have version 23.5 of mprime and probably ran either 23.4 or 23.7 of prime95. And as Xyzzy states, RHEL ES 3 use kernel 2.4 (2.4.21). It's good to hear that someone has run mprime OK on RHEL3... though that doesn't help me with my problem. My other Linux system that is running OK (RH8) uses kernel 2.4.20. It's still running mprime 23.4 -- I'll have to give that a try also. Any other suggestions? Is there anything I might be able to get out of the core file with the downloaded versions of mprime? Thanks.
 2004-02-12, 01:40 #5 geoff     Mar 2003 New Zealand 115710 Posts The only other thing I can think of is to check exactly what the resource limits are for the user running mprime, e.g. check that virtual memory has not been limited with 'ulimit -v' etc. You can run gdb on mprime with 'gdb mprime core' but it probably won't be much help without the debugging symbols, unless you are good at following assembly. If it segfaulted in a library function then it will at least tell you which one. To build mprime with debugging information I think you will need the full installation of binutils with all the cross-compiling utilities, not usually installed by default. I haven't done this myself, maybe someone else has?
 2004-02-12, 06:20 #6 bej   Jan 2004 11112 Posts ulimit -v is unlimited, so that shouldn't be a problem. gdb shows mprime died in free(). I tried building mprime again. If I use the .o files that ship in the sources23.zip, I seg fault immediately when I start running (the menu stuff is all OK if I run mprime -M, the seg fault comes immediately after I select Test/Continue). I downloaded and rebuilt the lastest binutils with coff support and rebuilt mprime (after a make clean so I build from the included .obj files rather than the .o files), but it still faults immediately the same way. Anyone know how mprime should be built? Or any other suggestions on tracking down the original seg fault? Thanks.
2004-02-12, 19:41   #7
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

5·1,553 Posts

Quote:
 Originally Posted by bej Anyone know how mprime should be built? Or any other suggestions on tracking down the original seg fault? Thanks.
Make sure the data segment for mult.o is on a 32-byte boundary. Use the right dummyXX.o file to move the data segment around.

 2004-02-13, 05:09 #8 bej   Jan 2004 3·5 Posts Thanks. I overlooked the alignment comments in the makefile. I've now rebuilt mprime with debug information, and it's running. In fact, it has now just completed P-1 stage 1 and stage 1 GCD, and has started P-1 stage 2. It's never gotten this far before on this system. I guess that's good, but now I'm running a version of mprime minus the security module. Will this prevent me from reporting results with primenet or otherwise affect the end results? Any suggestions on where I should go from here? Thanks.
 2004-02-14, 01:08 #9 geoff     Mar 2003 New Zealand 13·89 Posts Could you post a minimal local.ini, prime.ini, and worktodo.ini that will trigger the segfault on your system? It would be good to see if someone else can reproduce it. I am running Debian, but I'll try it out using kernel 2.4.21. Just to clarify your first post, does sprime segfault at the same place as mprime? And have you been able to complete a long run torture test on this machine?
 2004-02-14, 08:32 #10 bej   Jan 2004 3×5 Posts The debug version of mprime I built had gotten well into LL testing with no errors/faults so I stopped it and rebuilt mprime with optimization turned back on and debug turned off... It seg faulted at seemingly the same place (at the end of stage 1 GCD) as mprime 23.4/23.5. And yes, sprime seems to fault at the same place, but I never looked at a core from it. The odd thing is though, the core from the mprime I built myself looks completely different from the previous cores. This one shows the fault in gwcopy() with the following backtrace: Code: #0 0x0808aae0 in gwcopy () #1 0x08265448 in ?? () #2 0x0806d43d in pminus1 () #3 0x0805e67f in pfactor () #4 0x08058f6f in primeContinue () #5 0x08071876 in linuxContinue () #6 0x080737ea in main_menu () #7 0x08071197 in main () I have run a long torture test OK (12+ hours) on this system. Here are my ini files (minus personal info). Code: * prime.ini AskedAboutMemory=1 UsePrimenet=1 DialUp=0 DaysOfWork=1 WorkPreference=0 OutputIterations=100 ResultsFileIterations=999999999 DiskWriteTime=30 NetworkRetryTime=2 NetworkRetryTime2=240 DaysBetweenCheckins=3 TwoBackupFiles=1 SilentVictory=0 * local.ini OldCpuType=12 OldCpuSpeed=2659 ComputerID=hermes CPUHours=24 DayMemory=32 NightMemory=32 DayStartTime=450 DayEndTime=1410 Pid=488 LastEndDatesSent=1076563108 RollingStartTime=0 SelfTest768Passed=1 RollingAverage=999 SelfTest1024Passed=1 SelfTest8Passed=1 SelfTest10Passed=1 SelfTest896Passed=1 SelfTest12Passed=1 SelfTest14Passed=1 * worktodo.ini Test=14010833,65,0 I've had memory at both 32/32 and 128/128 -- same fault. It always seems to fail at the beginning of P-1 stage 2. Is there a way I can bypass P-1 factoring completely maybe? I thought SkipTrialFactoring=1 sounded like it would do it, but didn't seem to have any effect. Thanks.
 2004-02-14, 17:07 #11 Prime95 P90 years forever!     Aug 2002 Yeehaw, FL 5·1,553 Posts To skip P-1 edit your worktodo.ini and change the ",0" to ",1"

 Similar Threads Thread Thread Starter Forum Replies Last Post pessoft Software 0 2016-06-13 20:58 ChristianB YAFU 4 2015-09-09 19:38 PhilF Linux 5 2006-01-07 17:12 T.Rex Software 5 2005-06-22 04:22 sirius56 Software 2 2004-10-02 21:43

All times are UTC. The time now is 01:09.

Tue Jan 18 01:09:09 UTC 2022 up 178 days, 19:38, 0 users, load averages: 1.84, 1.65, 1.36