mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2017-03-20, 12:41   #12
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Default

Unfortunately this official 29.1 release still has the bug not being able to write the TF state files on exit:
http://mersenneforum.org/showthread.php?t=21988
rudi_m is offline   Reply With Quote
Old 2017-03-20, 22:04   #13
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

3·2,621 Posts
Default

Quote:
Originally Posted by rudi_m View Post
Unfortunately this official 29.1 release still has the bug not being able to write the TF state files on exit:
http://mersenneforum.org/showthread.php?t=21988
Try build 14. Mprime now waits up to 5 seconds when sent a SIGTERM to gracefully shutdown workers and create save files.

Do you still see the crash problem reported in your link???
Prime95 is online now   Reply With Quote
Old 2017-03-21, 11:48   #14
Explorer09
 
May 2014

1000012 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Try build 14. Mprime now waits up to 5 seconds when sent a SIGTERM to gracefully shutdown workers and create save files.

Do you still see the crash problem reported in your link???
@Prime95
I'm not sure how you fixed that bug. But if it's me, I would propose a wait routine at the end of main() in order to prevent the main() thread from exiting too early.

Correct me if my fix is wrong.

Code:
--- a/linux/prime.c	2017-03-08 13:32:28.000000000 +0800
+++ b/linux/prime.c	2017-03-21 19:42:38.459523041 +0800
@@ -356,6 +356,8 @@ int main ( int argc, char *argv[])
 		linuxContinue ("Another mprime is already running!\n", ALL_WORKERS, TRUE);
 	}
 
+	while (WORKER_THREADS_STOPPING) Sleep (50);
+
 /* Write the worktodo file in case the WELL_BEHAVED_WORK flag caused us */
 /* to delay writing the file. */
Explorer09 is offline   Reply With Quote
Old 2017-03-21, 12:36   #15
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Try build 14. Mprime now waits up to 5 seconds when sent a SIGTERM to gracefully shutdown workers and create save files.

Do you still see the crash problem reported in your link???
Thanks, this seems to work.
rudi_m is offline   Reply With Quote
Old 2017-03-21, 13:00   #16
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2·7·13 Posts
Default

BTW the TF speed is amazing now :) It makes the CPU more hot now than LL. I think we need a TF stress test.

I have one machine which can't stand the new TF speed:

Code:
[Mar21 13:47] CPU1: Core temperature/speed normal
[  +0.000001] CPU6: Package temperature/speed normal
[  +0.000000] CPU5: Core temperature/speed normal
[  +0.000001] CPU2: Package temperature/speed normal
[  +0.000002] CPU5: Package temperature/speed normal
[  +0.000001] CPU1: Package temperature/speed normal
[  +0.000006] mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 128: 0000000088012a82
[  +0.000001] mce: [Hardware Error]: TSC 1807058ba4c4ac mce:
[  +0.000001] mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1490100628 SOCKET 0 APIC 3 microcode 9e
[  +0.000000] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 128: 0000000088012a82
[  +0.000001] mce: [Hardware Error]: TSC 1807058ba4c26e mce:
[  +0.000000] mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1490100628 SOCKET 0 APIC 2 microcode 9e
[  +0.000015] CPU0: Package temperature/speed normal
[  +0.000000] CPU7: Package temperature/speed normal
[  +0.000001] CPU4: Package temperature/speed normal
[  +0.000001] CPU3: Package temperature/speed normal
29.1 build 8 was still slow enough for this CPU. Are there any options to run build 14 a bit slower? I have tried already HyperthreadTF=0.
rudi_m is offline   Reply With Quote
Old 2017-03-21, 17:45   #17
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

52·7·19 Posts
Default

Quote:
Originally Posted by rudi_m View Post
BTW the TF speed is amazing now :) It makes the CPU more hot now than LL. I think we need a TF stress test.

I have one machine which can't stand the new TF speed:

...

29.1 build 8 was still slow enough for this CPU. Are there any options to run build 14 a bit slower? I have tried already HyperthreadTF=0.
I think you've somewhat answered your own question... it's even more of a stress test than LL, so you've just uncovered a problem with your cooling (or you've overclocked too much?).

The wrong answer is to try and slow down a program to keep it from overheating the CPU. LOL
Madpoo is offline   Reply With Quote
Old 2017-03-21, 22:10   #18
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I think you've somewhat answered your own question... it's even more of a stress test than LL, so you've just uncovered a problem with your cooling (or you've overclocked too much?).

The wrong answer is to try and slow down a program to keep it from overheating the CPU. LOL
Actually "generic" answers like your one without knowing the background are often wrong answers ;)

This is a rented machine in a data center. It's not overclocked and I can't do anything about cooling. I could only either stop running mprime, or switch to a more expensive hoster, or just run it a bit slower.

If all their customers would run mprime then they would need to improve cooling and increase the prices a lot. Nethertheless I've called the support and they gave me a new machine immediately. This one woks now, CPU temperature is "only" about 90 °C.

BTW in our office I run also some servers optimized for low noise and cheap cooling in a room which is constantly over 30 °C! These machine also can't run mprime on all cores. But they are absolutely stable with only 2 threads per machine and finished hundreds of double checks successfully over the years. I don't see what's wrong with "putting less load" on the machines to avoid expensive air-conditioning systems (purchase, installation, energy bill).

Moreover the reason why I run TF instead of LL at all was because TF made the CPUs not as hot as LL ;)

Last fiddled with by rudi_m on 2017-03-21 at 22:13
rudi_m is offline   Reply With Quote
Old 2017-03-24, 03:07   #19
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

52×7×19 Posts
Default

Quote:
Originally Posted by rudi_m View Post
Actually "generic" answers like your one without knowing the background are often wrong answers ;)
LOL... touché. :) I assumed you were overclocking.
Madpoo is offline   Reply With Quote
Old 2017-03-24, 17:12   #20
rudi_m
 
rudi_m's Avatar
 
Jul 2005

2×7×13 Posts
Default

Quote:
Originally Posted by Madpoo View Post
LOL... touché. :) I assumed you were overclocking.
Hehe, nevermind. You have still motivated me to ask my hoster for better cooling, although I had never thought that they would fix it. BTW actually I believe now that these "Hardware Error" logs were not only caused by the heat. I've seen a lot hot CPUs but never got such logs. Maybe the optimized TF code just discovered a real broken part of that CPU.

BTW I have one issue with the new option "HyperthreadTF". It seems only evaluated on mprime startup. In case my first worktodo line is a TF job then this worker uses HT forever and later LL lines will also use HT.

Would be nice to have this fixed, otherwise mixed job queues are not performing as well as they could.
rudi_m is offline   Reply With Quote
Old 2017-03-25, 08:42   #21
Gimarel
 
Apr 2010

2×3×37 Posts
Default Crash in Advanced/Time

I get a SIGFPE when I try to use Advanced/Time in 29.1 build 14 for Linux 64-bit.

Code:
Your choice: 7

Exponent to time (10000000): 77025397
Number of Iterations (10): 

Accept the answers above? (Y): 
[New Thread 0x7ffff63e4700 (LWP 10256)]
         Main Menu

     1.  Test/Primenet
     2.  Test/Worker threads
     3.  Test/Status
     4.  Test/Continue
     5.  Test/Exit
     6.  Advanced/Test
     7.  Advanced/Time
     8.  Advanced/P-1
     9.  Advanced/ECM
    10.  Advanced/Manual Communication
    11.  Advanced/Unreserve Exponent
    12.  Advanced/Quit Gimps
    13.  Options/CPU
    14.  Options/Preferences
    15.  Options/Torture Test
    16.  Options/Benchmark
    17.  Help/About
    18.  Help/About PrimeNet Server
Your choice: [Main thread Mar 25 09:36] Starting worker.
[New Thread 0x7ffff6be5700 (LWP 10257)]
[New Thread 0x7ffff53e2700 (LWP 10258)]
[New Thread 0x7ffff4be1700 (LWP 10259)]
[Thread 0x7ffff6be5700 (LWP 10257) exited]
[Thread 0x7ffff53e2700 (LWP 10258) exited]
[Thread 0x7ffff4be1700 (LWP 10259) exited]
[Work thread Mar 25 09:36] Worker starting

Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff63e4700 (LWP 10256)]
0x0000000000437e01 in SetPriority ()
(gdb) bt
#0  0x0000000000437e01 in SetPriority ()
#1  0x00000000004391df in primeTime ()
#2  0x000000000043a314 in LauncherDispatch ()
#3  0x000000000043a528 in Launcher ()
#4  0x00000000004655fa in ThreadStarter ()
#5  0x00007ffff78c6064 in start_thread (arg=0x7ffff63e4700) at pthread_create.c:309
#6  0x00007ffff6ee462d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Gimarel is offline   Reply With Quote
Old 2017-03-26, 00:52   #22
Ducho
 
Nov 2016
Toronto, ON, CA

516 Posts
Default 29.1 b14 from TF to LL keeps using HT cores

Hi, I think that I found an "use HT" issue.

Using 29.1 b14, Windows 10 x64
Xeon 1650v4 DDR4-2400 ECC

Short version:
After finishing TF low limits task, LL started using hyperthreading cores.
HT Configured for TF but not for LL (as suggested).

Long version:
1 worker thread, 6 cpu cores,
use HT for TF
don't use HT for LL
I had a LL worker running, then I updated to 29.1 b14 today (was using 28.10) and got curious about TF performance.
Then some TF work was reserverd.
I closed Prime95 and reordered worktodo to do some TF first.
When restarted LL task it kept using 12 cores.
Ducho is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.3 Prime95 Software 148 2012-03-18 19:24
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 01:12.


Mon Apr 18 01:12:56 UTC 2022 up 3 days, 23:14, 0 users, load averages: 0.71, 0.99, 1.07

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔