![]() |
|
|
#12 |
|
Jul 2005
2×7×13 Posts |
Unfortunately this official 29.1 release still has the bug not being able to write the TF state files on exit:
http://mersenneforum.org/showthread.php?t=21988 |
|
|
|
|
|
#13 | |
|
P90 years forever!
Aug 2002
Yeehaw, FL
3·2,621 Posts |
Quote:
Do you still see the crash problem reported in your link??? |
|
|
|
|
|
|
#14 | |
|
May 2014
1000012 Posts |
Quote:
I'm not sure how you fixed that bug. But if it's me, I would propose a wait routine at the end of main() in order to prevent the main() thread from exiting too early. Correct me if my fix is wrong. Code:
--- a/linux/prime.c 2017-03-08 13:32:28.000000000 +0800
+++ b/linux/prime.c 2017-03-21 19:42:38.459523041 +0800
@@ -356,6 +356,8 @@ int main ( int argc, char *argv[])
linuxContinue ("Another mprime is already running!\n", ALL_WORKERS, TRUE);
}
+ while (WORKER_THREADS_STOPPING) Sleep (50);
+
/* Write the worktodo file in case the WELL_BEHAVED_WORK flag caused us */
/* to delay writing the file. */
|
|
|
|
|
|
|
#15 |
|
Jul 2005
2×7×13 Posts |
|
|
|
|
|
|
#16 |
|
Jul 2005
2·7·13 Posts |
BTW the TF speed is amazing now :) It makes the CPU more hot now than LL. I think we need a TF stress test.
I have one machine which can't stand the new TF speed: Code:
[Mar21 13:47] CPU1: Core temperature/speed normal [ +0.000001] CPU6: Package temperature/speed normal [ +0.000000] CPU5: Core temperature/speed normal [ +0.000001] CPU2: Package temperature/speed normal [ +0.000002] CPU5: Package temperature/speed normal [ +0.000001] CPU1: Package temperature/speed normal [ +0.000006] mce: [Hardware Error]: CPU 5: Machine Check: 0 Bank 128: 0000000088012a82 [ +0.000001] mce: [Hardware Error]: TSC 1807058ba4c4ac mce: [ +0.000001] mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1490100628 SOCKET 0 APIC 3 microcode 9e [ +0.000000] mce: [Hardware Error]: CPU 1: Machine Check: 0 Bank 128: 0000000088012a82 [ +0.000001] mce: [Hardware Error]: TSC 1807058ba4c26e mce: [ +0.000000] mce: [Hardware Error]: PROCESSOR 0:506e3 TIME 1490100628 SOCKET 0 APIC 2 microcode 9e [ +0.000015] CPU0: Package temperature/speed normal [ +0.000000] CPU7: Package temperature/speed normal [ +0.000001] CPU4: Package temperature/speed normal [ +0.000001] CPU3: Package temperature/speed normal |
|
|
|
|
|
#17 | |
|
Serpentine Vermin Jar
Jul 2014
52·7·19 Posts |
Quote:
![]() The wrong answer is to try and slow down a program to keep it from overheating the CPU. LOL |
|
|
|
|
|
|
#18 | |
|
Jul 2005
2×7×13 Posts |
Quote:
This is a rented machine in a data center. It's not overclocked and I can't do anything about cooling. I could only either stop running mprime, or switch to a more expensive hoster, or just run it a bit slower. If all their customers would run mprime then they would need to improve cooling and increase the prices a lot. Nethertheless I've called the support and they gave me a new machine immediately. This one woks now, CPU temperature is "only" about 90 °C. BTW in our office I run also some servers optimized for low noise and cheap cooling in a room which is constantly over 30 °C! These machine also can't run mprime on all cores. But they are absolutely stable with only 2 threads per machine and finished hundreds of double checks successfully over the years. I don't see what's wrong with "putting less load" on the machines to avoid expensive air-conditioning systems (purchase, installation, energy bill). Moreover the reason why I run TF instead of LL at all was because TF made the CPUs not as hot as LL ;) Last fiddled with by rudi_m on 2017-03-21 at 22:13 |
|
|
|
|
|
|
#19 |
|
Serpentine Vermin Jar
Jul 2014
52×7×19 Posts |
|
|
|
|
|
|
#20 |
|
Jul 2005
2×7×13 Posts |
Hehe, nevermind. You have still motivated me to ask my hoster for better cooling, although I had never thought that they would fix it. BTW actually I believe now that these "Hardware Error" logs were not only caused by the heat. I've seen a lot hot CPUs but never got such logs. Maybe the optimized TF code just discovered a real broken part of that CPU.
BTW I have one issue with the new option "HyperthreadTF". It seems only evaluated on mprime startup. In case my first worktodo line is a TF job then this worker uses HT forever and later LL lines will also use HT. Would be nice to have this fixed, otherwise mixed job queues are not performing as well as they could. |
|
|
|
|
|
#21 |
|
Apr 2010
2×3×37 Posts |
I get a SIGFPE when I try to use Advanced/Time in 29.1 build 14 for Linux 64-bit.
Code:
Your choice: 7
Exponent to time (10000000): 77025397
Number of Iterations (10):
Accept the answers above? (Y):
[New Thread 0x7ffff63e4700 (LWP 10256)]
Main Menu
1. Test/Primenet
2. Test/Worker threads
3. Test/Status
4. Test/Continue
5. Test/Exit
6. Advanced/Test
7. Advanced/Time
8. Advanced/P-1
9. Advanced/ECM
10. Advanced/Manual Communication
11. Advanced/Unreserve Exponent
12. Advanced/Quit Gimps
13. Options/CPU
14. Options/Preferences
15. Options/Torture Test
16. Options/Benchmark
17. Help/About
18. Help/About PrimeNet Server
Your choice: [Main thread Mar 25 09:36] Starting worker.
[New Thread 0x7ffff6be5700 (LWP 10257)]
[New Thread 0x7ffff53e2700 (LWP 10258)]
[New Thread 0x7ffff4be1700 (LWP 10259)]
[Thread 0x7ffff6be5700 (LWP 10257) exited]
[Thread 0x7ffff53e2700 (LWP 10258) exited]
[Thread 0x7ffff4be1700 (LWP 10259) exited]
[Work thread Mar 25 09:36] Worker starting
Program received signal SIGFPE, Arithmetic exception.
[Switching to Thread 0x7ffff63e4700 (LWP 10256)]
0x0000000000437e01 in SetPriority ()
(gdb) bt
#0 0x0000000000437e01 in SetPriority ()
#1 0x00000000004391df in primeTime ()
#2 0x000000000043a314 in LauncherDispatch ()
#3 0x000000000043a528 in Launcher ()
#4 0x00000000004655fa in ThreadStarter ()
#5 0x00007ffff78c6064 in start_thread (arg=0x7ffff63e4700) at pthread_create.c:309
#6 0x00007ffff6ee462d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
|
|
|
|
|
|
#22 |
|
Nov 2016
Toronto, ON, CA
516 Posts |
Hi, I think that I found an "use HT" issue.
Using 29.1 b14, Windows 10 x64 Xeon 1650v4 DDR4-2400 ECC Short version: After finishing TF low limits task, LL started using hyperthreading cores. HT Configured for TF but not for LL (as suggested). Long version: 1 worker thread, 6 cpu cores, use HT for TF don't use HT for LL I had a LL worker running, then I updated to 29.1 b14 today (was using 28.10) and got curious about TF performance. Then some TF work was reserverd. I closed Prime95 and reordered worktodo to do some TF first. When restarted LL task it kept using 12 cores. |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Prime95 version 27.3 | Prime95 | Software | 148 | 2012-03-18 19:24 |
| Prime95 version 26.3 | Prime95 | Software | 76 | 2010-12-11 00:11 |
| Prime95 version 25.5 | Prime95 | PrimeNet | 369 | 2008-02-26 05:21 |
| Prime95 version 25.4 | Prime95 | PrimeNet | 143 | 2007-09-24 21:01 |
| When the next prime95 version ? | pacionet | Software | 74 | 2006-12-07 20:30 |