mersenneforum.org Prime95 version 28.9 / 28.10
 Register FAQ Search Today's Posts Mark Forums Read

2017-01-14, 02:22   #177
Uncwilly
6809 > 6502

"""""""""""""""""""
Aug 2003
101×103 Posts

250916 Posts

Quote:
 Originally Posted by Prime95 Get your DLLs here: https://www.open-mpi.org/software/hwloc/v1.11/
It was in the zip file. I just didn't notice it during the unzip. But I found it and copied it to the second machine.

2017-01-23, 18:29   #178
tuxbg

Jan 2017

2 Posts

Quote:
 Originally Posted by Prime95 Prime95 version 28.9 build 2 is available. From whatsnew.txt: Code: 1) Since GPUs are so much better at trial factoring than CPUs, benchmarking no longer times mprime's trial factoring by default. Two new benchmarking options are available: OnlyBenchThroughput and OnlyBenchMaxCPUs. See undoc.txt for details. 2) Slightly reduced the memory bandwidth requirements for several large FFTs. May lead to about a 1% speed increase for users testing 100 million digit numbers. 3) If running more than one worker, prime95 looks for any sin/cos data that it can share among the workers. Depending on the FFT sizes you are running, this could lead to a very slight reduction in needed memory bandwidth. 4) Method for choosing the best FFT implementation changed. In previous versions, the FFT implementation that resulted in the fastest single worker timing was used. In this version the FFT implementation that had the best throughput was selected. For FMA3 FFTs I used a 4-core Skylake to measure best throughput. For AVX FFTs I used a 4-core Sandy Bridge to measure best throughput. Not many FFTs were affected, but you may see a few percent variation in throughput with this version. 5) Improved AVX2 trial factoring in 64-bit executable. Trial factoring should still be done on a GPU. A GPU is on the order of 100 times more efficient at trial factoring than a CPU!!! 6) Trial factoring now defines one "iteration" as processing 128KB of sieve, or 1M possible factors. In previous versions an iteration was defined as 16KB of sieve in 32-bit executables and 48KB in 64-bit executables. The trial factoring benchmark still times processing 16KB of sieve. 7) Trial factoring in 64-bit executables is now multi-threaded. 8) On initial install, the default settings for number of workers will be set to the number of cores / 4 with multithreading turned on. 9) The worker windows menu choice now enforces a minimum number of multi-threaded cores for some work types to ensure timely completion of assignments. Also, the worker windows menu choice no longer allows assigning work to hyperthreads (they are rarely beneficial in mprime). This behavior can be overridden with the ConfigureHyperthreads undoc.txt feature. Download links: Windows 64-bit: ftp://mersenne.org/gimps/p95v2810.win64.zip Linux 64-bit: ftp://mersenne.org/gimps/p95v2810.linux64.tar.gz Mac OS X: ftp://mersenne.org/gimps/p95v289.MacOSX.zip FreeBSD 10 64-bit: ftp://mersenne.org/gimps/p95v289.FreeBSD10-64.tar.gz (not ready yet) Windows 32-bit: ftp://mersenne.org/gimps/p95v2810.win32.zip Linux 32-bit: ftp://mersenne.org/gimps/p95v2810.linux32.tar.gz Source: ftp://mersenne.org/gimps/p95v2810.source.zip (not ready yet)
Hello.I have a some strange issue with prime95 28.10.If i run custom 800k-800k with 90% of my ram and using FMA3 im able to run it for endless time.But if i try to run exact same settings but with AVX and not FMA3 prime95 stop workers in about 3 mins

2017-01-23, 21:36   #179
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

740310 Posts

Quote:
 Originally Posted by tuxbg Hello.I have a some strange issue with prime95 28.10.If i run custom 800k-800k with 90% of my ram and using FMA3 im able to run it for endless time.But if i try to run exact same settings but with AVX and not FMA3 prime95 stop workers in about 3 mins
Is this a Skylake? Has the BIOS been updated?

2017-01-24, 05:56   #180
tuxbg

Jan 2017

2 Posts

Quote:
 Originally Posted by Prime95 Is this a Skylake? Has the BIOS been updated?
Hello.My motherboard is Asus X99-A 2 with latest BIOS.I.m using i7 5960x.Im seeing this behaviour in stock and overclocked cpu and ram

2017-01-30, 08:59   #181
rudi_m

Jul 2005

101101102 Posts

Quote:
 Originally Posted by Prime95 For TF users only: Link to Linux 29.1: https://www.dropbox.com/s/a53l99b68u...ime64.tgz?dl=0 TF code is still being worked on, so this version will be replaced soon.
BTW thanks a lot for 29.1! TF speed is more than double as fast on my Skylakes. And the TF threads still don't have negative impact on parallel running LL threads :)

2017-03-07, 04:28   #182
vsuite

Jan 2010

2·3·19 Posts

Quote:
 Originally Posted by vsuite Are there any settings to make an i7 quad core with hyperthreading seem like an 8 core machine so I can benchmark with 5 or 6 LL threads please?
I asked this question because I wonder whether 4 threads on 4 core HT i7 processors gives us the maximal throughput.

Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
CPU speed: 3990.82 MHz, 4 hyperthreaded cores
Timings for 1024K FFT length (4 cpus, 4 workers): 6.75, 6.77, 6.88, 6.78 ms. Throughput: 588.75 iter/sec.
Timings for 1024K FFT length (4 cpus hyperthreaded, 4 workers): 7.01, 6.99, 7.03, 7.01 ms. Throughput: 570.72 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers): 8.54, 8.51, 8.58, 8.54 ms. Throughput: 468.27 iter/sec.
Timings for 1280K FFT length (4 cpus hyperthreaded, 4 workers): 8.78, 8.76, 8.91, 8.76 ms. Throughput: 454.29 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 10.16, 10.20, 10.27, 10.23 ms. Throughput: 391.61 iter/sec.
Timings for 1536K FFT length (4 cpus hyperthreaded, 4 workers): 10.53, 10.52, 10.63, 10.50 ms. Throughput: 379.35 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 12.42, 12.43, 12.86, 12.34 ms. Throughput: 319.79 iter/sec.
Timings for 1792K FFT length (4 cpus hyperthreaded, 4 workers): 12.73, 13.51, 12.64, 14.93 ms. Throughput: 298.67 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 13.71, 13.86, 13.92, 13.72 ms. Throughput: 289.80 iter/sec.
Timings for 2048K FFT length (4 cpus hyperthreaded, 4 workers): 14.34, 14.27, 14.39, 14.22 ms. Throughput: 279.59 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 17.96, 17.96, 18.06, 17.86 ms. Throughput: 222.71 iter/sec.
Timings for 2560K FFT length (4 cpus hyperthreaded, 4 workers): 18.63, 18.62, 18.31, 18.17 ms. Throughput: 217.05 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 21.53, 21.79, 21.47, 21.45 ms. Throughput: 185.53 iter/sec.
Timings for 3072K FFT length (4 cpus hyperthreaded, 4 workers): 22.09, 22.24, 22.09, 22.43 ms. Throughput: 180.10 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 24.98, 25.53, 25.20, 25.22 ms. Throughput: 158.54 iter/sec.
Timings for 3584K FFT length (4 cpus hyperthreaded, 4 workers): 26.14, 25.63, 25.88, 25.93 ms. Throughput: 154.49 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 28.66, 28.68, 28.96, 28.73 ms. Throughput: 139.11 iter/sec.
Timings for 4096K FFT length (4 cpus hyperthreaded, 4 workers): 29.71, 29.33, 29.84, 29.37 ms. Throughput: 135.31 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 35.66, 35.97, 36.01, 35.79 ms. Throughput: 111.55 iter/sec.
Timings for 5120K FFT length (4 cpus hyperthreaded, 4 workers): 38.51, 38.96, 36.54, 38.47 ms. Throughput: 104.99 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 42.15, 42.54, 42.02, 41.96 ms. Throughput: 94.86 iter/sec.
Timings for 6144K FFT length (4 cpus hyperthreaded, 4 workers): 43.98, 43.97, 44.13, 43.62 ms. Throughput: 91.06 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 49.11, 49.92, 49.27, 49.16 ms. Throughput: 81.03 iter/sec.
Timings for 7168K FFT length (4 cpus hyperthreaded, 4 workers): 52.03, 51.71, 51.90, 51.76 ms. Throughput: 77.15 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 56.63, 56.62, 56.66, 56.55 ms. Throughput: 70.65 iter/sec.
Timings for 8192K FFT length (4 cpus hyperthreaded, 4 workers): 58.61, 57.98, 59.05, 58.59 ms. Throughput: 68.31 iter/sec.
Throughput is similar to the 4 cpus hyperthreaded if Prime95 is made to think it is a 8 core cpu

Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
CPU speed: 3979.14 MHz, 8 cores
Timings for 1024K FFT length (8 cpus, 8 workers): 14.20, 13.91, 14.07, 14.08, 14.35, 13.97, 14.38, 13.89 ms. Throughput: 567.23 iter/sec.
Timings for 1280K FFT length (8 cpus, 8 workers): 18.64, 17.00, 17.49, 17.46, 17.68, 17.52, 17.84, 17.46 ms. Throughput: 453.90 iter/sec.
Timings for 1536K FFT length (8 cpus, 8 workers): 21.18, 21.10, 21.17, 21.04, 21.33, 20.82, 21.49, 21.08 ms. Throughput: 378.25 iter/sec.
Timings for 1792K FFT length (8 cpus, 8 workers): 27.10, 24.77, 25.65, 25.48, 25.63, 25.29, 26.03, 25.21 ms. Throughput: 312.13 iter/sec.
Timings for 2048K FFT length (8 cpus, 8 workers): 30.11, 27.77, 27.79, 27.79, 28.52, 28.33, 29.09, 28.77 ms. Throughput: 280.67 iter/sec.
Timings for 2560K FFT length (8 cpus, 8 workers): 39.15, 36.08, 36.51, 36.54, 37.20, 36.74, 37.56, 37.24 ms. Throughput: 215.59 iter/sec.
[Mon Dec 05 23:57:12 2016]
Timings for 3072K FFT length (8 cpus, 8 workers): 44.09, 44.07, 43.48, 43.49, 44.80, 43.98, 45.17, 44.55 ms. Throughput: 181.01 iter/sec.
Timings for 3584K FFT length (8 cpus, 8 workers): 54.59, 50.56, 50.89, 50.90, 51.72, 50.77, 51.83, 51.47 ms. Throughput: 155.15 iter/sec.
Timings for 4096K FFT length (8 cpus, 8 workers): 58.57, 58.75, 58.54, 58.74, 60.00, 58.68, 59.38, 59.15 ms. Throughput: 135.65 iter/sec.
Timings for 5120K FFT length (8 cpus, 8 workers): 78.79, 73.09, 73.40, 73.41, 74.53, 73.62, 74.60, 74.00 ms. Throughput: 107.54 iter/sec.
Timings for 6144K FFT length (8 cpus, 8 workers): 90.20, 83.53, 88.52, 88.50, 86.48, 85.48, 86.64, 85.88 ms. Throughput: 92.10 iter/sec.
Timings for 7168K FFT length (8 cpus, 8 workers): 118.17, 93.58, 104.24, 104.27, 100.25, 99.53, 100.65, 99.77 ms. Throughput: 78.31 iter/sec.
Timings for 8192K FFT length (8 cpus, 8 workers): 119.92, 112.98, 119.11, 119.10, 114.97, 113.93, 115.30, 114.60 ms. Throughput: 68.86 iter/sec.
Throughput improves slightly when NUMCPUs is 7, but then it drops again when at 6, 5 and even 4. All things being equal throughput should improve.

Prime95 is not optimized to run 6. 5 or 4 threads on a 4 core hyperthreaded 4790 when the chip is treated as a 6, 5 or 4 core chip.

There is only one way for 7 threads to be used: AB CD EF Go. By releasing one thread of the 8. AB Co DE FG is considered equivalent to oA BC DE FG.
There are 2 ways for 6 threads to be used: AB CD EF oo and AE BF Co Do. The latter is more optimal. Threads E and F are free to be assigned to any thread, but each of A, B, C, and D are assigned to specific cores.
There are 2 ways for 5 threads to be used: AB CD Eo oo and AE Bo Co Do. The latter is more optimal. Threads E is free to be assigned to any thread, but each of A, B, C, and D are assigned to specific cores.
There are 3 ways for 4 threads to be used: AB CD oo oo and AB Co Do oo and Ao Bo Co Do. The last is most optimal. Each of A, B, C, and D are assigned to specific cores.

I guess Prime95 would be optimized to choose Ao Bo Co Do for any 4 threads simultaneously being run.

What I recommend is that processing be optimized to run 5 threads as above or even 6 threads.

I postulate that there should be slight total throughput increase for 5 threads [possibly dependent on memory system] and then a drop off until 8 threads. I don't think Prime95 allows benchmarking of this specific condition.

 2017-05-03, 06:06 #183 Harrywill   "Harry Willam" May 2017 USA 22×5 Posts Version 29.1 may include unspecified updates

 Similar Threads Thread Thread Starter Forum Replies Last Post Prime95 Software 148 2012-03-18 19:24 Prime95 Software 76 2010-12-11 00:11 Prime95 PrimeNet 369 2008-02-26 05:21 Prime95 PrimeNet 143 2007-09-24 21:01 pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 08:10.

Tue Apr 13 08:10:50 UTC 2021 up 5 days, 2:51, 1 user, load averages: 1.15, 1.33, 1.38