mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2017-01-14, 02:22   #177
Uncwilly
6809 > 6502
 
Uncwilly's Avatar
 
"""""""""""""""""""
Aug 2003
101×103 Posts

250916 Posts
Default

Quote:
Originally Posted by Prime95 View Post
It was in the zip file. I just didn't notice it during the unzip. But I found it and copied it to the second machine.
Uncwilly is offline   Reply With Quote
Old 2017-01-23, 18:29   #178
tuxbg
 
Jan 2017

2 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Prime95 version 28.9 build 2 is available. From whatsnew.txt:

Code:
1)  Since GPUs are so much better at trial factoring than CPUs, benchmarking no longer times
    mprime's trial factoring by default.  Two new benchmarking options are available:
    OnlyBenchThroughput and OnlyBenchMaxCPUs.  See undoc.txt for details.
2)  Slightly reduced the memory bandwidth requirements for several large FFTs.  May lead to
    about a 1% speed increase for users testing 100 million digit numbers.
3)  If running more than one worker, prime95 looks for any sin/cos data that it can share among
    the workers.  Depending on the FFT sizes you are running, this could lead to a very slight
    reduction in needed memory bandwidth.
4)  Method for choosing the best FFT implementation changed.  In previous versions, the FFT
    implementation that resulted in the fastest single worker timing was used.  In this version
    the FFT implementation that had the best throughput was selected.  For FMA3 FFTs I used a
    4-core Skylake to measure best throughput.  For AVX FFTs I used a 4-core Sandy Bridge
    to measure best throughput.  Not many FFTs were affected, but you may see a few percent
    variation in throughput with this version.
5)  Improved AVX2 trial factoring in 64-bit executable.  Trial factoring should still be done
    on a GPU.  A GPU is on the order of 100 times more efficient at trial factoring than a CPU!!!
6)  Trial factoring now defines one "iteration" as processing 128KB of sieve, or 1M possible
    factors.  In previous versions an iteration was defined as 16KB of sieve in 32-bit executables
    and 48KB in 64-bit executables.  The trial factoring benchmark still times processing 16KB of sieve.
7)  Trial factoring in 64-bit executables is now multi-threaded.
8)  On initial install, the default settings for number of workers will be set to
    the number of cores / 4 with multithreading turned on.
9)  The worker windows menu choice now enforces a minimum number of multi-threaded cores for some
    work types to ensure timely completion of assignments.  Also, the worker windows menu choice
    no longer allows assigning work to hyperthreads (they are rarely beneficial in mprime).
    This behavior can be overridden with the ConfigureHyperthreads undoc.txt feature.

Download links:
Windows 64-bit: ftp://mersenne.org/gimps/p95v2810.win64.zip
Linux 64-bit: ftp://mersenne.org/gimps/p95v2810.linux64.tar.gz
Mac OS X: ftp://mersenne.org/gimps/p95v289.MacOSX.zip
FreeBSD 10 64-bit: ftp://mersenne.org/gimps/p95v289.FreeBSD10-64.tar.gz (not ready yet)
Windows 32-bit: ftp://mersenne.org/gimps/p95v2810.win32.zip
Linux 32-bit: ftp://mersenne.org/gimps/p95v2810.linux32.tar.gz
Source: ftp://mersenne.org/gimps/p95v2810.source.zip (not ready yet)
Hello.I have a some strange issue with prime95 28.10.If i run custom 800k-800k with 90% of my ram and using FMA3 im able to run it for endless time.But if i try to run exact same settings but with AVX and not FMA3 prime95 stop workers in about 3 mins
tuxbg is offline   Reply With Quote
Old 2017-01-23, 21:36   #179
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

740310 Posts
Default

Quote:
Originally Posted by tuxbg View Post
Hello.I have a some strange issue with prime95 28.10.If i run custom 800k-800k with 90% of my ram and using FMA3 im able to run it for endless time.But if i try to run exact same settings but with AVX and not FMA3 prime95 stop workers in about 3 mins
Is this a Skylake? Has the BIOS been updated?
Prime95 is offline   Reply With Quote
Old 2017-01-24, 05:56   #180
tuxbg
 
Jan 2017

2 Posts
Default

Quote:
Originally Posted by Prime95 View Post
Is this a Skylake? Has the BIOS been updated?
Hello.My motherboard is Asus X99-A 2 with latest BIOS.I.m using i7 5960x.Im seeing this behaviour in stock and overclocked cpu and ram
tuxbg is offline   Reply With Quote
Old 2017-01-30, 08:59   #181
rudi_m
 
rudi_m's Avatar
 
Jul 2005

101101102 Posts
Default

Quote:
Originally Posted by Prime95 View Post
For TF users only:

Link to Linux 29.1: https://www.dropbox.com/s/a53l99b68u...ime64.tgz?dl=0

TF code is still being worked on, so this version will be replaced soon.
BTW thanks a lot for 29.1! TF speed is more than double as fast on my Skylakes. And the TF threads still don't have negative impact on parallel running LL threads :)
rudi_m is offline   Reply With Quote
Old 2017-03-07, 04:28   #182
vsuite
 
Jan 2010

2·3·19 Posts
Default

Quote:
Originally Posted by vsuite View Post
Are there any settings to make an i7 quad core with hyperthreading seem like an 8 core machine so I can benchmark with 5 or 6 LL threads please?
I asked this question because I wonder whether 4 threads on 4 core HT i7 processors gives us the maximal throughput.

Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
CPU speed: 3990.82 MHz, 4 hyperthreaded cores
Timings for 1024K FFT length (4 cpus, 4 workers): 6.75, 6.77, 6.88, 6.78 ms. Throughput: 588.75 iter/sec.
Timings for 1024K FFT length (4 cpus hyperthreaded, 4 workers): 7.01, 6.99, 7.03, 7.01 ms. Throughput: 570.72 iter/sec.
Timings for 1280K FFT length (4 cpus, 4 workers): 8.54, 8.51, 8.58, 8.54 ms. Throughput: 468.27 iter/sec.
Timings for 1280K FFT length (4 cpus hyperthreaded, 4 workers): 8.78, 8.76, 8.91, 8.76 ms. Throughput: 454.29 iter/sec.
Timings for 1536K FFT length (4 cpus, 4 workers): 10.16, 10.20, 10.27, 10.23 ms. Throughput: 391.61 iter/sec.
Timings for 1536K FFT length (4 cpus hyperthreaded, 4 workers): 10.53, 10.52, 10.63, 10.50 ms. Throughput: 379.35 iter/sec.
Timings for 1792K FFT length (4 cpus, 4 workers): 12.42, 12.43, 12.86, 12.34 ms. Throughput: 319.79 iter/sec.
Timings for 1792K FFT length (4 cpus hyperthreaded, 4 workers): 12.73, 13.51, 12.64, 14.93 ms. Throughput: 298.67 iter/sec.
Timings for 2048K FFT length (4 cpus, 4 workers): 13.71, 13.86, 13.92, 13.72 ms. Throughput: 289.80 iter/sec.
Timings for 2048K FFT length (4 cpus hyperthreaded, 4 workers): 14.34, 14.27, 14.39, 14.22 ms. Throughput: 279.59 iter/sec.
Timings for 2560K FFT length (4 cpus, 4 workers): 17.96, 17.96, 18.06, 17.86 ms. Throughput: 222.71 iter/sec.
Timings for 2560K FFT length (4 cpus hyperthreaded, 4 workers): 18.63, 18.62, 18.31, 18.17 ms. Throughput: 217.05 iter/sec.
Timings for 3072K FFT length (4 cpus, 4 workers): 21.53, 21.79, 21.47, 21.45 ms. Throughput: 185.53 iter/sec.
Timings for 3072K FFT length (4 cpus hyperthreaded, 4 workers): 22.09, 22.24, 22.09, 22.43 ms. Throughput: 180.10 iter/sec.
Timings for 3584K FFT length (4 cpus, 4 workers): 24.98, 25.53, 25.20, 25.22 ms. Throughput: 158.54 iter/sec.
Timings for 3584K FFT length (4 cpus hyperthreaded, 4 workers): 26.14, 25.63, 25.88, 25.93 ms. Throughput: 154.49 iter/sec.
Timings for 4096K FFT length (4 cpus, 4 workers): 28.66, 28.68, 28.96, 28.73 ms. Throughput: 139.11 iter/sec.
Timings for 4096K FFT length (4 cpus hyperthreaded, 4 workers): 29.71, 29.33, 29.84, 29.37 ms. Throughput: 135.31 iter/sec.
Timings for 5120K FFT length (4 cpus, 4 workers): 35.66, 35.97, 36.01, 35.79 ms. Throughput: 111.55 iter/sec.
Timings for 5120K FFT length (4 cpus hyperthreaded, 4 workers): 38.51, 38.96, 36.54, 38.47 ms. Throughput: 104.99 iter/sec.
Timings for 6144K FFT length (4 cpus, 4 workers): 42.15, 42.54, 42.02, 41.96 ms. Throughput: 94.86 iter/sec.
Timings for 6144K FFT length (4 cpus hyperthreaded, 4 workers): 43.98, 43.97, 44.13, 43.62 ms. Throughput: 91.06 iter/sec.
Timings for 7168K FFT length (4 cpus, 4 workers): 49.11, 49.92, 49.27, 49.16 ms. Throughput: 81.03 iter/sec.
Timings for 7168K FFT length (4 cpus hyperthreaded, 4 workers): 52.03, 51.71, 51.90, 51.76 ms. Throughput: 77.15 iter/sec.
Timings for 8192K FFT length (4 cpus, 4 workers): 56.63, 56.62, 56.66, 56.55 ms. Throughput: 70.65 iter/sec.
Timings for 8192K FFT length (4 cpus hyperthreaded, 4 workers): 58.61, 57.98, 59.05, 58.59 ms. Throughput: 68.31 iter/sec.
Throughput is similar to the 4 cpus hyperthreaded if Prime95 is made to think it is a 8 core cpu

Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
CPU speed: 3979.14 MHz, 8 cores
Timings for 1024K FFT length (8 cpus, 8 workers): 14.20, 13.91, 14.07, 14.08, 14.35, 13.97, 14.38, 13.89 ms. Throughput: 567.23 iter/sec.
Timings for 1280K FFT length (8 cpus, 8 workers): 18.64, 17.00, 17.49, 17.46, 17.68, 17.52, 17.84, 17.46 ms. Throughput: 453.90 iter/sec.
Timings for 1536K FFT length (8 cpus, 8 workers): 21.18, 21.10, 21.17, 21.04, 21.33, 20.82, 21.49, 21.08 ms. Throughput: 378.25 iter/sec.
Timings for 1792K FFT length (8 cpus, 8 workers): 27.10, 24.77, 25.65, 25.48, 25.63, 25.29, 26.03, 25.21 ms. Throughput: 312.13 iter/sec.
Timings for 2048K FFT length (8 cpus, 8 workers): 30.11, 27.77, 27.79, 27.79, 28.52, 28.33, 29.09, 28.77 ms. Throughput: 280.67 iter/sec.
Timings for 2560K FFT length (8 cpus, 8 workers): 39.15, 36.08, 36.51, 36.54, 37.20, 36.74, 37.56, 37.24 ms. Throughput: 215.59 iter/sec.
[Mon Dec 05 23:57:12 2016]
Timings for 3072K FFT length (8 cpus, 8 workers): 44.09, 44.07, 43.48, 43.49, 44.80, 43.98, 45.17, 44.55 ms. Throughput: 181.01 iter/sec.
Timings for 3584K FFT length (8 cpus, 8 workers): 54.59, 50.56, 50.89, 50.90, 51.72, 50.77, 51.83, 51.47 ms. Throughput: 155.15 iter/sec.
Timings for 4096K FFT length (8 cpus, 8 workers): 58.57, 58.75, 58.54, 58.74, 60.00, 58.68, 59.38, 59.15 ms. Throughput: 135.65 iter/sec.
Timings for 5120K FFT length (8 cpus, 8 workers): 78.79, 73.09, 73.40, 73.41, 74.53, 73.62, 74.60, 74.00 ms. Throughput: 107.54 iter/sec.
Timings for 6144K FFT length (8 cpus, 8 workers): 90.20, 83.53, 88.52, 88.50, 86.48, 85.48, 86.64, 85.88 ms. Throughput: 92.10 iter/sec.
Timings for 7168K FFT length (8 cpus, 8 workers): 118.17, 93.58, 104.24, 104.27, 100.25, 99.53, 100.65, 99.77 ms. Throughput: 78.31 iter/sec.
Timings for 8192K FFT length (8 cpus, 8 workers): 119.92, 112.98, 119.11, 119.10, 114.97, 113.93, 115.30, 114.60 ms. Throughput: 68.86 iter/sec.
Throughput improves slightly when NUMCPUs is 7, but then it drops again when at 6, 5 and even 4. All things being equal throughput should improve.

Prime95 is not optimized to run 6. 5 or 4 threads on a 4 core hyperthreaded 4790 when the chip is treated as a 6, 5 or 4 core chip.

There is only one way for 7 threads to be used: AB CD EF Go. By releasing one thread of the 8. AB Co DE FG is considered equivalent to oA BC DE FG.
There are 2 ways for 6 threads to be used: AB CD EF oo and AE BF Co Do. The latter is more optimal. Threads E and F are free to be assigned to any thread, but each of A, B, C, and D are assigned to specific cores.
There are 2 ways for 5 threads to be used: AB CD Eo oo and AE Bo Co Do. The latter is more optimal. Threads E is free to be assigned to any thread, but each of A, B, C, and D are assigned to specific cores.
There are 3 ways for 4 threads to be used: AB CD oo oo and AB Co Do oo and Ao Bo Co Do. The last is most optimal. Each of A, B, C, and D are assigned to specific cores.

I guess Prime95 would be optimized to choose Ao Bo Co Do for any 4 threads simultaneously being run.

What I recommend is that processing be optimized to run 5 threads as above or even 6 threads.

I postulate that there should be slight total throughput increase for 5 threads [possibly dependent on memory system] and then a drop off until 8 threads. I don't think Prime95 allows benchmarking of this specific condition.
vsuite is offline   Reply With Quote
Old 2017-05-03, 06:06   #183
Harrywill
 
"Harry Willam"
May 2017
USA

22×5 Posts
Default

Version 29.1 may include unspecified updates
Harrywill is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Prime95 version 27.3 Prime95 Software 148 2012-03-18 19:24
Prime95 version 26.3 Prime95 Software 76 2010-12-11 00:11
Prime95 version 25.5 Prime95 PrimeNet 369 2008-02-26 05:21
Prime95 version 25.4 Prime95 PrimeNet 143 2007-09-24 21:01
When the next prime95 version ? pacionet Software 74 2006-12-07 20:30

All times are UTC. The time now is 08:10.

Tue Apr 13 08:10:50 UTC 2021 up 5 days, 2:51, 1 user, load averages: 1.15, 1.33, 1.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.