![]() |
Prime95 30.7
Prime95 version 30.7 build 9 is available.
P-1/P+1/ECM users should consider upgrading to help with testing. Intel Alder Lake users definitely need to upgrade to iron out any issues. Win11 users should also consider upgrading to test for affinity issues. First time PRP users can consider upgrading for the P-1 stage 2 speed boost. WARNING: If you upgrade in the middle of P-1/P+1/ECM stage 2, then all your stage 2 work will be lost -- stage 2 starts from scratch. From whatsnew.txt: [CODE]1) Better prime pairing in stage 2 of ECM/P-1/P+1. This usually results in slightly better stage 2 timings or less memory used. Save file formats changed - upgrading to 30.7 while ECM/P-1/P+1 work is in stage 2 will result in stage 2 being restarted from scratch. 2) P-1 converted to use P+1 style stage 2. From the users perpective there is no difference. Internally a modular inverse is required at stage 2 init, but there is one multiplication saved for every D-block processed. For all common P-1 cases, this is a little faster. 3) ECM/P-1/P+1 no longer use a bit map for prime pairs. Instead a compressed pairing map is created to save memory. For large B2 values this also results in fewer calls to generate pairing maps. It also makes stage 2 save files smaller. 4) Some minor changes in AVX-512 FFT crossovers. ECM/P-1/P+1 all changed to rollback to the last save file and switch to a larger FFT size should an excessive roundoff error be encountered. 5) Support for asymmetric processor architectures such as Intel's Alder Lake. 6) Torture test dialog now asks for number of cores to test along with a "Use hyperthreading" checkbox. Previously, the dialog box asked for total number of torture threads to execute. 7) Versions 30.4/30.5/30.6 were underestimating the cost of P-1 stage 2 relative to P-1 stage 1. Expect this version to use lower stage 2 bounds in P-1. [/CODE] Download links: Windows 64-bit: [URL]https://mersenne.org/ftp_root/gimps/p95v307b9.win64.zip[/URL] Linux 64-bit: [URL]https://mersenne.org/ftp_root/gimps/p95v307b9.linux64.tar.gz[/URL] FreeBSD 64-bit: [URL]https://mersenne.org/ftp_root/gimps/p95v307b9.FreeBSD11-64.tar.gz[/URL] Windows 32-bit: [URL]https://mersenne.org/ftp_root/gimps/p95v307b9.win32.zip[/URL] Linux 32-bit: [URL]https://mersenne.org/ftp_root/gimps/p95v307b9.linux32.tar.gz[/URL] Windows 64-bit Service: [URL]https://mersenne.org/ftp_root/gimps/p95v307b9.win64.service.zip[/URL] Windows 32-bit Service: [URL]https://mersenne.org/ftp_root/gimps/p95v307b9.win32.service.zip[/URL] Source: [URL]https://mersenne.org/ftp_root/gimps/p95v307b9.source.zip[/URL] Please report any bugs you may find by email or posting in this thread. |
1) Benchmarking broken. Fixed in build 2.
2) Most non-Mersenne FFTs broken. Fixed in build 3. 3) Hyperthreaded torture tests not setting affinity properly for small FFTs. Fixed in build 4. 4) Hyperthreaded in-place torture tests crash for small FFTs. Fixed in build 4. 5) Semi-obscure ECM crash. If an ECM curve needed modular inverses in stage 2 and a subsequent curve needed none (more memory available), then a crash occurred. Fixed in build 4. 6) Assume CERTs will complete before all other work types in computing estimated completion dates. Fixed in build 5. 7) A low-memory situation during stage 2 init of ECM could lead to a crash writing a save file. Fixed in build 5. 8) During stage 2 init, checking for a restart due to a reduction in available memory was infrequent. Fixed in build 5 - might reduce chance of an out-of-memory event. 9) Options/Benchmark tries to run a hyperthreaded benchmark on non-hyperthreaded CPUs. Fixed in build 6. 10) Another possible crash bug in stage 2 init when memory settings change. Fixed in build 6. 11) ECM sometimes generated excessive roundoff error, usually at start up which then forced using a larger FFT size than necessary. Fixed in build 6. 12) On stage 2 restart due to more memory now available, stage 2 % complete was erroneously reported to be 100%. Fixed in build 8. 13) On stage 2 restart due to less memory being available, stage 2 might restart from scratch. Fixed in build 8. 14) Rare radix conversion excessive roundoff error affecting PRP of non-base-2 numbers. Fixed in build 8. 15) Trial factoring crashes. Fixed in build 9. |
How you can help:
1) Help fine-tune the P-1 stage 1 vs stage 2 cost function. In preferences, set output iterations low -- like 10000. Report the typical P-1 stage 1 timings vs. typical stage 2 timings as well as minimal architectural info. Example for one of my machines: [code]Skylake CPU. FMA FFT, 106M exponent: stage 1 = 83.9 sec, stage 2 = 129 sec.[/code] The optimal P-1 bounds depends on the stage 2 to stage 1 timing ratio. I'm seeing stage 2 anywhere from 30-50% slower. 2) Alder Lake and Win11 -- verify CPU affinities make sense and are working as expected. Add to prime.txt: [code]AffinityVerbosity=2 AffinityVerbosityTorture=2 AffinityVerbosityTime=2 AffinityVerbosityBench=2[/code] Run regular work, torture test, benchmarks, and even advanced/time. Prime95 should prefer assigning work to the performance cores. Make sure the cpu affinities output to each worker window make sense. Bring up task manager to verify that the work is being done on the cores prime95 assigned each worker. Try running on a subset of cores. For example, 1 worker running on 2 hyperthreaded cores -- do the 4 threads in fact run on only 2 performance cores according to task manager? |
I invite any adventurous Alder Lake owner to try it on both Win11 and [URL="https://www.howtogeek.com/744328/how-to-install-the-windows-subsystem-for-linux-on-windows-11/"]WSL[/URL] Ubuntu. And native Linux too if you've got dual-boot in place.
|
[QUOTE=kriesel;589238]And native Linux too if you've got dual-boot in place.[/QUOTE]
Are those who have Linux as primary boot welcome as well? :smile: |
[QUOTE=chalsall;589239]Are those who have Linux as primary boot welcome as well? :smile:[/QUOTE]Sure. There's a little advantage to Win, WSL, and Lin on identical hardware for a 3-way comparison on performance and proper core handling, but I don't think there's a capacity limit at this party. IIRC Windows requires primary partition, Linux doesn't.
|
[QUOTE=kriesel;589242]Windows requires primary partition, Linux doesn't.[/QUOTE]
You support my argument, sir... Micro$oft doesn't "play well with others". Some have learnt to stop playing the game with MicroCrap, and have gone "all in" with Linux as the primary OS. Particularly, being tricked into thinking running virtual environments under WinBlows 10 (now being forced to WinCrows 11) simulating Linux through some kind of virtual shell is somehow doing the same thing as running a "full-up Linux stack" has been empirically shown to be little more than "Snake Oil". Sincerely... No issues (between the two of us). :chalsall: |
Excuse me, but where's the source code?
|
[QUOTE=chalsall;589243]You support my argument, sir...
Micro$oft doesn't "play well with others". Particularly, being tricked into thinking running virtual environments...[/QUOTE]Hmm, you seem a bit zealous. This thread is about a new release of prime95 / mprime. It isn't the place for refighting the favorite-OS wars. Or whether single-boot, multi-boot, or VM is the one true way, or any other techno-religious-fervor conflict. They're all just tools. Don't blame the hammer for a lack of screwdriver-ness. Or do, but in the proper threads[QUOTE=chalsall;589222]I'm giving Ubuntu one more chance.[/QUOTE]Heck, run Fedora VMs of various versions on Fedora host OS if you like, and let us know how V30.7 behaves and performs on VM vs host. Or find issues with V30.7 on Fedora host OS. WSL or VM are tools for having multiple environments available on the same hardware at the same time. |
I noticed that for ECM on small exponents, stage 2 init now takes a lot longer than before
[code] version 30.6b4 [Worker #4 Oct 3 11:59] ECM on [B]M20393[/B]: curve #264 with s=652720576976964, B1=3000000, B2=TBD [Worker #4 Oct 3 12:02] Stage 1 complete. 77076114 transforms, 1 modular inverses. Time: 191.655 sec. [Worker #4 Oct 3 12:02] Available memory is 11000MB. [Worker #4 Oct 3 12:02] Optimal [B]B2 is 176*B1 = 528000000[/B]. [Worker #4 Oct 3 12:03] D: 6930, relative primes: 21344, stage 2 primes: 27534330, pair%=96.81 [Worker #4 Oct 3 12:03] Stage 2 uses [B]929MB of memory[/B], 2 FFTs per prime pair, 3-mult modinv pooling, pool size 35165. [Worker #4 Oct 3 12:03] Stage 2 init complete. 560562 transforms, 1 modular inverses. Time: [B]8.126 sec[/B]. [Worker #4 Oct 3 12:04] Stage 2 complete. 29840810 transforms, 2 modular inverses. Time: [B]99.321 sec[/B]. [Worker #4 Oct 3 12:04] Stage 2 GCD complete. Time: 0.001 sec. version 30.7b1 [Worker #4 Oct 3 15:17] ECM on M20393: curve #301 with s=7945291737592001, B1=3000000, B2=TBD [Worker #4 Oct 3 15:20] Stage 1 complete. 77076114 transforms, 1 modular inverses. Total time: 179.730 sec. [Worker #4 Oct 3 15:20] Available memory is 11000MB. [Worker #4 Oct 3 15:20] Optimal [B]B2 is 100*B1 = 300000000[/B]. [Worker #4 Oct 3 15:21] D: 2772, relative primes: 2664, stage 2 primes: 16035509, pair%=86.48 [Worker #4 Oct 3 15:21] Stage 2 uses [B]75MB of memory[/B], 2 FFTs per prime pair, 3-mult modinv pooling, pool size 2706. [Worker #4 Oct 3 15:21] Stage 2 init complete. 109141 transforms, 2 modular inverses. Time: [B]31.491 sec[/B]. [Worker #4 Oct 3 15:22] Stage 2 complete. 19634829 transforms, 31 modular inverses. Total time: [B]52.660 sec.[/B] [Worker #4 Oct 3 15:22] Stage 2 GCD complete. Time: 0.001 sec. [/code][code] version 30.6b4 [Worker #3 Oct 3 11:57] ECM on [B]M307409[/B]: curve #139 with s=96291502140021, B1=250000, B2=TBD [Worker #3 Oct 3 12:02] Stage 1 complete. 6387044 transforms, 1 modular inverses. Time: 316.008 sec. [Worker #3 Oct 3 12:02] Available memory is 11000MB. [Worker #3 Oct 3 12:02] Optimal [B]B2 is 154*B1 = 38500000[/B]. [Worker #3 Oct 3 12:02] D: 4620, relative primes: 6955, stage 2 primes: 2325683, pair%=92.69 [Worker #3 Oct 3 12:02] Stage 2 uses [B]2651MB of memory[/B], 2 FFTs per prime pair, 3-mult modinv pooling, pool size 7693. [Worker #3 Oct 3 12:02] Stage 2 init complete. 182767 transforms, 1 modular inverses. Time: [B]10.380 sec[/B]. [Worker #3 Oct 3 12:05] Stage 2 complete. 2656544 transforms, 1 modular inverses. Time: [B]137.281 sec[/B]. [Worker #3 Oct 3 12:05] Stage 2 GCD complete. Time: 0.030 sec. version 30.7b1 [Worker #3 Oct 3 15:14] ECM on M307409: curve #161 with s=8109473831276158, B1=250000, B2=TBD [Worker #3 Oct 3 15:20] Stage 1 complete. 6387044 transforms, 1 modular inverses. Total time: 326.664 sec. [Worker #3 Oct 3 15:20] Available memory is 11000MB. [Worker #3 Oct 3 15:20] Optimal [B]B2 is 147*B1 = 36750000[/B]. [Worker #3 Oct 3 15:20] D: 2772, relative primes: 3600, stage 2 primes: 2225256, pair%=97.96 [Worker #3 Oct 3 15:20] Stage 2 uses [B]1056MB of memory[/B], 2 FFTs per prime pair, 3-mult modinv pooling, pool size 2652. [Worker #3 Oct 3 15:20] Stage 2 init complete. 125837 transforms, 2 modular inverses. [B]Time: 15.606 sec[/B]. [Worker #3 Oct 3 15:22] Stage 2 complete. 2412103 transforms, 3 modular inverses. [B]Total time: 132.420 sec[/B]. [Worker #3 Oct 3 15:22] Stage 2 GCD complete. Time: 0.032 sec. [/code]The reduced memory usage is really impressive, though! It saves >90% on M20,393 and 60% on M307,409. This is a Zen 2 Ryzen 3950X with one worker per CPU thread at 2.8GHz with Linux. |
Observation: Stage 2 progress % splits start out bigger (relative to 30.6) and progressively becomes smaller towards the end. Makes ETA calculations tricky.
[CODE][Work thread Oct 3 05:34] Conversion of stage 1 result complete. 5 transforms, 1 modular inverse. Time: 1.704 sec. [Work thread Oct 3 05:34] D: 1848, relative primes: 4800, stage 2 primes: 20796549, pair%=99.71 [Work thread Oct 3 05:34] Using 10995MB of memory. [Work thread Oct 3 05:35] Stage 2 init complete. 9481 transforms. Time: 55.307 sec. [Work thread Oct 3 05:48] M5266619 stage 2 is 5.43% complete. Time: 838.973 sec. [Work thread Oct 3 06:03] M5266619 stage 2 is 10.95% complete. Time: 840.818 sec. [Work thread Oct 3 06:17] M5266619 stage 2 is 16.51% complete. Time: 845.246 sec. [Work thread Oct 3 06:31] M5266619 stage 2 is 22.10% complete. Time: 841.122 sec. [Work thread Oct 3 06:45] M5266619 stage 2 is 26.96% complete. Time: 841.132 sec. [Work thread Oct 3 06:59] M5266619 stage 2 is 31.68% complete. Time: 841.467 sec. [Work thread Oct 3 07:13] M5266619 stage 2 is 36.42% complete. Time: 842.283 sec. [Work thread Oct 3 07:27] M5266619 stage 2 is 41.18% complete. Time: 842.197 sec. [Work thread Oct 3 07:41] M5266619 stage 2 is 45.96% complete. Time: 842.407 sec. [Work thread Oct 3 07:55] M5266619 stage 2 is 50.75% complete. Time: 841.396 sec. [Work thread Oct 3 08:09] M5266619 stage 2 is 55.56% complete. Time: 843.942 sec. [Work thread Oct 3 08:23] M5266619 stage 2 is 60.38% complete. Time: 843.597 sec. [Work thread Oct 3 08:37] M5266619 stage 2 is 65.21% complete. Time: 842.223 sec. [Work thread Oct 3 08:51] M5266619 stage 2 is 70.06% complete. Time: 842.269 sec. [Work thread Oct 3 09:05] M5266619 stage 2 is 74.67% complete. Time: 841.707 sec. [Work thread Oct 3 09:19] M5266619 stage 2 is 79.22% complete. Time: 842.902 sec. [Work thread Oct 3 09:33] M5266619 stage 2 is 83.56% complete. Time: 842.358 sec. [Work thread Oct 3 09:47] M5266619 stage 2 is 87.78% complete. Time: 842.067 sec. [Work thread Oct 3 10:01] M5266619 stage 2 is 91.74% complete. Time: 842.497 sec. [Work thread Oct 3 10:15] M5266619 stage 2 is 95.51% complete. Time: 843.865 sec. [Work thread Oct 3 10:29] M5266619 stage 2 is 99.25% complete. Time: 844.053 sec. [Work thread Oct 3 10:32] M5266619 stage 2 complete. 21204894 transforms. Total time: 17866.727 sec.[/CODE] |
All times are UTC. The time now is 02:02. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.