mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Software (https://www.mersenneforum.org/forumdisplay.php?f=10)
-   -   Prime95 30.7 (https://www.mersenneforum.org/showthread.php?t=27180)

kriesel 2022-03-02 16:58

[QUOTE=Prime95;600986]Will fix.[/QUOTE]Thank you. I've turned off the get occasional cert work also there, so after the current cert completes today that should be the end of it. If not, will post again.

storm5510 2022-03-25 15:42

A different direction, If I may. An observation:

ECM's. I like ECM's. Most don't seem to give them much attention from the way it looks. Using [I]Prime95 v30.7 B9[/I], there is a long delay between the completion of Stage 1 and the beginning of Stage 2. I compared this with an older version, 29.x, I'm not sure which. There was no delay, but Stage 2 takes somewhat longer to complete. Whatever happens with the latest, doesn't get printed onto the screen until it is complete, then it goes by so rapidly, I can't read all of it. I see the detail line from Stage 1 before.

Perhaps, I am giving it too much Stage 2 RAM for what I am running, 1024 MB. When it begins, Stage 2 states it is using 124 MB. I think I will drop the allocation to 256 MB and see what happens.

I have tested 30.7 B9 on three dissimilar systems with no problems. I also tested [I]mprime[/I], same version, on an Ubuntu system. No issues there either.

tshinozk 2022-04-30 03:29

2 Attachment(s)
On p95v307b9.win64, I can not chage a number of cores when I run the benchmark.
If I set to 1 core, UI shows 1 core but Prime95.exe uses all cores.

p95v308b13.win64 also has same issue.

James Heinrich 2022-04-30 03:35

[QUOTE=tshinozk;604996]If I set to 1 core, UI shows 1 core but Prime95.exe uses all cores.[/QUOTE]Your screenshot shows [c]1-18[/c] for "Number of CPU cores to benchmark".

storm5510 2022-04-30 23:27

[QUOTE=tshinozk;604996]On p95v307b9.win64, I can not chage a number of cores when I run the benchmark.
If I set to 1 core, UI shows 1 core but Prime95.exe uses all cores.

p95v308b13.win64 also has same issue.[/QUOTE]

You're specifying a range. I don't think it works this way. Just put 18.

kriesel 2022-05-01 00:07

1 Attachment(s)
[QUOTE=James Heinrich;604997]Your screenshot shows [c]1-18[/c] for "Number of CPU cores to benchmark".[/QUOTE]Not a problem. See the note at the bottom of the screen capture's benchmark pane. Prime95 explicitly supports ranges, lists, or lists of ranges, benchmarking successively through 1, 2, 3, ... 18 cores on a single worker by entering 1-18 for example.
Check for other applications using lots of cycles. Firefox can be very CPU and memory intensive.
That can really distort both prime95 benchmark results and what Task Manager CPU monitoring show.

Best benchmarking results will be obtained when all other processes practical are idle or absent.

James Heinrich 2022-05-01 00:33

[QUOTE=kriesel;605045]Not a problem.[/QUOTE]It's only a problem because [I]tshinozk[/I] claimed to have entered 1 but got benchmarks (at some point) using all cores.

tshinozk 2022-05-01 14:17

I run some old versions.
From the taskmanger or the results of benchmarks, I can distinguish the issue.

p95v306b4.win64 OK (single core)
Timings for 2048K FFT length (1 core, 1 worker): 5.01 ms. Throughput: 199.74 iter/sec.

p95v307b1.win64 OK (single core), but fail to complete
p95v307b2.win64 NG (all cores)
p95v307b3.win64 NG (all cores)
p95v307b4.win64 fail to run, immediately stop
p95v307b5.win64 NG (all cores)
p95v307b7.win64 NG (all cores)
p95v307b8.win64 NG (all cores)
p95v307b9.win64 NG (all cores)

p95v308b13.win64 NG (all cores)
Timings for 2048K FFT length (1 core, 1 worker): 0.57 ms. Throughput: 1741.37 iter/sec.

tshinozk 2022-05-02 02:22

It seems that AlderLake has the issue.
The result of 1 core shows too fast, even if AlderLake is running over 5GHz.

12900k:
[url]https://mersenneforum.org/showpost.php?p=602718&postcount=64[/url]
Timings for 2048K FFT length (8 cores, 1 worker): 0.62 ms. Throughput: 1602.22 iter/sec.

[url]https://mersenneforum.org/showpost.php?p=602744&postcount=65[/url]
FFTlen=2048K all-complex, Type=3, Arch=8, Pass1=128, Pass2=16384, clm=4 (1 core, 1 worker): 0.62 ms. Throughput: 1624.29 iter/sec.

12700K:
[url]https://mersenneforum.org/showpost.php?p=605017&postcount=69[/url]
Timings for 2048K FFT length (1 core, 1 worker): 4.57 ms. Throughput: 218.94 iter/sec.
Timings for 2048K FFT length (8 cores, 1 worker): 0.64 ms. Throughput: 1564.59 iter/sec.
It appears that this is normal.

Zhangrc 2022-05-02 10:28

[QUOTE=tshinozk;605105]It seems that AlderLake has the issue.
The result of 1 core shows too fast, even if AlderLake is running over 5GHz.

12900k:
[url]https://mersenneforum.org/showpost.php?p=602718&postcount=64[/url]
Timings for 2048K FFT length (8 cores, 1 worker): 0.62 ms. Throughput: 1602.22 iter/sec.

[url]https://mersenneforum.org/showpost.php?p=602744&postcount=65[/url]
FFTlen=2048K all-complex, Type=3, Arch=8, Pass1=128, Pass2=16384, clm=4 (1 core, 1 worker): 0.62 ms. Throughput: 1624.29 iter/sec.

It appears that this is normal.[/QUOTE]

Maybe all-complex FFT with AVX-512 instruction set is faster.
FFT uses complex numbers, if we compute a complex number directly instead of computing the real and imaginary part respectively, we could get an over 2x speedup.

tshinozk 2022-05-03 02:00

"Benchmark all-complex FFTs" option is not much faster as normal in my machine with AVX512.

Timings for 2048K FFT length (1 core, 1 worker): 0.65 ms. Throughput: 1535.94 iter/sec.

Timings for 2048K all-complex FFT length (1 core, 1 worker): 0.65 ms. Throughput: 1546.01 iter/sec.

And both have the issue.(running using all cores)

James Heinrich 2022-05-03 02:08

[QUOTE=tshinozk;605158]And both have the issue.(running using all cores)[/QUOTE]Have you fixed the issue where you're entering [c]1-18[/c] for "CPU cores to benchmark"? Change that to [c]1[/c] if you only want to test 1 core...

tshinozk 2022-05-03 02:32

No.
Prime95.exe uses all cores, even if I enter 1 in "Number of CPU cores to benchmark" textbox.

Timings for 2048K FFT length (1 core, 1 worker): 0.61 ms. Throughput: 1627.02 iter/sec.
Timings for 2100K FFT length (1 core, 1 worker): 0.75 ms. Throughput: 1338.92 iter/sec.
Timings for 2160K FFT length (1 core, 1 worker): 0.80 ms. Throughput: 1243.79 iter/sec.


Throughput for 1 core is expected to around 100-200 iter/sec for such FFT length

tshinozk 2022-05-03 05:08

1 Attachment(s)
"FFT timings benchmark" does not have the issue.
I can see the multi-core scaling.

Timing FFTs using 1 core:
Best time for 2048K FFT length: 4.987 ms., avg: 5.001 ms.
Timing FFTs using 2 cores:
Best time for 2048K FFT length: 2.606 ms., avg: 2.891 ms.
Timing FFTs using 3 cores:
Best time for 2048K FFT length: 1.768 ms., avg: 2.066 ms.
Timing FFTs using 4 cores:
Best time for 2048K FFT length: 1.328 ms., avg: 1.853 ms.

tshinozk 2022-05-04 06:38

1 Attachment(s)
I try reduceing the active cores in BIOS setup.
Even if I activate only 4 cores (disabling 14 cores) and Hyperthread is off , Prime95.exe still uses all cores.


Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
CPU speed: 3286.47 MHz, 4 cores
Timings for 2048K FFT length (1 core, 1 worker): 1.42 ms. Throughput: 703.46 iter/sec.

Micke 2022-05-05 10:20

*edit*
I just don't use this bugged old version anymore, which is listed as actual version on [URL]https://www.mersenne.org/download/#download[/URL].

tshinozk 2022-05-05 14:23

1 Attachment(s)
I run mprime on linux (Mint) in the same machine, in turn.

Timings for 2048K FFT length (1 core, 1 worker): 0.47 ms. Throughput: 2119.90 iter/sec.

mprime still uses all cores.
This result corresponds to the case of 12900k in post #203.

tshinozk 2022-05-06 02:03

2 Attachment(s)
My BIOS (X299) can not disable AVX-512, while it seems RocketLake can.
I found that "noxsave" kernel parameter disables AVX.
[url]https://stackoverflow.com/questions/13965178/how-do-i-disable-avx-instructions-on-a-linux-computer[/url]

I run linux with "noxsave" for the boot option of GRUB.
As a result, all AVX are disabled, and Throughput shows very slow for SSE.
mprime still uses all cores, while the old version is not so.

Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
CPU features: Prefetchw, SSE, SSE2, SSE4

Prime95 64-bit version 30.7
Timings for 2048K FFT length (1 core, 1 worker): 1.14 ms. Throughput: 880.96 iter/sec.
Timings for 2048K FFT length (2 cores, 1 worker): 1.14 ms. Throughput: 879.03 iter/sec.

Prime95 64-bit version 30.6
Timings for 2048K FFT length (1 core, 1 worker): 16.28 ms. Throughput: 61.41 iter/sec.
Timings for 2048K FFT length (2 cores, 1 worker): 8.29 ms. Throughput: 120.60 iter/sec.

S485122 2022-05-06 05:37

[QUOTE=tshinozk;605315]My BIOS (X299) can not disable AVX-512, while it seems RocketLake can.
I found that "noxsave" kernel parameter disables AVX.
[url]https://stackoverflow.com/questions/13965178/how-do-i-disable-avx-instructions-on-a-linux-computer[/url]

I run linux with "noxsave" for the boot option of GRUB.
As a result, all AVX are disabled, and Throughput shows very slow for SSE.
...[/QUOTE]
It might be easier to tell Prime95 whether you want to use AVX512 features : stop (and exit ?) mprime, edit local.txt.

From undoc.txt :
[code]The program supports many different code paths for PRP/LL testing depending on
the CPU type. It also has a few different factoring code paths. You can
force the program to choose a specific code path by setting the proper
combination of these settings in local.txt:
CpuSupportsRDTSC=0 or 1
CpuSupportsCMOV=0 or 1
CpuSupportsPrefetch=0 or 1
CpuSupportsSSE=0 or 1
CpuSupportsSSE2=0 or 1
CpuSupports3DNow=0 or 1
CpuSupportsAVX=0 or 1
CpuSupportsFMA3=0 or 1
CpuSupportsFMA4=0 or 1
CpuSupportsAVX2=0 or 1
CpuSupportsAVX512F=0 or 1
This shouldn't be necessary though as the program uses the CPUID instruction
to see if the CPU supports these features.[/code]

kruoli 2022-05-06 08:24

It [I]should[/I] not be necessary to disable AVX-512 to get a working-as-expected configuration. But I concur with Jacob to try if disabling AVX-512 the way he described leads to the same behaviour.

@tshinozk: Have you had a look on v30.8 already? While is not ready-to-release yet, it is worth having a look at it to check whether it has this bug fixed. If not, you can post to the 30.8 thread that the bug is also present there and I guess it is more likely to get attention there.

tshinozk 2022-05-06 15:55

v30.8b14 has the issue, starting from v30.7b2 .

I try running prime95.exe with CpuSupportsXXX in local.txt .
I can disable AVXs, but the issue is not resolved.

Can nobody reproduce the issue where prime95 cannot change the number of cores in benchmark?

James Heinrich 2022-05-06 16:50

[QUOTE=tshinozk;605348]v30.8b14 has the issue, starting from v30.7b2[/QUOTE]As [i]kruoli[/i] said, if the problem exists in the current development version you may want to post in the [url=https://www.mersenneforum.org/showthread.php?t=27366]v30.8 thread[/url] since that version is currently being worked on (and v30.7 is not).

kriesel 2022-05-06 16:54

v30.7b9, i5-1035G1, & v30.8b14 benchmarking
 
5 Attachment(s)
Yes.
Mprime/prime95 normally runs at lower priority to other user applications and system tasks. It yields the CPU for those higher priority tasks as much as they can use. With significant system load, benchmarking on available cycles on half the cores or less may look very similar to benchmarking on all of them.
The following is a quick benchmark on my laptop I'm typing on now, which has lots of tabs in Firefox open, and numerous (dozens) remote desktop sessions going as client (display). Firefox alone was using around 2 cores' throughput out of the 4 real on this Windows 10 system.[CODE][May 6 11:13:06] Worker starting
[May 6 11:13:06] Your timings will be written to the results.bench.txt file.
[May 6 11:13:06] Compare your results to other computers at http://www.mersenne.org/report_benchmarks
[May 6 11:13:06] Benchmarking multiple workers to measure the impact of memory bandwidth
[May 6 11:13:07] Timing 2048K FFT, 1 core, 1 worker. Average times: 5.59 ms. Total throughput: 179.00 iter/sec.
[May 6 11:13:22] Timing 2048K FFT, 1 core hyperthreaded, 1 worker. Average times: 6.72 ms. Total throughput: 148.83 iter/sec.
[May 6 11:13:38] Timing 2048K FFT, 2 cores, 1 worker. Average times: 5.35 ms. Total throughput: 187.04 iter/sec.
[May 6 11:13:53] Timing 2048K FFT, 2 cores hyperthreaded, 1 worker. Average times: 6.63 ms. Total throughput: 150.94 iter/sec.
[May 6 11:14:09] Timing 2048K FFT, 3 cores, 1 worker. Average times: 5.58 ms. Total throughput: 179.11 iter/sec.
[May 6 11:14:25] Timing 2048K FFT, 3 cores hyperthreaded, 1 worker. Average times: 5.90 ms. Total throughput: 169.37 iter/sec.
[May 6 11:14:41] Timing 2048K FFT, 4 cores, 1 worker. Average times: 5.46 ms. Total throughput: 183.05 iter/sec.
[May 6 11:14:56] Timing 2048K FFT, 4 cores hyperthreaded, 1 worker. Average times: 5.86 ms. Total throughput: 170.73 iter/sec.
[May 6 11:15:12]
[May 6 11:15:12] Throughput benchmark complete.
[May 6 11:15:12] Throughput benchmark complete.
[May 6 11:15:12] Worker stopped.
[/CODE]Those timings are suspiciously similar even given the other system loads. See first two attachments.

So, switching to its hardware twin "martinette", no Firefox running, only 1 VNC remote desktop server running, and prime95 V30.8b14, CPU utilization ~5-10% with prime95 paused, benchmarking 1-4 cores with & without hyperthreading, 1 worker, Windows 11 TaskManager shows all physical cores running saturated regardless of indicated core count during prime95 benchmarking; logical cores' cpu loading % are somewhat affected by hyperthreading or not. (attachments 3 - 5)


Another oddity is that it would not run 8192 fft benchmarking. In v30.7b9 I could specify 2048-2048 as fft lengths to benchmark, 1-4 cores, 1 worker only.

V30.8b14 exited benchmarking instantly, to stopped state, if I specified 8192-8192. As does v30.7b9, for 4096 only or 8192 only. There might be more. 6144 behaved ok.

kruoli 2022-05-06 18:55

[QUOTE=kriesel;605353]See first two attachments.[/QUOTE]

The first attachment is especially worrysome because main memory and both the SSD are fully utilized. What are "other system loads"? Do they justify this loads? If not, it looks like extensive swapping.

kriesel 2022-05-06 20:32

[QUOTE=kruoli;605360]What are "other system loads"?[/QUOTE]Mostly Firefox-gone-wild. I've dumped a lot of tabs which gave very temporary relief. It gets in strange states sometimes if left running a long time, including when it has downloaded a FF update but not yet applied it & restarted FF. I tend to leave it running as long as possible with multiple Google Colab sessions, and many other tabs open. Looks like time for a clean start again RSN.
Checking just now, Help, About Firefox includes "Restart to update Firefox".
And after temporarily taming Firefox, prime95 v30.7b9 demonstrates the same all-physical-cores-busy-when-benchmarking-1 issue as v30.8b14.

Prime95 2022-05-06 21:32

[QUOTE=tshinozk;605348]v30.8b14 has the issue, starting from v30.7b2 .[/QUOTE]

Try v30.8b15

kriesel 2022-05-06 22:10

[QUOTE=Prime95;605369]Try v30.8b15[/QUOTE]Downloadable where?
Last I found in the v30.8 beta thread is b14, [url]https://mersenneforum.org/showpost.php?p=603242&postcount=501[/url]

James Heinrich 2022-05-06 22:16

[QUOTE=kriesel;605373]Downloadable where?[/QUOTE]Usual place, you just haven't given George time to post his changelog and links in the thread.
[url]https://www.mersenne.org/ftp_root/gimps/p95v308b15.win64.zip[/url]

edit: George has posted now: [url]https://www.mersenneforum.org/showpost.php?p=605385&postcount=551[/url]

tshinozk 2022-05-07 01:46

Thanks,
v30.8b15 fixed the issue.
I can find a sweet spot for multi-core scaling.

Timings for 2048K FFT length (1 core, 1 worker): 5.16 ms. Throughput: 193.83 iter/sec.
Timings for 2048K FFT length (2 cores, 1 worker): 2.63 ms. Throughput: 380.49 iter/sec.
...

kriesel 2022-05-13 23:26

Prime95 v30.7b9 undoc.txt says:
[CODE]In prime.txt,
ProofPower=x (x can be from 5 to 12)
ProofPowerAdjust=y (y can be from -2 to +3)
these override the selection of proof power based on temporary disk space available.[/CODE]I've put ProofPower=11 or =12 in prime.txt (before the [primenet] or [worker] sections) of various installs of prime95 V30.7b9 and v30.8b14.
It seems to have NO effect.

I still get proof power 9 for 60M PRP DC assignments, or power 10 for 109M PRP assignments.
The .residues files are preallocated to ~4 and 14GB (not the 32-160 GB resource limit). Worker window text is also consistent with default proof powers not the specified values.
Stopping and exiting the program, switching to ProofPowerAdjust=3, deleting the brief run attempts' residue and save files, and starting again from scratch, also fails on v30.7b9 and v30.8b14; resulted in 4 and 14GB residue files again, and the worker windows still indicate 4 GB and 14GB residue file required, and proof power 9 and 10, as if ProofPowerAdjust=0.

Forum search "ProofPower" shows that it worked for some users and not others, back at v30.4/5/6. Even if placed in both prime.txt and local.txt.

(My current use case is to get actual prime95 squarings counts for high proof powers on several sample exponents, quickly, without running large exponents that would default to high proof powers, take months each to complete, and not help the wavefronts. Trying to fill a table for powers 5-12 for reference info and understanding actual costs versus exponent of various proof powers.)

tdulcet 2022-05-17 12:21

[QUOTE=kriesel;605844]I've put ProofPower=11 or =12 in prime.txt (before the [primenet] or [worker] sections) of various installs of prime95 V30.7b9 and v30.8b14.
It seems to have NO effect.[/QUOTE]

Those options are for setting the maximum PRP proof power, but it will not let one set a higher proof power than the optimal for each exponent. You can override this by putting it in the [C][PrimeNet][/C] section, as Prime95/MPrime allows the PrimeNet server to reduce the proof power.

kriesel 2022-05-19 16:49

[QUOTE=tdulcet;605976]You can override this by putting it in the [C][PrimeNet][/C] section, as Prime95/MPrime allows the PrimeNet server to reduce the proof power.[/QUOTE]Thanks for the tip. That was not clear from the docs. Putting both[code]ProofPower=12
ProofPowerAdjust=3[/code]in the [PrimeNet] section of prime.txt & stop, exit & restart the app seems to be working; worker window includes:
[code][May 19 11:08] Setting affinity to run helper thread 1 on CPU core #2
[May 19 11:08] Setting affinity to run helper thread 2 on CPU core #3
[May 19 11:08] Setting affinity to run helper thread 3 on CPU core #4
[May 19 11:08] Starting Gerbicz error-checking PRP test of M61686073 using AVX-512 FFT length 3360K, Pass1=640, Pass2=5376, clm=1, 4 threads
[May 19 11:08] Preallocating disk space for the proof interim residues file p68B6073.residues
[May 19 11:09] PRP proof using power=[B]12[/B] and 64-bit hash size.
[May 19 11:09] Proof requires [B]31.6GB[/B] of temporary disk space and uploading a 100MB proof file.
[May 19 11:10] Iteration: 10000 / 61686073 [0.01%], ms/iter: 8.488, ETA: 6d 01:24
[May 19 11:12] Iteration: 20000 / 61686073 [0.03%], ms/iter: 8.185, ETA: 5d 20:12
...[/code]Being able to cause higher than optimal proof levels is the difference between getting level 11 squarings counts in several days at 60M, or in several months at 415+M, and getting any level 12 squarings counts from prime95 soon or never. (For proof effort [URL="https://www.mersenneforum.org/showpost.php?p=604768&postcount=24"]reference info;[/URL] link may change; & for George re actual observed algorithm behavior for proof level exponent transitions selection, documentation)

tdulcet 2022-05-20 09:01

[QUOTE=kriesel;606107]Thanks for the tip. That was not clear from the docs. Putting both[code]ProofPower=12
ProofPowerAdjust=3[/code]in the [PrimeNet] section of prime.txt & stop, exit & restart the app seems to be working[/QUOTE]

Yeah, it is an undocumented option only designed to be used by the PrimeNet server in an emergency. See @Prime95's comment (source: [C]commonb.c[/C], lines 11,827-11,828):
[CODE] // We have a way for the PrimeNet server to change (reduce) the proof power. This would reduce server proof
// processing load somewhat and reduce the bandwidth required for obtaining proof files. We hope to never use this option.[/CODE]Note that only [C]ProofHashLength[/C], [C]ProofPower[/C] and [C]ProofPowerMult[/C] can be set in the [C][PrimeNet][/C] section.

[QUOTE=kriesel;606107]Being able to cause higher than optimal proof levels is the difference between getting level 11 squarings counts in several days at 60M, or in several months at 415+M, and getting any level 12 squarings counts from prime95 soon or never.[/QUOTE]

Yes, this is currently the only way to use proof power 12 in Prime95/MPrime. You actually may be able to go higher than 12 (or lower than 5) if you wanted, as from looking at the code, I do not see any obvious assert statements or other checks like GpuOwl has. Anyway, I am looking forward to seeing your results.

tha 2022-05-20 14:35

If one adds lines to worktodo and then start mprime -m, the program starts by reading the worktodo file and makes reservations with the server. The new assignment keys are written to the worktodo file. Upon completing this nothing happens, and nothing signals it is waiting for input.

On the other hand, after choosing option 4, mprime starts by resending the menu to the output as if it is expecting further input.

So, requested is:

A: Output of the menu after finishing making reservations with the server.
B. Suppressing the output of the menu after menu option 4 is chosen.


All times are UTC. The time now is 12:40.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.