mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2022-10-02, 20:03   #749
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·19·181 Posts
Default

Quote:
Originally Posted by kriesel View Post
Two outstanding issues as far as I know (haven't tested v30.8b17 yet)

1) Observed on Windows 7 Pro x64, dual Xeon E5-2670, prime95 V30.8b15, using start /Node 0 or 1, /affinity 0x5555, running two instances, intended as one each side of the QPI;
when a worker window assigns cores, the following message is produced repeatedly, with variety of hex values consisting of 3 or c at various offsets:
Error setting affinity to cpuset 0x000000c0: No error
(refer to attachment of https://mersenneforum.org/showpost.p...&postcount=731)
3 or c is 0011 or 1100.
Windows' numbering representation of the two logical cores of a x2 hyperthreaded physical core #0 is 0,1,
while Linux's is 0,n where n is number of physical hyperthreaded cores present in the system.
So it appears to me that prime95, a Windows application, may be using an inappropriate affinity mask for Windows. We don't usually want two prime95 compute threads running on the same physical core.
... except when using hyperthreading while performing TF, which we'd rather do on GPUs anyway for performance. I've been starting the dual instances with the equivalent of /node 0 and /node 1 of the following,
Code:
start /node 0 /affinity 0x5555 /d "C:\Users\ ... \prime95-x64" prime95.exe
It never occurred to me that prime95 would default to no hyperthreading for fft multiplies, and generally benchmark as better performance without using hyperthreading, yet insist on being able to set both hyperthreads of the same physical cores as available, even trying to override the user.

I should be using 0xFFFF, which allows 16 threads on 8 cores? And let the threads flop about between logical processors? Seems counter to documentation's guidance. From readme.txt:
Code:
Use hyperthreading
------------------

Except for trial factoring, which is best left for GPUs to do, hyperthreading often offers no performance
benefit while using more electricity.  You can try test if hyperthreading speeds up your worker windows by
selecting these options.
Or for that matter, "not recommended" built into the GUI dialog boxes.

I found in early experimentation that without the affinity mask included, the Windows 7 start /Node 1 command did not properly place the second instance on NUMA node 1, instead running both instances on the various hyperthreads of node 0 (CPU package 0), saturating all its logical processors, and leaving the second processor package idle, reducing performance.

To restate:
Code:
start /node 0 /affinity 0x5555 /d "C:\Users\ ... \prime95-x64" prime95.exe
start /node 1 /affinity 0x5555 /d "C:\Users\ ... \2\prime95-x64" prime95.exe
Good performance, but error messages;

without affinity masks,
Code:
start /node 0 /d "C:\Users\ ... \prime95-x64" prime95.exe
start /node 1 /d "C:\Users\ ... \2\prime95-x64" prime95.exe
No error messages, but both land on node 0, so half the performance.

Using affinity mask 0xFFFF in the start command seems to resolve the error messages, but the selection of hyperthreads is irregular in that case. Its hyperthread use is also not stable over time, switching between logical cores of a physical core somewhat.
Attached Thumbnails
Click image for larger version

Name:	no HT configuration prime95.png
Views:	7
Size:	159.5 KB
ID:	27405   Click image for larger version

Name:	logical core hopping without hyperthread enabled in prime95.png
Views:	5
Size:	44.7 KB
ID:	27406  

Last fiddled with by kriesel on 2022-10-02 at 20:43
kriesel is offline   Reply With Quote
Old 2022-10-02, 22:18   #750
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·19·181 Posts
Default Rapid logical processor switching

On the dual E5-2670 Win 7, again:

Node 1, using affinity mask 0xFFFF, is alternately using all 16 logical cores, rapidly changing between them. This node is doing P-1. Affinity related error messages remain, plus there's a new variant. The error message does not relate to proof storage optional directory, or temporary directory, because those fields are blank. The path specified in the start command is known to be valid since it was copied/pasted from Explorer, and is on the system's boot drive.
Code:
[Oct 2 15:55:30] Worker starting
[Oct 2 15:55:30] Setting affinity to run worker on CPU core #1
[Oct 2 15:55:30] Error setting affinity to cpuset 0x00000003: No such file or directory
[Oct 2 15:55:30] Optimal P-1 factoring of M116509073 using up to 49152MB of memory.
[Oct 2 15:55:30] Assuming no factors below 2^77 and 1.3 primality tests saved if a factor is found.
[Oct 2 15:55:30] Optimal bounds are B1=805000, B2=201211000
[Oct 2 15:55:30] Chance of finding a factor is an estimated 5.23%
[Oct 2 15:55:30] 
[Oct 2 15:55:32] Setting affinity to run helper thread 1 on CPU core #2
[Oct 2 15:55:32] Setting affinity to run helper thread 2 on CPU core #3
[Oct 2 15:55:32] Using AVX FFT length 6400K, Pass1=640, Pass2=10K, clm=2, 4 threads
[Oct 2 15:55:32] Setting affinity to run helper thread 3 on CPU core #4
[Oct 2 15:55:32] Error setting affinity to cpuset 0x0000000c: No error
[Oct 2 15:55:32] Error setting affinity to cpuset 0x00000030: No error
[Oct 2 15:55:32] Error setting affinity to cpuset 0x000000c0: No error
[Oct 2 15:55:33] Ignoring suggested B2 value, using B2=201653100 from the save file
[Oct 2 15:55:37] Available memory is 49089MB.
[Oct 2 15:55:39] Setting affinity to run helper thread 1 on CPU core #2
[Oct 2 15:55:39] Setting affinity to run helper thread 2 on CPU core #3
[Oct 2 15:55:39] Error setting affinity to cpuset 0x00000030: No error
[Oct 2 15:55:39] Switching to AVX FFT length 7M, Pass1=448, Pass2=16K, clm=4, 4 threads
[Oct 2 15:55:39] Estimated stage 2 vs. stage 1 runtime ratio: 0.890
[Oct 2 15:55:39] Error setting affinity to cpuset 0x0000000c: No error
[Oct 2 15:55:39] Setting affinity to run helper thread 3 on CPU core #4
[Oct 2 15:55:39] Error setting affinity to cpuset 0x000000c0: No error
[Oct 2 15:55:40] Using 49055MB of memory.  D: 1650, 200x687 polynomial multiplication.
[Oct 2 15:55:51] Setting affinity to run polymult helper thread on CPU core #2
[Oct 2 15:55:51] Setting affinity to run polymult helper thread on CPU core #3
[Oct 2 15:55:51] Setting affinity to run polymult helper thread on CPU core #4
[Oct 2 15:55:51] Error setting affinity to cpuset 0x0000000c: No error
[Oct 2 15:55:51] Error setting affinity to cpuset 0x00000030: No error
[Oct 2 15:55:51] Error setting affinity to cpuset 0x000000c0: No error
[Oct 2 15:58:41] Stage 2 init complete. 5235 transforms. Time: 186.005 sec.
[Oct 2 15:59:52] M116509073 stage 2 at B2=58167450 [16.98%]
[Oct 2 16:25:20] M116509073 stage 2 at B2=72373950 [25.20%].  Time: 1528.234 sec.
[Oct 2 16:51:05] M116509073 stage 2 at B2=86580450 [33.42%].  Time: 1544.864 sec.
Attached Thumbnails
Click image for larger version

Name:	rapid logical processor switching.png
Views:	4
Size:	267.1 KB
ID:	27407  
kriesel is offline   Reply With Quote
Old 2022-10-02, 23:22   #751
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

2·19·181 Posts
Default

Same system as previous post:
A second instance, specified in start command to run on NUMA node 1, without affinity mask used, lands on node 0 instead. But no error messages. Note first instance's iteration time more than doubled, and NUMA node 1 is idle.
Attached Thumbnails
Click image for larger version

Name:	node 1 without affinity mask lands on node 0 instead.png
Views:	6
Size:	269.5 KB
ID:	27408  
kriesel is offline   Reply With Quote
Old 2022-10-04, 23:35   #752
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Not U. + S.A.

2×3×389 Posts
Default

Quote:
Originally Posted by kriesel View Post
Same system as previous post:
A second instance, specified in start command to run on NUMA node 1, without affinity mask used, lands on node 0 instead. But no error messages. Note first instance's iteration time more than doubled, and NUMA node 1 is idle.
NUMA: Non-Uniform Memory Access.

Off-topic: Why would the BIOS in a single CPU system have this enabled? It is on my old Xeon...
storm5510 is offline   Reply With Quote
Old 2022-10-05, 17:08   #753
bplenhart
 
"Brian Lenhart"
Oct 2013

11 Posts
Default FreeBSD version of mprime

Are there any plans to update the FreeBSD version past 30.7b9?
bplenhart is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Do not post your results here! kar_bon Prime Wiki 40 2022-04-03 19:05
what should I post ? science_man_88 science_man_88 24 2018-10-19 23:00
Where to post job ad? xilman Linux 2 2010-12-15 16:39
Moderated Post kar_bon Forum Feedback 3 2010-09-28 08:01
Something that I just had to post/buy dave_0273 Lounge 1 2005-02-27 18:36

All times are UTC. The time now is 11:38.


Thu Oct 6 11:38:07 UTC 2022 up 49 days, 9:06, 0 users, load averages: 1.68, 1.35, 1.21

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔