mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2023-09-21, 15:56   #1398
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Oceanus Procellarum

23×13×29 Posts
Default

Quote:
Originally Posted by rogue View Post
Adding threads won't always scale and sometimes threads are competing for work...
I have seen this firsthand with my recent build. If I allow Prime95 to run unhindered, it will use all 20 threads in this CPU. If I restrict it, the processing rate increases. It took some experimentation to find a good combination of workers and helpers. What I ended up with was four workers with 16 helpers. It was the helpers that did the majority of the work, and at a lower temperature while maintaining a good clock speed.

I learned more is not always the best way to go!
storm5510 is offline   Reply With Quote
Old 2023-09-21, 20:16   #1399
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

32·113 Posts
Default

Quote:
Originally Posted by rogue View Post
The following sieves still use x86 FPU logic either in the worker or for factor validation: afsieve, cwsieve, dmdsieve, fkbnsieve, gfndsieve, k1b2sive, pixsieve, smsieve, xyyxsieve. This means that they will not run on ARM until they are changed to use MpArith or MpArithVec. So if anyone wants to write such routines for those sieves, feel free to take a crack at it.

Some workers have AVX logic (cwsieve, psieve, xyyxsieve), but that is conditionally compiled for x86 CPUs and used only if the x86 CPU supports the AVX functionality that is needed by that worker.
twinsieve, dmdsieve, and fbncsieve also use FPU routines. Is afsieve included because of the assembly file in the folder? That file uses SSE, and I couldn't find any FPU calls in that folder.

Last fiddled with by Happy5214 on 2023-09-21 at 20:18 Reason: Quoting context
Happy5214 is offline   Reply With Quote
Old 2023-09-21, 20:36   #1400
pepi37
 
pepi37's Avatar
 
Dec 2011
After 1.58M nines:)

23·13·17 Posts
Default

Quote:
Originally Posted by rogue View Post
Adding threads won't always scale and sometimes threads are competing for work. Note that the amount of available memory will also have an impact. Note that each worker will be using 8 GB of memory to hold the primes.

Run with -W4 on both machines to see the impact of thrashing on the machine with 28 cores.

You can also set the number of primes per chunk to a fixed value by using -w and adding 'f' to the parameter passed to it. For example -w5e8f will use about 4 GB per worker and won't increase or decrease the number of primes per chunk.

You might also need it to run for a few minutes to reduce the impact of "peaks and valleys" of the rate calculation done when sieving starts.

A single GPU could be faster than all of your CPU cores combined.
So if my machine has 32 GB, usable is lets say 30. then better speed will be made by clever set of -w option. I should set it that every worker have 1 GB so in theory that should increase sieving speed?
Ok lets try that way...
pepi37 is online now   Reply With Quote
Old 2023-09-21, 21:17   #1401
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·467 Posts
Default

Quote:
Originally Posted by Happy5214 View Post
twinsieve, dmdsieve, and fbncsieve also use FPU routines. Is afsieve included because of the assembly file in the folder? That file uses SSE, and I couldn't find any FPU calls in that folder.
afsieve and pixsieve have hand-coded asm routines in .S files. There is no non-x86 equivalent of that code. It shouldn't be too hard to replace with MpArithVec. MpArithVec might actually be faster than the ASM on x86.

Last fiddled with by rogue on 2023-09-21 at 21:18
rogue is offline   Reply With Quote
Old 2023-09-22, 18:00   #1402
pepi37
 
pepi37's Avatar
 
Dec 2011
After 1.58M nines:)

176810 Posts
Default

Quote:
Originally Posted by rogue View Post

You can also set the number of primes per chunk to a fixed value by using -w and adding 'f' to the parameter passed to it. For example -w5e8f will use about 4 GB per worker and won't increase or decrease the number of primes per chunk.
./srsieve2 -P 1e16 -W36 -w1e8f -i b767_n.boinc -O factorsnew.txt
gives me about 39M P/s and use nearly 30GB of ram... on 36 threads...

This "f" is cool thing :)
pepi37 is online now   Reply With Quote
Old 2023-09-23, 14:28   #1403
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

32·113 Posts
Default

I've attached a new patch that finishes off the MpArith rewrite for fkbnsieve, and also fixes a bug in the previous patch. A ton of trailing spaces were deleted (basically, in whatever files I had open), and an unused SSE header was deleted in the srsieve2 code.
Attached Files
File Type: gz fkbnsieve.tar.gz (5.5 KB, 0 views)
Happy5214 is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 11:56.


Sun Sep 24 11:56:15 UTC 2023 up 11 days, 9:38, 0 users, load averages: 0.97, 0.95, 1.00

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔