mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2023-09-18, 18:20   #1387
Happy5214
 
Happy5214's Avatar
 
"Alexander"
Nov 2008
The Alamo City

32×113 Posts
Default

Quote:
Originally Posted by rogue View Post
Simply put, I haven't gotten around to it yet. Some of the sievers using x86 asm only for factor validation. Those have to be fixed. A couple have fairly complex x86 routines that are called. I need to find the time to motivate myself to replace those.
So Montgomery arithmetic is the goal for everything. I hope I can find some other use for the ODROID I bought to test the ARM assembly code.

Accept the attached fkbnsieve patch as an assist. I copied the template from srsieve2. No attribution is needed, as I added nothing original.
Attached Files
File Type: zip fkbnsieve.zip (715 Bytes, 7 views)
Happy5214 is offline   Reply With Quote
Old 2023-09-19, 16:00   #1388
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Oceanus Procellarum

23·13·29 Posts
Default

In my long-run project for Gary Barnes, srsieve2 is not a real problem. It is PFGW. I know, it is not part of the framework. It simply takes too much time, IMO. I dropped the "phase=20000" from the list. 16,000 is as far as I will go with it, which is running now. I anticipate a month to six weeks just to get it that far. Maybe more...
storm5510 is offline   Reply With Quote
Old 2023-09-19, 16:25   #1389
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×467 Posts
Default

Quote:
Originally Posted by Happy5214 View Post
So Montgomery arithmetic is the goal for everything. I hope I can find some other use for the ODROID I bought to test the ARM assembly code.

Accept the attached fkbnsieve patch as an assist. I copied the template from srsieve2. No attribution is needed, as I added nothing original.
Thank you!
rogue is offline   Reply With Quote
Old 2023-09-20, 04:13   #1390
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

2·5,179 Posts
Default

Quote:
Originally Posted by storm5510 View Post
srsieve2 is not a real problem. It is PFGW. <...> It simply takes too much time, IMO.
That means you sieve too less. The problem is usually neither the sieve nor the prime proving program, it is the user, it is called pebkac Try splitting the time conveniently between sieving and proving. Why would you use pfgw, if sieving would eliminate candidates faster? And if that is not the case, if pfgw eliminates candidates faster, you can not cry that "it takes too much time". With a good task split, sieving and proofing would eliminate candidates with the same speed, and then the only complaint could be that testing itself takes too much time, in its entirety. But, if it should be easy, everybody would do it, and we wouldn't argue here about it...

Last fiddled with by LaurV on 2023-09-20 at 04:18
LaurV is offline   Reply With Quote
Old 2023-09-20, 04:55   #1391
gd_barnes
 
gd_barnes's Avatar
 
"Gary"
May 2007
Overland Park, KS

310016 Posts
Default

Quote:
Originally Posted by storm5510 View Post
In my long-run project for Gary Barnes, srsieve2 is not a real problem. It is PFGW. I know, it is not part of the framework. It simply takes too much time, IMO. I dropped the "phase=20000" from the list. 16,000 is as far as I will go with it, which is running now. I anticipate a month to six weeks just to get it that far. Maybe more...
You remember me reminding you that I didn't think you would want to complete that to n=20K? Actually getting to n=16K is quite a stretch but I'm rooting for you to make it.

I don't think you understand the way testing works. Sieving is only 5-10% of any effort. Testing is the other 90-95% of it. If you have sieved about far enough, testing should take ~10-20 times as long as sieving does. If it takes more than 20X then you have likely have not sieved far enough. If it takes less than 10X then you likely sieved too far. In between is the golden zone for CPU efficiency. Srbsieve does pretty well in instructing srsieve2 to hit that zone.

There's nothing wrong with how long the various latest versions of the testing programs such as LLR or PFGW take to test.
gd_barnes is offline   Reply With Quote
Old 2023-09-20, 14:20   #1392
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·467 Posts
Default

I have posted mtsieve 2.5.4 to sourceforge. Here are the changes:

Code:
framework:
   Fix issue where small primes might not be tested because the initial worker
   stops processing its chunk so that another worker can continue.

fkbnsieve: version 1.6
   Remove x86 asm in factor validation.

srsieve2/srsieve2cl: version 1.7.7
   Fix a performance issue when starting with -i with tens of thousands of sequences.
   If you are sieving tens of thousands of sequences avoid using input files where
   k's are not in ascending sequence.  Performance for loading the sequences will
   take a noticeable hit if the input is not sorted by ascending k.
   Fix memory usage upon startup when searching for square free part of k.
   Remove x86 asm use so that it can run on ARM factors.
   Enforce generic sieving for k > 2^63.
The following sieves still use x86 FPU logic either in the worker or for factor validation: afsieve, cwsieve, dmdsieve, fkbnsieve, gfndsieve, k1b2sive, pixsieve, smsieve, xyyxsieve. This means that they will not run on ARM until they are changed to use MpArith or MpArithVec. So if anyone wants to write such routines for those sieves, feel free to take a crack at it.

Some workers have AVX logic (cwsieve, psieve, xyyxsieve), but that is conditionally compiled for x86 CPUs and used only if the x86 CPU supports the AVX functionality that is needed by that worker.
rogue is offline   Reply With Quote
Old 2023-09-20, 15:25   #1393
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Oceanus Procellarum

23·13·29 Posts
Default

Quote:
Originally Posted by gd_barnes View Post
You remember me reminding you that I didn't think you would want to complete that to n=20K? Actually getting to n=16K is quite a stretch but I'm rooting for you to make it.

I don't think you understand the way testing works. Sieving is only 5-10% of any effort. Testing is the other 90-95% of it. If you have sieved about far enough, testing should take ~10-20 times as long as sieving does. If it takes more than 20X then you have likely have not sieved far enough. If it takes less than 10X then you likely sieved too far. In between is the golden zone for CPU efficiency. Srbsieve does pretty well in instructing srsieve2 to hit that zone.

There's nothing wrong with how long the various latest versions of the testing programs such as LLR or PFGW take to test.
I'll get it to 16K. All instances are in the 12K to 16K phase now. Barring any problems, it probably will be late next month or even into November before they finish.

All instances are using the latest releases of srbsieve, srsieve2, and PFGW. I was able to slip in the latter two on-the-fly when they were not being used. srbsieve, I had to stop everything to replace it. There were no problems in restarting each one.

I found each console parked against the left side of the screen this morning. I usually keep them horizontally staggered in the center. It makes it easier to differentiate between each. What happened there, I don't know. Each were still running though.

I have had experience with testing in past years before PRP replaced LL. TF went fast, P-1 took longer, and LL took way longer. I saw a post somewhere a few days ago where an individual was discussing running a PRP on a wavefront exponent. It was going to take 66 days. I don't think I could do that.

When these finish at 16K, I plan to put each instance into a zip file and send them to you via email. The initial single run to 500 will be included.
storm5510 is offline   Reply With Quote
Old 2023-09-20, 16:49   #1394
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24×467 Posts
Default

I use ConsoleZ. It is a Windows app that allows you to open multiple command prompts in a single window, so you have less desktop clutter.
rogue is offline   Reply With Quote
Old 2023-09-20, 18:42   #1395
storm5510
Random Account
 
storm5510's Avatar
 
Aug 2009
Oceanus Procellarum

23·13·29 Posts
Default

Quote:
Originally Posted by rogue View Post
I use ConsoleZ. It is a Windows app that allows you to open multiple command prompts in a single window, so you have less desktop clutter.
Thanks. I will check this out.

Windows Powershell ISE can have multiple tabs but will only run things related to it, like scripts, for example.
storm5510 is offline   Reply With Quote
Old 2023-09-20, 20:24   #1396
pepi37
 
pepi37's Avatar
 
Dec 2011
After 1.58M nines:)

23·13·17 Posts
Default

Quote:
Originally Posted by rogue View Post
I have posted mtsieve 2.5.4 to sourceforge. Here are the changes:

Code:
framework:
   Fix issue where small primes might not be tested because the initial worker
   stops processing its chunk so that another worker can continue.

fkbnsieve: version 1.6
   Remove x86 asm in factor validation.

srsieve2/srsieve2cl: version 1.7.7
   Fix a performance issue when starting with -i with tens of thousands of sequences.
   If you are sieving tens of thousands of sequences avoid using input files where
   k's are not in ascending sequence.  Performance for loading the sequences will
   take a noticeable hit if the input is not sorted by ascending k.
   Fix memory usage upon startup when searching for square free part of k.
   Remove x86 asm use so that it can run on ARM factors.
   Enforce generic sieving for k > 2^63.
The following sieves still use x86 FPU logic either in the worker or for factor validation: afsieve, cwsieve, dmdsieve, fkbnsieve, gfndsieve, k1b2sive, pixsieve, smsieve, xyyxsieve. This means that they will not run on ARM until they are changed to use MpArith or MpArithVec. So if anyone wants to write such routines for those sieves, feel free to take a crack at it.

Some workers have AVX logic (cwsieve, psieve, xyyxsieve), but that is conditionally compiled for x86 CPUs and used only if the x86 CPU supports the AVX functionality that is needed by that worker.
I use this release , latest one.

Code:
 ./srsieve2 -P 1e15 -W 28 -i b767_n.boinc -O factors.txt -f B
srsieve2 v1.7.7, a program to find factors of k*b^n+c numbers for fixed b and variable k and n
Sieving with single sequence c=1 logic for p >= 1957275377549
BASE_MULTIPLE = 30, POWER_RESIDUE_LCM = 720, LIMIT_BASE = 720
Split 1 base 767 sequence into 15 base 767^120 sequences.
Legendre summary:  Approximately 1 bytes needed for Legendre tables
         1 total sequences
         1 are eligible for Legendre tables
         0 are not eligible for Legendre tables
         1 have Legendre tables in memory
         0 cannot have Legendre tables in memory
         0 have Legendre tables loaded from files
         1 required building of the Legendre tables
518400 bytes used for congruent q and ladder indices
259200 bytes used for congruent qs and ladders
Sieve started: 1957275377549 <= p <= 1e15 with 1377 terms (1000102 <= n <= 1099918, k*767^n+1) (expecting 248 factors)
Increasing worksize to 1000000000 since each chunk is tested in less than a second
  p=1961325128843, 29.04M p/sec, 3 factors found at 864 sec per factor (last 2 min), 0.0% done. ETC 2024-09-02 22:02
So it is run on 28 threads, and htop confirm that all threads are at 100%. What is problem that 8 cores ( disabled HT) Ryzen 5700x at 4 Ghz is little slower then this CPU on 28 threads at 2.8 Ghz.

Simple math is show 4*8 is far less then 28*2.8.
Ryzen at 4Ghz has 27M p/sec
Xeon at 2.8 Ghz has from 29 to 32M p/sec

Last fiddled with by pepi37 on 2023-09-20 at 20:56
pepi37 is online now   Reply With Quote
Old 2023-09-21, 12:34   #1397
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

24·467 Posts
Default

Adding threads won't always scale and sometimes threads are competing for work. Note that the amount of available memory will also have an impact. Note that each worker will be using 8 GB of memory to hold the primes.

Run with -W4 on both machines to see the impact of thrashing on the machine with 28 cores.

You can also set the number of primes per chunk to a fixed value by using -w and adding 'f' to the parameter passed to it. For example -w5e8f will use about 4 GB per worker and won't increase or decrease the number of primes per chunk.

You might also need it to run for a few minutes to reduce the impact of "peaks and valleys" of the rate calculation done when sieving starts.

A single GPU could be faster than all of your CPU cores combined.
rogue is offline   Reply With Quote
Reply

Thread Tools


All times are UTC. The time now is 09:51.


Sun Sep 24 09:51:54 UTC 2023 up 11 days, 7:34, 0 users, load averages: 0.74, 0.77, 0.79

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔