![]() |
[QUOTE=chalsall;588494]Coordination of the concurrency of processes is a non-trivial problem space.
... Talented humans are ***very*** expensive... :smile:[/QUOTE]Except when they're free. George does what he does not for the money. After watching George deliver very well for a quarter century, it seems clear to me he's up to the task. Multiple workers using multiple cores each, plus a PrimeNet communications thread. Maybe it just goes on a to-do-someday list beneath some other priorities. Or maybe there are good reasons not to try it, that I'm unaware of. Gpuowl source provides an example of how the GCD parallelism may be handled. Different situation GPU & CPU combined there, but still. For proof space preallocation, the potential time saving is smaller, but one could compute a time estimate for space preallocation and a time estimate for when depositing the first proof residue will be needed, and only parallelize when there's a comfortable time margin, and also ensure it wait for completion of preallocation. V30.7 is in preparation. AFAIK this includes P-1 speed improvements in primes pairing, & Alder Lake support. Not sure what else. |
[QUOTE=kriesel;588497]Except when they're free. George does what he does not for the money.[/QUOTE]
Time is the fundamental currency. Perfect is the enemy of good. [URL="https://www.youtube.com/watch?v=v0nmHymgM7Y"]Much like a poem, software is never finished. Simply abandoned.[/URL] George /might/ have made a conscious decision that the effort required (including all the "in the wild" debugging) was not worth the tiny amount of throughput which /might/ be gained. Or, maybe, he's just busy with other stuff... :tu: |
Let's ballpark these for ppm of system productivity.
P-1 GCD 1 hour at 880M on Xeon Phi 7210. I have another similar-exponent P-1 that's projecting a week left to go for about half of stage 2. So let's assume 30 days for both stages on 880M, 7210; 60 minutes x 2 stages / (30x24x60) x 15/16 ~2600 ppm = 0.26% of P-1 time, which is ~1/40 of PRP time, so ~62. ppm of exponent (TF + P-1 + PRP) time. That might become worthwhile to pursue at some point, depending on what other optimization opportunities remain and effort needed. Preallocate PRP proof space 3 minutes at 500M on Xeon Phi 7210. Forecast PRP time 328.5 days ~473040 minutes. 3/473040 x 15cores/16cores= 6. ppm of PRP time. That would need to be a very quick modification to be worth the programming and test time. Seems unlikely. |
[QUOTE=kriesel;588500]Let's ballpark these for ppm of system productivity.[/QUOTE]
Let's... :wink: You are working at the extreme edge. I understand the reasoning, but I would argue this should not inform "general policy". My P-1'ers (using mprime (Linux64,Prime95,v30.5,build 2)) are currently taking about 5 seconds for the GCDs (single-threaded). Not a problem, in my Universe. |
Assuming the ~p[SUP]2.1[/SUP] scaling also applies to GCD operations, and you're doing ~[B]106M[/B] P-1, there's a factor of ~4.2 unexplained difference in GCD speed in your favor. Maybe faster cores giving faster GCDs, and correspondingly faster stages too.
Timing I gave for large exponent was using ~10GB in stage 2, prime95 V30.6b4. edit: chalsall's small exponent ~[B]27.4M[/B] more than explains the rest of the speed ratio. 5.05sec x 2 /2hr29min = 0.11% potential speedup for him. Except, i3-9100 is 4-core no hyperthreading. Gpouwl's parallelism came about because Mihai took pity on my multi-Radeon VII/slow-cpu-forGCD P-1 factory, which spent ~5 minutes of a 40 minute wavefront P-1 factoring in single-cpu-core GCD with the GPU idle and waiting. System didn't have enough max ram to support dual-instance P-1 on its GPUs to mitigate it. 40/35 = 14.% P-1 speedup via speculative parallelism. As always, George's call what is worth George's time, and not worthwhile. |
[QUOTE=kriesel;588502]...there's nearly a factor of 5 unexplained difference in GCD speed in your favor.[/QUOTE]
All I can do is give you my empirical. [CODE][Work thread Sep 23 09:30] M27430621 stage 1 complete. 2997862 transforms. Time: 2750.641 sec. [Work thread Sep 23 09:30] Starting stage 1 GCD - please be patient. [Work thread Sep 23 09:30] Stage 1 GCD complete. Time: 5.052 sec. [Work thread Sep 23 09:30] D: 462, relative primes: 857, stage 2 primes: 3303121, pair%=90.33 [Work thread Sep 23 09:30] Using 9996MB of memory. [Work thread Sep 23 09:30] Stage 2 init complete. 7751 transforms. Time: 15.059 sec. [Work thread Sep 23 11:12] M27430621 stage 2 complete. 4016210 transforms. Time: 6135.924 sec. [Work thread Sep 23 11:12] Starting stage 2 GCD - please be patient. [Work thread Sep 23 11:12] Stage 2 GCD complete. Time: 5.054 sec. [Work thread Sep 23 11:12] M27430621 completed P-1, B1=1039000, B2=56821000, Wi8: C6D8FB56 [Comm thread Sep 23 11:12] Sending result to server: UID: [redacted]/usbenv, M27430621 completed P-1, B1=1039000, B2=56821000, Wi8: C6D8FB56, AID: 8B45B0E3C88E84E8B42236C07C5F070A [Work thread Sep 23 11:58] M27430643 stage 1 complete. 2997862 transforms. Time: 2749.861 sec. [Work thread Sep 23 11:58] Starting stage 1 GCD - please be patient. [Work thread Sep 23 11:58] Stage 1 GCD complete. Time: 5.043 sec. [Work thread Sep 23 11:58] D: 462, relative primes: 857, stage 2 primes: 3303121, pair%=90.33 [Work thread Sep 23 11:58] Using 9996MB of memory. [Work thread Sep 23 11:59] Stage 2 init complete. 7751 transforms. Time: 15.055 sec. [Work thread Sep 23 13:41] M27430643 stage 2 complete. 4016210 transforms. Time: 6143.559 sec. [Work thread Sep 23 13:41] Starting stage 2 GCD - please be patient. [Work thread Sep 23 13:41] Stage 2 GCD complete. Time: 5.052 sec. [Work thread Sep 23 13:41] M27430643 completed P-1, B1=1039000, B2=56821000, Wi8: C6B2FB4A [Comm thread Sep 23 13:41] Sending result to server: UID: [redacted]/usbenv, M27430643 completed P-1, B1=1039000, B2=56821000, Wi8: C6B2FB4A, AID: 30BC556ED1625FFF02A0B1960F00B038[/CODE] [CODE][chalsall@usbwalker prime]$ cat /proc/cpuinfo | grep name model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz model name : Intel(R) Core(TM) i3-9100 CPU @ 3.60GHz[/CODE] |
[QUOTE=kriesel;588502]edit: chalsall's small exponent ~[B]27.4M[/B] more than explains the rest of the speed ratio. 5.05sec x 2 /2hr29min = 0.11% potential speedup for him.[/QUOTE]
I would argue that for future readers it might have been more valuable for you to quote my message to yours in a new post, rather than editing your post speaking to my subsequent post. I deeply appreciate your curation skills, Ken. :tu: It's a job description that few appreciate. And those that do, would only take on if the subject domain was important enough... |
"Sending interim residue" Mxxx / AID
Prime95 30.6 b4
Nothing dramatic but an inconsistency nevertheless [noparse];-)[/noparse] When interim residues are sent to the server, between or during the periodic communication the format of the output to the screen and the prime.log file has the following format :[code][Comm thread Sep 16 19:22] Sending interim residue 40000000 for M58193041[/code] But when the residues are sent together with a result the AID is used instead of the M followed by the exponent :[code][Comm thread Sep 16 23:26] Sending interim residue 55000000 for assignment 172076D8AD6993D981F397637613B8DC [Comm thread Sep 16 23:26] Sending result to server: UID: S485122/i9-10920X, M58193041 is not prime. Res64: 1B5E1783A3861E57. Wh4: 67E20740,22995864,00000000, AID: 172076D8AD6993D981F397637613B8DC[/code] (Never mind the AID in clear : the assignment has been completed the assignment and its ID are bygones.) |
Found the following in an mprime run log immediately after starting mprime v30.6b4:[CODE][Main thread Sep 30 12:04] Mersenne number primality test program version 30.6 [Main thread Sep 30 12:04] Optimizing for CPU architecture: Core i3/i5/i7, L2 cache size: 256 KB, L3 cache size: 55 MB
[Main thread Sep 30 12:04] Starting worker. [Main thread Sep 30 12:04] Stopping all worker windows. [Work thread Sep 30 12:04] Worker starting [Work thread Sep 30 12:04] Worker stopped. [Main thread Sep 30 12:04] Execution halted. [Main thread Sep 30 12:04] Choose Test/Continue to restart[/CODE]That's hard to do when it's a Google Colab background process, no menu, no keyboard, no means of input. Stop and Continue the notebook section seems to have worked. No idea what caused the immediate stop. |
Any chance we could write PRP results to [C]results.txt[/C] too?
I understand that [C]results.txt[/C] has been deprecated in favor of the JSON file, but it would be nice to have data that is more human-readable. Or as a compromise, could we have an option to "pretty print" the JSON strings? |
[QUOTE=ixfd64;589095]Any chance we could write PRP results to [C]results.txt[/C] too?
I understand that [C]results.txt[/C] has been deprecated in favor of the JSON file, but it would be nice to have data that is more human-readable. Or as a compromise, could we have an option to "pretty print" the JSON strings?[/QUOTE]If by pretty-print you mean presenting JSON over multiple lines with indenting and such then no, as this will break manual results which is based on the assumption that one-line=one-result. I have no objection if George wants to add output to the non-JSON output, but support for any new format will not be added to manual results parsing (we don't want users submitting less data). I'm curious what part you find less-than-readable about the JSON results? If it would be universally considered helpful the JSON elements could be re-ordered without causing any problems. |
All times are UTC. The time now is 22:59. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.