mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > YAFU

Reply
 
Thread Tools
Old 2022-03-22, 02:09   #1
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

22×3×383 Posts
Default SIQS Across Multiple Machines

SIQS is already multi-threaded, but is it (at least) theoretically possible to run it across more machines, possibly via openmpi? Or, is it already available and I'm just not aware of it?
EdH is offline   Reply With Quote
Old 2022-03-22, 09:24   #2
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

2·5·599 Posts
Default

Quote:
Originally Posted by EdH View Post
SIQS is already multi-threaded, but is it (at least) theoretically possible to run it across more machines, possibly via openmpi? Or, is it already available and I'm just not aware of it?
Not aware of this being available. For numbers that could vaguely benefit from this cado probably isn't a bad option.
henryzz is offline   Reply With Quote
Old 2022-03-22, 13:31   #3
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

70438 Posts
Default

It is not a built-in capability, but it is possible to do. (I have done it, e.g., on this C130.) "Just" start N instances of yafu on N machines in N different folders. The options -siqsT (time limit) or -siqsR (relations limit) could be used if needed. Manual gathering and post-processing tests would be required.
bsquared is offline   Reply With Quote
Old 2022-03-22, 13:59   #4
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

10001111101002 Posts
Default

Quote:
Originally Posted by bsquared View Post
It is not a built-in capability, but it is possible to do. (I have done it, e.g., on this C130.) "Just" start N instances of yafu on N machines in N different folders. The options -siqsT (time limit) or -siqsR (relations limit) could be used if needed. Manual gathering and post-processing tests would be required.
Thanks! I will work on some scripts or such. Are you saying, simply supplying siqs(<composite>) -siqsT <time> on each can be combined on one for the final count? Would I simply cat the relations to siqs.dat? And, would I need to stop and restart the host machine for it to find all the relations?
EdH is offline   Reply With Quote
Old 2022-03-22, 14:16   #5
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

7·11·47 Posts
Default

Quote:
Originally Posted by EdH View Post
Thanks! I will work on some scripts or such. Are you saying, simply supplying siqs(<composite>) -siqsT <time> on each can be combined on one for the final count? Would I simply cat the relations to siqs.dat? And, would I need to stop and restart the host machine for it to find all the relations?
I think yes, to all. Each instance is started with random polynomials so siqs(<composite>) -siqsT <time> is sufficient. cat all relation files together. restart one process on the combined file for post processing. repeat as necessary. if you do have to repeat, you'd have to delete each remote siqs.dat first, otherwise all of them will resume and the next time you combine .dat files you'll have many duplicates. I recommend backing up the combined file if you can before post processing, just to prevent blunders.

If you are thinking of doing this on C105+ numbers for some reason then 3LP is faster, but only with parameters that are not auto-selected. Also I haven't tested 3LP on many machines; it might get tricky with the batch factoring that is used in 3LP. We can talk over strategies if this is something you're interested in.
bsquared is offline   Reply With Quote
Old 2022-03-22, 15:09   #6
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

22×3×383 Posts
Default

Thanks! I'm kind of only interested in playing at this point. It may not lead to anything. I'm running ECM on a cluster with ecmpi and NFS with CADO-NFS/Msieve across the same set of machines, but I have whichever host is up, currently running SIQS <100 digits for those that fail ECM. A couple of my hosts have 40 threads and run SIQS pretty quickly, but others are a bit more limited, so I thought I'd look at distributing SIQS. The 40 thread hosts are often tied up for many hours on Msieve LA. I may get somewhere or I may abandon the idea.
EdH is offline   Reply With Quote
Old 2022-03-23, 12:36   #7
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

459610 Posts
Default

Hey Ben,

I think I've laid out a concept for some scripts, but I'm wondering about needed relations. Is that value calculated or from a table?

Maybe I'll make up another "How I. . ." thread, with a comparison between YAFU SIQS and CADO-NFS on the same machines. The tune function sets the crossover for working on a single machine, but for my current setup, I use a much lower crossover because SIQS is limited to the host machine, but CADO-NFS includes a bunch of external clients. I don't expect that to change if I distribute SIQS, but I'd like to "play" and see what results.

Thanks,
-Ed
EdH is offline   Reply With Quote
Old 2022-03-23, 12:55   #8
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

7×11×47 Posts
Default

Quote:
Originally Posted by EdH View Post
Hey Ben,

I think I've laid out a concept for some scripts, but I'm wondering about needed relations. Is that value calculated or from a table?

Maybe I'll make up another "How I. . ." thread, with a comparison between YAFU SIQS and CADO-NFS on the same machines. The tune function sets the crossover for working on a single machine, but for my current setup, I use a much lower crossover because SIQS is limited to the host machine, but CADO-NFS includes a bunch of external clients. I don't expect that to change if I distribute SIQS, but I'd like to "play" and see what results.

Thanks,
-Ed
If you are talking normal SIQS and not 3LP, then there is no table or calculation. Raw relations are added directly to a tree structure that maintains an exact count of cycles on the fly. So we stop gathering as soon as we have enough cycles. That's on one machine, where the master process has control over all relations found. For multiple machines you'll probably have to develop your own table of approximate raw relation counts to gather for a given input size.

The tune.c file in yafu has raw relation counts for its built-in numbers, as a start:

Code:
uint32_t siqs_actualrels[NUM_SIQS_PTS] = {17136, 32337,
		63709, 143984, 242825, 589192, 847299, 1272852, 1709598};

double siqs_sizes[NUM_SIQS_PTS] = {60, 65, 70, 75, 80, 85, 90, 95, 100};
3LP is done differently but if your crossover is that low then 3LP will not be useful.
bsquared is offline   Reply With Quote
Old 2022-03-23, 15:31   #9
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

22·3·383 Posts
Default

Quote:
Originally Posted by bsquared View Post
. . .
3LP is done differently but if your crossover is that low then 3LP will not be useful.
Thanks! If distributed SIQS is easy to implement and at all effective, it might move that crossover up. Then maybe 3LP might even be something to play with.

As an aside, factordb has become a poor source for composites. I tried several 70-80 digit composites from its list and they all crashed SIQS with "c not divisible by 4 in Q2(x) variation!" messages. After too much testing, I discovered it was due them all being made up of entirely small factors. I had to construct my own test composites.
EdH is offline   Reply With Quote
Old 2022-03-23, 19:30   #10
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

107648 Posts
Default

I guess I've been successful! I have scripts that distribute the SIQS across four machines total and factors are returned. I started with a host that has 24 threads and added one client with 8 threads and two clients with 4 threads each. All are running YAFU 2.07.

After a small amount of adjusting/balancing of -siqsR I compared a normal run on the host (24 threads) to a distributed run with all four machines (40 threads total). The single machine run for a c90 took 128 seconds, while the distributed run took 89 seconds. The relations were well above the listed requirement, so more adjustment could trim even more time.

I then tried the runs with a c100. The single machine took 14:18, while the distributed run took 11:58. However, I'm sure I can adjust that down. But, I'm totally lost in the relations counts. Although I showed a combined total of 112466 relations when 120304 were needed, I got these lines on the final run:
Code:
414622 rels found: 50508 full + 364114 from 3307115 partial, (1569949.62 rels/sec)
. . .
414622 rels found: 50508 full + 364114 from 3307115 partial, (1569940.81 rels/sec)
414622 rels found: 50508 full + 364114 from 3307115 partial, (1565675.04 rels/sec)

now at relation 500001
now at relation 1000001
The balancing is definitely going to be complicated. I'm going to have to take into account every machine that I add, because I have such a variety. But, maybe once I do some calculations, it can be set in place with somewhat permanent values for a permanent set.
EdH is offline   Reply With Quote
Old 2022-03-23, 20:06   #11
bsquared
 
bsquared's Avatar
 
"Ben"
Feb 2007

E2316 Posts
Default

Quote:
Originally Posted by EdH View Post
But, I'm totally lost in the relations counts. Although I showed a combined total of 112466 relations when 120304 were needed, I got these lines on the final run:
Code:
414622 rels found: 50508 full + 364114 from 3307115 partial, (1569949.62 rels/sec)
. . .
414622 rels found: 50508 full + 364114 from 3307115 partial, (1569940.81 rels/sec)
414622 rels found: 50508 full + 364114 from 3307115 partial, (1565675.04 rels/sec)

now at relation 500001
now at relation 1000001
First off, good work! A reduction in time is great for your mix of cpus.

To hopefully shed some light on relation counts: what we really need are N cycles (120304 of them, in your case). These are formed by combining full and partial relations. When yafu says something like
A rels found: B full + C from D partial
what it means is that it has A total cycles: B of them are from full relations (no large primes) and C of them are cycles generated from partial relations (some number of large primes). D is the count of raw partial relations, and it's D that you want to split among processors. My hint above from the tune.c file shows the number of D relations you need to gather, approximately, for a c100. It looks instead like you may be trying to split A among processors.

-siqsR specifies the number of D relations (raw relations) to gather before stopping.

The (rels/sec) figure that is reported is also raw (D) relations per sec. So if you know the approximate rels/s figure for each of your computers over a range of input sizes, then you should be able to compute how long to run a job by dividing the total D relations needed by the rels/s sum of participating computers.

Last fiddled with by bsquared on 2022-03-23 at 20:18
bsquared is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
SIQS on GPU cgy606 Factoring 1 2016-10-21 13:37
Monitoring multiple client machines? dans Software 5 2010-01-03 04:30
Running on multiple machines Helfire Software 8 2004-01-14 00:09
Testing same number on multiple machines dotnet_dev Software 17 2003-06-16 14:30
Multiple systems/multiple CPUs. Best configuration? BillW Software 1 2003-01-21 20:11

All times are UTC. The time now is 05:01.


Sat Jun 25 05:01:08 UTC 2022 up 72 days, 3:02, 0 users, load averages: 1.49, 1.47, 1.25

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔