"Ed Hall"
SIQS Across Multiple Machines
SIQS is already multithreaded, but is it (at least) theoretically possible to run it across more machines, possibly via openmpi? Or, is it already available and I'm just not aware of it?

"David"
"David"
Not aware of this being available. For numbers that could vaguely benefit from this cado probably isn't a bad option.

"Ben"
It is not a builtin capability, but it is possible to do. (I have done it, e.g., on this C130.) "Just" start N instances of yafu on N machines in N different folders. The options siqsT (time limit) or siqsR (relations limit) could be used if needed. Manual gathering and postprocessing tests would be required.

"Ed Hall"
"Ben"
If you are thinking of doing this on C105+ numbers for some reason then 3LP is faster, but only with parameters that are not autoselected. Also I haven't tested 3LP on many machines; it might get tricky with the batch factoring that is used in 3LP. We can talk over strategies if this is something you're interested in. 

"Ed Hall"
Thanks! I'm kind of only interested in playing at this point. It may not lead to anything. I'm running ECM on a cluster with ecmpi and NFS with CADONFS/Msieve across the same set of machines, but I have whichever host is up, currently running SIQS <100 digits for those that fail ECM. A couple of my hosts have 40 threads and run SIQS pretty quickly, but others are a bit more limited, so I thought I'd look at distributing SIQS. The 40 thread hosts are often tied up for many hours on Msieve LA. I may get somewhere or I may abandon the idea.

"Ed Hall"
Hey Ben,
I think I've laid out a concept for some scripts, but I'm wondering about needed relations. Is that value calculated or from a table? Maybe I'll make up another "How I. . ." thread, with a comparison between YAFU SIQS and CADONFS on the same machines. The tune function sets the crossover for working on a single machine, but for my current setup, I use a much lower crossover because SIQS is limited to the host machine, but CADONFS includes a bunch of external clients. I don't expect that to change if I distribute SIQS, but I'd like to "play" and see what results. Thanks, Ed 
"Ben"
The tune.c file in yafu has raw relation counts for its builtin numbers, as a start: Code:
uint32_t siqs_actualrels[NUM_SIQS_PTS] = {17136, 32337, 63709, 143984, 242825, 589192, 847299, 1272852, 1709598}; double siqs_sizes[NUM_SIQS_PTS] = {60, 65, 70, 75, 80, 85, 90, 95, 100}; 

"Ed Hall"
As an aside, factordb has become a poor source for composites. I tried several 7080 digit composites from its list and they all crashed SIQS with "c not divisible by 4 in Q2(x) variation!" messages. After too much testing, I discovered it was due them all being made up of entirely small factors. I had to construct my own test composites. 

"Ed Hall"
I guess I've been successful! I have scripts that distribute the SIQS across four machines total and factors are returned. I started with a host that has 24 threads and added one client with 8 threads and two clients with 4 threads each. All are running YAFU 2.07.
After a small amount of adjusting/balancing of siqsR I compared a normal run on the host (24 threads) to a distributed run with all four machines (40 threads total). The single machine run for a c90 took 128 seconds, while the distributed run took 89 seconds. The relations were well above the listed requirement, so more adjustment could trim even more time. I then tried the runs with a c100. The single machine took 14:18, while the distributed run took 11:58. However, I'm sure I can adjust that down. But, I'm totally lost in the relations counts. Although I showed a combined total of 112466 relations when 120304 were needed, I got these lines on the final run: Code:
414622 rels found: 50508 full + 364114 from 3307115 partial, (1569949.62 rels/sec) . . . 414622 rels found: 50508 full + 364114 from 3307115 partial, (1569940.81 rels/sec) 414622 rels found: 50508 full + 364114 from 3307115 partial, (1565675.04 rels/sec) now at relation 500001 now at relation 1000001 
"Ben"
To hopefully shed some light on relation counts: what we really need are N cycles (120304 of them, in your case). These are formed by combining full and partial relations. When yafu says something like A rels found: B full + C from D partial what it means is that it has A total cycles: B of them are from full relations (no large primes) and C of them are cycles generated from partial relations (some number of large primes). D is the count of raw partial relations, and it's D that you want to split among processors. My hint above from the tune.c file shows the number of D relations you need to gather, approximately, for a c100. It looks instead like you may be trying to split A among processors. siqsR specifies the number of D relations (raw relations) to gather before stopping. The (rels/sec) figure that is reported is also raw (D) relations per sec. So if you know the approximate rels/s figure for each of your computers over a range of input sizes, then you should be able to compute how long to run a job by dividing the total D relations needed by the rels/s sum of participating computers. Last fiddled with by bsquared on 20220323 at 20:18 

