mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   CADO-NFS (https://www.mersenneforum.org/forumdisplay.php?f=170)
-   -   Team sieve for Kosta C198 (https://www.mersenneforum.org/showthread.php?t=25492)

VBCurtis 2020-04-26 21:06

Team sieve for Kosta C198
 
If you have some spare cores, my usual server/port is now serving workunits for a C198 from the Kosta numbers (M107^12 + 1). A=30 so about 5.5GB per process. Default is now 4 threads per process, naturally you can set to whatever you like via "--override t {number}" on the command line.

swellman and I plan to sieve this from Q=2-80M, with the 15e queue handling 80-700M. In a sense, a JV version of what we just did for the C217- use CADO for what it is best at on a large sieve region and small Q, and The Cloud in the region that minimizes wasted effort. If it works well, I think we can use this hybrid to factor up to 201 or 202 digit numbers on 15e queue.

I can personally sieve about 1MQ/day, and I'm planning a month of sieving.

axn 2020-04-27 02:59

My cores are occupied for the next 5 days or so, but I can join in afterwards.

So, the exact same command line, and it will pull in the new poly file and root file and start crunching?

Is it better to run one client with all the threads or split into multiple (say 2 or 3) clients on a single CPU with 12 threads?

VBCurtis 2020-04-27 04:23

Same exact command line, correct. I don't know what is most efficient; on my home machine I often run two 6-threaded instances, so that I can kill one when something else catches my attention.
The server itself is running 10 4-threaded instances.

swellman 2020-04-28 19:08

Server seems to be unavailable to both my machines.

EdH 2020-04-28 19:49

I just started adding machines. I hope I didn't break something, again! I did before, but that was because the ones I tried didn't have enough RAM. This time I made sure all the machines had at least 8GB with >6GB free. I had several already running and then added a few more. Maybe the initial d/l for the more recent machines was too heavy.

[later]
I think I will kill all my machines that never finished downloading the initial files and leave those that had failed to upload running and see if that affects the server in a positive manner. . .

[later, still]
It is serving again. I aborted several machines. Is that what fixed it, or just coincidence? I will start adding again, but at a slower pace. . .

[Even later]
It appears that I cannot download the roots1 file in any acceptable time frame and that the d/l keeps the server from performing other tasks while fiddling. Speedtest.net shows my d/l at >22Mbps, but the file size shows only about 20M after a couple minutes. I have taken to manually copying the roots1 file from an already running client into the others I am trying to start. This appears to be working.

VBCurtis 2020-04-28 21:35

Hrmm... I see lots of clients from eFarm, and couple from seven, and the localhost clients are all still running. Work is being accepted and given out.
Strange.
Yield at Q=7M is higher than it was at Q=2M, so I started the sieving a little low.
Holy wow does Ed have some clients going!
eFarm 40 appears to be crashing repeatedly, though.

EdH 2020-04-29 16:44

I added a few more machines today with no d/l issues and, more importantly, I didn't seem to hang the server at all.:smile:

From my vantage eFarm.40 looks like it's running correctly ATM.

Is there (or, will there be) a cloudygo page for this team sieve?

VBCurtis 2020-04-29 16:53

Thanks to Ed and Seth's clients, we're flying through this one!
Presently Q=11.3M, just over 18M relations found.
We started at Q=2M, so yield is a tick below 2 so far. Not great, slightly tempting to use I=16- but at this yield we'll still get ~150M relations from a Q-range that doesn't sieve well on the 15e queue; "free" relations so to speak!

I think on any larger input, we'd do this "free" cado sieving on I=16 to reduce the Q-range to be done on the 15e queue.

Thanks for all the assistance! We've done 5MQ in a day, so just ~14 more days at this pace will complete it to Q=80M. Hopefully that helps Seth decide whether to bother with a cloudygo tracker- I felt bad that we stopped the 2,1165 job like the day after he got the tracker up!

EdH 2020-04-30 14:28

I am seeing an occasional full bucket on more than one machine:
[code]
ERROR:root:Command resulted in exit code 134
ERROR:root:Stderr: code BUG() : condition most_full > 1 failed in reserve at .../las-threads.cpp:123 -- Abort
Aborted (core dumped)
[/code]Is there something I should adjust at this end, or as long as it's only a few, not worry about it?

My current scripts kill the machine until I can review and restart it. I can easily change that to allow it to continue on its own, which is my current plan, unless you have another idea. So far, the restarted machines appear to run fine.

VBCurtis 2020-04-30 15:55

There is a setting "bkmult" that, in my experience, occasionally self-adjusts when a buckets-full message pops up. It usually results in a slower run for that one workunit, but when it happens often I have added a tasks.sieve.bkmult=1.10 (default is 1, from what I can tell, and this only happens on large jobs).

You can try setting --override bkmult=1.10 or 1.12 and see if the errors cease; but I don't know why you get a crash when I just get a slowdown from that particular bucket, so maybe I'm talking about a different "bucket".

You could also try the CADO mailing list on [url]http://cado-nfs.gforge.inria.fr/support.html[/url]
But if that machine isn't using the most-current GIT, that may be fruitless (what dev wants to hear "on the version from 7 months ago, I have this bug...."?).

EDIT: "don't worry about it" is fine, too. You're not getting many bad WUs from this problem.

EdH 2020-04-30 16:04

[QUOTE=VBCurtis;544284]
. . .
EDIT: "don't worry about it" is fine, too. You're not getting many bad WUs from this problem.[/QUOTE]
The failure is easy to skip with my scripts and those machines that have shown it, continued on fine when restarted, so I'll go with, "don't worry about it."


All times are UTC. The time now is 14:52.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.