mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > CADO-NFS

Reply
 
Thread Tools
Old 2020-05-07, 02:52   #23
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

4,021 Posts
Default

Quote:
Originally Posted by axn View Post
Server unreachable for 40 minutes. Problem at my end or server?
All my machines are suffering, too.
EdH is offline   Reply With Quote
Old 2020-05-07, 04:10   #24
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3×1,667 Posts
Default

oops! My home desktop crashed. Seems it somehow took down the server (must've forgotten to run cado within screen). It'll be up momemtarily.
EDIT: Nope, that was merely a coincidence. I get error: too many failed workunits (max 100).
This is not the max-timed-out error; that setting tasks.maxtimedout is set to 5000, plenty.

Seems the buckets-full error Ed gets on a few clients added up. I checked the params.c90 file to see if there's a setting to loosen this and allow the factorization to keep going, but I may need to build a new job in a new folder starting at the Q we left off at. More shortly.

Re-Edit: started CADO anew at Q=57.66M. Building factor base now, should be up in 10-15 min for workunits.

Last fiddled with by VBCurtis on 2020-05-07 at 04:20
VBCurtis is online now   Reply With Quote
Old 2020-05-07, 11:52   #25
axn
 
axn's Avatar
 
Jun 2003

3×17×101 Posts
Default

More problems. Just now, the clients are unable to connect to the server.
axn is offline   Reply With Quote
Old 2020-05-07, 12:40   #26
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

4,021 Posts
Default

Quote:
Originally Posted by axn View Post
More problems. Just now, the clients are unable to connect to the server.
When I woke up the slumberers this morning, they all got in each others' ways because they hadn't gracefully stopped last night, due to the outage. The new run by Curtis might have compounded my local troubles, since the machines (or, at least some of them) wanted new roots1.

I have totally removed all my machines and will hope this allows the server to catch up. I will check later to see if I should add any back. I will have to evaluate my scripts again, to lessen the load.

Although I previously chose to ignore the "full" issue, I did actually address it when I found some, by installing the latest dev on those machines. I also upgraded some others. However, I have found that the latest dev slows my machines by over 10%. This is repeatable. WUs that averaged 14.5 minutes took 16.5. Those that took 27 minutes, took 33, etc. Going back to the earlier installs, took the time back to the earlier lengths. But in all of this, I did keep an eye out for the "full" issues and thought I had minimized them. I must not have minimized them enough.

My apologies for breaking the server. I hope it is back up now that my machines have been removed.
EdH is offline   Reply With Quote
Old 2020-05-07, 13:15   #27
axn
 
axn's Avatar
 
Jun 2003

3×17×101 Posts
Default

Well, FWIW, things are running fine now. Maybe you can start adding the clients back in a staggered fashion?
axn is offline   Reply With Quote
Old 2020-05-07, 14:31   #28
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

4,021 Posts
Default

Quote:
Originally Posted by axn View Post
Well, FWIW, things are running fine now. Maybe you can start adding the clients back in a staggered fashion?
Done so. I think they are all back up, properly. I'll try to keep an eye on this thread. . .

(I had to distribute the roots1 file locally. Some of my machines wanted to d/l the 200+MB file at about 5kB/s! 40000 seconds!?! Not sure what's going on there.)
EdH is offline   Reply With Quote
Old 2020-05-07, 15:52   #29
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3×1,667 Posts
Default

The server is on a university network, and I've experienced 100Mbit speeds downloading relations files from it to my home desktop.

Bummer about the speed loss due to updating software!

Things look good presently from my end. Q=59.9M, 5.8M new relations since the restart last night.
VBCurtis is online now   Reply With Quote
Old 2020-05-07, 22:08   #30
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

4,021 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
The server is on a university network, and I've experienced 100Mbit speeds downloading relations files from it to my home desktop.

Bummer about the speed loss due to updating software!

Things look good presently from my end. Q=59.9M, 5.8M new relations since the restart last night.
Well, I found two more of my machines with "most_full" issues and updated CADO-NFS on both. They immediately slowed by 2+ minutes per run. I'm left with opting for errors or slowing progress. If the errors compound to server stoppages, it's a bigger issue, so I'm choosing slower running when I come across these "exit code 134," "most_full" errors.
EdH is offline   Reply With Quote
Old 2020-05-07, 22:38   #31
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

76658 Posts
Default

Due to some of my research through previous logs, I have come to the idea (possibly false), that the slowdown is actually from CADO-NFS adjusting some internal values to prevent the "most_full" conditions from arising. This comes at a cost to performance.
EdH is offline   Reply With Quote
Old 2020-05-07, 22:40   #32
VBCurtis
 
VBCurtis's Avatar
 
"Curtis"
Feb 2005
Riverside, CA

3·1,667 Posts
Default

If your errors are related to the bkmult bucket full, then the lost speed may be illusory to an extent- your old version may have either run fast with a smaller bucket, or crashed/had to redo a whole workunit on a larger setting. I've set bkmult=1.10 or 1.12 on 190+ jobs because ever newer versions would run into buckets-full, increase bkmult, and restart on a Q. If that happens a bunch, I've found it's faster to just set bkmult from the get-go (I did not set it on this job).

Regardless of the cause of the error: if you crashed every 10 workunits on that machine, but were 10% faster, your new copy of CADO is just as fast as the old one, since it's doing 10 slow WUs in the time the old copy did 11 but failed on one (if the failure took almost the whole time, at least).

I hope that makes you feel a bit better!

I saw errors most often on #40 and #53. I saw 53 today, so I bet you caught that one today too.
VBCurtis is online now   Reply With Quote
Old 2020-05-08, 02:45   #33
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

4,021 Posts
Default

Quote:
Originally Posted by VBCurtis View Post
If your errors are related to the bkmult bucket full, then the lost speed may be illusory to an extent- your old version may have either run fast with a smaller bucket, or crashed/had to redo a whole workunit on a larger setting. I've set bkmult=1.10 or 1.12 on 190+ jobs because ever newer versions would run into buckets-full, increase bkmult, and restart on a Q. If that happens a bunch, I've found it's faster to just set bkmult from the get-go (I did not set it on this job).

Regardless of the cause of the error: if you crashed every 10 workunits on that machine, but were 10% faster, your new copy of CADO is just as fast as the old one, since it's doing 10 slow WUs in the time the old copy did 11 but failed on one (if the failure took almost the whole time, at least).

I hope that makes you feel a bit better!

I saw errors most often on #40 and #53. I saw 53 today, so I bet you caught that one today too.
All my errors were early - between 13 and 27 seconds. #40 was a for sure, but I don't remember if #53 was. I did two or three others than #40.

I'll try to remember to check #53 for sure tomorrow.. I changed #49 from an i5 to an i7 today, to squeeze just a little more from it. #49 was the slowest of my machines at about 33 minutes/WU. With a brand new CADO-NFS install, it's running at around 27/WU now. Not real impressive, but every little bit helps.
EdH is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Team sieve for OPN - 70841^53-1 RichD NFS@Home 26 2016-11-18 07:55
Team sieve #22: c166 from 3270:620 fivemack Aliquot Sequences 55 2011-02-15 23:01
Team Sieve for 2995125705 SlashDude Riesel Prime Search 78 2006-05-14 16:56
Team Sieve grobie Riesel Prime Search 3 2005-11-16 08:46
Team Sieve of 210885 SlashDude 15k Search 21 2003-12-23 16:31

All times are UTC. The time now is 05:08.


Mon Oct 25 05:08:54 UTC 2021 up 93 days, 23:37, 0 users, load averages: 1.22, 1.07, 1.02

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.