mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Data > Marin's Mersenne-aries

Closed Thread
 
Thread Tools
Old 2017-05-31, 03:08   #1475
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

2×3×1,637 Posts
Default

Quote:
Originally Posted by GP2 View Post
OK. I'm monitoring each milestone and at the 4M mark, all four exponents match so far.
Looks like the issue was overheating - After an outright system freeze-up during the night, David replaced the 'decent-quality' fan-based cooler on the Ryzen with a water cooler. Restarted my octet of 3072K DCs - which were throwing fatal-ROE-and-interval-retries more than once per hour (among the 8 jobs in total) yesterday - 3 hours ago, no sign of anything but steady crunching since.

The restart also gave me a chance to try one more throughput-related test - I queued up the 8 DCs I just grabbed above in addition to the original 8 (all @3072K) and assigned one to each of the 16 logical cores of the system [8 physical cores, each mapping to 2 logical]. That proved very bad - 8 jobs (on cores 0,2,4,6,8,10,12,14, i.e. 1 per physical core using AMD's core-numbering system) yields 0.042 sec/iter for each for a total throughput of 8/.042 = 190 iter/sec, but 16 jobs (1 each on core 0-15) pushes that up to .11 sec/iter for a total throughput of 16/.11 = 145 iter/sec, a massive 25% drop.

Fingers crossed that all the overheating-data-corruption badness didn't hose any of the 8 original DCs - Gord, how are those 4 TC residues looking compared to the ones I posted?
ewmayer is online now  
Old 2017-05-31, 14:45   #1476
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

1004010 Posts
Default

45699887
matched
kladner is online now  
Old 2017-06-02, 03:26   #1477
GP2
 
GP2's Avatar
 
Sep 2003

29·89 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Just finished these DCs [53647547,53648423,53648893,53648981] on the Ryzen system - all 4 final residues mismatch those of the first-test submission.
Very close to the halfway mark on the triple-check run: three exponents are at 26M and one laggard is at 25M.

All the interim residues are matching yours so far.

If there is some secret algorithm to make Mlucas robust on flaky hardware, you should let George know so he can implement it for mprime too.

Let me know if any exponents in the new batches of 8 + 8 end up mismatching.

Last fiddled with by GP2 on 2017-06-02 at 03:28
GP2 is offline  
Old 2017-06-02, 07:16   #1478
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
Rep├║blica de California

2·3·1,637 Posts
Default

Quote:
Originally Posted by GP2 View Post
Very close to the halfway mark on the triple-check run: three exponents are at 26M and one laggard is at 25M.

All the interim residues are matching yours so far.

If there is some secret algorithm to make Mlucas robust on flaky hardware, you should let George know so he can implement it for mprime too.
Thanks! I confess I'm (pleasantly) shocked that none of the multiple fatal-ROE-retries in my runs have been accompanied by a silent-but-deadly data corruption. Wish I could claim credit for some 'secret sauce' behind that, but that would be dishonest. If and when I garner more than a trivial user base, we may get information on whether the alleged robustness is in any way systematic, or whether this particular instance of hardware flakiness is a lucky fluke.

FYI, here is the number of such occurrences in each of my 4 runs, based on grepping the exponent status file:

p53647547.stat:62
p53648423.stat:53
p53648893.stat:57
p53648981.stat:72
ewmayer is online now  
Old 2017-06-05, 16:09   #1479
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

29·113 Posts
Default Weird list...

Here's a strange list of 5 exponents that have a very high chance of being done wrong the first time. All 5 are currently assigned which means technically you'd be poaching them, but all of them are assigned to "anonymous" users back in Sep/Oct of 2014 and haven't been heard from in years.

Since they wouldn't normally expire for many years to come (when the double-checking gets up in to the 70M range) we may as well check them now since the assignments have clearly been abandoned.

Code:
DoubleCheck=73964809,75,1
DoubleCheck=73965077,75,1
DoubleCheck=73919003,75,1
DoubleCheck=78181099,75,1
DoubleCheck=73681423,75,1
They have bad/good ratios from 4.5 up to 21.0 (that last one) for any given month... This user/computer has been on our target list for a while due to a period of several months when it just churned out one bad result after another, and a LOT of them (30+ in a single month somehow).

If there are other likely cases of a "very likely bad" result with an assignment that's so old it's probably abandoned, I'll put those up later as well, but these 5 have been bugging me, just sitting there...

EDIT: In addition to those 5, there were only a couple others that fall into the category of "assigned, but really really old". So, here are all 7 of those along with the relevant stats so you can see just what the ratios look like:
Code:
exponent	Bad	Good	Unk	Sus	Solo	Mis	worktodo
73681423	21	1	1	0	1	0	DoubleCheck=73681423,75,1
74207999	14	1	1	0	1	0	DoubleCheck=74207999,75,1
73964809	25	3	2	0	2	0	DoubleCheck=73964809,75,1
73965077	25	3	2	0	2	0	DoubleCheck=73965077,75,1
73919003	33	5	1	0	1	0	DoubleCheck=73919003,75,1
78181099	27	6	1	0	1	0	DoubleCheck=78181099,75,1
66882859	2	0	1	0	1	0	DoubleCheck=66882859,74,1
Those 2 new additions were last updated 1 to nearly 2 years ago.

Last fiddled with by Madpoo on 2017-06-05 at 17:26
Madpoo is offline  
Old 2017-06-05, 23:59   #1480
bgbeuning
 
Dec 2014

22·32·7 Posts
Default

Queued up the first 5

Quote:
DoubleCheck=73964809,75,1
DoubleCheck=73965077,75,1
DoubleCheck=73919003,75,1
DoubleCheck=78181099,75,1
DoubleCheck=73681423,75,1
But they do not show up as reserved and
my "manual comm" button in prime95 is disabled.
bgbeuning is offline  
Old 2017-06-06, 01:06   #1481
GP2
 
GP2's Avatar
 
Sep 2003

29×89 Posts
Default

Quote:
Originally Posted by bgbeuning View Post
Queued up the first 5



But they do not show up as reserved and
my "manual comm" button in prime95 is disabled.
They can't be reserved, since other (anonymous) users have already reserved them.

However, those anonymous users have not been heard from since 2014, so they are probably not going to complete the exponents.

You can just run the exponents yourself despite the lack of a reservation and the program will report the results to PrimeNet in the usual way.
GP2 is offline  
Old 2017-06-06, 07:59   #1482
GP2
 
GP2's Avatar
 
Sep 2003

29·89 Posts
Default

The triple checks on exponents 53648893, 53648981, 53648423, 53647547 will finish within a few hours, in that order. The first one is already at 51M and the slowest is almost at 49M. All interim residues are still matching.

Since it seems almost certain that the first-time LL tests (all by the same user) were wrong, here are some more exponents by that same user that are relatively close in time frame and in exponent value:

DoubleCheck=53643523,73,1
DoubleCheck=53643913,73,1
DoubleCheck=53644573,73,1
DoubleCheck=53644883,73,1
DoubleCheck=53647171,73,1
DoubleCheck=53648677,73,1
DoubleCheck=53648729,73,1
DoubleCheck=53679289,73,1
GP2 is offline  
Old 2017-06-06, 14:02   #1483
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2·3·1,193 Posts
Default

i took:

DoubleCheck=53643523,73,1
Prime95 is offline  
Old 2017-06-06, 16:17   #1484
Madpoo
Serpentine Vermin Jar
 
Madpoo's Avatar
 
Jul 2014

1100110011012 Posts
Default

Quote:
Originally Posted by GP2 View Post
...
Since it seems almost certain that the first-time LL tests (all by the same user) were wrong, here are some more exponents by that same user that are relatively close in time frame and in exponent value...
FYI, that particular user/cpu has a mixed history with an overall track record of 52 bad and 178 good.

Here is the breakdown by year/month, so depending on when a result came in it may have better/worse odds:
Code:
YYYY-MM	Bad	Good	Unknown
2008-11	3	2	0
2008-12	2	0	0
2009-1	0	14	7
2009-3	0	2	1
2009-4	0	2	3
2009-5	0	4	7
2009-6	0	1	1
2009-7	0	1	4
2009-8	0	7	2
2009-9	0	1	1
2009-10	0	2	4
2009-11	0	2	6
2009-12	0	4	3
2010-1	0	0	4
2010-2	0	2	1
2010-3	0	2	6
2010-4	0	0	4
2010-5	0	1	7
2010-6	0	4	2
2010-7	0	5	3
2010-8	0	3	6
2010-9	0	2	0
2010-10	0	8	3
2010-11	0	8	1
2010-12	0	2	3
2011-1	0	1	14
2011-2	0	1	8
2011-3	0	1	7
2011-4	0	0	10
2011-5	0	3	8
2011-6	0	1	14
2011-7	0	4	10
2011-8	0	1	11
2011-9	0	2	12
2011-10	0	2	17
2011-11	0	0	8
2011-12	0	0	24
2012-1	0	1	2
2012-2	9	7	2
2012-3	0	1	22
2012-4	3	2	0
2012-5	1	3	17
2012-7	0	2	31
2012-8	1	2	17
2012-9	0	3	27
2012-10	1	1	14
2012-11	1	4	17
2012-12	0	0	23
2013-1	0	0	9
2013-2	0	0	17
2013-3	0	0	18
2013-4	0	0	24
2013-5	0	0	6
2013-6	0	0	22
2013-7	0	2	14
2013-8	0	1	17
2013-9	0	0	17
2013-10	0	0	28
2013-11	1	2	19
2013-12	0	1	13
2014-1	1	5	11
2014-2	7	6	11
2014-3	5	9	19
2014-4	6	4	7
2014-6	0	2	21
2014-7	2	2	16
2014-8	1	2	13
2014-9	0	0	15
2014-10	0	0	14
2014-11	0	0	10
2014-12	0	0	14
2015-2	0	0	17
2015-4	0	0	14
2015-6	0	0	19
2015-9	2	2	8
2015-10	2	2	4
2015-11	1	2	1
2015-12	0	0	4
2016-1	0	0	4
2016-3	0	0	12
2016-5	0	0	4
2016-6	0	0	8
2016-8	0	1	16
2016-10	0	0	6
2016-12	0	0	7
2017-1	3	2	0
2017-2	0	8	3
2017-3	0	3	4
2017-4	0	0	4
2017-6	0	1	3
Madpoo is offline  
Old 2017-06-06, 16:52   #1485
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

11011111101102 Posts
Default

Quote:
Originally Posted by Madpoo View Post
FYI, that particular user/cpu has a mixed history with an overall track record of 52 bad and 178 good.
I think it would be a good idea to strategically double-check at least half of this users exponents. Based on the data that comes back, we might then double-check the other half.
Prime95 is offline  
Closed Thread

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Double-Double Arithmetic Mysticial Software 50 2017-10-30 19:16
Clicking an exponent leads to 404 page marigonzes Information & Answers 2 2017-02-14 16:56
x.265 half the size, double the computation; so if you double again? 1/4th? jasong jasong 7 2015-08-17 10:56
What about double-checking TF/P-1? 137ben PrimeNet 6 2012-03-13 04:01
Double the area, Double the volume. Uncwilly Puzzles 8 2006-07-03 16:02

All times are UTC. The time now is 20:05.

Mon Nov 23 20:05:13 UTC 2020 up 74 days, 17:16, 3 users, load averages: 2.49, 2.60, 2.52

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.