mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-11-08, 15:44   #1
cdepth
 
Nov 2018

1 Posts
Default Weird Jacobi check behaviour

Hello everyone :)

I upgraded my Prime95 software to the latest Win64 version a couple of weeks ago. It worked like a charm until yesterday, when I had to manually stop the program for a few minutes. After resuming it, the Jacobi check run on one of my assignments failed. Then it performed the Jacobi check on some backup files, which kept failing until one of them (written at a previous manual stop) passed the test. Nothing so far seems strange, but every time a Jacobi check is performed on the assignment, it fails and it restores the same backup file, even if Prime95 has only been running for a few seconds since restoring the backup file.

My wild guess is that the residue (Ri) at that backup file is, unexpectedly, not 2 less than a square mod Mp (so (Ri + 2 / Mp) = -1) but it does pass the Jacobi test as it is also not 2 more than a square mod Mp (so (Ri - 2 / Mp) = -1), so the next residue (Ri+1) minus 2 has Jacobi symbol (Ri+1 - 2 / Mp) = (Ri2 - 4 / Mp) = ((Ri + 2)(Ri - 2) / Mp) = (Ri + 2 / Mp) (Ri - 2 / Mp) = (-1)(-1) = 1 and every subsequent residue is 1 as all the future residues plus 2 are a square (unless another hardware error kicks in). I do not really know whether the software also checks that (Ri + 2 / Mp) equals 1, though.
cdepth is offline   Reply With Quote
Old 2018-11-08, 16:11   #2
GP2
 
GP2's Avatar
 
Sep 2003

3·863 Posts
Default

I think a Jacobi check in an LL test only has a 50% chance of detecting an error, as opposed to a Gerbicz error check in a PRP test, which is much more foolproof. So maybe your "good" backup file is actually bad, too.

Last fiddled with by GP2 on 2018-11-08 at 16:14
GP2 is offline   Reply With Quote
Old 2018-12-18, 15:19   #3
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

14A16 Posts
Default

I ran into something vaguely similar now.

I've had a long pause of not doing anything GIMPS related, but managed to scrape together a very low budget setup that has a tolerable speed for current FFT lengths. (AMD Ryzen 3 2200G paired with the second cheapest memory, DDR4 2666 MHz CL19). Ran various kinds of hardware tests to make sure that everything is working well. My decision from the beginning was not to do any CPU overclocking, but maybe tweak the memory timings a bit. I managed to get the memory clock up to 3333 MHz and stable; stable here meaning that it ran Memtest86 for about a day with no hiccups. So I thought it's good enough for Prime95 as well, and to err on the safe side, backed down to 3200 MHz.

Well, no. I didn't get any error messages as such during the work, but the Jacobi check would fail and then progress would revert quite a long way back before it would find a save file that would check OK. And in one case it went all the way back to iteration 0 since all it had was bad files.

It seems that the default Jacobi error checking interval is 12 hours, and Prime95 saves files every 30 minutes and keeps three backups. So it is very well possible that all saved files are bad, in the default state, since the oldest saved backup can contain the situation several hours after the last Jacobi test.

Two things fixed it, though. One bodge (since the hardware was still flaky as I later found out) and then the proper one after that.

For the bodge, there are a couple settings in undoc.txt that can help. In prime.txt I set:
JacobiErrorCheckingInterval=3
JacobiBackupFiles=4

So when it finds an error, not that much work is lost. But the problem is indeed, that the Jacobi test only catches 50% of the errors.

The proper fix, of course, was to back down the memory clock to the specified 2666 MHz. Over 24h later and no further errors have been detected. I'm not willing to say that it's stable now, but perhaps after it's been running a week or two and has produced several matching double-check LL residues.

And in any case, I found that the memory overclock didn't bring any truly noticeable gains in performance. The difference between 3200 and 2666 MHz was around 1% in iteration time. Not worth it at all if it means losing work at random and having to backtrack to an earlier state. Now running at about 5.52 ms/iter with all four cores on the same worker. According to the benchmarks, the throughput is slightly better when running four worker threads as well, but I'll switch to that only when the system has been stable and productive for a while longer.
nomead is offline   Reply With Quote
Old 2018-12-18, 17:37   #4
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

117668 Posts
Default

In undoc.txt:
Quote:
You can control how often Jacobi error checking is run. Default is 12 hours.
If a Jacobi test takes 30 seconds, then the default represents an overhead of
30 / (12 * 60 * 60) or 0.07% overhead. Each Jacobi test has a 50% chance of
discovering if a hardware error has occured in the last time interval. In prime.txt:
JacobiErrorCheckingInterval=N (default is 12)
where N is in hours.
You can control how many save files are kept that have passed the Jacobi error check.
This value is in addition to the value set by the NumBackupFiles setting. So if
NumBackupFiles=3 and JacobiBackupFiles=2 then 5 save files are kept - the first three
may or may not pass a Jacobi test, the last two save files have passed the Jacobi error
check. In prime.txt:
JacobiBackupFiles=N (default is 2)
The Jacobi check is somewhat time consuming, so by default it is done rarely. Since it is limited to 50% detection rate, you may want to switch it from LL testing to PRP after finishing the current assignment(s). That's easy to do; switch the "type of work to get" (Test, Worker windows, in prime95)


You can have prime95 permanently save intermediate files. In undoc.txt again:
Quote:
You can have the program generate save files every n iterations. The files
will have a .XXX extension where XXX equals the current iteration divided
by n. In prime.txt enter:
InterimFiles=n
You may want to periodically clean these out after an exponent completes, to conserve disk space.
kriesel is online now   Reply With Quote
Old 2018-12-18, 20:48   #5
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

55638 Posts
Default

Quote:
Originally Posted by nomead View Post
AMD Ryzen 3 2200G paired with the second cheapest memory, DDR4 2666 MHz CL19.
The 4 cores in that CPU have slow AVX2, so 2666 MHz memory wouldn't bottleneck the cores.

I would enable Overdrive in the bios, which is automatic overclocking, to better take advantage of your memory's speed.
Mark Rose is offline   Reply With Quote
Old 2018-12-19, 07:20   #6
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts
Default

Quote:
Originally Posted by kriesel View Post
The Jacobi check is somewhat time consuming, so by default it is done rarely. Since it is limited to 50% detection rate, you may want to switch it from LL testing to PRP after finishing the current assignment(s). That's easy to do; switch the "type of work to get" (Test, Worker windows, in prime95)
Ok, I know how to change the preferred type of assignment. However, I'm totally unfamiliar with PRP - I guess there's a lot to learn after years away from using Prime95. Is it about as fast as LL?

And in any case, the Jacobi check takes about 15 seconds on the 50M-something exponent I'm double checking at the moment, I would hardly consider that time consuming. Something like 0.13% extra when doing it every three hours.

Quote:
Originally Posted by kriesel View Post
You can have prime95 permanently save intermediate files. In undoc.txt again:You may want to periodically clean these out after an exponent completes, to conserve disk space.
I somehow missed that option. Thanks for the pointer. It would definitely be better to actually RTFM than just skim through it.
nomead is offline   Reply With Quote
Old 2018-12-19, 07:55   #7
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

1010010102 Posts
Default

Quote:
Originally Posted by Mark Rose View Post
The 4 cores in that CPU have slow AVX2, so 2666 MHz memory wouldn't bottleneck the cores.

I would enable Overdrive in the bios, which is automatic overclocking, to better take advantage of your memory's speed.
Yes, I knew about the limitations of the architecture when making the choice, but with the current Intel availability and especially prices, an equivalent i3-based system was out of the question. Anyway, I may consider upgrading just the processor when the low-range Zen 2 parts come out. I've heard the AVX2 should be much improved on those. Will it still match the Intel implementation? I can only hope... but I doubt it. But we'll see.

And there is not much headroom for CPU overclocking, since I'm using the barely adequate stock cooler. 78C all the time, yay... and that is on the 3.5 GHz base clock. Besides, I thought the automatic overclocking really didn't do much for all-core loads, just ones using 1 or 2 cores, or short tasks requiring a temporary boost.
nomead is offline   Reply With Quote
Old 2018-12-19, 09:56   #8
preda
 
preda's Avatar
 
"Mihai Preda"
Apr 2015

32·151 Posts
Default

Quote:
Originally Posted by nomead View Post
Ok, I know how to change the preferred type of assignment. However, I'm totally unfamiliar with PRP - I guess there's a lot to learn after years away from using Prime95. Is it about as fast as LL?
Yes, PRP is as fast as LL, and the PRP error-check is very strong. Thus PRP is very good for validating hardware (because it reports errors early and confidently).

Last fiddled with by preda on 2018-12-19 at 10:55
preda is offline   Reply With Quote
Old 2018-12-19, 10:32   #9
SELROC
 

13·613 Posts
Default

Quote:
Originally Posted by preda View Post
Yes, PRP is as fast as LL, and the PRP error-check is very strong. Thus PRP is very good for validating hardware (because in reports errors early and confidently).

With mprime I do hardware validating for the cpu and ram and a bunch of other hardware components. With gpuowl I think I can validate both the cpu and gpus.


Thus it is wonderful to be able to test against Small Primes :-)
  Reply With Quote
Old 2018-12-19, 13:19   #10
nomead
 
nomead's Avatar
 
"Sam Laur"
Dec 2018
Turku, Finland

2×3×5×11 Posts
Default

Yup... M50925509 finished, and as I expected, the residue didn't match the earlier test. Jacobi error check failed about 1/4 into the exponent, and at that point, the testing restarted from the previous saved file. Something must have happened later on in the test, or that previous file was bad as well, and I got that other side of the 50% chance of catching errors.
nomead is offline   Reply With Quote
Old 2018-12-19, 16:13   #11
R. Gerbicz
 
R. Gerbicz's Avatar
 
"Robert Gerbicz"
Oct 2005
Hungary

2×733 Posts
Default

Quote:
Originally Posted by nomead View Post
Something must have happened later on in the test, or that previous file was bad as well, and I got that other side of the 50% chance of catching errors.
Your run could be still good, as currently there are two different residues:
https://www.mersenne.org/report_expo...925509&exp_hi= , this is the time when we do triple ( and possibly quadruple etc. ) checks, there is some overhead.
R. Gerbicz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Application of Jacobi check to P-1 factoring (paging owftheevil and other coding wizards) kriesel Software 7 2018-09-11 13:32
error during Jacobi check on 330,000,000+ exponent evanh Hardware 5 2018-02-20 03:46
Endlessly Running Jacobi error check on v29.3 emiller Software 10 2017-11-14 10:26
LL testing: Jacobi symbol of the (interim or final) residue minus 2 as error check GP2 Number Theory Discussion Group 33 2017-08-21 22:14
Weird GPU72 Behaviour Gordon GPU to 72 5 2015-01-01 23:41

All times are UTC. The time now is 03:07.

Mon May 10 03:07:08 UTC 2021 up 31 days, 21:48, 0 users, load averages: 1.55, 1.90, 1.98

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.