Register FAQ Search Today's Posts Mark Forums Read

2017-08-26, 06:10   #133
Antonio

"Antonio Key"
Sep 2011
UK

21316 Posts

gap11, an update to gap10 - provides a modest 1.5-2% speed boost on my ivybridge using 4 threads.
I was unable to speed test on my skylake laptop due to thermal limiting.
Code:
// version 1.05 Remove always false tests from prec_prime              Antonio Key
//		and next_prime, alternative would be to replace
//		with asserts as the progam failed elsewhere if
//		the tests were ever true.
//		Removed always true test in gap search
//		Removed bound test on P2 in gap search
//		Replaced result sort routine
//		On screen display of process state (sieve or gap search)
//		Let compiler decide if AVX2 is available to use
//		Update default_unknowngap
//		Moved the #define version so that it was easier to check
//		against the version comments and keep them consistent.
The usual Windows executables are provided along with the required dll.
Attached Files
 gap11_code.7z (18.9 KB, 97 views) gap11_exe.7z (279.8 KB, 95 views) cygwin1.7z (859.6 KB, 124 views)

 2017-08-31, 07:02 #134 robert44444uk     Jun 2003 Oxford, UK 1,933 Posts Is there any reasons for a drop off in speed as we moved from 6e18 to 7e18? My speed has dropped from 32e9 to 29.6e9 n/sec. The only change of variables compared to 6e18 is that I now have -unknowngap 1382 Presumably the next_prime prev_prime calculations take marginally longer because the n surviving the sieve are larger, but this does seem like a large drop off. Also gap11 is marginally slower on my machine than gap10. At the end of the test the gap11 was running at 29.25e9 compares to 29.35e9 for gap11.
 2017-08-31, 09:46 #135 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 2·23·107 Posts Mine has increased per Antonios values. Try to reboot the machine and check if you have any svhost.exe service running in background.
2017-08-31, 11:41   #136
Antonio

"Antonio Key"
Sep 2011
UK

53110 Posts

Quote:
 Originally Posted by robert44444uk Is there any reasons for a drop off in speed as we moved from 6e18 to 7e18? My speed has dropped from 32e9 to 29.6e9 n/sec. The only change of variables compared to 6e18 is that I now have -unknowngap 1382 Presumably the next_prime prev_prime calculations take marginally longer because the n surviving the sieve are larger, but this does seem like a large drop off. Also gap11 is marginally slower on my machine than gap10. At the end of the test the gap11 was running at 29.25e9 compares to 29.35e9 for gap11.
I don't know of a reason for any slowdown between gap10 and gap11 unless it is as Carlos says, some windows service is the cause ( I had a problem with the windows10 update service at one point on my laptop, but it magically sorted itself out).
For testing I used a batch file containing the following on my i5 ivybridge:
Code:
Rem Reference Code
gap10 -n1 6e18 -n2 625e16 -n 6e18 -res1 0 -res2 15 -res 0 -m1 1190 -m2 8151 -unknowngap 1382 -numcoprime 27 -sb 24 -bs 18 -t 4 -mem 12.25
ren gap_solutions.txt gap10_solutions.txt
ren gap_report.txt gap10_report.txt
Rem Test Code
gap11 -n1 6e18 -n2 625e16 -n 6e18 -res1 0 -res2 15 -res 0 -m1 1190 -m2 8151 -unknowngap 1382 -numcoprime 27 -sb 24 -bs 18 -t 4 -mem 12.25
I compared the wall time reported at the end of each run to get the reported speedup, and checked that the solutions and report files were identical.
Are you using the provided exe or compiling it yourself? If compiling yourself, what version of gcc? I'm using version 6.3.0

2017-08-31, 13:00   #137
axn

Jun 2003

496210 Posts

Quote:
 Originally Posted by robert44444uk Is there any reasons for a drop off in speed as we moved from 6e18 to 7e18? My speed has dropped from 32e9 to 29.6e9 n/sec. The only change of variables compared to 6e18 is that I now have -unknowngap 1382
Sieving would take slightly longer since more primes are used in the sieve. Also, the program reports cumulative speed since start of the run (instead of incremental -- feature request?), and initial iterations would be slower, so it takes time for the speed to stabilize.

2017-08-31, 13:35   #138
pinhodecarlos

"Carlos Pinho"
Oct 2011
Milton Keynes, UK

133A16 Posts

Quote:
 Originally Posted by axn Sieving would take slightly longer since more primes are used in the sieve. Also, the program reports cumulative speed since start of the run (instead of incremental -- feature request?), and initial iterations would be slower, so it takes time for the speed to stabilize.
Speed takes awhile to stabilise, at least 12 hours, at least that's what my computer reports until it reaches max speed.

 2017-10-15, 02:03 #139 danaj   "Dana Jacobsen" Feb 2011 Bangkok, TH 16148 Posts Due to a friendly cooperative 64-bit factoring competion / hackathon in the last week, we have somewhat faster mulredc aka mont_mulmod now, using some asm written by Ben Buhrow of yafu fame. I also have a tweak to the addmod asm. What is the best way to get this applied? It would be nice if the gap finding program was on github so I could do a pull request.
2017-10-15, 06:01   #140
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

2×733 Posts

Quote:
 Originally Posted by danaj Due to a friendly cooperative 64-bit factoring competion / hackathon in the last week, we have somewhat faster mulredc aka mont_mulmod now, using some asm written by Ben Buhrow of yafu fame. I also have a tweak to the addmod asm. What is the best way to get this applied? It would be nice if the gap finding program was on github so I could do a pull request.
First way is just to modify the source and post the code, but you can also place it on github, I'm allowing it.

Since that seems arithmetic speedup we would not need a proper test, but I don't like untested codes, so test it and compare the results to say gap10.c or even gap11.c [Antonio tested these two, so I've accepted this]. Your code should handle 64 bits n values also, up to n<2^64-2^32. [in the latest range [9.25e18,10e18] we are already in that area].

2017-10-18, 04:16   #141
Antonio

"Antonio Key"
Sep 2011
UK

32×59 Posts

gap12 code now available:
Code:
// version 1.06		Speed up in assembly routines		  Dana Jacobsen
//			mulmod & addmod by Dana Jacobsen
//			mont_prod64 asm thanks to Ben Buhrow
//			Replace in-code tests for AVX2 use with	  Antonio Key
//			conditional compile directives
//			Cosmetic change - Now displays upper and lower bounds
//			of n for the current test, rather than just lower bound.
The changes give around 1.5% speed boost on my desktop i5-3570k (Ivybridge, no AVX2), and around 5.5% speed boost on my laptop i7-6700HQ (Skylake, AVX2), the latter result may not be accurate due to thermal limiting, but Dana was showing around the same gain in his results (see the hardware thread).

The usual Windows executables are provided, the required dll is available in post #133.
Attached Files
 gap12_code.7z (18.6 KB, 131 views) gap12_exe.7z (285.3 KB, 140 views)

 2017-10-18, 20:29 #142 pinhodecarlos     "Carlos Pinho" Oct 2011 Milton Keynes, UK 2×23×107 Posts Thank you Antonio. I’ve upgraded the client our morning but speed is still stabilising therefore awaiting to see the improvements claimed. Nevertheless good stuff.
2017-10-19, 07:29   #143
R. Gerbicz

"Robert Gerbicz"
Oct 2005
Hungary

101101110102 Posts

Quote:
 Originally Posted by Antonio gap12 code now available: [..] The changes give around 1.5% speed boost on my desktop i5-3570k (Ivybridge, no AVX2), and around 5.5% speed boost on my laptop i7-6700HQ (Skylake, AVX2), the latter result may not be accurate due to thermal limiting, but Dana was showing around the same gain in his results (see the hardware thread).
Thanks for your and Dana's efforts!

 Similar Threads Thread Thread Starter Forum Replies Last Post frmky Factoring 36 2016-08-13 16:32 mdettweiler Conjectures 'R Us 109 2010-09-29 20:20 aaa120 GMP-ECM 2 2008-10-31 14:28 Bundu Software 1 2004-11-03 23:18 [CZ]Pegas Software 3 2002-08-23 17:05

All times are UTC. The time now is 23:18.

Tue May 11 23:18:35 UTC 2021 up 33 days, 17:59, 0 users, load averages: 2.90, 3.04, 3.04