20210907, 09:06  #23  
"Dana Jacobsen"
Feb 2011
Bangkok, TH
909_{10} Posts 
Quote:
Quote:
64bit: 64bit next_prime: https://github.com/danaj/MathPrime...b8/util.c#L127 64bit prev_prime: https://github.com/danaj/MathPrime...b8/util.c#L155 The algorithm is: 1. If the value is tiny, use a fixed data set. 2. If the value is in our small saved sieve, use that. 3. Skip forward in a mod30 wheel doing a primality test. Steps 1 and 2 are useful for the many calls done with small values. The primality test: https://github.com/danaj/MathPrime...mality.c#L1274 That's hardcoded tiny trial division followed by "AES" BPSW, which is deterministic for 64bit inputs. The MR base 2 test followed by a fast Lucas test. On an x86 for native inputs, there seems to be no benefit in running a Fermat test rather than a MillerRabin test. For that matter, even with GMP I wasn't able to measure any advantage. For the full test, a hashed MillerRabin set (2 or 3 total tests needed) is slightly faster for base2 pseudoprimes, but not obviously faster for primes. On platforms without fast Montgomery math, it may be different. Larger than 64bit: next_prime: https://github.com/danaj/MathPrime...mp_main.c#L332 prev_prime: https://github.com/danaj/MathPrime...mp_main.c#L360 surround_primes: https://github.com/danaj/MathPrime...mp_main.c#L389 For next and prev prime, with values less than 121 bits we use a very similar simple algorithm. Skip forward using a mod30 wheel, do trivial trial division, then call the primality test. primality test: https://github.com/danaj/MathPrime...mality.c#L1471 which is pretests (trial division), then ES BPSW (base2 MillerRabin then extrastrong Lucas test). pretests: https://github.com/danaj/MathPrime...mp_main.c#L108 which is overly complicated amounts of trial division. This includes cached large GCDs (because GMP does those very quickly), and also a choice between simple trivial division and a treesieve routine (D Bernstein describes the concept at a high level, Jens K Andersen has a nice detailed description of the concept, my implementation may or may not be particularly efficient, but it's much faster than oneatatime trial division for large enough inputs). This leaves numbers larger than 120 bits, as well as surround_primes, which is meant for prime gaps. Here we do a partial sieve with a width of about 30 merits and a convoluted "educated guess" as to a depth that will give us the best performance. After the sieve, the ES BPSW test is done on candidates in order. It repeats the process if no result is found of course, though at 30 merits this isn't common. surround_primes sieves 20 merits on either side of the given input, then tests candidates. If both candidates are not found, it repeats with 40, 80 merits, etc. to find whichever one (or both) haven't been found. I have had on my todo list for a long time, going over the method in https://arxiv.org/abs/2012.03771, which is claimed to be faster. 

20210907, 22:10  #24  
"Seth"
Apr 2019
3^{2}×41 Posts 
Quote:
Nothing in it helps with very small primorials or single calls to surround_primes. But it has a number of useful speedups for running many sequential surround_primes. I did recently add a GPU tester to my repository that uses CGBN which I've found to be very useful for doing math on 2562048 bit numbers. the CGBN repository had a prewritten millerrabin implementation I made use, on my 1080ti it's much faster using all the 12 cores of my Ryzen 3900x 

20210908, 06:35  #25  
Mar 2021
2×3^{3} Posts 
Quote:
Quote:
Code:
(x1 x2 x3) * (y1 y2 y3) = x1y1 x1y2+x2y1 x1y3+x2y2+x3y1 x2y3+x3y2 x3y3 Code:
(1 x2 x3) * (1 y2 y3) = 1 x2+y2 x2y2+x3+y3 x2y3+x3y2 x3y3 

20210908, 17:13  #26 
"Seth"
Apr 2019
3^{2}·41 Posts 
CGBN doesn't have BITS+1 optimizations or and I doubt the author will want to add them. It's much more oriented around large input numbers.

20210909, 07:37  #27  
Jan 2018
111101_{2} Posts 
I have not been looking at the forum too regularly lately, lots of posts:
Quote:
@Dana/@Craig I have a question regarding the bignum library in C++ under windows, let me elaborate a little first: The fastest library for bignum calculations is GMP and GMP is native under Unix/Linux, but not under windows. Perl has a GMP library incorporated in the strawberry core files. So under Perl it is relatively easy to call on GMP functions. C++ itself does not seem to have a bignum library incorporated (that I could find, maybe I did not search in the right place?). @Craig: how did you solve this? What bignum library are you using in cuda C++? If one would like to use GMP under C++ windows, google will show some links about MinGw and using GMP from that instance. Does anybody have any experience with setting that up? Help would be appreciated! Kind regards Michiel 

20210909, 17:44  #28 
Mar 2021
2·3^{3} Posts 
I use GMP for my CPU code but I've only used linux. I have my laptop set up so I can boot into either linux or Windows. I'm not sure if that will work for you.
For the GPU stuff I have written all my own bignum routines. It looks like CGBN is a good option now but it didn't exist when I started. I haven't tried it yet. There might be some instances where a specialized routine will be faster (e.g. 65bit numbers). 
20210909, 17:49  #29  
Jun 2003
Oxford, UK
19×103 Posts 
Quote:


20210909, 21:14  #30  
Jan 2018
75_{8} Posts 
Thnx Graig, I was wondering, after installing Strawberry Perl, a gmp.h file is installed at the laptop in the directory C:\Strawberry\c\include. The C++ code seems to accept a reference to gmp.h (include gmp.h). Could this work? I will try later with some test code, but what are your thoughts? Worth a try? Or does the GPU not work welll with GMP in your experience?
I spotted this thread online: https://stackoverflow.com/questions/...nggmplibrary If GMP and Cuda do not mix, I am interested in your homebrew code ;) I found some references to other libraries for GPU: CUMP https://github.com/skystar0227/CUMP XMP https://github.com/NVlabs/xmp Campary https://homepages.laas.fr/mmjoldes/campary/ Any thoughts on those? And an article regarding GMP implementation: http://individual.utoronto.ca/haojun...724_Report.pdf I get a feeling programming for a GPU will require a lot of programming and research ... Kind regards Michiel Quote:


20210910, 06:39  #31 
Mar 2021
54_{10} Posts 
I think you will be better off with a library that is designed to work well with the GPU architecture instead of trying to run GMP in the GPU.
CUMP looks like it is for floating point numbers. XMP looks like an old version of CGBN (XMP 2.0) Campary doesn't have any documentation that I could find. I would start with CGBN. It is written by NVIDIA and well tested. It will handle larger numbers than my code at the moment and it is a more complete library. You will probably still want GMP for your CPU code. I haven't used MinGW or run GMP/CUDA on Windows so I can't help there. 
20210910, 06:47  #32  
Mar 2021
54_{10} Posts 
Quote:
It says CGBN currently requires 4, 8, 16 or 32 thread per CGBN group. What is a CGBN group? Is it a single bignum? Also, the size must be evenly divisible by 32. Does this mean it won't work for a 65 bit number? Or can you treat it as a 96 bit number with leading 0s and everything will work correctly? 

20210910, 07:46  #33  
"Seth"
Apr 2019
369_{10} Posts 
Quote:
IMO CGBN isn't going to be of much use for 65 bit numbers, maybe 96 and defiantly 128 bit numbers but not 65 bits. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
HTML5 prime number searching?  Dale Mahalko  Software  7  20150321 19:55 
Elementary Literature on probabilistic aspects of prime searching  hhh  Math  7  20140318 14:37 
Question about multicore prime searching.  Trilo  Riesel Prime Search  33  20130919 21:21 
idea for twin prime searching  MiniGeek  Twin Prime Search  8  20110130 00:18 
Searching for user BlueSoda (no its not a prime)  ltd  Prime Sierpinski Project  3  20060323 19:27 