20221104, 16:18  #1 
Nov 2022
2 Posts 
Java modPow slower than pow following by mod
testing how long it would take to raise 3 to a power x mod 2^p  1.
I tried 2^p  1 where p = the highest known mersenne prime 2^82589933 because of Java the exponent must be int so less than 2^32  1 I chose 2^26 so I try something like Int a = 26; int n = 82589933 int j =0; BigInteger b = new BigInteger(“2”); BigInteger c = new BigInteger (“3”); BigInteger d= new BigInteger (“0”); BigInteger e= new BigInteger (“0”); BigInteger p= new BigInteger (“0”); BigInteger large = new BigInteger (“0”); BigInteger test= new BigInteger(“0”); d=b.pow(a); j = d.intVal(); // exponent must be int e=b.pow(n); // p=e.subtract(BigInteger.ONE); //prime large = c.pow(j); // 3^(2^26); test = large.mod(p); … this takes about a minute and a half to run then I try test = c.modPow(d,p) and I give up after an hour I would feel better if they took roughly the same time. 
20221104, 18:20  #2 
"Mark"
Apr 2003
Between here and the
17·401 Posts 
2^p  1 for your p is a very large number at over 2 million digits. Of course it will take a long time to compute.

20221104, 19:15  #3 
Nov 2022
2 Posts 
but. … it didn’t take a long time .
this code is 2 ^ 82539833 a min and forty seconds e=b.pow(n); // p=e.subtract(BigInteger.ONE); //prime It’s the modPow that is hours+ 
20221104, 19:26  #4  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7,103 Posts 
Quote:
The highest known mersenne prime is not "2^82589933" nor is its exponent. 2^{26} = 67108864. Reached in 26 squarings. 2^82589933 is hugely bigger. Reached in 82,589,933 squarings. Try working your way up gradually, rather than in one giant leap, & include accurate timing in the program, to determine run time scaling in acceptable testing time. For example p= 13 127 1279 11213 110503 graph resulting timings log/log & regression fit a power function to the timing data for which run time is a function of exponent. (Very small values may have initialization & other overhead greater than computation time, and should be excluded from the power fit.) Well written, tuned and optimized compiled GPU code on a fast GPU (gpuowl 6.11380 on Radeon VII) takes ~10 hours for PRP3 computation on M82589933, and scales as good fft implementations do ~p^{2} log p log log p ~p^{2.1}. Higher powers are bad. Long multiplication algorithm scales as ~p^{3}. Naive implementations have been posted, whose run time scalings indicate very long wavefront timings, in one case longer than the estimated age of the universe. Last fiddled with by kriesel on 20221104 at 19:43 

20221104, 19:31  #5 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
7,103 Posts 

20221104, 19:31  #6 
If I May
"Chris Halsall"
Sep 2002
Barbados
2A6E_{16} Posts 

20221105, 11:22  #7  
"Tilman Neumann"
Jan 2016
Germany
2×7×37 Posts 
Quote:
I extended your test program to a broader class of inputs. The (appended) program computes c^d mod p in both ways, where * a = 2,3,4,5,... * c is a random number in [3, 1023] * d is a random number about the size of 2^a, * and p is a random number about the size of 2^(2^a) The results confirm the issue for random numbers of this kind. (see below) Looking at BigInteger.modPow() source code, it looks quite sophisticated, using Montgomery squaring and multiplication and even tearing the modulus into an "odd part" and power of two, and finally combining thew results using the chinese remainder theorem. But something the developers must have gotten terribly wrong. Probably this is not the main cause, but one thing that irritated me for a long time is the following source code comment on BigInteger.montgomeryMultiply(): // Montgomery multiplication. These are wrappers for // implMontgomeryXX routines which are expected to be replaced by // virtual machine intrinsics. We don't use the intrinsics for // very large operands: MONTGOMERY_INTRINSIC_THRESHOLD should be // larger than any reasonable crypto key. MONTGOMERY_INTRINSIC_THRESHOLD is 512 and measured in ints; so the intrinsics (optimized assembler code) is not used for modulus > 16384 bit. This suggests that the method was never supposed to be optimized for large inputs. Possibly it was only tested for the input sizes required by Java's own crypto libraries. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> test logs (getting interesting at a > 14) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a = 2, c = 310, d = 7, computed in 0 ms  p (8 bit) computed in 1 ms r = c^d (58 bit) computed in 0 ms r % p (7 bit) computed in 0 ms c.modPow(d,p) (7 bit) computed in 0 ms results equal? true a = 3, c = 36, d = 22, computed in 0 ms  p (24 bit) computed in 0 ms r = c^d (114 bit) computed in 0 ms r % p (24 bit) computed in 0 ms c.modPow(d,p) (24 bit) computed in 0 ms results equal? true a = 4, c = 388, d = 24, computed in 0 ms  p (26 bit) computed in 0 ms r = c^d (207 bit) computed in 0 ms r % p (26 bit) computed in 0 ms c.modPow(d,p) (26 bit) computed in 1 ms results equal? true a = 5, c = 379, d = 61, computed in 0 ms  p (62 bit) computed in 0 ms r = c^d (523 bit) computed in 0 ms r % p (60 bit) computed in 0 ms c.modPow(d,p) (60 bit) computed in 0 ms results equal? true a = 6, c = 820, d = 183, computed in 0 ms  p (184 bit) computed in 0 ms r = c^d (1772 bit) computed in 0 ms r % p (183 bit) computed in 0 ms c.modPow(d,p) (183 bit) computed in 0 ms results equal? true a = 7, c = 174, d = 201, computed in 0 ms  p (203 bit) computed in 0 ms r = c^d (1497 bit) computed in 0 ms r % p (203 bit) computed in 1 ms c.modPow(d,p) (203 bit) computed in 0 ms results equal? true a = 8, c = 581, d = 345, computed in 0 ms  p (347 bit) computed in 0 ms r = c^d (3168 bit) computed in 0 ms r % p (347 bit) computed in 0 ms c.modPow(d,p) (347 bit) computed in 0 ms results equal? true a = 9, c = 118, d = 1159, computed in 0 ms  p (1161 bit) computed in 0 ms r = c^d (7977 bit) computed in 1 ms r % p (1159 bit) computed in 0 ms c.modPow(d,p) (1159 bit) computed in 0 ms results equal? true a = 10, c = 181, d = 2157, computed in 0 ms  p (2158 bit) computed in 1 ms r = c^d (16178 bit) computed in 1 ms r % p (2158 bit) computed in 0 ms c.modPow(d,p) (2158 bit) computed in 1 ms results equal? true a = 11, c = 490, d = 3648, computed in 0 ms  p (3642 bit) computed in 0 ms r = c^d (32601 bit) computed in 3 ms r % p (3642 bit) computed in 2 ms c.modPow(d,p) (3642 bit) computed in 0 ms results equal? true a = 12, c = 983, d = 11308, computed in 0 ms  p (11308 bit) computed in 1 ms r = c^d (112414 bit) computed in 10 ms r % p (11305 bit) computed in 8 ms c.modPow(d,p) (11305 bit) computed in 3 ms results equal? true a = 13, c = 871, d = 24939, computed in 0 ms  p (24941 bit) computed in 0 ms r = c^d (243568 bit) computed in 11 ms r % p (24941 bit) computed in 9 ms c.modPow(d,p) (24941 bit) computed in 21 ms results equal? true a = 14, c = 331, d = 24853, computed in 0 ms  p (24854 bit) computed in 0 ms r = c^d (208037 bit) computed in 6 ms r % p (24854 bit) computed in 6 ms c.modPow(d,p) (24854 bit) computed in 18 ms results equal? true a = 15, c = 714, d = 76517, computed in 0 ms  p (76519 bit) computed in 1 ms r = c^d (725365 bit) computed in 31 ms r % p (76519 bit) computed in 34 ms c.modPow(d,p) (76519 bit) computed in 154 ms results equal? true a = 16, c = 803, d = 143429, computed in 0 ms  p (143430 bit) computed in 1 ms r = c^d (1383984 bit) computed in 104 ms r % p (143429 bit) computed in 87 ms c.modPow(d,p) (143429 bit) computed in 349 ms results equal? true a = 17, c = 520, d = 380768, computed in 0 ms  p (380770 bit) computed in 2 ms r = c^d (3435429 bit) computed in 78 ms r % p (380770 bit) computed in 146 ms c.modPow(d,p) (380770 bit) computed in 2583 ms results equal? true a = 18, c = 792, d = 578458, computed in 0 ms  p (578460 bit) computed in 0 ms r = c^d (5570179 bit) computed in 321 ms r % p (578460 bit) computed in 293 ms c.modPow(d,p) (578460 bit) computed in 6322 ms results equal? true a = 19, c = 370, d = 708558, computed in 0 ms  p (708559 bit) computed in 0 ms r = c^d (6044979 bit) computed in 309 ms r % p (708556 bit) computed in 330 ms c.modPow(d,p) (708556 bit) computed in 9976 ms results equal? true a = 20, c = 888, d = 2737668, computed in 0 ms  p (2737669 bit) computed in 1 ms r = c^d (26813859 bit) computed in 1849 ms r % p (2737667 bit) computed in 2863 ms c.modPow(d,p) (2737667 bit) computed in 148685 ms results equal? true a = 21, c = 126, d = 3460325, computed in 0 ms  p (3460327 bit) computed in 1 ms r = c^d (24143657 bit) computed in 1902 ms r % p (3460323 bit) computed in 2721 ms c.modPow(d,p) (3460323 bit) computed in 260967 ms results equal? true a = 22, c = 621, d = 10025779, computed in 0 ms  p (10025777 bit) computed in 3 ms r = c^d (93023684 bit) computed in 19541 ms r % p (10025777 bit) computed in 19307 ms c.modPow(d,p) (10025777 bit) computed in 2299139 ms results equal? true Last fiddled with by Till on 20221105 at 11:25 Reason: interesting tests at a>14 

20221105, 17:24  #8 
"Tilman Neumann"
Jan 2016
Germany
2·7·37 Posts 
After some more investigations, it seems to me that Montgomery multiplication is a bad choice for this kind of numbers (small base, big exponent, big modulus).
Let B be the base, E the exponent, N the modulus. With Montgomery multiplication, modPow() does something like O(log E) multiplications/squares (and possibly modulus computations) of Nsize numbers. If B^E is not much bigger than N, then a small fraction of those required Montgomery mults/squares are enough to match the runtime of the "silly" B^E computation. Doing "naive" repeated squaring (without Montgomery) might be even better because it starts with very small products/squares, whose size will surpass the modulus in only a few last operations. Last fiddled with by Till on 20221105 at 17:27 Reason: clean up 
Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
Java applet alternative  a1call  Programming  19  20191108 22:31 
Simple Atkin sieve java example  lucaf  Computer Science & Computational Number Theory  8  20171006 20:56 
Java Quadratic Sieve  Ilya Gazman  Factoring  3  20160222 11:32 
Java based program  GordonBM  Information & Answers  11  20120424 08:27 
Understandable QS java Implementation  ThiloHarich  Factoring  41  20051128 21:18 