![]() |
![]() |
#78 | |
Sep 2016
1011100102 Posts |
![]() Quote:
I think this is well into the region where NTTs are gonna win simply based on memory usage. And a power-of-two convolution length does not require any irrational base shenanigans. |
|
![]() |
![]() |
![]() |
#79 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×29×127 Posts |
![]()
Not sure what the minimim feasible timing is for current hardware under US$20k system cost.
Or how a block of time using a supercomputer might change things, or what that would cost. But here are some datapoints for disparate affordable hardware, to place an upper bound on feasible run time. Ernst probably has a better idea of what's currently possible. Maybe Drkirkby's dual-28-core system can do better than any of the CPU data below. Mlucas.cfg excerpt for dual Xeon E5-2697V2, in ubuntu atop WSL, system cost me ~$1.5K years ago: Code:
491520 msec/iter = 1452.54 ROE[avg,max] = [0.248991718, 0.281250000] radices = 960 16 16 32 32 0 0 0 0 0 524288 msec/iter = 1597.16 ROE[avg,max] = [0.248288750, 0.312500000] radices = 1024 16 16 32 32 0 0 0 0 0 #above were done with 24 cores, 0:23 If scaling by Geekbench 3 64 bit multicore can be trusted, e5-2697v2 52150 vs e5-2698v4 73848 implies ~308. years on a dual e5-2698V4. https://cpu-benchmark.org/cpu/intel-xeon-e5-2697-v2/ https://cpu-benchmark.org/cpu/intel-xeon-e5-2698-v4/ See also end of https://www.mersenneforum.org/showpo...6&postcount=14 for timing on a tenth generation Intel laptop, ~$600, a millenium plus to run. Comparing via CPUID CPU-Z benchmark, an AMD Ryzen 5950x might be around 120 years. From https://www.mersenneforum.org/mayer/README.html, a Xeon Phi 7250 can do ~552.5msec/iter at 512M, extrapolating to ~150.5 years for F33 Pepin test. (At one time that would have been a $5K system.) In 2017, at https://mersenneforum.org/showpost.p...&postcount=178 Ernst wrote of a 32-core Xeon that he estimated could do a gigadigit mersenne primality test in ~20 years. F33 is considerably longer; ~2.5^2.1 ~6.85 times as long, 137. years. 8-core Ryzen: ~58 years for gigadigit Mersenne; extrapolates to ~397. years for F33. https://mersenneforum.org/showpost.p...&postcount=182 Looking a bit beyond the current state of the art: To attempt F33 with gpuowl, it would need to be extended to handle >32 bit exponents, add 512M ffts, and implement the Pepin test. Looking at it the other way around, to get F33 PRP run time under 10 years, would require an iteration time < 10 * 365 * 24 * 3600 / 2^33 ~ 0.0367 seconds; 36.7msec. |
![]() |
![]() |
![]() |
#80 |
Sep 2016
2×5×37 Posts |
![]()
The best implementation I have needs about ~0.65 seconds to do a ~2^33-bit multiply convolution on my 7950X with memory @ 4400.
Squaring would certainly be faster, though I don't have benchmarks for that. Either way, still way too slow. Not an apples-to-apples comparison anyway since it can't do exactly 2^33-bit convolution length, but it can probably be modified to match it exactly with no overhead to avoid irrational base weights. And yes, it is 100% memory-bound. |
![]() |
![]() |
![]() |
#81 |
Sep 2016
2·5·37 Posts |
![]()
If we want to look at theoreticals, a 2^33-bit convolution using the optimal NTT method will need ~4GB of ram to hold the transform data.
Assuming standard 2-pass approach will full pass-merging on both ends, each iteration will require reading and writing 8 GB of memory. Zen4 with 6000 memory seems to be around 70 GB copy bandwidth. So we can put a lower-bound of ~120ms using a hypothetical memory-optimal implementation. (which goes to show how far off from optimal my own implementation still is) Last fiddled with by Mysticial on 2022-10-11 at 23:47 |
![]() |
![]() |
![]() |
#82 |
If I May
"Chris Halsall"
Sep 2002
Barbados
2·72·113 Posts |
![]() |
![]() |
![]() |
![]() |
#83 |
Sep 2016
2·5·37 Posts |
![]()
Actually I lied. It's not 4GB to hold the transform data, it's just 2GB. I forgot the factor of two from the wrap-around.
So it's read+write 4GB of memory per iteration, or about ~60ms lower-bound on a Zen4 box. |
![]() |
![]() |
![]() |
#84 |
If I May
"Chris Halsall"
Sep 2002
Barbados
2B4216 Posts |
![]() |
![]() |
![]() |
![]() |
#85 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
163068 Posts |
![]()
Oops. (Case of mistaken identity.)
Quote:
|
|
![]() |
![]() |
![]() |
#86 |
"Daniel Jackson"
May 2011
14285714285714285714
3·251 Posts |
![]()
In other words, it would require a large cluster of PCs to get this factored in a reasonable time, say <10 years.
|
![]() |
![]() |
![]() |
#87 |
Tribal Bullet
Oct 2004
32·5·79 Posts |
![]()
Ancient history trying P-1 on F31/F33 (start at post #13)
|
![]() |
![]() |
![]() |
#88 |
Oct 2016
32 Posts |
![]()
Hey there,
I've written a C code that uses GMP to check if there is any factor less than sqrt of a Mersenne number. It's using trial division and couldn't finish checking 2^127-1 in two days so it is SLOW! :) But maybe someone else needs a similar thing or can find it useful for his/her needs. You can run it after compile like ./jumper 47 and it will check 2^47-1 I've named it "jumper", because it's jumping -2*p after founds a good point to start. Any comment/commit is welcome, C is not a language that I've experienced so maybe there could be improvements. https://github.com/tanaydin/jumper/ Last fiddled with by tanaydin on 2022-12-05 at 10:53 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Do you know this method to factorize? | Godzilla | Miscellaneous Math | 28 | 2017-10-31 18:14 |
mathematica7.0 can easily factorize 10^67+1111 | aaa120 | Factoring | 14 | 2008-12-07 13:14 |