mersenneforum.org I want to factorize 2^1277-1
 Register FAQ Search Today's Posts Mark Forums Read

2022-10-11, 17:41   #78
Mysticial

Sep 2016

1011100102 Posts

Quote:
 Originally Posted by kriesel P-1 factoring on F33 is a 2-years-long undertaking in Mlucas on good hardware, as Ernst has been doing it. Generally a primality test is dozens of times as long as a reasonable bounds P-1 attempt. A Pepin test on F33 is probably a several decades long undertaking on any CPUs reasonably available. Are you prepared to alter Gpuowl to perform a Pepin test, on fft lengths ~512M not yet implemented in gpuowl, on F33 and rent a high end GPU (recent Tesla or AMD Instinct) for some years? I estimate a billion-digit Mersenne PRP run would take ~6 years in gpuowl on a radeon VII, and primality tests scale as p2.1, so ~6 * (233/3321928171)2.1 ~ 44 years. An A100 may cut that to ~12-15 years. During that time faster GPUs will appear, that could shorten it somewhat.
What's the current state-of-the-art iteration time on F33?

I think this is well into the region where NTTs are gonna win simply based on memory usage. And a power-of-two convolution length does not require any irrational base shenanigans.

 2022-10-11, 22:23 #79 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 2×29×127 Posts Not sure what the minimim feasible timing is for current hardware under US$20k system cost. Or how a block of time using a supercomputer might change things, or what that would cost. But here are some datapoints for disparate affordable hardware, to place an upper bound on feasible run time. Ernst probably has a better idea of what's currently possible. Maybe Drkirkby's dual-28-core system can do better than any of the CPU data below. Mlucas.cfg excerpt for dual Xeon E5-2697V2, in ubuntu atop WSL, system cost me ~$1.5K years ago: Code:  491520 msec/iter = 1452.54 ROE[avg,max] = [0.248991718, 0.281250000] radices = 960 16 16 32 32 0 0 0 0 0 524288 msec/iter = 1597.16 ROE[avg,max] = [0.248288750, 0.312500000] radices = 1024 16 16 32 32 0 0 0 0 0 #above were done with 24 cores, 0:23 At 2^33 iterations, a Pepin test of F33 on 480M - 512M fft would be ~400. to 436. years on that. If scaling by Geekbench 3 64 bit multicore can be trusted, e5-2697v2 52150 vs e5-2698v4 73848 implies ~308. years on a dual e5-2698V4. https://cpu-benchmark.org/cpu/intel-xeon-e5-2697-v2/ https://cpu-benchmark.org/cpu/intel-xeon-e5-2698-v4/ See also end of https://www.mersenneforum.org/showpo...6&postcount=14 for timing on a tenth generation Intel laptop, ~$600, a millenium plus to run. Comparing via CPUID CPU-Z benchmark, an AMD Ryzen 5950x might be around 120 years. From https://www.mersenneforum.org/mayer/README.html, a Xeon Phi 7250 can do ~552.5msec/iter at 512M, extrapolating to ~150.5 years for F33 Pepin test. (At one time that would have been a$5K system.) In 2017, at https://mersenneforum.org/showpost.p...&postcount=178 Ernst wrote of a 32-core Xeon that he estimated could do a gigadigit mersenne primality test in ~20 years. F33 is considerably longer; ~2.5^2.1 ~6.85 times as long, 137. years. 8-core Ryzen: ~58 years for gigadigit Mersenne; extrapolates to ~397. years for F33. https://mersenneforum.org/showpost.p...&postcount=182 Looking a bit beyond the current state of the art: To attempt F33 with gpuowl, it would need to be extended to handle >32 bit exponents, add 512M ffts, and implement the Pepin test. Looking at it the other way around, to get F33 PRP run time under 10 years, would require an iteration time < 10 * 365 * 24 * 3600 / 2^33 ~ 0.0367 seconds; 36.7msec.
 2022-10-11, 23:31 #80 Mysticial     Sep 2016 2×5×37 Posts The best implementation I have needs about ~0.65 seconds to do a ~2^33-bit multiply convolution on my 7950X with memory @ 4400. Squaring would certainly be faster, though I don't have benchmarks for that. Either way, still way too slow. Not an apples-to-apples comparison anyway since it can't do exactly 2^33-bit convolution length, but it can probably be modified to match it exactly with no overhead to avoid irrational base weights. And yes, it is 100% memory-bound.
 2022-10-11, 23:42 #81 Mysticial     Sep 2016 2·5·37 Posts If we want to look at theoreticals, a 2^33-bit convolution using the optimal NTT method will need ~4GB of ram to hold the transform data. Assuming standard 2-pass approach will full pass-merging on both ends, each iteration will require reading and writing 8 GB of memory. Zen4 with 6000 memory seems to be around 70 GB copy bandwidth. So we can put a lower-bound of ~120ms using a hypothetical memory-optimal implementation. (which goes to show how far off from optimal my own implementation still is) Last fiddled with by Mysticial on 2022-10-11 at 23:47
2022-10-11, 23:54   #82
chalsall
If I May

"Chris Halsall"
Sep 2002

2·72·113 Posts

Quote:
 Originally Posted by Mysticial Assuming standard 2-pass approach will full pass-merging on both ends, each iteration will require reading and writing 8 GB of memory.
Ouch! I'm constrained to 8 KB of D for one of my code paths. Also another 8 KB for I (read: code). Great fun! 8^)

 2022-10-12, 00:04 #83 Mysticial     Sep 2016 2·5·37 Posts Actually I lied. It's not 4GB to hold the transform data, it's just 2GB. I forgot the factor of two from the wrap-around. So it's read+write 4GB of memory per iteration, or about ~60ms lower-bound on a Zen4 box.
2022-10-12, 00:46   #84
chalsall
If I May

"Chris Halsall"
Sep 2002

2B4216 Posts

Quote:
 Originally Posted by Mysticial Actually I lied.
Please forgive me for this. I can get a little intense sometimes...

You didn't lie. You made a mistake. Mistakes happen. It is how we learn.

2022-10-13, 14:23   #85
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

163068 Posts
dept. of corrections

Oops. (Case of mistaken identity.)
Quote:
 Originally Posted by kriesel See also end of https://www.mersenneforum.org/showpo...6&postcount=14 for timing on a tenth eleventh generation Intel laptop (i7-1165G7) , ~$600$900, a millenium plus to run. Comparing via CPUID CPU-Z benchmark, an AMD Ryzen 5950x might be around 120 238. years.

 2022-10-14, 17:13 #86 Stargate38     "Daniel Jackson" May 2011 14285714285714285714 3·251 Posts In other words, it would require a large cluster of PCs to get this factored in a reasonable time, say <10 years.
 2022-10-14, 21:41 #87 jasonp Tribal Bullet     Oct 2004 32·5·79 Posts Ancient history trying P-1 on F31/F33 (start at post #13)
 2022-12-05, 10:52 #88 tanaydin   Oct 2016 32 Posts A C code for educational purposes Hey there, I've written a C code that uses GMP to check if there is any factor less than sqrt of a Mersenne number. It's using trial division and couldn't finish checking 2^127-1 in two days so it is SLOW! :) But maybe someone else needs a similar thing or can find it useful for his/her needs. You can run it after compile like ./jumper 47 and it will check 2^47-1 I've named it "jumper", because it's jumping -2*p after founds a good point to start. Any comment/commit is welcome, C is not a language that I've experienced so maybe there could be improvements. https://github.com/tanaydin/jumper/ Last fiddled with by tanaydin on 2022-12-05 at 10:53

 Similar Threads Thread Thread Starter Forum Replies Last Post Godzilla Miscellaneous Math 28 2017-10-31 18:14 aaa120 Factoring 14 2008-12-07 13:14

All times are UTC. The time now is 20:19.

Wed Feb 1 20:19:57 UTC 2023 up 167 days, 17:48, 0 users, load averages: 1.19, 1.12, 1.02