![]() |
![]() |
#1 |
"Oliver"
Sep 2017
Porta Westfalica, DE
7·223 Posts |
![]()
When experimenting with stage1_norm, it occured to me that stricter bounds will miss results that would be found with less strict bounds. Very loose bounds on the other head miss a lot results, too. I ran polynomial selection on a GPU on a C168 and had this as the first coefficient to search above 100k for different maximum stage 1 norms:
Code:
1e20 coeff 100500 specialq 1 - 1 other 2348 - 5636 2e20 coeff 100500 specialq 1 - 1 other 4696 - 11272 5e20 coeff 100500 specialq 1 - 1 other 11742 - 28181 1e21 coeff 100500 specialq 1 - 1 other 23484 - 56363 2e21 coeff 100500 specialq 1 - 1 other 46969 - 112727 5e21 coeff 100500 specialq 1 - 1 other 117424 - 281819 1e22 coeff 100500 specialq 1 - 1 other 234849 - 563638 2e22 coeff 100500 specialq 1 - 1 other 469698 - 1127277 5e22 coeff 100500 specialq 1 - 1 other 1004521 - 2410852 1e23 coeff 100500 specialq 1 - 13 other 632808 - 1518741 2e23 coeff 100500 specialq 1 - 138 other 398644 - 956747 5e23 coeff 100500 specialq 1 - 2943 other 216417 - 519402 1e24 coeff 100500 specialq 1 - 29673 other 136334 - 327202 2e24 coeff 100500 specialq 1 - 299091 other 85885 - 206124 5e24 coeff 100500 specialq 1 - 6342692 other 46625 - 111901 1e25 coeff 100500 specialq 1 - 63930721 other 29372 - 70493 2e25 coeff 100500 specialq 1 - 644375701 other 18503 - 44408 5e25 coeff 100500 specialq 1 - 4294967295 other 10045 - 24108 1e26 coeff 100500 specialq 1 - 4294967295 other 6327 - 15187 2e26 coeff 100500 specialq 1 - 4294967295 other 3986 - 9567 5e26 coeff 100500 specialq 1 - 4294967295 other 2164 - 5194 1e27 coeff 100500 specialq 1 - 4294967295 other 1363 - 3272 2e27 coeff 100500 specialq 1 - 4294967295 other 858 - 2061 5e27 coeff 100500 specialq 1 - 4294967295 other 466 - 1119 1e28 coeff 100500 specialq 1 - 4294967295 other 293 - 704 2e28 coeff 100500 specialq 1 - 4294967295 other 185 - 444 5e28 coeff 100500 specialq 1 - 4294967295 other 100 - 241 1e29 coeff 100500 specialq 1 - 4294967295 other 62 - 151 2e29 coeff 100500 specialq 1 - 4294967295 other 39 - 95 5e29 coeff 100500 specialq 1 - 4294967295 other 21 - 51 1e30 coeff 100500 specialq 1 - 4294967295 other 13 - 32 2e30 coeff 100500 specialq 1 - 4294967295 other 8 - 20 5e30 coeff 100500 specialq 1 - 4294967295 other 4 - 11 At 1e23, special Q starts to grow. At 5e25, special Q is maxed out at UINT32_MAX. The bound that msieve itself suggested was ca. 2e25. Which are the screws I need to toy with to get something like incr or nq from CADO? Which other ways do I have to tell it to search harder/less hard with a given stage1_norm? For size optimising, I saw that it is limited to four threads for some reason. As an aside, does the polynomial selection make use of every CUDA CU? Power usage on the GPU is quite low, even when there is little waiting for the CPU. |
![]() |
![]() |
![]() |
#2 |
Apr 2010
26210 Posts |
![]()
The stage1_norm of msieve is something like a combined value for nq and P of the cado parameters. nq relates to special-q and P to "other".
I think there was an msieve branch for 64-bit special-q but that was never finished. Last fiddled with by Gimarel on 2023-03-06 at 19:08 |
![]() |
![]() |
![]() |
#3 |
Tribal Bullet
Oct 2004
1101111011012 Posts |
![]()
I have been working very part-time on merging the specialq64 branch back to trunk but the two branches have diverged quite a lot, and the mods to use special-q larger than 2^32 involve replacing 32-bit arithmetic in many places with 64-bit or multiple-precision arithmetic. The merge can also port fast arithmetic from the GPU code to the CPU implementation.
Don't expect that I will finish anytime soon. At the same time large special-q will become important for polynomial selection on very large inputs, otherwise most of the search space is wasted. GPU polynomial selection uses GPU sorting, which is memory bound and not flops bound. The code imposes a 300MB limit on the memory used, and current-size problems don't really need more than that. Making the memory use 10x bigger will find 10x stage 1 hits in 10x time, and so is a wash. |
![]() |
![]() |
![]() |
#4 | |
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
22×3×7×73 Posts |
![]() Quote:
Also is it possible to reduce the size of each sort through a bucketing system(not a clue whether this would work for gpu vs cpu)? Apologies if these turn out to be silly questions. |
|
![]() |
![]() |
![]() |
#5 |
Tribal Bullet
Oct 2004
5·23·31 Posts |
![]()
You can change the memory allocated to GPU sorting with 'gpu_mem_mb=X' added to the list of command line options, but the 300MB upper limit is compiled in.
The sort code comes from CUB and is a very complex radix sort. Before that the code used a very complex radix sort from Sean Baxter's ModernGpu library, which I think was even faster but which unfortunately stopped working for me around CUDA 5.5 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
factmsieve.py does not wait for completion of polynomial selection by msieve | maxal | Msieve | 0 | 2022-09-20 20:38 |
Running msieve polynomial selection in parallel? | ryanp | Msieve | 9 | 2019-11-16 19:45 |
reduce number of coefficient for polynomial selection with msieve on GPU | aein | Factoring | 3 | 2017-02-25 16:42 |
msieve 1.52 with GPU polynomial selection | cgy606 | Msieve | 16 | 2016-10-06 14:16 |
2^877-1 polynomial selection | fivemack | Factoring | 47 | 2009-06-16 00:24 |