mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2023-03-06, 17:20   #1
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

7·223 Posts
Default Understanding msieve's Polynomial Selection Parametrisation

When experimenting with stage1_norm, it occured to me that stricter bounds will miss results that would be found with less strict bounds. Very loose bounds on the other head miss a lot results, too. I ran polynomial selection on a GPU on a C168 and had this as the first coefficient to search above 100k for different maximum stage 1 norms:
Code:
1e20
coeff 100500 specialq 1 - 1 other 2348 - 5636
2e20
coeff 100500 specialq 1 - 1 other 4696 - 11272
5e20
coeff 100500 specialq 1 - 1 other 11742 - 28181
1e21
coeff 100500 specialq 1 - 1 other 23484 - 56363
2e21
coeff 100500 specialq 1 - 1 other 46969 - 112727
5e21
coeff 100500 specialq 1 - 1 other 117424 - 281819
1e22
coeff 100500 specialq 1 - 1 other 234849 - 563638
2e22
coeff 100500 specialq 1 - 1 other 469698 - 1127277
5e22
coeff 100500 specialq 1 - 1 other 1004521 - 2410852
1e23
coeff 100500 specialq 1 - 13 other 632808 - 1518741
2e23
coeff 100500 specialq 1 - 138 other 398644 - 956747
5e23
coeff 100500 specialq 1 - 2943 other 216417 - 519402
1e24
coeff 100500 specialq 1 - 29673 other 136334 - 327202
2e24
coeff 100500 specialq 1 - 299091 other 85885 - 206124
5e24
coeff 100500 specialq 1 - 6342692 other 46625 - 111901
1e25
coeff 100500 specialq 1 - 63930721 other 29372 - 70493
2e25
coeff 100500 specialq 1 - 644375701 other 18503 - 44408
5e25
coeff 100500 specialq 1 - 4294967295 other 10045 - 24108
1e26
coeff 100500 specialq 1 - 4294967295 other 6327 - 15187
2e26
coeff 100500 specialq 1 - 4294967295 other 3986 - 9567
5e26
coeff 100500 specialq 1 - 4294967295 other 2164 - 5194
1e27
coeff 100500 specialq 1 - 4294967295 other 1363 - 3272
2e27
coeff 100500 specialq 1 - 4294967295 other 858 - 2061
5e27
coeff 100500 specialq 1 - 4294967295 other 466 - 1119
1e28
coeff 100500 specialq 1 - 4294967295 other 293 - 704
2e28
coeff 100500 specialq 1 - 4294967295 other 185 - 444
5e28
coeff 100500 specialq 1 - 4294967295 other 100 - 241
1e29
coeff 100500 specialq 1 - 4294967295 other 62 - 151
2e29
coeff 100500 specialq 1 - 4294967295 other 39 - 95
5e29
coeff 100500 specialq 1 - 4294967295 other 21 - 51
1e30
coeff 100500 specialq 1 - 4294967295 other 13 - 32
2e30
coeff 100500 specialq 1 - 4294967295 other 8 - 20
5e30
coeff 100500 specialq 1 - 4294967295 other 4 - 11
Up to 5e22, "other" increases. Then it drops off.
At 1e23, special Q starts to grow. At 5e25, special Q is maxed out at UINT32_MAX.
The bound that msieve itself suggested was ca. 2e25.

Which are the screws I need to toy with to get something like incr or nq from CADO? Which other ways do I have to tell it to search harder/less hard with a given stage1_norm?

For size optimising, I saw that it is limited to four threads for some reason.

As an aside, does the polynomial selection make use of every CUDA CU? Power usage on the GPU is quite low, even when there is little waiting for the CPU.
kruoli is offline   Reply With Quote
Old 2023-03-06, 19:07   #2
Gimarel
 
Apr 2010

26210 Posts
Default

The stage1_norm of msieve is something like a combined value for nq and P of the cado parameters. nq relates to special-q and P to "other".
I think there was an msieve branch for 64-bit special-q but that was never finished.

Last fiddled with by Gimarel on 2023-03-06 at 19:08
Gimarel is offline   Reply With Quote
Old 2023-03-07, 20:54   #3
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

1101111011012 Posts
Default

I have been working very part-time on merging the specialq64 branch back to trunk but the two branches have diverged quite a lot, and the mods to use special-q larger than 2^32 involve replacing 32-bit arithmetic in many places with 64-bit or multiple-precision arithmetic. The merge can also port fast arithmetic from the GPU code to the CPU implementation.

Don't expect that I will finish anytime soon. At the same time large special-q will become important for polynomial selection on very large inputs, otherwise most of the search space is wasted.

GPU polynomial selection uses GPU sorting, which is memory bound and not flops bound. The code imposes a 300MB limit on the memory used, and current-size problems don't really need more than that. Making the memory use 10x bigger will find 10x stage 1 hits in 10x time, and so is a wash.
jasonp is offline   Reply With Quote
Old 2023-03-09, 10:20   #4
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

22×3×7×73 Posts
Default

Quote:
Originally Posted by jasonp View Post
I have been working very part-time on merging the specialq64 branch back to trunk but the two branches have diverged quite a lot, and the mods to use special-q larger than 2^32 involve replacing 32-bit arithmetic in many places with 64-bit or multiple-precision arithmetic. The merge can also port fast arithmetic from the GPU code to the CPU implementation.

Don't expect that I will finish anytime soon. At the same time large special-q will become important for polynomial selection on very large inputs, otherwise most of the search space is wasted.

GPU polynomial selection uses GPU sorting, which is memory bound and not flops bound. The code imposes a 300MB limit on the memory used, and current-size problems don't really need more than that. Making the memory use 10x bigger will find 10x stage 1 hits in 10x time, and so is a wash.
Would it be possible to reduce the 300MB enough to fit in L2 cache? The 40 series gpus have 48-72MB of L2 cache.
Also is it possible to reduce the size of each sort through a bucketing system(not a clue whether this would work for gpu vs cpu)?

Apologies if these turn out to be silly questions.
henryzz is offline   Reply With Quote
Old 2023-03-09, 20:09   #5
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

You can change the memory allocated to GPU sorting with 'gpu_mem_mb=X' added to the list of command line options, but the 300MB upper limit is compiled in.

The sort code comes from CUB and is a very complex radix sort. Before that the code used a very complex radix sort from Sean Baxter's ModernGpu library, which I think was even faster but which unfortunately stopped working for me around CUDA 5.5
jasonp is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
factmsieve.py does not wait for completion of polynomial selection by msieve maxal Msieve 0 2022-09-20 20:38
Running msieve polynomial selection in parallel? ryanp Msieve 9 2019-11-16 19:45
reduce number of coefficient for polynomial selection with msieve on GPU aein Factoring 3 2017-02-25 16:42
msieve 1.52 with GPU polynomial selection cgy606 Msieve 16 2016-10-06 14:16
2^877-1 polynomial selection fivemack Factoring 47 2009-06-16 00:24

All times are UTC. The time now is 14:18.


Thu Jun 8 14:18:19 UTC 2023 up 294 days, 11:46, 1 user, load averages: 1.29, 1.52, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔