mersenneforum.org Faster GPU-ECM with CGBN
 User Name Remember Me? Password
 Register FAQ Search Today's Posts Mark Forums Read

 2022-03-08, 13:32 #133 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 125916 Posts I understood about the -gpucurves, but what confused me was the: Code: CGBN<512, 4> running kernel<56 block x 256 threads> input number is 246 bits lines. I see now that they are based on the input number size and automatically taken care of by the program. I had thought maybe there were more options to provide. Thanks for helping me understand this and for a great speedup.
 2022-03-08, 16:39 #134 chris2be8     Sep 2009 26·37 Posts ecm-gpu downloaded from https://gitlab.inria.fr/zimmerma/ecm.git works for b1=11e7: Code: chris@4core:~/ecm-cgbn.2/ecm> date;time ./ecm -gpu -cgbn -save test2.save 110000000 1
 2022-03-08, 19:43 #135 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 10010010110012 Posts Sorry if this has an "elementary" answer, but is there an optimum value that B1 should be a multiple of? I'm currently basing my B1 values on what 896 curves need for the different t-levels. Should I adjust B1 to a close multiple of a base value, then adjust the -gpucurves, accordingly, or am I complicating things?
2022-03-08, 20:46   #136
SethTro

"Seth"
Apr 2019

19×23 Posts

Quote:
 Originally Posted by EdH Sorry if this has an "elementary" answer, but is there an optimum value that B1 should be a multiple of? I'm currently basing my B1 values on what 896 curves need for the different t-levels. Should I adjust B1 to a close multiple of a base value, then adjust the -gpucurves, accordingly, or am I complicating things?

TL;DR If you are still running B2 you should probably set B1 for each t-level based on this chart then round number of curves to the nearest multiple of 896. This is probably within 20% of optimal for >= t45. You could slightly optimize by increasing B1 if you round down or increasing B1 if you round up (so that ecm -v prints "expected number of curves to find a factor" equal to the number of curves you are using)

In practice for small factors everything is really fast so for a single number who cares, but if you were working on factordb or a huge amount of numbers (>5000) you would want to do something smarter. In theory the code could run one curve for 896 different numbers or something.

It can also make sense to tune the B1/B2 ratio based on how much RAM you have and how fast your CPU is versus your GPU. For example see the discussion here. I wrote some hacky shell code to do this at sethtro/misc-scripts/ecm_gpu_optimizer

 2022-03-08, 22:36 #137 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 469710 Posts Thanks. This gives me something to study. Unfortunately, the machine I was able to get to run the GPU has only 2 cores and 8G RAM. But, I have a script now that sends the residues to a second machine and moves to the next B1 level. Of course, now the GPU is the bottleneck since I'm only running stage 1 operations on its machine. I'm still looking at what might be best for my setup.
2022-03-17, 08:55   #138
SethTro

"Seth"
Apr 2019

19×23 Posts

Quote:
 Originally Posted by chris2be8 The older version without -cgbn took about 9 hours to do the same job. Many thanks for the speed up.
Fun fact if you follow the advice about custom kernel size you can potentially make this an additional 40% faster

Code:
$echo "1044362381090522430349272504349028000743722878937901553864893424154624748141120681170432021570621655565526684395777956912757565835989960001844742211087555729316372309210417" | ./ecm -cgbn -v 11e5 0 GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Using B1=1100000, B2=0, sigma=3:1276799189-3:1276800020 (832 curves) Compiling custom kernel for 640 bits should be ~144% faster CGBN<1024, 8> running kernel<26 block x 256 threads> input number is 569 bits Computing 1158 bits/call, 96372/1586512 (6.1%), ETA 106 + 7 = 113 seconds (~135 ms/curves) Computing 1158 bits/call, 212172/1586512 (13.4%), ETA 97 + 15 = 113 seconds (~135 ms/curves) Computing 1158 bits/call, 327972/1586512 (20.7%), ETA 89 + 23 = 112 seconds (~135 ms/curves) After changing - typedef cgbn_params_t<8, 1024> cgbn_params_1024; + typedef cgbn_params_t<8, 640> cgbn_params_1024;$ echo "1044362381090522430349272504349028000743722878937901553864893424154624748141120681170432021570621655565526684395777956912757565835989960001844742211087555729316372309210417" | ./ecm -cgbn -v 11e5 0
GMP-ECM 7.0.5-dev [configured with GMP 6.2.99, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM]
Using B1=1100000, B2=0, sigma=3:230651649-3:230652480 (832 curves)
CGBN<640, 8> running kernel<26 block x 256 threads> input number is 569 bits
Computing 1863 bits/call, 146292/1586512 (9.2%), ETA 67 + 7 = 74 seconds (~89 ms/curves)
Computing 1863 bits/call, 332592/1586512 (21.0%), ETA 60 + 16 = 76 seconds (~92 ms/curves)
Computing 1863 bits/call, 518892/1586512 (32.7%), ETA 52 + 25 = 77 seconds (~93 ms/curves)`

Last fiddled with by SethTro on 2022-03-17 at 09:07

 2022-03-17, 09:16 #139 Gimarel   Apr 2010 22×3×19 Posts If trying custom kernel sizes, try also 768 bits. For me (GTX 2060 Super) thats faster than 640 bits.
2022-03-17, 09:37   #140
henryzz
Just call me Henry

"David"
Sep 2007
Liverpool (GMT/BST)

23·7·107 Posts

Quote:
 Originally Posted by Gimarel If trying custom kernel sizes, try also 768 bits. For me (GTX 2060 Super) thats faster than 640 bits.
If thats the case then a kernal benchmark would be useful that identifies the fastest kernels for each card. I currently have a version with all the possible kernals added upto 300 digits or so.

2022-03-17, 16:53   #141
chris2be8

Sep 2009

236810 Posts

Quote:
 Originally Posted by SethTro Fun fact if you follow the advice about custom kernel size you can potentially make this an additional 40% faster
That won't be much help to me, it already takes the CPU much longer to do stage 2 than the GPU takes to do stage 1.

I've looked at your chart for recommended B1 and B2 values, but it confuses my script's calculations of how much ECM to do for a number of a given size. I need to do some serious thinking to get it to all work together.

 2022-04-03, 03:52 #142 wombatman I moo ablest echo power!     May 2013 1,801 Posts Hi, I've built this under WSL2, and everything works quite nicely, but when I do the test file (gpu_throughput_test.sh), CBGN fails when the input number is large enough: "No available CGBN Kernel large enough to process N(1864 bits)" I saw some posts earlier in the thread that might apply, but I thought it would be best to ask before I start messing with anything.
2022-04-03, 06:22   #143
SethTro

"Seth"
Apr 2019

6658 Posts

Quote:
 Originally Posted by wombatman Hi, I've built this under WSL2, and everything works quite nicely, but when I do the test file (gpu_throughput_test.sh), CBGN fails when the input number is large enough: "No available CGBN Kernel large enough to process N(1864 bits)" I saw some posts earlier in the thread that might apply, but I thought it would be best to ask before I start messing with anything.
This is expected. I'm balancing binary size and compile time vs range of numbers that can be tested.

If you want to run ECM on numbers > 1020 bits look around line 670 in cgbn_stage1.cu

Last fiddled with by SethTro on 2022-04-03 at 06:22

 Similar Threads Thread Thread Starter Forum Replies Last Post moytrage Software 50 2021-07-21 05:55 indomit Information & Answers 4 2020-10-07 10:50 paulunderwood Miscellaneous Math 13 2016-08-02 00:05 lidocorc Software 2 2008-11-08 09:26 clowns789 Miscellaneous Math 3 2004-05-27 23:39

All times are UTC. The time now is 09:05.

Thu Aug 11 09:05:12 UTC 2022 up 35 days, 3:52, 2 users, load averages: 1.90, 1.56, 1.28

Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔