mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Factoring

Reply
 
Thread Tools
Old 2022-03-07, 11:58   #122
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

2×17×107 Posts
Default

Quote:
Originally Posted by EdH View Post
I do wonder if I should try to add CGBN to the Colab GMP-ECM session.
That might be helpful. I was thinking of the GPU Msieve LA process but I'm in the wrong thread. I see it was recently posted there. Many thanks!
RichD is offline   Reply With Quote
Old 2022-03-07, 14:08   #123
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

10010010110012 Posts
Default

Quote:
Originally Posted by RichD View Post
That might be helpful. I was thinking of the GPU Msieve LA process but I'm in the wrong thread. I see it was recently posted there. Many thanks!
You're quite welcome. I always hope someone can get some use from the threads.

I will have to think about some things a bit. The Colab GPU LA was quite complicated to get arranged and is still more involved than the rest, but the Colab portion actually simplified after a bit. It started out as about five separate code blocks.

Maybe for the Colab GPU GMP-ECM, I can simply add a function with a switch whether to include CGBN.

A few things to think about, but sometimes too many cause me to step back and go do something else.
EdH is offline   Reply With Quote
Old 2022-03-07, 16:01   #124
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

2·17·107 Posts
Default

Quote:
Originally Posted by EdH View Post
You're quite welcome. I always hope someone can get some use from the threads.
I like the cookbook approach. Everything you need in one post.

Jeff Gilchrist did some work early on when things were much easier. Just ECM, GGNFS and Msieve. Now with all the different branches it is hard to keep them all straight. Thanks for all your work.

Quote:
Originally Posted by EdH View Post
A few things to think about, but sometimes too many cause me to step back and go do something else.
Yea, I have that problem too - AAADD.
RichD is offline   Reply With Quote
Old 2022-03-07, 16:48   #125
chris2be8
 
chris2be8's Avatar
 
Sep 2009

26·37 Posts
Default

Quote:
Originally Posted by SethTro View Post
@chris2be8. I think you have an old version of the code. That limit was removed at some point. Can you check that you are using https://gitlab.inria.fr/zimmerma/ecm.git and not my personal repository (https://github.com/sethtroisi/gmp-ecm).
I was using https://github.com/sethtroisi/gmp-ecm which probably explains it. I'll try https://gitlab.inria.fr/zimmerma/ecm.git once the job that's running on the GPU now has ended (it built OK but I've not tested it yet).
chris2be8 is offline   Reply With Quote
Old 2022-03-07, 20:53   #126
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

7×11×61 Posts
Default

I guess I installed everything OK and it is working. I ran a test of B1 values only (B2=0) on a c170 to compare timings for my K20X card. ECM chose 896 curves, out of the 2688 shading units. (I still don't know why so few.):
Code:
8875593388...97<170>:
Completed 1e3 with     CGBN in 00:00
Completed 1e3 without  CGBN in 00:01
Completed 15e3 with    CGBN in 00:03
Completed 15e3 without CGBN in 00:08
Completed 12e4 with    CGBN in 00:24
Completed 12e4 without CGBN in 01:02
Completed 1e6 with     CGBN in 03:17
Completed 1e6 without  CGBN in 08:35
Completed 6e6 with     CGBN in 19:41
Completed 6e6 without  CGBN in 51:32
Now, when I can get stage 2 to be a bit more competitive. . .
EdH is offline   Reply With Quote
Old 2022-03-07, 21:12   #127
WraithX
 
WraithX's Avatar
 
Mar 2006

3×173 Posts
Default

Quote:
Originally Posted by EdH View Post
I guess I installed everything OK and it is working. I ran a test of B1 values only (B2=0) on a c170 to compare timings for my K20X card. ECM chose 896 curves, out of the 2688 shading units. (I still don't know why so few.):
Code:
8875593388...97<170>:
Completed 1e3 with     CGBN in 00:00
Completed 1e3 without  CGBN in 00:01
Completed 15e3 with    CGBN in 00:03
Completed 15e3 without CGBN in 00:08
Completed 12e4 with    CGBN in 00:24
Completed 12e4 without CGBN in 01:02
Completed 1e6 with     CGBN in 03:17
Completed 1e6 without  CGBN in 08:35
Completed 6e6 with     CGBN in 19:41
Completed 6e6 without  CGBN in 51:32
Now, when I can get stage 2 to be a bit more competitive. . .
Are the "without CGBN" times still using the gpu? Or is that cpu time to complete 896 curves?

Could you run those tests again with "-gpucurves 2688"? I'd be interested to see if you can get more curves done in the same time, or more time.

Also, for some reason, I seem to remember gpu-ecm running best at half of the number of CUDA cores, but maybe that is different with CGBN? Maybe another test with "-gpucurves 1344" and/or "-gpucurves 5376"?
WraithX is offline   Reply With Quote
Old 2022-03-07, 22:43   #128
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

19×23 Posts
Default

Quote:
Originally Posted by WraithX View Post
Are the "without CGBN" times still using the gpu? Or is that cpu time to complete 896 curves?

Could you run those tests again with "-gpucurves 2688"? I'd be interested to see if you can get more curves done in the same time, or more time.

Also, for some reason, I seem to remember gpu-ecm running best at half of the number of CUDA cores, but maybe that is different with CGBN? Maybe another test with "-gpucurves 1344" and/or "-gpucurves 5376"?
You can run `./gpu_throughput_test.sh` from the gmp-ecm folder and it should test with many different multiples of the default (1/4x, 1/2x, 1x, 2x, 4x, 8x). If the default 1x curves is bad It takes the number of curves as an optional 2nd parameter after the ecm binary as an optional first parameter so something like `./gpu_throughput_test.sh ./ecm 1344`

For example on my 970 it looks like 832 is the best number of curves for 1024 bit numbers (Up to C300) but for smaller numbers (< C15) 3328 is the best number of curves which is 4x the default.


Code:
TESTING (2^269-1)/13822297 B1=128000
Step 1 took 139ms
Computing 832 Step 1 took 683ms of CPU time / 33471ms of GPU time
Throughput: 24.857 curves per second (on average 40.23ms per Step 1)

CGBN<512, 4> running kernel<4 block x 256 threads> input number is 246 bits
Computing 224 Step 1 took 35ms of CPU time / 7710ms of GPU time
Throughput: 29.054 curves per second (on average 34.42ms per Step 1)

CGBN<512, 4> running kernel<7 block x 256 threads> input number is 246 bits
Computing 416 Step 1 took 9ms of CPU time / 7646ms of GPU time
Throughput: 54.408 curves per second (on average 18.38ms per Step 1)

CGBN<512, 4> running kernel<13 block x 256 threads> input number is 246 bits
Computing 832 Step 1 took 17ms of CPU time / 7629ms of GPU time
Throughput: 109.055 curves per second (on average 9.17ms per Step 1)

CGBN<512, 4> running kernel<26 block x 256 threads> input number is 246 bits
Computing 1664 Step 1 took 21ms of CPU time / 7844ms of GPU time
Throughput: 212.141 curves per second (on average 4.71ms per Step 1)

CGBN<512, 4> running kernel<52 block x 256 threads> input number is 246 bits
Computing 3328 Step 1 took 33ms of CPU time / 13393ms of GPU time
Throughput: 248.482 curves per second (on average 4.02ms per Step 1)

CGBN<512, 4> running kernel<104 block x 256 threads> input number is 246 bits
Computing 6656 Step 1 took 83ms of CPU time / 27894ms of GPU time
Throughput: 238.620 curves per second (on average 4.19ms per Step 1)



TESTING (2^499-1)/20959 B1=64000
Step 1 took 81ms
Computing 832 Step 1 took 384ms of CPU time / 18396ms of GPU time
Throughput: 45.228 curves per second (on average 22.11ms per Step 1)

CGBN<512, 4> running kernel<4 block x 256 threads> input number is 485 bits
Computing 224 Step 1 took 18ms of CPU time / 3956ms of GPU time
Throughput: 56.626 curves per second (on average 17.66ms per Step 1)

CGBN<512, 4> running kernel<7 block x 256 threads> input number is 485 bits
Computing 416 Step 1 took 16ms of CPU time / 3882ms of GPU time
Throughput: 107.165 curves per second (on average 9.33ms per Step 1)

CGBN<512, 4> running kernel<13 block x 256 threads> input number is 485 bits
Computing 832 Step 1 took 6ms of CPU time / 3856ms of GPU time
Throughput: 215.783 curves per second (on average 4.63ms per Step 1)

CGBN<512, 4> running kernel<26 block x 256 threads> input number is 485 bits
Computing 1664 Step 1 took 14ms of CPU time / 4154ms of GPU time
Throughput: 400.610 curves per second (on average 2.50ms per Step 1)

CGBN<512, 4> running kernel<52 block x 256 threads> input number is 485 bits
Computing 3328 Step 1 took 37ms of CPU time / 7469ms of GPU time
Throughput: 445.558 curves per second (on average 2.24ms per Step 1)

CGBN<512, 4> running kernel<104 block x 256 threads> input number is 485 bits
Computing 6656 Step 1 took 47ms of CPU time / 15017ms of GPU time
Throughput: 443.217 curves per second (on average 2.26ms per Step 1)



TESTING 2^997-1 B1=32000
Step 1 took 73ms
Computing 832 Step 1 took 182ms of CPU time / 9450ms of GPU time
Throughput: 88.045 curves per second (on average 11.36ms per Step 1)

CGBN<1024, 8> running kernel<7 block x 256 threads> input number is 997 bits
Computing 224 Step 1 took 28ms of CPU time / 3294ms of GPU time
Throughput: 67.994 curves per second (on average 14.71ms per Step 1)

CGBN<1024, 8> running kernel<13 block x 256 threads> input number is 997 bits
Computing 416 Step 1 took 27ms of CPU time / 3161ms of GPU time
Throughput: 131.591 curves per second (on average 7.60ms per Step 1)

CGBN<1024, 8> running kernel<26 block x 256 threads> input number is 997 bits
Computing 832 Step 1 took 38ms of CPU time / 3450ms of GPU time
Throughput: 241.137 curves per second (on average 4.15ms per Step 1)

CGBN<1024, 8> running kernel<52 block x 256 threads> input number is 997 bits
Computing 1664 Step 1 took 37ms of CPU time / 7034ms of GPU time
Throughput: 236.566 curves per second (on average 4.23ms per Step 1)

CGBN<1024, 8> running kernel<104 block x 256 threads> input number is 997 bits
Computing 3328 Step 1 took 63ms of CPU time / 14158ms of GPU time
Throughput: 235.059 curves per second (on average 4.25ms per Step 1)

CGBN<1024, 8> running kernel<208 block x 256 threads> input number is 997 bits
Computing 6656 Step 1 took 105ms of CPU time / 29785ms of GPU time
Throughput: 223.465 curves per second (on average 4.47ms per Step 1)
SethTro is offline   Reply With Quote
Old 2022-03-07, 23:01   #129
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

111318 Posts
Default

Quote:
Originally Posted by WraithX View Post
Are the "without CGBN" times still using the gpu? Or is that cpu time to complete 896 curves?

Could you run those tests again with "-gpucurves 2688"? I'd be interested to see if you can get more curves done in the same time, or more time.

Also, for some reason, I seem to remember gpu-ecm running best at half of the number of CUDA cores, but maybe that is different with CGBN? Maybe another test with "-gpucurves 1344" and/or "-gpucurves 5376"?
Those were comparing GPU times between ECM compiled with CGBN and ECM without CGBN compiled. I'm running the test again with -cgbn present and absent for the ECM that was compiled with CGBN:
Code:
function runecmCGBN {
  result=$(echo "$comp" | $HOME/Math/ecm-cgbn/ecm-cgbn -cgbn -gpu -gpudevice 0 -q $b1 0)
}

function runecm {
  result=$(echo "$comp" | $HOME/Math/ecm-cgbn/ecm-cgbn -gpu -gpudevice 0 -q $b1 0)
}
So far the times are pretty close.

I will play with the throughput test and other values later.
EdH is offline   Reply With Quote
Old 2022-03-08, 00:56   #130
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

7×11×61 Posts
Default

Quote:
Originally Posted by EdH View Post
So far the times are pretty close.
. . .
Nearly the same:
Code:
8875593388...97<170>:
Completed 1e3 with     CGBN in 00:00
Completed 1e3 without  CGBN in 00:01
Completed 15e3 with    CGBN in 00:03
Completed 15e3 without CGBN in 00:08
Completed 12e4 with    CGBN in 00:24
Completed 12e4 without CGBN in 01:02
Completed 1e6 with     CGBN in 03:17
Completed 1e6 without  CGBN in 08:33
Completed 6e6 with     CGBN in 19:38
Completed 6e6 without  CGBN in 51:22
EdH is offline   Reply With Quote
Old 2022-03-08, 01:13   #131
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

7×11×61 Posts
Default

Not really sure how to use this. Would merely changing the -gpucurves value make all the other values change or would I adjust other things? Is this in the docs?
Code:
$ bash gpu_throughput_test.sh
 
TESTING (2^269-1)/13822297 B1=128000
Step 1 took 275ms
Computing 896 Step 1 took 1219ms of CPU time / 65682ms of GPU time
Throughput: 13.641 curves per second (on average 73.31ms per Step 1)

CGBN<512, 4> running kernel<4 block x 256 threads> input number is 246 bits
Computing 224 Step 1 took 23ms of CPU time / 12473ms of GPU time
Throughput: 17.959 curves per second (on average 55.68ms per Step 1)

CGBN<512, 4> running kernel<7 block x 256 threads> input number is 246 bits
Computing 448 Step 1 took 16ms of CPU time / 12474ms of GPU time
Throughput: 35.915 curves per second (on average 27.84ms per Step 1)

CGBN<512, 4> running kernel<14 block x 256 threads> input number is 246 bits
Computing 896 Step 1 took 32ms of CPU time / 12460ms of GPU time
Throughput: 71.913 curves per second (on average 13.91ms per Step 1)

CGBN<512, 4> running kernel<28 block x 256 threads> input number is 246 bits
Computing 1792 Step 1 took 19ms of CPU time / 14248ms of GPU time
Throughput: 125.769 curves per second (on average 7.95ms per Step 1)

CGBN<512, 4> running kernel<56 block x 256 threads> input number is 246 bits
Computing 3584 Step 1 took 45ms of CPU time / 22182ms of GPU time
Throughput: 161.573 curves per second (on average 6.19ms per Step 1)

CGBN<512, 4> running kernel<112 block x 256 threads> input number is 246 bits
Computing 7168 Step 1 took 70ms of CPU time / 44416ms of GPU time
Throughput: 161.384 curves per second (on average 6.20ms per Step 1)



TESTING (2^499-1)/20959 B1=64000
Step 1 took 184ms
Computing 896 Step 1 took 617ms of CPU time / 32883ms of GPU time
Throughput: 27.248 curves per second (on average 36.70ms per Step 1)

CGBN<512, 4> running kernel<4 block x 256 threads> input number is 485 bits
Computing 224 Step 1 took 8ms of CPU time / 6256ms of GPU time
Throughput: 35.808 curves per second (on average 27.93ms per Step 1)

CGBN<512, 4> running kernel<7 block x 256 threads> input number is 485 bits
Computing 448 Step 1 took 16ms of CPU time / 6233ms of GPU time
Throughput: 71.872 curves per second (on average 13.91ms per Step 1)

CGBN<512, 4> running kernel<14 block x 256 threads> input number is 485 bits
Computing 896 Step 1 took 17ms of CPU time / 6235ms of GPU time
Throughput: 143.703 curves per second (on average 6.96ms per Step 1)

CGBN<512, 4> running kernel<28 block x 256 threads> input number is 485 bits
Computing 1792 Step 1 took 24ms of CPU time / 7151ms of GPU time
Throughput: 250.600 curves per second (on average 3.99ms per Step 1)

CGBN<512, 4> running kernel<56 block x 256 threads> input number is 485 bits
Computing 3584 Step 1 took 31ms of CPU time / 11108ms of GPU time
Throughput: 322.648 curves per second (on average 3.10ms per Step 1)

CGBN<512, 4> running kernel<112 block x 256 threads> input number is 485 bits
Computing 7168 Step 1 took 87ms of CPU time / 22239ms of GPU time
Throughput: 322.312 curves per second (on average 3.10ms per Step 1)



TESTING 2^997-1 B1=32000
Step 1 took 180ms
Computing 896 Step 1 took 326ms of CPU time / 16376ms of GPU time
Throughput: 54.714 curves per second (on average 18.28ms per Step 1)

CGBN<1024, 8> running kernel<7 block x 256 threads> input number is 997 bits
Computing 224 Step 1 took 11ms of CPU time / 5296ms of GPU time
Throughput: 42.299 curves per second (on average 23.64ms per Step 1)

CGBN<1024, 8> running kernel<14 block x 256 threads> input number is 997 bits
Computing 448 Step 1 took 14ms of CPU time / 5289ms of GPU time
Throughput: 84.698 curves per second (on average 11.81ms per Step 1)

CGBN<1024, 8> running kernel<28 block x 256 threads> input number is 997 bits
Computing 896 Step 1 took 33ms of CPU time / 6285ms of GPU time
Throughput: 142.559 curves per second (on average 7.01ms per Step 1)

CGBN<1024, 8> running kernel<56 block x 256 threads> input number is 997 bits
Computing 1792 Step 1 took 44ms of CPU time / 10762ms of GPU time
Throughput: 166.513 curves per second (on average 6.01ms per Step 1)

CGBN<1024, 8> running kernel<112 block x 256 threads> input number is 997 bits
Computing 3584 Step 1 took 81ms of CPU time / 21541ms of GPU time
Throughput: 166.382 curves per second (on average 6.01ms per Step 1)

CGBN<1024, 8> running kernel<224 block x 256 threads> input number is 997 bits
Computing 7168 Step 1 took 159ms of CPU time / 43201ms of GPU time
Throughput: 165.923 curves per second (on average 6.03ms per Step 1)
EdH is offline   Reply With Quote
Old 2022-03-08, 06:53   #132
SethTro
 
SethTro's Avatar
 
"Seth"
Apr 2019

1101101012 Posts
Default

Quote:
Originally Posted by EdH View Post
Not really sure how to use this. Would merely changing the -gpucurves value make all the other values change or would I adjust other things? Is this in the docs?
This isn't documented anywhere, but if we talk through some good notes here I'll happily write them up and include them after the program runs. This is doing the same thing you are with runecmCGBN, it runs ecm with a bunch of different -gpucurves and prints out the time for each.

Maybe a prefix like "This script helps you find the best gpucurves for your gpu. It run ecm (<BINARY NAME>) while changing the -gpucurve parameter from the default on your card, X, to a number of multiplies. It runs at 3 levels a 256 bits (C80), 512 (C150), and 1024 bits (C300). The first line in each set is the CPU timing, then the GPU times for different values of -gpucurve.

After it's done something like like "Large values tend to produce better throughput but can double the time to get the curves. We suggest the first -gpucurve value that within 10% of the best throughput."


(written on mobile without proofreading, pre-apology for grammar and spelling)

Last fiddled with by SethTro on 2022-03-08 at 06:53
SethTro is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
NTT faster than FFT? moytrage Software 50 2021-07-21 05:55
PRP on gpu is faster that on cpu indomit Information & Answers 4 2020-10-07 10:50
faster than LL? paulunderwood Miscellaneous Math 13 2016-08-02 00:05
My CPU is getting faster and faster ;-) lidocorc Software 2 2008-11-08 09:26
Faster than LL? clowns789 Miscellaneous Math 3 2004-05-27 23:39

All times are UTC. The time now is 09:33.


Thu Aug 11 09:33:50 UTC 2022 up 35 days, 4:21, 2 users, load averages: 0.95, 1.15, 1.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔