![]() |
![]() |
#122 |
Sep 2008
Kansas
2×17×107 Posts |
![]() |
![]() |
![]() |
![]() |
#123 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
10010010110012 Posts |
![]() Quote:
I will have to think about some things a bit. The Colab GPU LA was quite complicated to get arranged and is still more involved than the rest, but the Colab portion actually simplified after a bit. It started out as about five separate code blocks. Maybe for the Colab GPU GMP-ECM, I can simply add a function with a switch whether to include CGBN. A few things to think about, but sometimes too many cause me to step back and go do something else. |
|
![]() |
![]() |
![]() |
#124 | ||
Sep 2008
Kansas
2·17·107 Posts |
![]() Quote:
Jeff Gilchrist did some work early on when things were much easier. Just ECM, GGNFS and Msieve. Now with all the different branches it is hard to keep them all straight. Thanks for all your work. Quote:
|
||
![]() |
![]() |
![]() |
#125 | |
Sep 2009
26·37 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#126 |
"Ed Hall"
Dec 2009
Adirondack Mtns
7×11×61 Posts |
![]()
I guess I installed everything OK and it is working. I ran a test of B1 values only (B2=0) on a c170 to compare timings for my K20X card. ECM chose 896 curves, out of the 2688 shading units. (I still don't know why so few.):
Code:
8875593388...97<170>: Completed 1e3 with CGBN in 00:00 Completed 1e3 without CGBN in 00:01 Completed 15e3 with CGBN in 00:03 Completed 15e3 without CGBN in 00:08 Completed 12e4 with CGBN in 00:24 Completed 12e4 without CGBN in 01:02 Completed 1e6 with CGBN in 03:17 Completed 1e6 without CGBN in 08:35 Completed 6e6 with CGBN in 19:41 Completed 6e6 without CGBN in 51:32 |
![]() |
![]() |
![]() |
#127 | |
Mar 2006
3×173 Posts |
![]() Quote:
Could you run those tests again with "-gpucurves 2688"? I'd be interested to see if you can get more curves done in the same time, or more time. Also, for some reason, I seem to remember gpu-ecm running best at half of the number of CUDA cores, but maybe that is different with CGBN? Maybe another test with "-gpucurves 1344" and/or "-gpucurves 5376"? |
|
![]() |
![]() |
![]() |
#128 | |
"Seth"
Apr 2019
19×23 Posts |
![]() Quote:
For example on my 970 it looks like 832 is the best number of curves for 1024 bit numbers (Up to C300) but for smaller numbers (< C15) 3328 is the best number of curves which is 4x the default. Code:
TESTING (2^269-1)/13822297 B1=128000 Step 1 took 139ms Computing 832 Step 1 took 683ms of CPU time / 33471ms of GPU time Throughput: 24.857 curves per second (on average 40.23ms per Step 1) CGBN<512, 4> running kernel<4 block x 256 threads> input number is 246 bits Computing 224 Step 1 took 35ms of CPU time / 7710ms of GPU time Throughput: 29.054 curves per second (on average 34.42ms per Step 1) CGBN<512, 4> running kernel<7 block x 256 threads> input number is 246 bits Computing 416 Step 1 took 9ms of CPU time / 7646ms of GPU time Throughput: 54.408 curves per second (on average 18.38ms per Step 1) CGBN<512, 4> running kernel<13 block x 256 threads> input number is 246 bits Computing 832 Step 1 took 17ms of CPU time / 7629ms of GPU time Throughput: 109.055 curves per second (on average 9.17ms per Step 1) CGBN<512, 4> running kernel<26 block x 256 threads> input number is 246 bits Computing 1664 Step 1 took 21ms of CPU time / 7844ms of GPU time Throughput: 212.141 curves per second (on average 4.71ms per Step 1) CGBN<512, 4> running kernel<52 block x 256 threads> input number is 246 bits Computing 3328 Step 1 took 33ms of CPU time / 13393ms of GPU time Throughput: 248.482 curves per second (on average 4.02ms per Step 1) CGBN<512, 4> running kernel<104 block x 256 threads> input number is 246 bits Computing 6656 Step 1 took 83ms of CPU time / 27894ms of GPU time Throughput: 238.620 curves per second (on average 4.19ms per Step 1) TESTING (2^499-1)/20959 B1=64000 Step 1 took 81ms Computing 832 Step 1 took 384ms of CPU time / 18396ms of GPU time Throughput: 45.228 curves per second (on average 22.11ms per Step 1) CGBN<512, 4> running kernel<4 block x 256 threads> input number is 485 bits Computing 224 Step 1 took 18ms of CPU time / 3956ms of GPU time Throughput: 56.626 curves per second (on average 17.66ms per Step 1) CGBN<512, 4> running kernel<7 block x 256 threads> input number is 485 bits Computing 416 Step 1 took 16ms of CPU time / 3882ms of GPU time Throughput: 107.165 curves per second (on average 9.33ms per Step 1) CGBN<512, 4> running kernel<13 block x 256 threads> input number is 485 bits Computing 832 Step 1 took 6ms of CPU time / 3856ms of GPU time Throughput: 215.783 curves per second (on average 4.63ms per Step 1) CGBN<512, 4> running kernel<26 block x 256 threads> input number is 485 bits Computing 1664 Step 1 took 14ms of CPU time / 4154ms of GPU time Throughput: 400.610 curves per second (on average 2.50ms per Step 1) CGBN<512, 4> running kernel<52 block x 256 threads> input number is 485 bits Computing 3328 Step 1 took 37ms of CPU time / 7469ms of GPU time Throughput: 445.558 curves per second (on average 2.24ms per Step 1) CGBN<512, 4> running kernel<104 block x 256 threads> input number is 485 bits Computing 6656 Step 1 took 47ms of CPU time / 15017ms of GPU time Throughput: 443.217 curves per second (on average 2.26ms per Step 1) TESTING 2^997-1 B1=32000 Step 1 took 73ms Computing 832 Step 1 took 182ms of CPU time / 9450ms of GPU time Throughput: 88.045 curves per second (on average 11.36ms per Step 1) CGBN<1024, 8> running kernel<7 block x 256 threads> input number is 997 bits Computing 224 Step 1 took 28ms of CPU time / 3294ms of GPU time Throughput: 67.994 curves per second (on average 14.71ms per Step 1) CGBN<1024, 8> running kernel<13 block x 256 threads> input number is 997 bits Computing 416 Step 1 took 27ms of CPU time / 3161ms of GPU time Throughput: 131.591 curves per second (on average 7.60ms per Step 1) CGBN<1024, 8> running kernel<26 block x 256 threads> input number is 997 bits Computing 832 Step 1 took 38ms of CPU time / 3450ms of GPU time Throughput: 241.137 curves per second (on average 4.15ms per Step 1) CGBN<1024, 8> running kernel<52 block x 256 threads> input number is 997 bits Computing 1664 Step 1 took 37ms of CPU time / 7034ms of GPU time Throughput: 236.566 curves per second (on average 4.23ms per Step 1) CGBN<1024, 8> running kernel<104 block x 256 threads> input number is 997 bits Computing 3328 Step 1 took 63ms of CPU time / 14158ms of GPU time Throughput: 235.059 curves per second (on average 4.25ms per Step 1) CGBN<1024, 8> running kernel<208 block x 256 threads> input number is 997 bits Computing 6656 Step 1 took 105ms of CPU time / 29785ms of GPU time Throughput: 223.465 curves per second (on average 4.47ms per Step 1) |
|
![]() |
![]() |
![]() |
#129 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
111318 Posts |
![]() Quote:
Code:
function runecmCGBN {
result=$(echo "$comp" | $HOME/Math/ecm-cgbn/ecm-cgbn -cgbn -gpu -gpudevice 0 -q $b1 0)
}
function runecm {
result=$(echo "$comp" | $HOME/Math/ecm-cgbn/ecm-cgbn -gpu -gpudevice 0 -q $b1 0)
}
I will play with the throughput test and other values later. |
|
![]() |
![]() |
![]() |
#130 |
"Ed Hall"
Dec 2009
Adirondack Mtns
7×11×61 Posts |
![]()
Nearly the same:
Code:
8875593388...97<170>: Completed 1e3 with CGBN in 00:00 Completed 1e3 without CGBN in 00:01 Completed 15e3 with CGBN in 00:03 Completed 15e3 without CGBN in 00:08 Completed 12e4 with CGBN in 00:24 Completed 12e4 without CGBN in 01:02 Completed 1e6 with CGBN in 03:17 Completed 1e6 without CGBN in 08:33 Completed 6e6 with CGBN in 19:38 Completed 6e6 without CGBN in 51:22 |
![]() |
![]() |
![]() |
#131 |
"Ed Hall"
Dec 2009
Adirondack Mtns
7×11×61 Posts |
![]()
Not really sure how to use this. Would merely changing the -gpucurves value make all the other values change or would I adjust other things? Is this in the docs?
Code:
$ bash gpu_throughput_test.sh TESTING (2^269-1)/13822297 B1=128000 Step 1 took 275ms Computing 896 Step 1 took 1219ms of CPU time / 65682ms of GPU time Throughput: 13.641 curves per second (on average 73.31ms per Step 1) CGBN<512, 4> running kernel<4 block x 256 threads> input number is 246 bits Computing 224 Step 1 took 23ms of CPU time / 12473ms of GPU time Throughput: 17.959 curves per second (on average 55.68ms per Step 1) CGBN<512, 4> running kernel<7 block x 256 threads> input number is 246 bits Computing 448 Step 1 took 16ms of CPU time / 12474ms of GPU time Throughput: 35.915 curves per second (on average 27.84ms per Step 1) CGBN<512, 4> running kernel<14 block x 256 threads> input number is 246 bits Computing 896 Step 1 took 32ms of CPU time / 12460ms of GPU time Throughput: 71.913 curves per second (on average 13.91ms per Step 1) CGBN<512, 4> running kernel<28 block x 256 threads> input number is 246 bits Computing 1792 Step 1 took 19ms of CPU time / 14248ms of GPU time Throughput: 125.769 curves per second (on average 7.95ms per Step 1) CGBN<512, 4> running kernel<56 block x 256 threads> input number is 246 bits Computing 3584 Step 1 took 45ms of CPU time / 22182ms of GPU time Throughput: 161.573 curves per second (on average 6.19ms per Step 1) CGBN<512, 4> running kernel<112 block x 256 threads> input number is 246 bits Computing 7168 Step 1 took 70ms of CPU time / 44416ms of GPU time Throughput: 161.384 curves per second (on average 6.20ms per Step 1) TESTING (2^499-1)/20959 B1=64000 Step 1 took 184ms Computing 896 Step 1 took 617ms of CPU time / 32883ms of GPU time Throughput: 27.248 curves per second (on average 36.70ms per Step 1) CGBN<512, 4> running kernel<4 block x 256 threads> input number is 485 bits Computing 224 Step 1 took 8ms of CPU time / 6256ms of GPU time Throughput: 35.808 curves per second (on average 27.93ms per Step 1) CGBN<512, 4> running kernel<7 block x 256 threads> input number is 485 bits Computing 448 Step 1 took 16ms of CPU time / 6233ms of GPU time Throughput: 71.872 curves per second (on average 13.91ms per Step 1) CGBN<512, 4> running kernel<14 block x 256 threads> input number is 485 bits Computing 896 Step 1 took 17ms of CPU time / 6235ms of GPU time Throughput: 143.703 curves per second (on average 6.96ms per Step 1) CGBN<512, 4> running kernel<28 block x 256 threads> input number is 485 bits Computing 1792 Step 1 took 24ms of CPU time / 7151ms of GPU time Throughput: 250.600 curves per second (on average 3.99ms per Step 1) CGBN<512, 4> running kernel<56 block x 256 threads> input number is 485 bits Computing 3584 Step 1 took 31ms of CPU time / 11108ms of GPU time Throughput: 322.648 curves per second (on average 3.10ms per Step 1) CGBN<512, 4> running kernel<112 block x 256 threads> input number is 485 bits Computing 7168 Step 1 took 87ms of CPU time / 22239ms of GPU time Throughput: 322.312 curves per second (on average 3.10ms per Step 1) TESTING 2^997-1 B1=32000 Step 1 took 180ms Computing 896 Step 1 took 326ms of CPU time / 16376ms of GPU time Throughput: 54.714 curves per second (on average 18.28ms per Step 1) CGBN<1024, 8> running kernel<7 block x 256 threads> input number is 997 bits Computing 224 Step 1 took 11ms of CPU time / 5296ms of GPU time Throughput: 42.299 curves per second (on average 23.64ms per Step 1) CGBN<1024, 8> running kernel<14 block x 256 threads> input number is 997 bits Computing 448 Step 1 took 14ms of CPU time / 5289ms of GPU time Throughput: 84.698 curves per second (on average 11.81ms per Step 1) CGBN<1024, 8> running kernel<28 block x 256 threads> input number is 997 bits Computing 896 Step 1 took 33ms of CPU time / 6285ms of GPU time Throughput: 142.559 curves per second (on average 7.01ms per Step 1) CGBN<1024, 8> running kernel<56 block x 256 threads> input number is 997 bits Computing 1792 Step 1 took 44ms of CPU time / 10762ms of GPU time Throughput: 166.513 curves per second (on average 6.01ms per Step 1) CGBN<1024, 8> running kernel<112 block x 256 threads> input number is 997 bits Computing 3584 Step 1 took 81ms of CPU time / 21541ms of GPU time Throughput: 166.382 curves per second (on average 6.01ms per Step 1) CGBN<1024, 8> running kernel<224 block x 256 threads> input number is 997 bits Computing 7168 Step 1 took 159ms of CPU time / 43201ms of GPU time Throughput: 165.923 curves per second (on average 6.03ms per Step 1) |
![]() |
![]() |
![]() |
#132 | |
"Seth"
Apr 2019
1101101012 Posts |
![]() Quote:
Maybe a prefix like "This script helps you find the best gpucurves for your gpu. It run ecm (<BINARY NAME>) while changing the -gpucurve parameter from the default on your card, X, to a number of multiplies. It runs at 3 levels a 256 bits (C80), 512 (C150), and 1024 bits (C300). The first line in each set is the CPU timing, then the GPU times for different values of -gpucurve. After it's done something like like "Large values tend to produce better throughput but can double the time to get the curves. We suggest the first -gpucurve value that within 10% of the best throughput." (written on mobile without proofreading, pre-apology for grammar and spelling) Last fiddled with by SethTro on 2022-03-08 at 06:53 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
NTT faster than FFT? | moytrage | Software | 50 | 2021-07-21 05:55 |
PRP on gpu is faster that on cpu | indomit | Information & Answers | 4 | 2020-10-07 10:50 |
faster than LL? | paulunderwood | Miscellaneous Math | 13 | 2016-08-02 00:05 |
My CPU is getting faster and faster ;-) | lidocorc | Software | 2 | 2008-11-08 09:26 |
Faster than LL? | clowns789 | Miscellaneous Math | 3 | 2004-05-27 23:39 |