![]() |
![]() |
#34 |
Jul 2003
So Cal
3×17×47 Posts |
![]()
Is there a simpler way to distribute stage 2 across multiple cores than creating a script to use the -save option with B2=0, split the save file, then launch multiple ecm processes with -resume?
|
![]() |
![]() |
![]() |
#35 |
"Ben"
Feb 2007
7×11×47 Posts |
![]()
I am working on the ability to process ecm save files with yafu, but it isn't ready yet.
|
![]() |
![]() |
![]() |
#36 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
22×1,151 Posts |
![]() Quote:
Edit: For my Colab-GPU ECM experiements, I use: Code:
python3 ecm.py -resume residues The latest version is here. Last fiddled with by EdH on 2021-08-30 at 20:25 |
|
![]() |
![]() |
![]() |
#37 |
"Seth"
Apr 2019
19×23 Posts |
![]()
@EdH I started using ECM.py again and it's great!
--- I wrote a bunch of code today so S_BITS_PER_BATCH is dynamic and there's better verbose output. Verbose output includes this message, when the kernel size is much lager than the input number. Code:
Input number is 2^239-1 (72 digits) Compiling custom kernel for 256 bits should be ~180% faster CGBN<512, 4> running kernel<56 block x 128 threads> Code:
- typedef cgbn_params_t<4, 512> cgbn_params_4_512; + typedef cgbn_params_t<4, 256> cgbn_params_4_512; ETA and estimated throughput Code:
Copying 716800 bits of data to GPU CGBN<640, 8> running kernel<112 block x 128 threads> Computing 100 bits/call, 0/4328085 (0.0%) Computing 110 bits/call, 100/4328085 (0.0%) Computing 121 bits/call, 210/4328085 (0.0%) ... Computing 256 bits/call, 1584/4328085 (0.0%) Computing 655 bits/call, 5630/4328085 (0.1%) Computing 1694 bits/call, 16050/4328085 (0.4%) Computing 2049 bits/call, 35999/4328085 (0.8%), ETA 184 + 2 = 186 seconds (~104 ms/curves) Computing 2049 bits/call, 56489/4328085 (1.3%), ETA 183 + 2 = 185 seconds (~103 ms/curves) ... Computing 2049 bits/call, 158939/4328085 (3.7%), ETA 178 + 7 = 185 seconds (~103 ms/curves) Computing 2049 bits/call, 363839/4328085 (8.4%), ETA 169 + 16 = 185 seconds (~103 ms/curves) ... Computing 2049 bits/call, 1798139/4328085 (41.5%), ETA 109 + 77 = 186 seconds (~104 ms/curves) Computing 2049 bits/call, 2003039/4328085 (46.3%), ETA 100 + 86 = 186 seconds (~104 ms/curves) Computing 2049 bits/call, 4052039/4328085 (93.6%), ETA 12 + 175 = 187 seconds (~104 ms/curves) Copying results back to CPU ... Computing 1792 Step 1 took 240ms of CPU time / 186575ms of GPU time Throughput: 9.605 curves per second (on average 104.12ms per Step 1) I've found that doubling gpucurves can lead to 2x worse throughput! So I may need to add some warnings. Last fiddled with by SethTro on 2021-08-31 at 08:49 |
![]() |
![]() |
![]() |
#38 |
"Ed Hall"
Dec 2009
Adirondack Mtns
22×1,151 Posts |
![]()
Good to read. I just wish I could get my sm_30 card to do something. . . (2 sm-20s and 1 sm_30 and none will do anything productive, . . . yet. With all the install/reinstall/remove activity, now the sm_30 machine is complaining about a linux-kernel, so I've taken a break from trying more.)
|
![]() |
![]() |
![]() |
#39 |
Aug 2020
79*6581e-4;3*2539e-3
2·271 Posts |
![]()
I couldn't find it in the thread (hope I didn't just overlook it), how does the speed of ECM on GPU generally compare to CPU? Say a GTX 1660 or similar.
And is it so that only small B1 values can be used? I found this paper and they also only seem to have used B1=50k. With a 2080 Ti they achieved "2781 ECM trials", I guess curves, per second for B1=50k. That is very fast, but if the B1 size is severely limited, a CPU is still required for larger factors? Last fiddled with by bur on 2021-08-31 at 17:56 |
![]() |
![]() |
![]() |
#40 | |
"Seth"
Apr 2019
19·23 Posts |
![]() Quote:
Both CPU and GPU have the same linear scaling for B1 which can be increased to any number you want. the speedup is strongly depends on your CPU vs GPU. For my 1080ti vs 2600K 250 bits 46x faster on GPU 500 bits 48x faster on GPU 1000 bits 68x faster on GPU 1500 bits 83x faster on GPU 2000 bits 46x faster on GPU Which means we are seeing roughly the same scaling for the GPU as CPU for bit levels < 2K. Informal testing with larger inputs (2048 - 32,768 bits) bits shows the CPU outscales GPU for larger inputs and the speedup slowly decreases from ~50x to ~25x as bits increase from 2K to 16K. At the maximal value of 32K bits performances has decreases again to 14x (from 26x at 16K bits) Last fiddled with by SethTro on 2021-08-31 at 21:02 |
|
![]() |
![]() |
![]() |
#41 |
Bamboozled!
"๐บ๐๐ท๐ท๐ญ"
May 2003
Down not across
2×5,689 Posts |
![]()
it is what I used to do when GPU-enabled ECM still worked on my machines. It was a trivial script to write.
Last fiddled with by xilman on 2021-09-01 at 01:51 Reason: Fix typ |
![]() |
![]() |
![]() |
#42 |
"Ben"
Feb 2007
7×11×47 Posts |
![]()
I re-cloned the gpu_integration branch to capture the latest changes and went through the build process with the following caveats:
specifying --with-gmp together with --with-cgbn-include doesn't work. I had to use the system default gmp (6.0.0). With compute 70 I still have to replace __any with __any_sync(__activemask() on line 10 of cude_kernel_default.cu building with gcc I get this error in cgbn_stage1.cu: cgbn_stage1.cu(654): error: initialization with "{...}" is not allowed for object of type "const std::vector<uint32_t, std::allocator<uint32_t>>" I suppose I need to build with g++ instead? Anyway I can get past all of that and get a working binary and the cpu usage is now much lower. But now the gpu portion appears to be about 15% slower? Before: Code:
Input number is 2^997-1 (301 digits) Computing 5120 Step 1 took 75571ms of CPU time / 129206ms of GPU time Throughput: 39.627 curves per second (on average 25.24ms per Step 1) Code:
Input number is 2^997-1 (301 digits) Computing 5120 Step 1 took 643ms of CPU time / 149713ms of GPU time Throughput: 34.199 curves per second (on average 29.24ms per Step 1) |
![]() |
![]() |
![]() |
#43 |
Sep 2009
22·587 Posts |
![]()
Hello,
I've upgraded my system with a GTX 970 (sm_52) to openSUSE 42.2 and installed CUDA 9.0 on it. But when I try to compile ecm with GPU support ./configure says: Code:
configure: Using cuda.h from /usr/local/cuda/include checking cuda.h usability... no checking cuda.h presence... yes configure: WARNING: cuda.h: present but cannot be compiled configure: WARNING: cuda.h: check for missing prerequisite headers? configure: WARNING: cuda.h: see the Autoconf documentation configure: WARNING: cuda.h: section "Present But Cannot Be Compiled" configure: WARNING: cuda.h: proceeding with the compiler's result configure: WARNING: ## ----------------------------------- ## configure: WARNING: ## Report this to ecm-discuss@inria.fr ## configure: WARNING: ## ----------------------------------- ## checking for cuda.h... no configure: error: required header file missing Code:
Some versions of CUDA are not compatible with recent versions of gcc. To specify which C compiler is called by the CUDA compiler nvcc, type: $ ./configure --enable-gpu --with-cuda-compiler=/PATH/DIR If you get errors about "cuda.h: present but cannot be compiled" Try using an older CC: $ ./configure --enable-gpu CC=gcc-8 The value of this parameter is directly passed to nvcc via the option "--compiler-bindir". By default, GMP-ECM lets nvcc choose what C compiler it uses. Chris (getting slightly frustrated by now) |
![]() |
![]() |
![]() |
#44 | |
"Ed Hall"
Dec 2009
Adirondack Mtns
107748 Posts |
![]() Quote:
* I'm curious about the gcc version numer difference between yours and mine. The default Ubuntu 20.04 gcc is 9.3.0, my Debian Buster is 8.3.0, and the default for my Fedora 33 is 10.3.1. Is your version actually that old compared to mine? |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
NTT faster than FFT? | moytrage | Software | 50 | 2021-07-21 05:55 |
PRP on gpu is faster that on cpu | indomit | Information & Answers | 4 | 2020-10-07 10:50 |
faster than LL? | paulunderwood | Miscellaneous Math | 13 | 2016-08-02 00:05 |
My CPU is getting faster and faster ;-) | lidocorc | Software | 2 | 2008-11-08 09:26 |
Faster than LL? | clowns789 | Miscellaneous Math | 3 | 2004-05-27 23:39 |