![]() |
![]() |
#78 | |
"Seth"
Apr 2019
24×33 Posts |
![]() Quote:
Ignore this, but for completion sake you can probably clone my copy of CGBN with `git clone -b cgbn_swap https://github.com/sethtroisi/CGBN.git` The top entry from `git log` should be Code:
commit 1595e543801bcbffd2c36cbf978baff843c09876 (HEAD -> gpu_integration, origin/gpu_integration) Author: Seth Troisi <sethtroisi@google.com> Date: Sat Sep 4 20:26:30 2021 -0700 reverted the cgbn_swap change till that is accepted |
|
![]() |
![]() |
![]() |
#79 |
Sep 2009
2×33×43 Posts |
![]()
I'm still stuck. I re-downloaded everything from scratch and re-ran autoreconf -si, ./configure and make. But make still fails
Code:
... libtool: link: ( cd ".libs" && rm -f "libecm.la" && ln -s "../libecm.la" "libecm.la" ) /bin/sh ./libtool --tag=CC --mode=link gcc-9 -g -I/usr/local/cuda/include -g -O2 -DWITH_GPU -R /usr/local/cuda/lib64 -o ecm ecm-auxi.o ecm-b1_ainc.o ecm-candi.o ecm-eval.o ecm-main.o ecm-resume.o ecm-addlaws.o ecm-torsions.o ecm-getprime_r.o aprtcle/ecm-mpz_aprcl.o ecm-memusage.o libecm.la -lgmp -lrt -lm -lm -lm -lm -lm libtool: link: gcc-9 -g -I/usr/local/cuda/include -g -O2 -DWITH_GPU -o ecm ecm-auxi.o ecm-b1_ainc.o ecm-candi.o ecm-eval.o ecm-main.o ecm-resume.o ecm-addlaws.o ecm-torsions.o ecm-getprime_r.o aprtcle/ecm-mpz_aprcl.o ecm-memusage.o ./.libs/libecm.a -L/usr/local/cuda/lib64 -lcudart -lgmp -lrt -lm -Wl,-rpath -Wl,/usr/local/cuda/lib64 /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o): in function `cgbn_ecm_stage1': tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text+0x8b3): undefined reference to `operator delete(void*)' /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text+0x196e): undefined reference to `operator delete(void*)' /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o): in function `void std::vector<unsigned int, std::allocator<unsigned int> >::_M_realloc_insert<unsigned int>(__gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > >, unsigned int&&)': tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text._ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_[_ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_]+0x50): undefined reference to `operator new(unsigned long)' /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text._ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_[_ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_]+0xc8): undefined reference to `operator delete(void*)' /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o):(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0' collect2: error: ld returned 1 exit status make[2]: *** [Makefile:973: ecm] Error 1 make[2]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm' make[1]: *** [Makefile:1903: all-recursive] Error 1 make[1]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm' make: *** [Makefile:783: all] Error 2 |
![]() |
![]() |
![]() |
#80 |
Sep 2002
Database er0rr
23·179 Posts |
![]()
Did you install with YaST the dev package of libstdc++?
|
![]() |
![]() |
![]() |
#81 |
Sep 2009
232210 Posts |
![]()
Success!
The vital bit of info came from putting "__gxx_personality_v0" into duckduckgo. That told me it's provided by libstdc++ which is the g++ runtime. After installing gcc9-g++ and its run time libstdc++6-devel-gcc9 everything works. This has been an educational experience. Next step is to benchmark cgbn on my GPU. |
![]() |
![]() |
![]() |
#82 |
Sep 2009
44228 Posts |
![]()
Benchmark results:
Code:
chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^499-1)/20959" | ./ecm -gpu -gpucurves 3584 -sigma 3:1000 20000 0;date Sun 5 Sep 19:42:42 BST 2021 GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is (2^499-1)/20959 (146 digits) Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves) GPU: Using device code targeted for architecture compile_52 GPU: Ptx version is 52 GPU: maxThreadsPerBlock = 1024 GPU: numRegsPerThread = 31 sharedMemPerBlock = 24576 bytes GPU: Block: 32x32x1 Grid: 112x1x1 (3584 parallel curves) Computing 3584 Step 1 took 190ms of CPU time / 20427ms of GPU time Sun 5 Sep 19:43:03 BST 2021 chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0;date Sun 5 Sep 19:43:29 BST 2021 GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is (2^499-1)/20959 (146 digits) Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves) GPU: Using device code targeted for architecture compile_52 GPU: Ptx version is 52 GPU: maxThreadsPerBlock = 640 GPU: numRegsPerThread = 93 sharedMemPerBlock = 0 bytes Computing 3584 Step 1 took 30ms of CPU time / 3644ms of GPU time Sun 5 Sep 19:43:33 BST 2021 chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^997-1)" | ./ecm -gpu -sigma 3:1000 20000 0;date Sun 5 Sep 19:44:25 BST 2021 GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is (2^997-1) (301 digits) Using B1=20000, B2=0, sigma=3:1000-3:1831 (832 curves) GPU: Using device code targeted for architecture compile_52 GPU: Ptx version is 52 GPU: maxThreadsPerBlock = 1024 GPU: numRegsPerThread = 31 sharedMemPerBlock = 24576 bytes GPU: Block: 32x32x1 Grid: 26x1x1 (832 parallel curves) Computing 832 Step 1 took 188ms of CPU time / 4552ms of GPU time Sun 5 Sep 19:44:30 BST 2021 chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^997-1)" | ./ecm -gpu -cgbn -sigma 3:1000 20000 0;date Sun 5 Sep 19:44:41 BST 2021 GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is (2^997-1) (301 digits) Using B1=20000, B2=0, sigma=3:1000-3:1831 (832 curves) GPU: Using device code targeted for architecture compile_52 GPU: Ptx version is 52 GPU: maxThreadsPerBlock = 640 GPU: numRegsPerThread = 93 sharedMemPerBlock = 0 bytes Computing 832 Step 1 took 8ms of CPU time / 1995ms of GPU time Sun 5 Sep 19:44:44 BST 2021 But my overall throughput won't increase much because my CPU can't do stage 2 as fast as the GPU can do stage 1 now. But that's not your fault. And any speedup is nice. Thanks. Other lessons learnt: autoreconf -si creates symlinks to missing files while autoreconf -i copies them. Using -si saves space, but if you upgrade to a new level of automake you can get hanging symlinks: Code:
lrwxrwxrwx 1 chris users 32 Nov 12 2015 INSTALL -> /usr/share/automake-1.13/INSTALL lrwxrwxrwx 1 chris users 35 Nov 12 2015 ltmain.sh -> /usr/share/libtool/config/ltmain.sh Code:
lrwxrwxrwx 1 chris users 32 Sep 4 19:20 INSTALL -> /usr/share/automake-1.15/INSTALL lrwxrwxrwx 1 chris users 38 Sep 4 19:20 ltmain.sh -> /usr/share/libtool/build-aux/ltmain.sh And suggestions for the install process: INSTALL-ecm should tell users to run autoreconf -i (or -si) before running ./configure (which is created by autoreconf -i). ./configure compiles several small programs and runs them to check things. If the compile fails it should put out a message saying the compile failed, not one saying it found different levels of run time library etc. If the compile normally produces no output then letting any output it does produce go to the screen would be informative (eg when it can't find -lstdc++). Chris |
![]() |
![]() |
![]() |
#83 |
"Seth"
Apr 2019
43210 Posts |
![]()
I'm glad we finally got here!
2.2x speedup for the 1024 bit case is almost exactly what everyone else is seeing (except bsquared maybe because newer card?). You can often improve overall throughput by adjust to 1.2*B1 and 1/2*B2 (and checking that expected curves stays roughly the same). This can especially help if Stage 1 time < Stage 2 time / cores. I'll reflect on your notes and see if I can improve the documentation / configure script. |
![]() |
![]() |
![]() |
#84 | |
Sep 2009
44228 Posts |
![]() Quote:
Code:
diff -u INSTALL-ecm INSTALL-ecm.new --- INSTALL-ecm 2021-09-05 12:13:55.613439408 +0100 +++ INSTALL-ecm.new 2021-09-07 16:37:42.903291304 +0100 @@ -19,6 +19,7 @@ 1) check your configuration with: + $ autoreconf -i $ ./configure The configure script accepts several options (see ./configure --help). |
|
![]() |
![]() |
![]() |
#85 | |
Mar 2006
3×173 Posts |
![]() Quote:
Looking at the various documents, I see that README.dev has the advice of running autoreconf -i. |
|
![]() |
![]() |
![]() |
#86 |
Sep 2009
1001000100102 Posts |
![]()
How about having INSTALL-ecm tell users to run autoreconf -i if they don't have a ./configure in the directory?
And if people get an official release would the files that would be created by autoreconf -i be correct for their OS etc? |
![]() |
![]() |
![]() |
#87 |
"Ed Hall"
Dec 2009
Adirondack Mtns
2×2,251 Posts |
![]()
@Chris: Did you get your sm_30 card working or just the higher arch one?
|
![]() |
![]() |
![]() |
#88 |
Sep 2009
2×33×43 Posts |
![]()
Just the higher arch one (sm_52). Sorry.
PS. Does CGBN increase the maximum size of number that can be handled? I'd try it, but I'm tied up catching up with ECM work I delayed while I was getting ecm-cgbn working. |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
NTT faster than FFT? | moytrage | Software | 50 | 2021-07-21 05:55 |
PRP on gpu is faster that on cpu | indomit | Information & Answers | 4 | 2020-10-07 10:50 |
faster than LL? | paulunderwood | Miscellaneous Math | 13 | 2016-08-02 00:05 |
My CPU is getting faster and faster ;-) | lidocorc | Software | 2 | 2008-11-08 09:26 |
Faster than LL? | clowns789 | Miscellaneous Math | 3 | 2004-05-27 23:39 |