![]() |
[QUOTE=chris2be8;587296]That fails:
[code] And 'git pull' does nothing: [code] chris@4core:~/CGBN> git pull Already up to date. [/code]Unless I'm not using it correctly.[/QUOTE] Ignore this, but for completion sake you can probably clone my copy of CGBN with `git clone -b cgbn_swap https://github.com/sethtroisi/CGBN.git` The top entry from `git log` should be [CODE] commit 1595e543801bcbffd2c36cbf978baff843c09876 (HEAD -> gpu_integration, origin/gpu_integration) Author: Seth Troisi <sethtroisi@google.com> Date: Sat Sep 4 20:26:30 2021 -0700 reverted the cgbn_swap change till that is accepted [/CODE] If so you should be able to build. If it's not try `git fetch` then `git pull origin gpu_integration` |
I'm still stuck. I re-downloaded everything from scratch and re-ran autoreconf -si, ./configure and make. But make still fails
[code] ... libtool: link: ( cd ".libs" && rm -f "libecm.la" && ln -s "../libecm.la" "libecm.la" ) /bin/sh ./libtool --tag=CC --mode=link gcc-9 -g -I/usr/local/cuda/include -g -O2 -DWITH_GPU -R /usr/local/cuda/lib64 -o ecm ecm-auxi.o ecm-b1_ainc.o ecm-candi.o ecm-eval.o ecm-main.o ecm-resume.o ecm-addlaws.o ecm-torsions.o ecm-getprime_r.o aprtcle/ecm-mpz_aprcl.o ecm-memusage.o libecm.la -lgmp -lrt -lm -lm -lm -lm -lm libtool: link: gcc-9 -g -I/usr/local/cuda/include -g -O2 -DWITH_GPU -o ecm ecm-auxi.o ecm-b1_ainc.o ecm-candi.o ecm-eval.o ecm-main.o ecm-resume.o ecm-addlaws.o ecm-torsions.o ecm-getprime_r.o aprtcle/ecm-mpz_aprcl.o ecm-memusage.o ./.libs/libecm.a -L/usr/local/cuda/lib64 -lcudart -lgmp -lrt -lm -Wl,-rpath -Wl,/usr/local/cuda/lib64 /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o): in function `cgbn_ecm_stage1': tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text+0x8b3): undefined reference to `operator delete(void*)' /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text+0x196e): undefined reference to `operator delete(void*)' /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o): in function `void std::vector<unsigned int, std::allocator<unsigned int> >::_M_realloc_insert<unsigned int>(__gnu_cxx::__normal_iterator<unsigned int*, std::vector<unsigned int, std::allocator<unsigned int> > >, unsigned int&&)': tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text._ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_[_ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_]+0x50): undefined reference to `operator new(unsigned long)' /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: tmpxft_00007e39_00000000-6_cgbn_stage1.cudafe1.cpp:(.text._ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_[_ZNSt6vectorIjSaIjEE17_M_realloc_insertIJjEEEvN9__gnu_cxx17__normal_iteratorIPjS1_EEDpOT_]+0xc8): undefined reference to `operator delete(void*)' /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: ./.libs/libecm.a(cgbn_stage1.o):(.data.rel.local.DW.ref.__gxx_personality_v0[DW.ref.__gxx_personality_v0]+0x0): undefined reference to `__gxx_personality_v0' collect2: error: ld returned 1 exit status make[2]: *** [Makefile:973: ecm] Error 1 make[2]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm' make[1]: *** [Makefile:1903: all-recursive] Error 1 make[1]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm' make: *** [Makefile:783: all] Error 2 [/code] Any ideas? |
Did you install with YaST the dev package of libstdc++?
|
Success!
The vital bit of info came from putting "__gxx_personality_v0" into duckduckgo. That told me it's provided by libstdc++ which is the g++ runtime. After installing gcc9-g++ and its run time libstdc++6-devel-gcc9 everything works. This has been an educational experience. Next step is to benchmark cgbn on my GPU. |
Benchmark results:
[code] chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^499-1)/20959" | ./ecm -gpu -gpucurves 3584 -sigma 3:1000 20000 0;date Sun 5 Sep 19:42:42 BST 2021 GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is (2^499-1)/20959 (146 digits) Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves) GPU: Using device code targeted for architecture compile_52 GPU: Ptx version is 52 GPU: maxThreadsPerBlock = 1024 GPU: numRegsPerThread = 31 sharedMemPerBlock = 24576 bytes GPU: Block: 32x32x1 Grid: 112x1x1 (3584 parallel curves) Computing 3584 Step 1 took 190ms of CPU time / 20427ms of GPU time Sun 5 Sep 19:43:03 BST 2021 chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^499-1)/20959" | ./ecm -gpu -cgbn -gpucurves 3584 -sigma 3:1000 20000 0;date Sun 5 Sep 19:43:29 BST 2021 GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is (2^499-1)/20959 (146 digits) Using B1=20000, B2=0, sigma=3:1000-3:4583 (3584 curves) GPU: Using device code targeted for architecture compile_52 GPU: Ptx version is 52 GPU: maxThreadsPerBlock = 640 GPU: numRegsPerThread = 93 sharedMemPerBlock = 0 bytes Computing 3584 Step 1 took 30ms of CPU time / 3644ms of GPU time Sun 5 Sep 19:43:33 BST 2021 chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^997-1)" | ./ecm -gpu -sigma 3:1000 20000 0;date Sun 5 Sep 19:44:25 BST 2021 GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is (2^997-1) (301 digits) Using B1=20000, B2=0, sigma=3:1000-3:1831 (832 curves) GPU: Using device code targeted for architecture compile_52 GPU: Ptx version is 52 GPU: maxThreadsPerBlock = 1024 GPU: numRegsPerThread = 31 sharedMemPerBlock = 24576 bytes GPU: Block: 32x32x1 Grid: 26x1x1 (832 parallel curves) Computing 832 Step 1 took 188ms of CPU time / 4552ms of GPU time Sun 5 Sep 19:44:30 BST 2021 chris@4core:~/ecm-cgbn/gmp-ecm> date;echo "(2^997-1)" | ./ecm -gpu -cgbn -sigma 3:1000 20000 0;date Sun 5 Sep 19:44:41 BST 2021 GMP-ECM 7.0.5-dev [configured with GMP 5.1.3, --enable-asm-redc, --enable-gpu, --enable-assert] [ECM] Input number is (2^997-1) (301 digits) Using B1=20000, B2=0, sigma=3:1000-3:1831 (832 curves) GPU: Using device code targeted for architecture compile_52 GPU: Ptx version is 52 GPU: maxThreadsPerBlock = 640 GPU: numRegsPerThread = 93 sharedMemPerBlock = 0 bytes Computing 832 Step 1 took 8ms of CPU time / 1995ms of GPU time Sun 5 Sep 19:44:44 BST 2021 [/code] So about 5 times faster for (2^499-1)/20959 and about twice as fast for 2^997-1. But these are all small cases. But my overall throughput won't increase much because my CPU can't do stage 2 as fast as the GPU can do stage 1 now. But that's not your fault. And any speedup is nice. Thanks. Other lessons learnt: autoreconf -si creates symlinks to missing files while autoreconf -i copies them. Using -si saves space, but if you upgrade to a new level of automake you can get hanging symlinks: [code] lrwxrwxrwx 1 chris users 32 Nov 12 2015 INSTALL -> /usr/share/automake-1.13/INSTALL lrwxrwxrwx 1 chris users 35 Nov 12 2015 ltmain.sh -> /usr/share/libtool/config/ltmain.sh [/code] They needed updating to: [code] lrwxrwxrwx 1 chris users 32 Sep 4 19:20 INSTALL -> /usr/share/automake-1.15/INSTALL lrwxrwxrwx 1 chris users 38 Sep 4 19:20 ltmain.sh -> /usr/share/libtool/build-aux/ltmain.sh [/code] Not a common issue though. And suggestions for the install process: INSTALL-ecm should tell users to run autoreconf -i (or -si) before running ./configure (which is created by autoreconf -i). ./configure compiles several small programs and runs them to check things. If the compile fails it should put out a message saying the compile failed, not one saying it found different levels of run time library etc. If the compile normally produces no output then letting any output it does produce go to the screen would be informative (eg when it can't find -lstdc++). Chris |
[QUOTE=chris2be8;587337]Success![/QUOTE]
I'm glad we finally got here! 2.2x speedup for the 1024 bit case is almost exactly what everyone else is seeing (except bsquared maybe because newer card?). You can often improve overall throughput by adjust to 1.2*B1 and 1/2*B2 (and checking that expected curves stays roughly the same). This can especially help if Stage 1 time < Stage 2 time / cores. I'll reflect on your notes and see if I can improve the documentation / configure script. |
[QUOTE=SethTro;587429]
I'll reflect on your notes and see if I can improve the documentation / configure script.[/QUOTE] How about updating INSTALL-ecm like this: [code] diff -u INSTALL-ecm INSTALL-ecm.new --- INSTALL-ecm 2021-09-05 12:13:55.613439408 +0100 +++ INSTALL-ecm.new 2021-09-07 16:37:42.903291304 +0100 @@ -19,6 +19,7 @@ 1) check your configuration with: + $ autoreconf -i $ ./configure The configure script accepts several options (see ./configure --help). [/code] That's a minimum change to get new users started. |
[QUOTE=chris2be8;587449]How about updating INSTALL-ecm like this:
[code] diff -u INSTALL-ecm INSTALL-ecm.new --- INSTALL-ecm 2021-09-05 12:13:55.613439408 +0100 +++ INSTALL-ecm.new 2021-09-07 16:37:42.903291304 +0100 @@ -19,6 +19,7 @@ 1) check your configuration with: + $ autoreconf -i $ ./configure The configure script accepts several options (see ./configure --help). [/code] That's a minimum change to get new users started.[/QUOTE] That document describes what users should do when they have downloaded an official release. When building an official release, you do not need to run [C]autoreconf -i[/C]. You only need to run [C]autoreconf -i[/C] when you download a development version with git or svn. I don't think adding [C]autoreconf -i[/C] to this document is a good idea. Looking at the various documents, I see that [C]README.dev[/C] has the advice of running [C]autoreconf -i[/C]. |
How about having INSTALL-ecm tell users to run [c]autoreconf -i[/c] if they don't have a ./configure in the directory?
And if people get an official release would the files that would be created by autoreconf -i be correct for their OS etc? |
@Chris: Did you get your sm_30 card working or just the higher arch one?
|
Just the higher arch one (sm_52). Sorry.
PS. Does CGBN increase the maximum size of number that can be handled? I'd try it, but I'm tied up catching up with ECM work I delayed while I was getting ecm-cgbn working. |
All times are UTC. The time now is 04:35. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.