mersenneforum.org Faster GPU-ECM with CGBN
 Register FAQ Search Today's Posts Mark Forums Read

 2021-11-12, 04:04 #100 SethTro     "Seth" Apr 2019 433 Posts I spent a good part of this week trying to implement fast squaring for CGBN. Ultimately my code was 10% slower and still had breaking edge cases. In the best case with 100% faster fast squaring, there are 4 mont_sqr and 4 mont_mul so it would only be 8 / (4 / 2 + 4) - 1 = 33% faster. Using GMP's 50% faster number it would be 1 - 8 / (4 / 1.5 + 4) - 1 = 20% faster. I'll reach out to the author of the repo because they mention fast squaring in their paper "Optimizing Modular Multiplication for NVIDIA’s Maxwell GPUs" http://www.acsel-lab.com/arithmetic/...a/1616a047.pdf but it's unlikely to happen.
 2021-11-27, 22:05 #101 henryzz Just call me Henry     "David" Sep 2007 Liverpool (GMT/BST) 32×5×7×19 Posts Just tried to upgrade my version of this as I was on a fairly old version and certain numbers were crashing. Compiling has failed with the following error: Code: /bin/bash ./libtool --tag=CC --mode=compile /usr/local/cuda/bin/nvcc --compile -I/mnt/c/Users/david/Downloads/gmp-ecm-gpu_integration/CGBN/include/cgbn -lgmp -I/usr/local/cuda/include -DECM_GPU_CURVES_BY_BLOCK=32 --generate-code arch=compute_75,code=sm_75 --ptxas-options=-v --compiler-options -fno-strict-aliasing -O2 --compiler-options -fPIC -I/usr/local/cuda/include -DWITH_GPU -o cgbn_stage1.lo cgbn_stage1.cu -static libtool: compile: /usr/local/cuda/bin/nvcc --compile -I/mnt/c/Users/david/Downloads/gmp-ecm-gpu_integration/CGBN/include/cgbn -lgmp -I/usr/local/cuda/include -DECM_GPU_CURVES_BY_BLOCK=32 --generate-code arch=compute_75,code=sm_75 --ptxas-options=-v --compiler-options -fno-strict-aliasing -O2 --compiler-options -fPIC -I/usr/local/cuda/include -DWITH_GPU cgbn_stage1.cu -o cgbn_stage1.o cgbn_stage1.cu(437): error: identifier "cgbn_swap" is undefined detected during instantiation of "void kernel_double_add(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<4U, 512U>]" (800): here cgbn_stage1.cu(444): error: identifier "cgbn_swap" is undefined detected during instantiation of "void kernel_double_add(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<4U, 512U>]" (800): here cgbn_stage1.cu(407): warning: variable "temp" was declared but never referenced detected during instantiation of "void kernel_double_add(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<4U, 512U>]" (800): here cgbn_stage1.cu(437): error: identifier "cgbn_swap" is undefined detected during instantiation of "void kernel_double_add(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<8U, 1024U>]" (803): here cgbn_stage1.cu(444): error: identifier "cgbn_swap" is undefined detected during instantiation of "void kernel_double_add(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<8U, 1024U>]" (803): here cgbn_stage1.cu(407): warning: variable "temp" was declared but never referenced detected during instantiation of "void kernel_double_add(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<8U, 1024U>]" (803): here 4 errors detected in the compilation of "cgbn_stage1.cu". Have I messed something up while updating my local git repository or is the gpu_integration branch broken currently?
 2021-11-28, 04:43 #102 henryzz Just call me Henry     "David" Sep 2007 Liverpool (GMT/BST) 32×5×7×19 Posts May have discovered the issue. I think I need to update CGBN edit: confirmed Last fiddled with by henryzz on 2021-11-28 at 05:21
 2022-03-03, 16:45 #103 chris2be8     Sep 2009 233510 Posts My GTX 970 has burnt out, so I've had to replace it, with a RTX 3060 Ti. That's sm_86 so I had to reinstall ECM-GPU. After updating CUDA to the latest driver and runtime version (11.6) I fetched gmp-ecm again: git clone https://github.com/sethtroisi/gmp-ecm/ -b gpu_integration But ./configure doesn't support GPU arch above 75 so I had to run: ./configure --enable-gpu=75 --with-cuda=/usr/local/cuda CC=gcc-9 -with-cgbn-include=/home/chris/CGBN/include/cgbn Then manually update the makefiles to sm_86. nvcc -h says in part Code: --gpu-code ,... (-code) Specify the name of the NVIDIA GPU to assemble and optimize PTX for. nvcc embeds a compiled code image in the resulting executable for each specified architecture, which is a true binary load image for each 'real' architecture (such as sm_50), and PTX code for the 'virtual' architecture (such as compute_50). During runtime, such embedded PTX code is dynamically compiled by the CUDA runtime system if no binary load image is found for the 'current' GPU. Architectures specified for options '--gpu-architecture' and '--gpu-code' may be 'virtual' as well as 'real', but the architectures must be compatible with the architecture. When the '--gpu-code' option is used, the value for the '--gpu-architecture' option must be a 'virtual' PTX architecture. For instance, '--gpu-architecture=compute_60' is not compatible with '--gpu-code=sm_52', because the earlier compilation stages will assume the availability of 'compute_60' features that are not present on 'sm_52'. Note: the values compute_30, compute_32, compute_35, compute_37, compute_50, sm_30, sm_32, sm_35, sm_37 and sm_50 are deprecated and may be removed in a future release. Allowed values for this option: 'compute_35','compute_37','compute_50', 'compute_52','compute_53','compute_60','compute_61','compute_62','compute_70', 'compute_72','compute_75','compute_80','compute_86','compute_87','lto_35', 'lto_37','lto_50','lto_52','lto_53','lto_60','lto_61','lto_62','lto_70', 'lto_72','lto_75','lto_80','lto_86','lto_87','sm_35','sm_37','sm_50','sm_52', 'sm_53','sm_60','sm_61','sm_62','sm_70','sm_72','sm_75','sm_80','sm_86', 'sm_87'. That nvcc has an option --list-gpu-code to list the gpu architectures supported by the compiler. But older versions of it don't have that option. Older versions of nvcc will probably give a list with: nvcc -h | grep -o -E 'sm_[0-9]+' | sort -u But that won't work for 11.6 because the help lists sm_30 as deprecated even though it's no longer valid. It seems to work OK, but I've not tried it on a big job yet. And I need to update my scripts because the new GPU does 2432 stage 1 curves per run. Which limits its use if I just need to do t30. @ SethTro, can you update configure to support sm_86? One other grouse is that Nvidia seem to regard details of what level of CUDA you need for a given card as top secret information. I wasted a lot of time searching for it.
2022-03-04, 03:45   #104
EdH

"Ed Hall"
Dec 2009

22·1,129 Posts

Quote:
 Originally Posted by chris2be8 My GTX 970 has burnt out,. . . One other grouse is that Nvidia seem to regard details of what level of CUDA you need for a given card as top secret information. I wasted a lot of time searching for it.

You're probably aware of this site, but I've been having good luck at techpowerup for all the details on the various cards.

e.g. https://www.techpowerup.com/gpu-spec...-3060-ti.c3681, which shows CUDA 8.6.

 2022-03-04, 16:39 #105 chris2be8     Sep 2009 91F16 Posts And I've got another problem with the new card: Code: `tests/b58+148> /home/chris/ecm-cgbn/gmp-ecm/ecm -gpu -cgbn -save test1.save 110000000 110000000 /home/chris/ecm-cgbn/gmp-ecm/ecm -gpu -cgbn -save test1.save 60000000 1 /home/chris/ecm-cgbn/gmp-ecm/ecm -gpu -save test2.save 110000000 1
2022-03-05, 00:44   #106
EdH

"Ed Hall"
Dec 2009

106448 Posts

Quote:
 Originally Posted by chris2be8 . . . @EdH, I don't think the old card can be repaired, it smells of burnt plastic. And the one thing techpowerup don't say is what level of CUDA drivers and runtime the card needs.
Which I haven't been trying to make use of yet. I still haven't figured out the correlation with sm/cores/?? and how many parallel processes are run by ECM.

2022-03-05, 06:13   #107
Gimarel

Apr 2010

22·3·19 Posts

Quote:
 Originally Posted by chris2be8 After updating CUDA to the latest driver and runtime version (11.6) I fetched gmp-ecm again: git clone https://github.com/sethtroisi/gmp-ecm/ -b gpu_integration
CGBN has been merged into the main branch, it's probably better to use
git clone https://gitlab.inria.fr/zimmerma/ecm.git

Quote:
 Originally Posted by chris2be8 And I need to update my scripts because the new GPU does 2432 stage 1 curves per run.
There are 4864 shader units on this card according to the technical infos linked above. So if this is correct, it's better to run 4864 curves at once.

Quote:
 Originally Posted by chris2be8 Which limits its use if I just need to do t30.
Why? If you just want to do t30, use 4864 curves and a lower bound of 37e4 and skip stage 2. That should be about t30. Unless you have a very powerful cpu it should be faster.

2022-03-05, 13:55   #108
EdH

"Ed Hall"
Dec 2009

451610 Posts

Quote:
 Originally Posted by Gimarel CGBN has been merged into the main branch, it's probably better to use git clone https://gitlab.inria.fr/zimmerma/ecm.git
Is this where I should retrieve GMP-ECM rather than the svn source I reference, or is the svn source still current? Is the git source the official one?

Quote:
 Originally Posted by Gimarel There are 4864 shader units on this card according to the technical infos linked above. So if this is correct, it's better to run 4864 curves at once.
This is confusing to me. GMP-ECM defaults to 64 curves for an NVS 510 with 192 shading units and for my K20X with 2688 shading units, the default is 896 curves. If I double (triple, etc.) the curves, it doubles (triples, etc.) the GPU time taken. This is all with the svn download.

All help in understanding this is appreciated.

 2022-03-05, 14:45 #109 Gimarel   Apr 2010 22·3·19 Posts I don't know. I have a 2060 Super that has 2176 shader units. Anything below 2176 curves takes as much time as 2176 curves. Total throughput is about 5-10% better for 4352 concurrent curves.
2022-03-06, 13:30   #110
EdH

"Ed Hall"
Dec 2009

22·1,129 Posts

Quote:
 Originally Posted by Gimarel CGBN has been merged into the main branch, it's probably better to use git clone https://gitlab.inria.fr/zimmerma/ecm.git . . .
I am confused (yet, again). How do I start from scratch to compile GMP-ECM with CGBN for an sm_35 card?

 Similar Threads Thread Thread Starter Forum Replies Last Post moytrage Software 50 2021-07-21 05:55 indomit Information & Answers 4 2020-10-07 10:50 paulunderwood Miscellaneous Math 13 2016-08-02 00:05 lidocorc Software 2 2008-11-08 09:26 clowns789 Miscellaneous Math 3 2004-05-27 23:39

All times are UTC. The time now is 05:34.

Sun May 29 05:34:34 UTC 2022 up 45 days, 3:35, 0 users, load averages: 1.25, 1.40, 1.37