#111
EdH

"Ed Hall"
Dec 2009

10100010110102 Posts

Quote:
 Originally Posted by Plutie the easiest way would probably be running "msieve -nc2 -g 0" - if it outputs a line showing the VBITS value you compiled msieve with, then it's compiled properly.
Thanks, but to do that, I think -np1 would work better since I wouldn't need to create as many other files first. But I'd still need to look for a value (such as "using GPU" in the log. I was looking for a simple value check or existence check for a file, perhaps a .ptx.

 2022-08-17, 18:30 #112 Plutie   "Evan" Dec 2020 Montreal 22×3×7 Posts ah, in that case - you can look for the lanczos_kernel.ptx file (or stage1_core.ptx) Last fiddled with by Plutie on 2022-08-17 at 18:31 Reason: oops
#112
EdH

"Ed Hall"
Dec 2009

2·5·521 Posts

Quote:
 Originally Posted by Plutie ah, in that case - you can look for the lanczos_kernel.ptx file (or stage1_core.ptx)
Thanks! I'll work with that.

 2022-08-18, 13:14 #114 EdH     "Ed Hall" Dec 2009 Adirondack Mtns 121328 Posts I'm too excited to keep this to myself. I finally have sufficient cooling for my M40 GPU and am running a c173 that is in LA on both, the 40-thread machine and the GPU machine. This is the 40-thread (40GB) machine at start of LA: Code: Wed Aug 17 22:53:22 2022 linear algebra at 0.0%, ETA 66h33m and, current state (08:13): Code: linear algebra completed 2537146 of 16995095 dimensions (14.9%, ETA 53h31m) Here is the GPU machine at start of LA: Code: Wed Aug 17 23:11:13 2022 linear algebra at 0.0%, ETA 24h39m and. current state (08:13): Code: linear algebra completed 6241861 of 16995095 dimensions (36.7%, ETA 15h35m) Here's a litle extra from the GPU machine log: Code: Wed Aug 17 22:59:01 2022 using VBITS=256 Wed Aug 17 22:59:01 2022 skipping matrix build Wed Aug 17 22:59:04 2022 matrix starts at (0, 0) Wed Aug 17 22:59:07 2022 matrix is 16994916 x 16995095 (5214.6 MB) with weight 1611774956 (94.84/col) Wed Aug 17 22:59:07 2022 sparse part has weight 1163046519 (68.43/col) Wed Aug 17 22:59:07 2022 saving the first 240 matrix rows for later Wed Aug 17 22:59:11 2022 matrix includes 256 packed rows Wed Aug 17 22:59:16 2022 matrix is 16994676 x 16995095 (4829.9 MB) with weight 1060776224 (62.42/col) Wed Aug 17 22:59:16 2022 sparse part has weight 994218947 (58.50/col) Wed Aug 17 22:59:16 2022 using GPU 0 (Tesla M40 24GB) Wed Aug 17 22:59:16 2022 selected card has CUDA arch 5.2 Wed Aug 17 23:10:30 2022 commencing Lanczos iteration Wed Aug 17 23:10:31 2022 memory use: 11864.2 MB The GPU is showing "12701MiB / 22945MiB" for its memory use, so I should be able to do some even larger numbers.
 2022-08-19, 03:12 #115 LaurV Romulan Interpreter     "name field" Jun 2011 Thailand 10,273 Posts Sorry I didn't follow this thread very close. Are you saying that you do NFS completely on GPU? I mean, I knew poly can be done, and I am reading now about LA? How about sieving? If so, where can I grab the exe and the "for dummy" tutorial? Windows/Linux available? I may give it a try on local (where I run few quite powerful AMD and Nvidia cards) or on Colab (where I have occasional access to P100, V100 and - if lucky- A100).
#116
Plutie

"Evan"
Dec 2020
Montreal

10101002 Posts

Quote:
 Originally Posted by LaurV Sorry I didn't follow this thread very close. Are you saying that you do NFS completely on GPU? I mean, I knew poly can be done, and I am reading now about LA? How about sieving? If so, where can I grab the exe and the "for dummy" tutorial? Windows/Linux available? I may give it a try on local (where I run few quite powerful AMD and Nvidia cards) or on Colab (where I have occasional access to P100, V100 and - if lucky- A100).
currently, polyselect and LA can be done on GPU - sieving and filtering are still on CPU.

here's a quick guide for linux specifically, but I don't think the process will be too different on windows.

Quote:
 find the compute capability of your GPU - can be found here. compilation example here is for a GTX 1060 (CC 6.1) Code: git clone https://github.com/gchilders/msieve_nfsathome -b msieve-lacuda-nfsathome cd msieve_nfsathome make all CUDA=61 VBITS=256
once compiled, you can run both polyselect and LA just as you would with normal msieve, just add "-g (gpu_num)" to the command. you can lower the VBITS value to fit larger matrices onto GPU during LA, but at a performance penalty.

 2022-08-29, 01:57 #117 RichD     Sep 2008 Kansas 22×13×73 Posts I forgot to add -g 0 to the command line and it seemed to default to device 0. I did specify use_managed=1 so maybe that was enough to invoke the GPU. Then again, I may be using an earlier release.
 2022-10-05, 23:21 #118 RichD     Sep 2008 Kansas 22×13×73 Posts Here is a data point for the crossover using a GPU for LA. Attempt to run 50+% memory over subscribed on a 6GB card. use_managed=1 Code: saving the first 240 matrix rows for later matrix includes 256 packed rows matrix is 10820818 x 10821229 (4662.2 MB) with weight 1103319671 (101.96/col) sparse part has weight 1049028095 (96.94/col) using GPU 0 (NVIDIA GeForce GTX 1660) selected card has CUDA arch 7.5 Nonzeros per block: 1750000000 Storing matrix in managed memory converting matrix to CSR and copying it onto the GPU 1049028095 10820818 10821229 1049028095 10821229 10820818 commencing Lanczos iteration vector memory use: 2311.7 MB dense rows memory use: 330.2 MB sparse matrix memory use: 8086.0 MB memory use: 10727.9 MB Allocated 761.4 MB for SpMV library Allocated 761.4 MB for SpMV library linear algebra at 0.1%, ETA 139h41m821229 dimensions (0.1%, ETA 139h41m) checkpointing every 80000 dimensions21229 dimensions (0.1%, ETA 139h44m) linear algebra completed 376713 of 10821229 dimensions (3.5%, ETA 136h25m) Running without the use of a GPU. Code: saving the first 240 matrix rows for later matrix includes 256 packed rows matrix is 10820818 x 10821229 (4662.2 MB) with weight 1103319671 (101.96/col) sparse part has weight 1049028095 (96.94/col) using block size 8192 and superblock size 147456 for processor cache size 6144 kB commencing Lanczos iteration (4 threads) memory use: 6409.8 MB linear algebra at 0.0%, ETA 105h56m821229 dimensions (0.0%, ETA 105h56m) checkpointing every 110000 dimensions1229 dimensions (0.0%, ETA 107h23m) linear algebra completed 45961 of 10821229 dimensions (0.4%, ETA 103h24m)

