mersenneforum.org  

Go Back   mersenneforum.org > Factoring Projects > Msieve

Reply
 
Thread Tools
Old 2022-08-17, 18:06   #111
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

124438 Posts
Default

Quote:
Originally Posted by Plutie View Post
the easiest way would probably be running "msieve -nc2 -g 0" - if it outputs a line showing the VBITS value you compiled msieve with, then it's compiled properly.
Thanks, but to do that, I think -np1 would work better since I wouldn't need to create as many other files first. But I'd still need to look for a value (such as "using GPU" in the log. I was looking for a simple value check or existence check for a file, perhaps a .ptx.
EdH is offline   Reply With Quote
Old 2022-08-17, 18:30   #112
Plutie
 
"Evan"
Dec 2020
Montreal

22×3×7 Posts
Default

ah, in that case - you can look for the lanczos_kernel.ptx file (or stage1_core.ptx)

Last fiddled with by Plutie on 2022-08-17 at 18:31 Reason: oops
Plutie is offline   Reply With Quote
Old 2022-08-17, 18:35   #113
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

7·773 Posts
Default

Quote:
Originally Posted by Plutie View Post
ah, in that case - you can look for the lanczos_kernel.ptx file (or stage1_core.ptx)
Thanks! I'll work with that.
EdH is offline   Reply With Quote
Old 2022-08-18, 13:14   #114
EdH
 
EdH's Avatar
 
"Ed Hall"
Dec 2009
Adirondack Mtns

152316 Posts
Default

I'm too excited to keep this to myself. I finally have sufficient cooling for my M40 GPU and am running a c173 that is in LA on both, the 40-thread machine and the GPU machine.

This is the 40-thread (40GB) machine at start of LA:
Code:
Wed Aug 17 22:53:22 2022  linear algebra at 0.0%, ETA 66h33m
and, current state (08:13):
Code:
linear algebra completed 2537146 of 16995095 dimensions (14.9%, ETA 53h31m)
Here is the GPU machine at start of LA:
Code:
Wed Aug 17 23:11:13 2022  linear algebra at 0.0%, ETA 24h39m
and. current state (08:13):
Code:
linear algebra completed 6241861 of 16995095 dimensions (36.7%, ETA 15h35m)
Here's a litle extra from the GPU machine log:
Code:
Wed Aug 17 22:59:01 2022  using VBITS=256
Wed Aug 17 22:59:01 2022  skipping matrix build
Wed Aug 17 22:59:04 2022  matrix starts at (0, 0)
Wed Aug 17 22:59:07 2022  matrix is 16994916 x 16995095 (5214.6 MB) with weight 1611774956 (94.84/col)
Wed Aug 17 22:59:07 2022  sparse part has weight 1163046519 (68.43/col)
Wed Aug 17 22:59:07 2022  saving the first 240 matrix rows for later
Wed Aug 17 22:59:11 2022  matrix includes 256 packed rows
Wed Aug 17 22:59:16 2022  matrix is 16994676 x 16995095 (4829.9 MB) with weight 1060776224 (62.42/col)
Wed Aug 17 22:59:16 2022  sparse part has weight 994218947 (58.50/col)
Wed Aug 17 22:59:16 2022  using GPU 0 (Tesla M40 24GB)
Wed Aug 17 22:59:16 2022  selected card has CUDA arch 5.2
Wed Aug 17 23:10:30 2022  commencing Lanczos iteration
Wed Aug 17 23:10:31 2022  memory use: 11864.2 MB
The GPU is showing "12701MiB / 22945MiB" for its memory use, so I should be able to do some even larger numbers.
EdH is offline   Reply With Quote
Old 2022-08-19, 03:12   #115
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

101000001010012 Posts
Default

Sorry I didn't follow this thread very close.

Are you saying that you do NFS completely on GPU? I mean, I knew poly can be done, and I am reading now about LA? How about sieving?
If so, where can I grab the exe and the "for dummy" tutorial?
Windows/Linux available? I may give it a try on local (where I run few quite powerful AMD and Nvidia cards) or on Colab (where I have occasional access to P100, V100 and - if lucky- A100).
LaurV is offline   Reply With Quote
Old 2022-08-19, 03:56   #116
Plutie
 
"Evan"
Dec 2020
Montreal

22·3·7 Posts
Plus

Quote:
Originally Posted by LaurV View Post
Sorry I didn't follow this thread very close.

Are you saying that you do NFS completely on GPU? I mean, I knew poly can be done, and I am reading now about LA? How about sieving?
If so, where can I grab the exe and the "for dummy" tutorial?
Windows/Linux available? I may give it a try on local (where I run few quite powerful AMD and Nvidia cards) or on Colab (where I have occasional access to P100, V100 and - if lucky- A100).
currently, polyselect and LA can be done on GPU - sieving and filtering are still on CPU.

here's a quick guide for linux specifically, but I don't think the process will be too different on windows.

Quote:
find the compute capability of your GPU - can be found here.

compilation example here is for a GTX 1060 (CC 6.1)
Code:
git clone https://github.com/gchilders/msieve_nfsathome -b msieve-lacuda-nfsathome
cd msieve_nfsathome
make all CUDA=61 VBITS=256
once compiled, you can run both polyselect and LA just as you would with normal msieve, just add "-g (gpu_num)" to the command. you can lower the VBITS value to fit larger matrices onto GPU during LA, but at a performance penalty.
Plutie is offline   Reply With Quote
Old 2022-08-29, 01:57   #117
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

23×479 Posts
Default

I forgot to add -g 0 to the command line and it seemed to default to device 0. I did specify use_managed=1 so maybe that was enough to invoke the GPU. Then again, I may be using an earlier release.
RichD is offline   Reply With Quote
Old 2022-10-05, 23:21   #118
RichD
 
RichD's Avatar
 
Sep 2008
Kansas

23·479 Posts
Default

Here is a data point for the crossover using a GPU for LA.

Attempt to run 50+% memory over subscribed on a 6GB card. use_managed=1
Code:
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 10820818 x 10821229 (4662.2 MB) with weight 1103319671 (101.96/col)
sparse part has weight 1049028095 (96.94/col)
using GPU 0 (NVIDIA GeForce GTX 1660)
selected card has CUDA arch 7.5
Nonzeros per block: 1750000000
Storing matrix in managed memory
converting matrix to CSR and copying it onto the GPU
1049028095 10820818 10821229
1049028095 10821229 10820818
commencing Lanczos iteration
vector memory use: 2311.7 MB
dense rows memory use: 330.2 MB
sparse matrix memory use: 8086.0 MB
memory use: 10727.9 MB
Allocated 761.4 MB for SpMV library
Allocated 761.4 MB for SpMV library
linear algebra at 0.1%, ETA 139h41m821229 dimensions (0.1%, ETA 139h41m)    
checkpointing every 80000 dimensions21229 dimensions (0.1%, ETA 139h44m)    
linear algebra completed 376713 of 10821229 dimensions (3.5%, ETA 136h25m)
Running without the use of a GPU.
Code:
saving the first 240 matrix rows for later
matrix includes 256 packed rows
matrix is 10820818 x 10821229 (4662.2 MB) with weight 1103319671 (101.96/col)
sparse part has weight 1049028095 (96.94/col)
using block size 8192 and superblock size 147456 for processor cache size 6144 kB
commencing Lanczos iteration (4 threads)
memory use: 6409.8 MB
linear algebra at 0.0%, ETA 105h56m821229 dimensions (0.0%, ETA 105h56m)    
checkpointing every 110000 dimensions1229 dimensions (0.0%, ETA 107h23m)    
linear algebra completed 45961 of 10821229 dimensions (0.4%, ETA 103h24m)
RichD is offline   Reply With Quote
Old 2023-02-24, 12:28   #119
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

25·32·5 Posts
Default Help needed for compiling

Before going to an MPI-enabled CUDA build, I wanted to get a single-card CUDA build first. Unfortunately, the code seems to be problematic for my system. The error messages begin with:
Code:
./cub/agent/agent_merge_sort.cuh(80): error: a class or namespace qualified name is required

./cub/agent/agent_merge_sort.cuh(80): error: expected a ";"

./cub/agent/agent_merge_sort.cuh(81): error: a class or namespace qualified name is required

./cub/agent/agent_merge_sort.cuh(81): error: expected a ";"

./cub/block/specializations/../../block/../util_type.cuh(79): error: class "std::iterator_traits<<error-type>>" has no member "value_type"
          detected during:
            instantiation of type "cub::detail::value_t<<error-type>>"
./cub/block/block_load.cuh(1295): here
            processing of template argument list for "cub::BlockLoadType" based on template arguments <Policy, <error-type>>
./cub/agent/agent_merge_sort.cuh(83): here
GCC and OS details:
Code:
$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 10.2.1-6' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-gcn/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.1 20210110 (Debian 10.2.1-6)
nvcc details:
Code:
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Feb_14_21:12:58_PST_2021
Cuda compilation tools, release 11.2, V11.2.152
Build cuda_11.2.r11.2/compiler.29618528_0
Makefile settings:
Code:
WIN = 0
WIN64 = 0
VBITS = 256
OMP = 1
CUDA = 86

CC = gcc
WARN_FLAGS = -Wall -W
OPT_FLAGS = -O3 -march=native -mtune=native \
            -D_FILE_OFFSET_BITS=64 -DNDEBUG -D_LARGEFILE64_SOURCE -DVBITS=$(VBITS)
kruoli is online now   Reply With Quote
Old 2023-02-24, 17:36   #120
frmky
 
frmky's Avatar
 
Jul 2003
So Cal

A3D16 Posts
Default

Quote:
Originally Posted by kruoli View Post
Code:
Cuda compilation tools, release 11.2, V11.2.152
Use the msieve-lacuda-nfsathome-cuda11.5 branch. Or upgrade your CUDA toolkit. CUDA 11.6 introduced breaking changes to CUB.

Last fiddled with by frmky on 2023-02-24 at 17:37
frmky is offline   Reply With Quote
Old 2023-02-24, 20:31   #121
kruoli
 
kruoli's Avatar
 
"Oliver"
Sep 2017
Porta Westfalica, DE

25×32×5 Posts
Default

Thanks!
For everyone else: Debian 11 (the current stable version) does not have any newer CUDA package in their repository. Instead, install the CUDA toolkit from nVidia instead. Run the following as root after you have purged all currently installed nVidia stuff, CUDA, drivers, etc.:
Code:
wget 'http://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb'
dpkg -i cuda-keyring_1.0-1_all.deb
add-apt-repository contrib # may already be activated
apt update
apt install cuda
Then, you need a "start up script", that must be filled like this:
Code:
export PATH=/usr/local/cuda-12.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Now I have problems again. NVCC says:
./dispatch_unaryspmv_orig.cuh(378): error: identifier "CUB_IS_DEVICE_CODE" is undefined
So I looked that up and it seems like this is deprecated and replaced by NV_IF_TARGET. So I ran
Code:
for f in cub/cub/device/dispatch/dispatch_* cub/cub/grid/grid_queue.cuh cub/dispatch_unaryspmv_orig.cuh;
do
  sed -i 's/CUB_IS_DEVICE_CODE/NV_IF_TARGET/g' $f;
done
but that did not work either, now it does not know about NV_IF_TARGET. Any hints?
kruoli is online now   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Resume linear algebra Timic Msieve 35 2020-10-05 23:08
use msieve linear algebra after CADO-NFS filtering aein Msieve 2 2017-10-05 01:52
Has anyone tried linear algebra on a Threadripper yet? fivemack Hardware 3 2017-10-03 03:11
Linear algebra at 600% CRGreathouse Msieve 8 2009-08-05 07:25
Linear algebra proof Damian Math 8 2007-02-12 22:25

All times are UTC. The time now is 10:26.


Fri Mar 24 10:26:58 UTC 2023 up 218 days, 7:55, 0 users, load averages: 1.07, 1.11, 1.05

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔