![]() |
![]() |
#111 |
"Ed Hall"
Dec 2009
Adirondack Mtns
124438 Posts |
![]()
Thanks, but to do that, I think -np1 would work better since I wouldn't need to create as many other files first. But I'd still need to look for a value (such as "using GPU" in the log. I was looking for a simple value check or existence check for a file, perhaps a .ptx.
|
![]() |
![]() |
![]() |
#112 |
"Evan"
Dec 2020
Montreal
22×3×7 Posts |
![]()
ah, in that case - you can look for the lanczos_kernel.ptx file (or stage1_core.ptx)
Last fiddled with by Plutie on 2022-08-17 at 18:31 Reason: oops |
![]() |
![]() |
![]() |
#113 |
"Ed Hall"
Dec 2009
Adirondack Mtns
7·773 Posts |
![]() |
![]() |
![]() |
![]() |
#114 |
"Ed Hall"
Dec 2009
Adirondack Mtns
152316 Posts |
![]()
I'm too excited to keep this to myself. I finally have sufficient cooling for my M40 GPU and am running a c173 that is in LA on both, the 40-thread machine and the GPU machine.
This is the 40-thread (40GB) machine at start of LA: Code:
Wed Aug 17 22:53:22 2022 linear algebra at 0.0%, ETA 66h33m Code:
linear algebra completed 2537146 of 16995095 dimensions (14.9%, ETA 53h31m) Code:
Wed Aug 17 23:11:13 2022 linear algebra at 0.0%, ETA 24h39m Code:
linear algebra completed 6241861 of 16995095 dimensions (36.7%, ETA 15h35m) Code:
Wed Aug 17 22:59:01 2022 using VBITS=256 Wed Aug 17 22:59:01 2022 skipping matrix build Wed Aug 17 22:59:04 2022 matrix starts at (0, 0) Wed Aug 17 22:59:07 2022 matrix is 16994916 x 16995095 (5214.6 MB) with weight 1611774956 (94.84/col) Wed Aug 17 22:59:07 2022 sparse part has weight 1163046519 (68.43/col) Wed Aug 17 22:59:07 2022 saving the first 240 matrix rows for later Wed Aug 17 22:59:11 2022 matrix includes 256 packed rows Wed Aug 17 22:59:16 2022 matrix is 16994676 x 16995095 (4829.9 MB) with weight 1060776224 (62.42/col) Wed Aug 17 22:59:16 2022 sparse part has weight 994218947 (58.50/col) Wed Aug 17 22:59:16 2022 using GPU 0 (Tesla M40 24GB) Wed Aug 17 22:59:16 2022 selected card has CUDA arch 5.2 Wed Aug 17 23:10:30 2022 commencing Lanczos iteration Wed Aug 17 23:10:31 2022 memory use: 11864.2 MB ![]() |
![]() |
![]() |
![]() |
#115 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
101000001010012 Posts |
![]()
Sorry I didn't follow this thread very close.
Are you saying that you do NFS completely on GPU? I mean, I knew poly can be done, and I am reading now about LA? How about sieving? If so, where can I grab the exe and the "for dummy" tutorial? ![]() Windows/Linux available? I may give it a try on local (where I run few quite powerful AMD and Nvidia cards) or on Colab (where I have occasional access to P100, V100 and - if lucky- A100). |
![]() |
![]() |
![]() |
#116 | ||
"Evan"
Dec 2020
Montreal
22·3·7 Posts |
![]() Quote:
here's a quick guide for linux specifically, but I don't think the process will be too different on windows. Quote:
|
||
![]() |
![]() |
![]() |
#117 |
Sep 2008
Kansas
23×479 Posts |
![]()
I forgot to add -g 0 to the command line and it seemed to default to device 0. I did specify use_managed=1 so maybe that was enough to invoke the GPU. Then again, I may be using an earlier release.
|
![]() |
![]() |
![]() |
#118 |
Sep 2008
Kansas
23·479 Posts |
![]()
Here is a data point for the crossover using a GPU for LA.
Attempt to run 50+% memory over subscribed on a 6GB card. use_managed=1 Code:
saving the first 240 matrix rows for later matrix includes 256 packed rows matrix is 10820818 x 10821229 (4662.2 MB) with weight 1103319671 (101.96/col) sparse part has weight 1049028095 (96.94/col) using GPU 0 (NVIDIA GeForce GTX 1660) selected card has CUDA arch 7.5 Nonzeros per block: 1750000000 Storing matrix in managed memory converting matrix to CSR and copying it onto the GPU 1049028095 10820818 10821229 1049028095 10821229 10820818 commencing Lanczos iteration vector memory use: 2311.7 MB dense rows memory use: 330.2 MB sparse matrix memory use: 8086.0 MB memory use: 10727.9 MB Allocated 761.4 MB for SpMV library Allocated 761.4 MB for SpMV library linear algebra at 0.1%, ETA 139h41m821229 dimensions (0.1%, ETA 139h41m) checkpointing every 80000 dimensions21229 dimensions (0.1%, ETA 139h44m) linear algebra completed 376713 of 10821229 dimensions (3.5%, ETA 136h25m) Code:
saving the first 240 matrix rows for later matrix includes 256 packed rows matrix is 10820818 x 10821229 (4662.2 MB) with weight 1103319671 (101.96/col) sparse part has weight 1049028095 (96.94/col) using block size 8192 and superblock size 147456 for processor cache size 6144 kB commencing Lanczos iteration (4 threads) memory use: 6409.8 MB linear algebra at 0.0%, ETA 105h56m821229 dimensions (0.0%, ETA 105h56m) checkpointing every 110000 dimensions1229 dimensions (0.0%, ETA 107h23m) linear algebra completed 45961 of 10821229 dimensions (0.4%, ETA 103h24m) |
![]() |
![]() |
![]() |
#119 |
"Oliver"
Sep 2017
Porta Westfalica, DE
25·32·5 Posts |
![]()
Before going to an MPI-enabled CUDA build, I wanted to get a single-card CUDA build first. Unfortunately, the code seems to be problematic for my system. The error messages begin with:
Code:
./cub/agent/agent_merge_sort.cuh(80): error: a class or namespace qualified name is required ./cub/agent/agent_merge_sort.cuh(80): error: expected a ";" ./cub/agent/agent_merge_sort.cuh(81): error: a class or namespace qualified name is required ./cub/agent/agent_merge_sort.cuh(81): error: expected a ";" ./cub/block/specializations/../../block/../util_type.cuh(79): error: class "std::iterator_traits<<error-type>>" has no member "value_type" detected during: instantiation of type "cub::detail::value_t<<error-type>>" ./cub/block/block_load.cuh(1295): here processing of template argument list for "cub::BlockLoadType" based on template arguments <Policy, <error-type>> ./cub/agent/agent_merge_sort.cuh(83): here Code:
$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/10/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa:hsa OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Debian 10.2.1-6' --with-bugurl=file:///usr/share/doc/gcc-10/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-10 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-10-Km9U7s/gcc-10-10.2.1/debian/tmp-gcn/usr,hsa --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu --with-build-config=bootstrap-lto-lean --enable-link-mutex Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 10.2.1 20210110 (Debian 10.2.1-6) Code:
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Sun_Feb_14_21:12:58_PST_2021 Cuda compilation tools, release 11.2, V11.2.152 Build cuda_11.2.r11.2/compiler.29618528_0 Code:
WIN = 0 WIN64 = 0 VBITS = 256 OMP = 1 CUDA = 86 CC = gcc WARN_FLAGS = -Wall -W OPT_FLAGS = -O3 -march=native -mtune=native \ -D_FILE_OFFSET_BITS=64 -DNDEBUG -D_LARGEFILE64_SOURCE -DVBITS=$(VBITS) |
![]() |
![]() |
![]() |
#120 |
Jul 2003
So Cal
A3D16 Posts |
![]()
Use the msieve-lacuda-nfsathome-cuda11.5 branch. Or upgrade your CUDA toolkit. CUDA 11.6 introduced breaking changes to CUB.
Last fiddled with by frmky on 2023-02-24 at 17:37 |
![]() |
![]() |
![]() |
#121 |
"Oliver"
Sep 2017
Porta Westfalica, DE
25×32×5 Posts |
![]()
Thanks!
For everyone else: Debian 11 (the current stable version) does not have any newer CUDA package in their repository. Instead, install the CUDA toolkit from nVidia instead. Run the following as root after you have purged all currently installed nVidia stuff, CUDA, drivers, etc.: Code:
wget 'http://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/cuda-keyring_1.0-1_all.deb' dpkg -i cuda-keyring_1.0-1_all.deb add-apt-repository contrib # may already be activated apt update apt install cuda Code:
export PATH=/usr/local/cuda-12.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} ./dispatch_unaryspmv_orig.cuh(378): error: identifier "CUB_IS_DEVICE_CODE" is undefined So I looked that up and it seems like this is deprecated and replaced by NV_IF_TARGET. So I ran Code:
for f in cub/cub/device/dispatch/dispatch_* cub/cub/grid/grid_queue.cuh cub/dispatch_unaryspmv_orig.cuh; do sed -i 's/CUB_IS_DEVICE_CODE/NV_IF_TARGET/g' $f; done |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Resume linear algebra | Timic | Msieve | 35 | 2020-10-05 23:08 |
use msieve linear algebra after CADO-NFS filtering | aein | Msieve | 2 | 2017-10-05 01:52 |
Has anyone tried linear algebra on a Threadripper yet? | fivemack | Hardware | 3 | 2017-10-03 03:11 |
Linear algebra at 600% | CRGreathouse | Msieve | 8 | 2009-08-05 07:25 |
Linear algebra proof | Damian | Math | 8 | 2007-02-12 22:25 |