![]() |
Thanks, but I've already tried that:
[code] 4core:/etc/modprobe.d # cat 60-blacklist.nouveau.conf blacklist nouveau options nouveau modeset=0 [/code] And it is in the current initramfs: [code] 4core:/etc/modprobe.d # lsinitrd -f /etc/modprobe.d/60-blacklist.nouveau.conf blacklist nouveau options nouveau modeset=0 [/code] lsmod doesn't show any nvidia kernel modules: [code] 4core:/etc/modprobe.d # lsmod | grep -i nvidia 4core:/etc/modprobe.d # [/code] On my system where CUDA (but not cgbn) works: [code] root@sirius:~# lsmod | grep nvidia nvidia_uvm 876544 0 nvidia_drm 49152 5 nvidia_modeset 1122304 14 nvidia_drm nvidia 19517440 682 nvidia_uvm,nvidia_modeset drm_kms_helper 180224 1 nvidia_drm drm 483328 8 drm_kms_helper,nvidia_drm ipmi_msghandler 102400 2 ipmi_devintf,nvidia [/code] |
Did you install through Yast or a direct download from nVidia?
|
I've already tried that:
[code] 4core:/etc/modprobe.d # cat 60-blacklist.nouveau.conf blacklist nouveau options nouveau modeset=0 [/code] And it is in initrd: [code] 4core:/etc/modprobe.d # lsinitrd -f /etc/modprobe.d/60-blacklist.nouveau.conf blacklist nouveau options nouveau modeset=0 [/code] Digging a bit further I don't think the nvidia kernel modules are correctly installed: [code] 4core:/lib/modules # find . -name 'nvidia*' ./4.12.14-lp150.12.82-default/updates/nvidia-uvm.ko ./4.12.14-lp150.12.82-default/updates/nvidia-modeset.ko ./4.12.14-lp150.12.82-default/updates/nvidia.ko ./4.12.14-lp150.12.82-default/updates/nvidia-drm.ko ./5.3.18-57-default/weak-updates/updates/nvidia-uvm.ko ./5.3.18-57-default/weak-updates/updates/nvidia-modeset.ko ./5.3.18-57-default/weak-updates/updates/nvidia.ko ./5.3.18-57-default/weak-updates/updates/nvidia-drm.ko ./5.3.18-57-default/kernel/drivers/net/ethernet/nvidia ./5.3.18-57-preempt/kernel/drivers/net/ethernet/nvidia ./5.3.18-59.19-preempt/kernel/drivers/net/ethernet/nvidia ./5.3.18-59.19-default/weak-updates/updates/nvidia-uvm.ko ./5.3.18-59.19-default/weak-updates/updates/nvidia-modeset.ko ./5.3.18-59.19-default/weak-updates/updates/nvidia.ko ./5.3.18-59.19-default/weak-updates/updates/nvidia-drm.ko ./5.3.18-59.19-default/kernel/drivers/net/ethernet/nvidia 4core:/lib/modules # uname -r 5.3.18-59.19-preempt [/code] So the kernel I'm running won't find them because it will look in 5.3.18-59.19-preempt even though they are installed in 5.3.18-59.19-default (next question, how to fix this cleanly). But at least I think I know where I'm going now. |
[QUOTE=paulunderwood;587259]Did you install through Yast or a direct download from nVidia?[/QUOTE]
zypper on the command line. Following the instructions on Nvidia's web site [url]https://developer.nvidia.com/cuda-downloads[/url] |
Some of the instructions I saw in the past had a separate step, almost hidden, that was required to install the driver. Is it possible there is a driver install step missing in your procedure?
For my Ubuntu repository install of 10.2, it automatically installs the 470 driver, no matter what I have beforehand. Is there an equivalent to this Ubuntu command?:[code]sudo [B]ubuntu-drivers devices[/B] WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine support level == /sys/devices/pci0000:00/0000:00:03.0/0000:01:00.0 == modalias : pci:v000010DEd00000FFDsv0000103Csd00000967bc03sc00i00 vendor : NVIDIA Corporation model : GK107 [NVS 510] driver : nvidia-driver-450-server - distro non-free driver : nvidia-driver-450 - third-party non-free driver : nvidia-driver-460-server - distro non-free driver : nvidia-driver-455 - third-party non-free driver : nvidia-driver-418-server - distro non-free driver : nvidia-340 - distro non-free driver : nvidia-driver-465 - third-party non-free driver : nvidia-driver-390 - distro non-free driver : nvidia-driver-470 - third-party non-free recommended driver : nvidia-driver-418 - third-party non-free driver : nvidia-driver-410 - third-party non-free driver : nvidia-driver-470-server - distro non-free driver : nvidia-driver-440 - third-party non-free driver : nvidia-driver-460 - third-party non-free driver : xserver-xorg-video-nouveau - distro free builtin[/code]Would such be of any help? |
After rebooting using the 5.3.18-59.19-default kernel the nvidia drivers are picked up:
[code] 4core:~ # lspci -v -s 01:00 01:00.0 VGA compatible controller: NVIDIA Corporation GM204 [GeForce GTX 970] (rev a1) (prog-if 00 [VGA controller]) Subsystem: eVga.com. Corp. Device 3978 Flags: bus master, fast devsel, latency 0, IRQ 16 Memory at f6000000 (32-bit, non-prefetchable) [size=16M] Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0000000 (64-bit, prefetchable) [size=32M] I/O ports at e000 [size=128] [virtual] Expansion ROM at f7000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] #19 Kernel driver in use: nvidia Kernel modules: nouveau, nvidia_drm, nvidia [/code] I'll need to fix that but it can wait for now. Then I started testing things ... msieve works OK: [code] Sat Sep 4 19:10:51 2021 Msieve v. 1.54 (SVN 1043) Sat Sep 4 19:10:51 2021 random seeds: 6e515738 cae1a347 Sat Sep 4 19:10:51 2021 factoring 1522605027922533360535618378132637429718068114961380688657908494580122963258952897654000350692006139 (100 digits) Sat Sep 4 19:10:51 2021 no P-1/P+1/ECM available, skipping Sat Sep 4 19:10:51 2021 commencing number field sieve (100-digit input) Sat Sep 4 19:10:51 2021 commencing number field sieve polynomial selection Sat Sep 4 19:10:51 2021 polynomial degree: 4 Sat Sep 4 19:10:51 2021 max stage 1 norm: 1.16e+17 Sat Sep 4 19:10:51 2021 max stage 2 norm: 8.33e+14 Sat Sep 4 19:10:51 2021 min E-value: 9.89e-09 Sat Sep 4 19:10:51 2021 poly select deadline: 54 Sat Sep 4 19:10:51 2021 time limit set to 0.01 CPU-hours Sat Sep 4 19:10:51 2021 expecting poly E from 1.49e-08 to > 1.71e-08 Sat Sep 4 19:10:51 2021 searching leading coefficients from 10000 to 1000000 Sat Sep 4 19:10:52 2021 using GPU 0 (NVIDIA GeForce GTX 970) Sat Sep 4 19:10:52 2021 selected card has CUDA arch 5.2 Sat Sep 4 19:11:19 2021 polynomial selection complete Sat Sep 4 19:11:19 2021 elapsed time 00:00:28 [/code] But I've been having fun with ecm. The problem with conftest turned out to be: [code] chris@4core:~> gcc-9 -o conftest -I/usr/local/cuda/include -g -O2 -I/usr/local/cuda/include -Wl,-rpath,/usr/local/cuda/lib64 -L/usr/local/cuda/lib64 conftest.c -lcudart -lstdc++ -lcuda -lrt -lm -lm -lm -lm -lm /usr/lib64/gcc/x86_64-suse-linux/9/../../../../x86_64-suse-linux/bin/ld: cannot find -lstdc++ collect2: error: ld returned 1 exit status [/code] So changing ./configure line 15498 from [c]CUDALIB="-lcudart -lstdc++"[/c] to [c]CUDALIB="-lcudart"[/c] made it work OK. I then got a lot of errors like this: [code] Instruction 'vote' without '.sync' is not supported on .target sm_70 and higher from PTX ISA version 6.4 [/code] So edited the Makefile to only build for sm_52 since that's all I need. But trying to build CGBN support I get: [code] chris@4core:~/ecm-cgbn/gmp-ecm> make make all-recursive make[1]: Entering directory '/home/chris/ecm-cgbn/gmp-ecm' Making all in x86_64 make[2]: Entering directory '/home/chris/ecm-cgbn/gmp-ecm/x86_64' make[2]: Nothing to be done for 'all'. make[2]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm/x86_64' make[2]: Entering directory '/home/chris/ecm-cgbn/gmp-ecm' /bin/sh ./libtool --tag=CC --mode=compile /usr/local/cuda/bin/nvcc --compile -I/home/chris/CGBN/include/cgbn -lgmp -I/usr/local/cuda/include -DECM_GPU_CURVES_BY_BLOCK=32 --generate-code arch=compute_52,code=sm_52 --ptxas-options=-v --compiler-options -fno-strict-aliasing -O2 --compiler-options -fPIC -I/usr/local/cuda/include -DWITH_GPU -o cgbn_stage1.lo cgbn_stage1.cu -static libtool: compile: /usr/local/cuda/bin/nvcc --compile -I/home/chris/CGBN/include/cgbn -lgmp -I/usr/local/cuda/include -DECM_GPU_CURVES_BY_BLOCK=32 --generate-code arch=compute_52,code=sm_52 --ptxas-options=-v --compiler-options -fno-strict-aliasing -O2 --compiler-options -fPIC -I/usr/local/cuda/include -DWITH_GPU cgbn_stage1.cu -o cgbn_stage1.o cgbn_stage1.cu(435): error: identifier "cgbn_swap" is undefined detected during instantiation of "void kernel_double_add<params>(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<4U, 512U>]" (757): here cgbn_stage1.cu(442): error: identifier "cgbn_swap" is undefined detected during instantiation of "void kernel_double_add<params>(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<4U, 512U>]" (757): here cgbn_stage1.cu(435): error: identifier "cgbn_swap" is undefined detected during instantiation of "void kernel_double_add<params>(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<8U, 1024U>]" (760): here cgbn_stage1.cu(442): error: identifier "cgbn_swap" is undefined detected during instantiation of "void kernel_double_add<params>(cgbn_error_report_t *, uint32_t, uint32_t, uint32_t, char *, uint32_t *, uint32_t, uint32_t, uint32_t) [with params=cgbn_params_t<8U, 1024U>]" (760): here 4 errors detected in the compilation of "cgbn_stage1.cu". make[2]: *** [Makefile:2571: cgbn_stage1.lo] Error 1 make[2]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm' make[1]: *** [Makefile:1903: all-recursive] Error 1 make[1]: Leaving directory '/home/chris/ecm-cgbn/gmp-ecm' make: *** [Makefile:783: all] Error 2 [/code] This is after several attempts to run make, so hopefully only the relevant messages. But I've got an older version of ecm working on the GPU (at last!) So i'll leave it for now. |
[QUOTE=SethTro;587047]I halved compile time by adding cgbn_swap and avoiding inlining double_add_v2 twice.
[/QUOTE] Does it affect the runtime? I don't care much about the compile time. Just compile a few small kernels for testing, and once it's stable include a good coverage of kernels and just let it compile overnight if necessary. In my current build I included all of [CODE] typedef cgbn_params_t<4, 256> cgbn_params_256; typedef cgbn_params_t<4, 512> cgbn_params_512; typedef cgbn_params_t<8, 768> cgbn_params_768; typedef cgbn_params_t<8, 1024> cgbn_params_1024; typedef cgbn_params_t<8, 1536> cgbn_params_1536; typedef cgbn_params_t<8, 2048> cgbn_params_2048; typedef cgbn_params_t<16, 3072> cgbn_params_3072; typedef cgbn_params_t<16, 4096> cgbn_params_4096; typedef cgbn_params_t<16, 5120> cgbn_params_5120; typedef cgbn_params_t<16, 6144> cgbn_params_6144; typedef cgbn_params_t<16, 7168> cgbn_params_7168; typedef cgbn_params_t<16, 8192> cgbn_params_8192; typedef cgbn_params_t<32, 10240> cgbn_params_10240; typedef cgbn_params_t<32, 12288> cgbn_params_12288; typedef cgbn_params_t<32, 14336> cgbn_params_14336; typedef cgbn_params_t<32, 16384> cgbn_params_16384; typedef cgbn_params_t<32, 18432> cgbn_params_18432; typedef cgbn_params_t<32, 20480> cgbn_params_20480; typedef cgbn_params_t<32, 22528> cgbn_params_22528; typedef cgbn_params_t<32, 24576> cgbn_params_24576; typedef cgbn_params_t<32, 28672> cgbn_params_28672; typedef cgbn_params_t<32, 32768> cgbn_params_32768; [/CODE] and it took a little over an hour to compile for sm_70. |
[QUOTE=chris2be8;587267]
So changing ./configure line 15498 from [c]CUDALIB="-lcudart -lstdc++"[/c] to [c]CUDALIB="-lcudart"[/c] made it work OK. [/QUOTE] Use YaST to search for the dev file of libstdc++ and install it (and its dependencies), and then link with -lstdc++ |
[QUOTE=chris2be8;587267]
This is after several attempts to run make, so hopefully only the relevant messages. But I've got an older version of ecm working on the GPU (at last!) So i'll leave it for now.[/QUOTE] This is an easy fix, you are on the home stretch! I'll committed a change that depends on [url]https://github.com/NVlabs/CGBN/pull/17[/url] being accepted. I'll committed a change reverting that to 3 cgbn_set's for now. After you `git pull` everything should build! Alternatively you can use replace your CGBN directory with this one. `git clone -b cgbn_swap [email]git@github.com:sethtroisi/CGBN.git[/email]` |
[QUOTE=frmky;587274]Does it affect the runtime? I don't care much about the compile time. Just compile a few small kernels for testing, and once it's stable include a good coverage of kernels and just let it compile overnight if necessary. In my current build I included all of
[CODE] typedef cgbn_params_t<4, 256> cgbn_params_256; typedef cgbn_params_t<4, 512> cgbn_params_512; typedef cgbn_params_t<8, 768> cgbn_params_768; typedef cgbn_params_t<8, 1024> cgbn_params_1024; ......... typedef cgbn_params_t<32, 32768> cgbn_params_32768; [/CODE] and it took a little over an hour to compile for sm_70.[/QUOTE] It doesn't reduce runtime, it does make it faster for me to test things and slightly reduces registers pressure. |
[QUOTE=SethTro;587290]Alternatively you can use replace your CGBN directory with this one. `git clone -b cgbn_swap [email]git@github.com:sethtroisi/CGBN.git[/email]`[/QUOTE]
That fails: [code] chris@4core:~> git clone -b cgbn_swap git@github.com:sethtroisi/CGBN.git Cloning into 'CGBN'... The authenticity of host 'github.com (140.82.121.4)' can't be established. RSA key fingerprint is SHA256:nThbg6kXUpJWGl7E1IGOCspRomTxdCARLviKw6E5SY8. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added 'github.com,140.82.121.4' (RSA) to the list of known hosts. git@github.com: Permission denied (publickey). fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. [/code] And 'git pull' does nothing: [code] chris@4core:~/CGBN> git pull Already up to date. [/code] Unless I'm not using it correctly. |
All times are UTC. The time now is 16:15. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.