![]() |
![]() |
#3521 |
Quasi Admin Thing
May 2005
2·491 Posts |
![]()
Dear everyone
I'm still very new at Linux - now it appears to be a problem, where having not 1 but 2 offline computers trying to run mfaktc I need a guide from a-z on how to make it run. Current OS is Linux Mint 20.1 (Cinnamon 64-bit) I downloaded mfaktc-0.21.linux64.cuda11.2 and copied mfaktc from the compressed folder to a mersenne folder, where I had a mfaktc.ini and a worktodo.txt file First trying to run mfaktc from a terminal, I typed this: ./mfaktc (enter) This gave me a no permission message This I came around, after a googling on the phone, right click and set permissions to read and write in all available selections on the mfaktc, mfaktc.ini and worktodo.txt files Now back in the terminal I again typed this: ./mfaktc (enter) This gave me an "no souch directory" no cudart.11.0 found (or something like that) It appears after more googling that cuda 11.0 is not installed, and that may in fact be correct, because in the terminal when running: nvidia msi, the terminal window told me that I have cuda 11.2 installed and driver 460.32 (if I recall correct) This raises the question does mfaktc 0.21 not work with cuda 11.2? When trying to install the drivers - wich is typical what is missed on windows I tryed running this file: cuda_11.2.0_460.27.04_linux.run (Downloaded from Nvidia website) - using this command: sudo sh cuda_11.2.0_460.27.04_linux.run (enter) Now asking for and recieving my password Then a warning that cuda files 11.2 is already installed and it is recommended to remove the package before installing and after that I aborted to avoid breaking anything. Now KEP ask for a thorough guide, on how to run mfaktc on my 2 Linux Mint 20.1 64 bit machines. So can I download a libcudart11.0 file somewhere or find a version of mfaktc that actually reads and finds the cudadrivers on my Linux machines? Any help is greatly appreciated, since I had hoped to get the 2 gt 1030 up and running with TF for mersenne numbers, before the severe nightcold (-15 degrees celcius) makes its entrance on the night between christmas eve and the day after. PM is welcome, but maybe someone can come up with a guide to the newbies, that the newbies can find for the future - if such guide excist, someone can maybe refer me to such ![]() Last question, is there no way, like on windows, to download the libcudart11.0 driver a memorystick and copy that driverfile to the folder where mfaktc is contained and then get mfaktc running? (Just like on windows) Best regards KEP |
![]() |
![]() |
![]() |
#3522 |
Sep 2011
Germany
3,413 Posts |
![]()
Is someone able to compile the mfaktc app on Ubuntu 18? We need it for CUDA11.1/11.2, most of the users have no glibc2.29
|
![]() |
![]() |
![]() |
#3523 | |
Random Account
Aug 2009
Not U. + S.A.
9D816 Posts |
![]() Quote:
Some members here guided me through the process, and I am grateful they took the time. I took a lot of notes, but I don't know if what I have would work with Ubuntu 18. This is for someone else to say. |
|
![]() |
![]() |
![]() |
#3524 | |
Sep 2002
Database er0rr
2·33·83 Posts |
![]() Quote:
sudo apt-get install build-essential nvidia-cuda-toolkit Download the source (second in the list) from https://www.mersenneforum.org/mfaktc/mfaktc-0.21/ and untar the source with: tar -zxvf mfaktc-0.21.tar.gz Edit the Makefile in the src directory to suit your GPU: https://en.wikipedia.org/wiki/CUDA#GPUs_supported Comment out the lines: Code:
# generate code for various compute capabilities NVCCFLAGS += --generate-code arch=compute_11,code=sm_11 # CC 1.1, 1.2 and 1.3 GPUs will use this code (1.0 is not possible for mfaktc) NVCCFLAGS += --generate-code arch=compute_20,code=sm_20 # CC 2.x GPUs will use this code, one code fits all! NVCCFLAGS += --generate-code arch=compute_30,code=sm_30 # all CC 3.x GPUs _COULD_ use this code NVCCFLAGS += --generate-code arch=compute_35,code=sm_35 # but CC 3.5 (3.2?) _CAN_ use funnel shift which is useful for mfaktc Code:
NVCCFLAGS += --generate-code arch=compute_61,code=sm_61 Last fiddled with by paulunderwood on 2022-03-19 at 17:01 |
|
![]() |
![]() |
![]() |
#3525 | |
Random Account
Aug 2009
Not U. + S.A.
47308 Posts |
![]() Quote:
@paulunderwood: Any idea why the executable has a .exe file extension? It seems sort of strange in this context. |
|
![]() |
![]() |
![]() |
#3526 |
Sep 2002
Database er0rr
2·33·83 Posts |
![]() |
![]() |
![]() |
![]() |
#3527 | |
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996
2×72 Posts |
![]() Quote:
For me, removing whole-program optimization flags from the compiler and linker seem to have eliminated the problem! So no /GL or /ltgc. This makes some sense as a root cause in terms of the cuda build pipeline, and makes no discernable difference in factoring speed for me on 1080ti or 3090. I don't have an updated makefile to link to right now, but anyone building mfaktc on windows should probably remove /ltgc from LFLAGS and /GL from CFLAGS (and -Xcompiler). Ethan |
|
![]() |
![]() |
![]() |
#3528 |
Feb 2022
7 Posts |
![]()
Hello guys I've been following all of your works since last November '21.
I am fascinated by this GIMPS project. I've been able to dedicate * RTX2080 ULTRA XC2 * Ryzen 9 5950X Zen 3 Vermeer * Ryzen 7 1700 Original Zen * An Ocassional T4 from Colab. * "Intel Core2 Duo E8400 @ 3.00GHz Linux64,v30.8,build 11" (ECM small only and Cert work) Here is my current contribution to the cause. 226 nullcure 44763.429 | *** 279 76 7 | 72.5 1.3 17.8 6.5 0.6 1.0 | -------------------------------------------------------------- I've spent a great deal of time researching all of your forum posts and works on different programs I've got the RTX2080 automated with MISFIT & PS D:\GIMPS\mfaktfc> .\mfaktc-win-64.exe mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 100000 SievePrimesAdjust 1 SievePrimesMin 2000 SievePrimesMax 200000 NumStreams 10 CPUStreams 5 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 128Mi bits GPUSieveProcessSize 8Ki bits Checkpoints enabled CheckpointDelay 30s WorkFileAddDelay disabled Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults yes CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 11.60 CUDA device info name NVIDIA GeForce RTX 2080 compute capability 7.5 max threads per block 1024 max shared memory per MP 65536 byte number of multiprocessors 46 clock rate (CUDA cores) 1815MHz memory clock rate: 7000MHz memory bus width: 256 bit Automatic parameters threads per grid 753664 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... Selftest statistics number of tests 107 successfull tests 107 selftest PASSED! got assignment: exp=14200031 bit_min=72 bit_max=73 (67.36 GHz-days) Starting trial factoring M14200031 from 2^72 to 2^73 (67.36 GHz-days) k_min = 166280146951380 k_max = 332560293908488 Using GPU kernel "barrett76_mul32_gs" found a valid checkpoint file! last finished class was: 1681 found 1 factor(s) already [date time] exponent [TF bits]: percent class #, seq | GHZ | time | ETA | #FCs | rate | SieveP. | [Mar 28 11:51] M14200031 [72-73]: 36.8% 1693/4620,353/960 | 3379.25 | 1.794s | 18m09s | 35.99G | 20062.1M/s | 82485 | PS D:\GIMPS\mfaktfc> ---------------------------------------- Theres my mfaktc stats its running back logged assignments from gpu72 (Primarily because my Ryzen 9 does PRP 2 days faster than the RTX2080) Now I've compiled Gpuowl from github I feel like I'm missing something in terms of GpuOwl and PRP testing on the RTX0280 Is there really no way to make it PRP test faster than the 5950X? --------------------------------------------- And please feel free to ask me anything, I'll help out and contribute where I can. I came onto the computer scene back in 1995 Windows 95 Those phreaking things. ;-) I have 3 systems up and running Windows Server with 32 GB DDR4 3200 used for small lab work I do have RDP and free user accounts over VPN it's literally used for “Lab work” it's a test system for people to throw rocks at. I have the older intel core2 running headless Linux… Accounts and SSH available and is party of the “Test Lab” please don't break the DDR2 memory ordering from China takes forever. :-) and last my main system the 5950X on Win11 (Linux WSL2 with X Gui support forced my hand to upgrade from win10) There will be no public access to this system, lol, this is my baby. --------------------------------- So again. Whatever I can do to help contribute, I have 2 systems for private use of friends in a mini lab config for throwing rocks at. (PM me for remote ssh or RDP accounts) and secondly. Is my Ryzen 5950X supposed to PRP faster than the RTX2080 on GpuOwl that I've git compiled for win x64? (Attached latest GpuOwl compiled for windows x64 non-dirty) I am also having issues compiling mfaktc source. Am I missing anything, as I'm only 4 months into this project. ![]() Thanks all. |
![]() |
![]() |
![]() |
#3529 |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3,677 Posts |
![]()
Welcome to the forum and to GIMPS.
NVIDIA RTX20xx, GTX 16xx, and even more so RTX 30xx are far faster (more productive) at TF (than PRP or P-1 or LL), because they have exceptionally low DP/SP performance ratios (1/32 or lower). Teslas, or AMD GPUs, tend to have closer SP/DP ratios. See for example the theoretical performance figures at https://www.techpowerup.com/gpu-spec...rtx-2080.c3224 https://www.techpowerup.com/gpu-specs/radeon-vii.c3358 For a large and growing compilation of reference info, see this thread and note the beginning reading recommendations there. Last fiddled with by kriesel on 2022-03-28 at 17:21 |
![]() |
![]() |
![]() |
#3530 |
Feb 2022
7 Posts |
![]()
Thank you Kriesel, Funny you responded. I had posted on github an issue I had with your mingw64 MSYS2 64bit windows compilation guide oif gpuowl.
Though there was nothing wrong with you guide but an issue I encountered during the compile process. When gpuowl was compiling a file called gpuowl-exanded (i forget the extension) would result in a file size of 0. So get around this I kept the file opened in notepad during the compile so the contents of gpuowl-exanded remained. Problem solved. Though I'm not sure what was causing it. So you're saying if I was to have an AMD Gpu a premium one that it would outperform the 5950x in PRP crunch time? |
![]() |
![]() |
![]() |
#3531 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
2×3,677 Posts |
![]() Quote:
Try running a DC using PRP, or LLDC, on your 5950x. Can you beat 9 hours for ~65M exponent? 8.5 hours? On gpuowl v6.11-380, AMD Radeon VII, Windows 10, GPU memory clock at 1200MHz, GPU power reduced by 20%, stock voltage curve (NTP time sync system): Code:
2022-03-27 16:16:44 test/radeonvii 65005679 FFT: 3.50M 1K:7:256 (17.71 bpw) 2022-03-27 16:16:44 test/radeonvii Expected maximum carry32: 2CD70000 2022-03-27 16:16:45 test/radeonvii OpenCL args "-DEXP=65005679u -DWIDTH=1024u -DSMALL_HEIGHT=256u -DMIDDLE=7u -DPM1=0 -DAMDGPU=1 -DWEIGHT_STEP_MINUS_1=0xe.1b17042cf73dp-6 -DIWEIGHT_STEP_MINUS_1=-0xb.8eee6898b4078p-6 -DNO_ASM=1 -cl-unsafe-math-optimizations -cl-std=CL2.0 -cl-finite-math-only " 2022-03-27 16:16:54 test/radeonvii OpenCL compilation in 9.08 s 2022-03-27 16:16:54 test/radeonvii 65005679 LL 0 loaded: 0000000000000004 ... 2022-03-28 01:14:38 test/radeonvii 65005679 LL 65005677 100.00%; 517 us/it; ETA 0d 00:00; bd325b7241d0a681 2022-03-28 01:14:38 test/radeonvii waiting for the Jacobi check to finish.. 2022-03-28 01:15:20 test/radeonvii 65005679 OK 65000000 (jacobi == -1) 2022-03-28 01:15:20 test/radeonvii {"status":"C", "exponent":"65005679", "worktype":"LL", "res64":"bd325b7241d0a681", "fft-length":"3670016", "shift-count":"0", "program":{"name":"gpuowl", "version":"v6.11-380-g79ea0cc"}, "user":"kriesel", "computer":"test/radeonvii", "aid":"(redacted)", "timestamp":"2022-03-28 06:15:20 UTC"} If I boost it back up to nominal power it will go somewhat faster, maybe ~5%. A quick test on a few hundred K iterations of a higher exponent indicates ~5.4%, so ~8.518 hours estimated for 65M exponent at nominal power. https://www.mersenne.org/report_expo...exp_hi=&full=1 Last fiddled with by kriesel on 2022-03-29 at 20:39 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1719 | 2023-01-16 15:51 |
gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 42 | 2022-12-18 05:59 |
The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |