mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-01-10, 20:38   #45
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

167A16 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi henryzz,

for factors between 2^64 to 2^71 it is about twice as fast as a single core of ath's core 2 quad.
Nice to know that my GPU will donate 2 cores worth of throughput on my pc.
Quote:
I like the card since it is relative slow it's easy to spot differences in runtime of the GPU-code, the CPU never limits the throughput.

The RAW speed of the GPU code can be easily estimated since it scales perfect along the GPUs GFLOPS. I have tested this on
- 8400GS (43.2GFLOPS / 2.3M candidates tested per second)
- 8600GT (113GFLOPS / 6.1M candidates tested per second)
- 8800GTX (518GFLOPS / ~28M candidates tested per second)
- GTX 275 (1011GFLOPS / ~54M candidates tested per second)
Nice that it scales with GFLOPS. That makes it easy to make estimates.
Quote:
I think it is not the right time for precompiled binaries, there are too many compiletime options in the code right now.

I forgot to mention: I have run it only on Linux right now (openSUSE 11.1 x86-64).
If you still want a binary I can create one on my system. Let me known which CPU you have, I'll make some settings than.
You need to install the CUDA software aswell.
My attempts at compiling using compilers new to me often fail(still haven't managed to compile anything major properly with Visual Studio for example). I often end up causing more trouble than the time it would take for you to post binaries. My platform is a Q6600. I have the CUDA software and will attempt to compile tomorrow on Ubuntu 9.04-64bit. Hopefully it will go smoothly.
henryzz is offline   Reply With Quote
Old 2010-01-10, 21:35   #46
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×3×5×37 Posts
Default

Hi Henry,

try this one. I won't be surprised if it doesn't work (libraries versions, ...)
I have mistyped the model name of the 8600, I have a 8600GT here, not a 8600GTS. The GTS is faster.

Oliver
Attached Files
File Type: gz mfaktc-binary-openSUSE11.1-CUDA2.3.x64-64.tar.gz (60.0 KB, 282 views)
TheJudger is offline   Reply With Quote
Old 2010-01-11, 09:30   #47
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Hi,
On ubuntu9.04/32bit/GTX260
Code:
$ time ./mfaktc.exe 66362159 64 65
mfaktc v0.01  C...
...
no factor for M66362159 from 2^64 to 2^65 bits
tf(): total time spent: 273133msec

real    4m33.207s
user    4m30.925s
sys     0m2.288s
msft is offline   Reply With Quote
Old 2010-01-11, 10:08   #48
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2×3×5×37 Posts
Default

Hi msft,

can you post the compiletime options and your CPU (Q8400?), too?
I'm pretty sure this run was CPU-limited.

If you want to try to run 2 (or maybe even 3) processes at the same time (in different directories because both processes try to access results.txt).
TheJudger is offline   Reply With Quote
Old 2010-01-11, 10:36   #49
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Yes Q8400.
Code:
#!/bin/bash -x

mkdir compile_bla_bla
cd compile_bla_bla

gcc -Wall -O2 -c ../sieve.c -o sieve.o
nvcc -c ../mfaktc.cu -o mfaktc.o -I  /NVIDIA_GPU_Computing_SDK/C/common/inc/ --ptxas-options=-v --keep -DMUL24HI

mv mfaktc.ptx mfaktc.ptx.old
cat mfaktc.ptx.old | sed s/mul\.hi\.u32/mul24\.hi\.u32/ > mfaktc.ptx

rm -f mfaktc.sm_10.cubin mfaktc.cu.cpp mfaktc.o
  
ptxas --key="xxxxxxxxxx"  -arch=sm_10 -v  "mfaktc.ptx"  -o "mfaktc.sm_10.cubin"
fatbin --key="xxxxxxxxxx" --source-name="../mfaktc.cu" --usage-mode="-v  " --embedded-fatbin="mfaktc.fatbin.c" "--image=profile=sm_10,file=mfaktc.sm_10.cubin" "--image=profile=compute_10,file=mfaktc.ptx"
cudafe++  --gnu_version=40302 --diag_error=host_device_limited_call --diag_error=ms_asm_decl_not_allowed --parse_templates  --gen_c_file_name "mfaktc.cudafe1.cpp" --stub_file_name "mfaktc.cudafe1.stub.c" --stub_header_file_name "mfaktc.cudafe1.stub.h" "mfaktc.cpp1.ii"
gcc -D__CUDA_ARCH__=100 -E -x c++ -DCUDA_NO_SM_12_ATOMIC_INTRINSICS -DCUDA_NO_SM_13_DOUBLE_INTRINSICS -DCUDA_FLOAT_MATH_FUNCTIONS -DCUDA_NO_SM_11_ATOMIC_INTRINSICS  "-I  /NVIDIA_GPU_Computing_SDK/C/common/inc/" -I/usr/local/cuda/include/ -I.   -o "mfaktc.cu.cpp" "mfaktc.cudafe1.cpp"
gcc -c -x c++ "-I /NVIDIA_GPU_Computing_SDK/C/common/inc/"  -I/usr/local/cuda/include/  -I.   -o "mfaktc.o" "mfaktc.cu.cpp"

gcc -fPIC -o ../mfaktc.exe sieve.o mfaktc.o  -L/usr/local/lib  -L/usr/local/cuda/lib -L/NVIDIA_GPU_Computing_SDK/C/lib -L/NVIDIA_GPU_Computing_SDK/C/common/common/lib/linux -lcudart -L/usr/local/cuda/lib -L/NVIDIA_GPU_Computing_SDK/C/lib -L/NVIDIA_GPU_Computing_SDK/C/common/lib/linux -lcufft  -lm

cd ..
rm compile_bla_bla -rf
Code:
$ time ./mfaktc.exe 66362159 64 65
mfaktc v0.01  C...
...
no factor for M66362159 from 2^64 to 2^65 bits
tf(): total time spent: 273291msec

real    4m33.374s
user    4m31.081s
sys     0m2.304s

$ time ./mfaktc.exe 66362159 64 65 &
$ time ./mfaktc.exe 66362159 64 65 &
...
no factor for M66362159 from 2^64 to 2^65 bits
tf(): total time spent: 274948msec

real    4m35.055s
user    4m31.613s
sys     0m3.392s
class  417: tested 265712378014859264 candidates in 12176232284160ms (93725704046247936/sec)
no factor for M66362159 from 2^64 to 2^65 bits
tf(): total time spent: 275090msec

real    4m35.173s
user    4m31.745s
sys     0m3.356s
msft is offline   Reply With Quote
Old 2010-01-11, 11:46   #50
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2·3·5·37 Posts
Default

Thank you, msft!

Actually I was asking for this:
Code:
Compiletime Options
  THREADS_PER_GRID    1048576
  THREADS_PER_BLOCK   256
  SIEVE_SIZE_LIMIT    32kiB
  SIEVE_SIZE          230945bits
  SIEVE_PRIMES        250000
  USE_PINNED_MEMORY   enabled
  USE_ASYNC_COPY      enabled
  VERBOSE_TIMING      disabled
  SELFTEST            disabled
  MORE_CLASSES        disabled
It is clearly CPU bound with only one process on your machine (this was expected). The slowdown from one to two processes is very little.
275s for 2 times from 2^64 to 2^65 of M66362159 looks reasonable (still a little bit CPU-limited). My 275GTX paired with a fast Core 2 Duo does is in ~220 seconds.


Quote:
class 417: tested 265712378014859264 candidates in 12176232284160ms (93725704046247936/sec)
This doesn't look as it should.
Can you edit mfaktc.cu line 615:
replace
Code:
printf("class %4d: tested...
with
Code:
printf("class %4Lu: tested...
This is an example output on my Pentium-D with 8600GT
Code:
./mfaktc.exe 66362159 1 64
mfaktc v0.01
...
Compiletime Options
  THREADS_PER_GRID    1048576
  THREADS_PER_BLOCK   256
  SIEVE_SIZE_LIMIT    32kiB
  SIEVE_SIZE          230945bits
  SIEVE_PRIMES        250000
  USE_PINNED_MEMORY   enabled
  USE_ASYNC_COPY      enabled
  VERBOSE_TIMING      disabled
  SELFTEST            disabled
  MORE_CLASSES        disabled
tf(66362159, 1, 64);
 k_min = 0
 k_max = 138985412407
sieve_init(): sieving factor candidates with small primes up to 3497867
class    0: tested 54525952 candidates in 9014ms (6049029/sec)
class    4: tested 54525952 candidates in 9014ms (6049029/sec)
...
class   49: tested 54525952 candidates in 9014ms (6049029/sec)
Result[00]: M66362159 has a factor: 6901664537
...
class   61: tested 54525952 candidates in 9014ms (6049029/sec)
Result[00]: M66362159 has a factor: 9157977943
...
class  301: tested 54525952 candidates in 9015ms (6048358/sec)
Result[00]: M66362159 has a factor: 124246422648815633
...
class  417: tested 54525952 candidates in 9014ms (6049029/sec)
found 3 factors for M66362159 with  1 to 64 bits
tf(): total time spent: 891193msec

If you want to spent more time on this: please edit params.h and enable "SELFTEST" and "MORE_CLASSES" (remove // from the defines). It should find one factor per mersenne number (check results.txt after the run).

Last fiddled with by TheJudger on 2010-01-11 at 11:52
TheJudger is offline   Reply With Quote
Old 2010-01-11, 12:30   #51
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default

Hi,
Code:
Compiletime Options
  THREADS_PER_GRID    1048576
  THREADS_PER_BLOCK   256
  SIEVE_SIZE_LIMIT    32kiB
  SIEVE_SIZE          230945bits
  SIEVE_PRIMES        50000
  USE_PINNED_MEMORY   enabled
  USE_ASYNC_COPY      enabled
  VERBOSE_TIMING      disabled
  SELFTEST            disabled
  MORE_CLASSES        disabled
Quote:
Originally Posted by TheJudger View Post
If you want to spent more time on this: please edit params.h and enable "SELFTEST" and "MORE_CLASSES" (remove // from the defines). It should find one factor per mersenne number (check results.txt after the run).
typescript.gz is log.
Code:
$ cat results.txt 
no factor for M66362159 from 2^64 to 2^65 bits
no factor for M66362159 from 2^64 to 2^65 bits
no factor for M66362159 from 2^64 to 2^65 bits
no factor for M66362159 from 2^64 to 2^65 bits
no factor for M66362159 from 2^64 to 2^65 bits
no factor for M66362159 from 2^64 to 2^65 bits
M50804297 has a factor: 180620316395899877719
M50725243 has a factor: 230316474510833959177
M49635893 has a factor: 280164061095680036711
M51332417 has a factor: 297892586972172587537
M51413951 has a factor: 317216341513975685569
M51265327 has a factor: 348552331323478392193
M50787953 has a factor: 408564895570348290031
M51161503 has a factor: 415469688496323219041
M51061601 has a factor: 427900063728254374393
M51082547 has a factor: 465935689349117544521
M51437311 has a factor: 503858403232211768047
M51486859 has a factor: 510284989447684180297
M51408359 has a factor: 522238472503709826367
M51532279 has a factor: 541792563550794873377
M50751637 has a factor: 550221472071174741833
M51302663 has a factor: 603656963178941666303
M51163433 has a factor: 684192107898332819377
M50896831 has a factor: 705640111241611518359
M51375383 has a factor: 713108825973682051703
M51133343 has a factor: 796838010410767671769
M51023447 has a factor: 931398820964215340641
M50863909 has a factor: 959145688648033584641
M50920721 has a factor: 1253793135671017237321
M48630643 has a factor: 1396673413347982098001
M51250613 has a factor: 1412902407482377985447
M51406301 has a factor: 1426645377855974696807
M50893061 has a factor: 1441854080374870808777
M50979079 has a factor: 1443184588520125697329
M51064417 has a factor: 1464103704184177492831
M51293899 has a factor: 1595148557829097879457
M51132959 has a factor: 1609354388906437820393
M51125413 has a factor: 1754609807377017622201
M50781589 has a factor: 1771605458538879435223
M51321659 has a factor: 1782972607557912437543
M49715873 has a factor: 2029034084175690064751
M49915309 has a factor: 2085962683046854861393
M51152869 has a factor: 2105744115640061414321
M50909147 has a factor: 2218183397480493562177
M51340871 has a factor: 2283988614248258513047
M47644171 has a factor: 2357049767161724465927
Attached Files
File Type: gz typescript.gz (3.2 KB, 260 views)
msft is offline   Reply With Quote
Old 2010-01-11, 12:55   #52
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

2·3·5·37 Posts
Default

Thank you!

This was with the modified printf in mfaktc.cu line 615, right?
results.txt and the screen output (typescript.gz) are as expected. :)

I just noticed another bug. Look at results.txt:
Code:
no factor for M66362159 from 2^64 to 2^65 bits
2^64 to 2^65 bits is way too much. ;)
TheJudger is offline   Reply With Quote
Old 2010-01-11, 13:14   #53
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by TheJudger View Post
This was with the modified printf in mfaktc.cu line 615, right?
Right.
msft is offline   Reply With Quote
Old 2010-01-11, 18:20   #54
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·3·7·137 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi Henry,

try this one. I won't be surprised if it doesn't work (libraries versions, ...)
I have mistyped the model name of the 8600, I have a 8600GT here, not a 8600GTS. The GTS is faster.

Oliver
Thanks Oliver,

the binary works fine. However running it makes my pc respond slowly and it becomes almost unusable. Have you any suggestions to cure this? Would increasing the sieve bound to make it cpu bound help? It seems to respond every time it moves onto the next class.
Here is a benchmark which is the same as the first one in #49.
Code:
time ./mfaktc.exe 66362159 64 65
mfaktc v0.01  Copyright (C) 2009, 2010  Oliver Weihe (o.weihe@t-online.de)
This program comes with ABSOLUTELY NO WARRANTY; for details see COPYING.
This is free software, and you are welcome to redistribute it
under certain conditions; see COPYING for details.
Compiletime Options
  THREADS_PER_GRID    1048576
  THREADS_PER_BLOCK   256
  SIEVE_SIZE_LIMIT    32kiB
  SIEVE_SIZE          230945bits
  SIEVE_PRIMES        50000
  USE_PINNED_MEMORY   enabled
  USE_ASYNC_COPY      enabled
  VERBOSE_TIMING      disabled
  SELFTEST            disabled
  MORE_CLASSES        disabled
tf(66362159, 64, 65);
 k_min = 138985412160
 k_max = 277970824814
sieve_init(): sieving factor candidates with small primes up to 611957
class    0: tested 61865984 candidates in 8318ms (7437603/sec)
class    4: tested 61865984 candidates in 8311ms (7443867/sec)
class    9: tested 61865984 candidates in 8311ms (7443867/sec)
class   12: tested 61865984 candidates in 8314ms (7441181/sec)
class   16: tested 61865984 candidates in 8307ms (7447452/sec)
class   21: tested 61865984 candidates in 8315ms (7440286/sec)
class   24: tested 61865984 candidates in 8303ms (7451039/sec)
class   25: tested 61865984 candidates in 8312ms (7442972/sec)
class   37: tested 61865984 candidates in 8311ms (7443867/sec)
class   40: tested 61865984 candidates in 8311ms (7443867/sec)
class   45: tested 61865984 candidates in 8311ms (7443867/sec)
class   49: tested 61865984 candidates in 8312ms (7442972/sec)
class   52: tested 61865984 candidates in 8313ms (7442076/sec)
class   60: tested 61865984 candidates in 8319ms (7436709/sec)
class   61: tested 61865984 candidates in 8316ms (7439392/sec)
class   69: tested 61865984 candidates in 8309ms (7445659/sec)
class   72: tested 61865984 candidates in 8304ms (7450142/sec)
class   76: tested 61865984 candidates in 8309ms (7445659/sec)
class   81: tested 61865984 candidates in 8316ms (7439392/sec)
class   84: tested 61865984 candidates in 8317ms (7438497/sec)
class   96: tested 61865984 candidates in 8314ms (7441181/sec)
class   97: tested 61865984 candidates in 8314ms (7441181/sec)
class  100: tested 61865984 candidates in 8314ms (7441181/sec)
class  105: tested 61865984 candidates in 8317ms (7438497/sec)
class  109: tested 61865984 candidates in 8311ms (7443867/sec)
class  112: tested 61865984 candidates in 8314ms (7441181/sec)
class  117: tested 61865984 candidates in 8318ms (7437603/sec)
class  121: tested 61865984 candidates in 8314ms (7441181/sec)
class  124: tested 61865984 candidates in 8303ms (7451039/sec)
class  129: tested 61865984 candidates in 8308ms (7446555/sec)
class  132: tested 61865984 candidates in 68309ms (905678/sec)
class  136: tested 61865984 candidates in 68353ms (905095/sec)
class  144: tested 61865984 candidates in 8321ms (7434921/sec)
class  145: tested 61865984 candidates in 8317ms (7438497/sec)
class  156: tested 61865984 candidates in 8309ms (7445659/sec)
class  157: tested 61865984 candidates in 8313ms (7442076/sec)
class  160: tested 61865984 candidates in 8315ms (7440286/sec)
class  165: tested 61865984 candidates in 8313ms (7442076/sec)
class  172: tested 61865984 candidates in 8310ms (7444763/sec)
class  177: tested 61865984 candidates in 8315ms (7440286/sec)
class  180: tested 61865984 candidates in 8313ms (7442076/sec)
class  181: tested 61865984 candidates in 8310ms (7444763/sec)
class  184: tested 61865984 candidates in 8316ms (7439392/sec)
class  189: tested 61865984 candidates in 8305ms (7449245/sec)
class  192: tested 61865984 candidates in 8308ms (7446555/sec)
class  196: tested 61865984 candidates in 8316ms (7439392/sec)
class  201: tested 61865984 candidates in 8314ms (7441181/sec)
class  205: tested 61865984 candidates in 8313ms (7442076/sec)
class  216: tested 61865984 candidates in 8308ms (7446555/sec)
class  217: tested 61865984 candidates in 8315ms (7440286/sec)
class  220: tested 61865984 candidates in 8313ms (7442076/sec)
class  229: tested 61865984 candidates in 8307ms (7447452/sec)
class  237: tested 61865984 candidates in 8315ms (7440286/sec)
class  240: tested 61865984 candidates in 8303ms (7451039/sec)
class  241: tested 61865984 candidates in 8311ms (7443867/sec)
class  244: tested 61865984 candidates in 8315ms (7440286/sec)
class  249: tested 61865984 candidates in 8317ms (7438497/sec)
class  252: tested 61865984 candidates in 8311ms (7443867/sec)
class  256: tested 61865984 candidates in 8313ms (7442076/sec)
class  261: tested 61865984 candidates in 8316ms (7439392/sec)
class  264: tested 61865984 candidates in 8307ms (7447452/sec)
class  265: tested 61865984 candidates in 8316ms (7439392/sec)
class  276: tested 61865984 candidates in 8303ms (7451039/sec)
class  277: tested 61865984 candidates in 8314ms (7441181/sec)
class  280: tested 61865984 candidates in 8311ms (7443867/sec)
class  285: tested 61865984 candidates in 8316ms (7439392/sec)
class  289: tested 61865984 candidates in 8317ms (7438497/sec)
class  292: tested 61865984 candidates in 8313ms (7442076/sec)
class  297: tested 61865984 candidates in 8314ms (7441181/sec)
class  300: tested 61865984 candidates in 8314ms (7441181/sec)
class  301: tested 61865984 candidates in 8317ms (7438497/sec)
class  304: tested 61865984 candidates in 8309ms (7445659/sec)
class  312: tested 61865984 candidates in 8317ms (7438497/sec)
class  321: tested 61865984 candidates in 8313ms (7442076/sec)
class  324: tested 61865984 candidates in 8316ms (7439392/sec)
class  325: tested 61865984 candidates in 8315ms (7440286/sec)
class  336: tested 61865984 candidates in 8313ms (7442076/sec)
class  340: tested 61865984 candidates in 8313ms (7442076/sec)
class  345: tested 61865984 candidates in 8316ms (7439392/sec)
class  349: tested 61865984 candidates in 8312ms (7442972/sec)
class  352: tested 61865984 candidates in 8318ms (7437603/sec)
class  357: tested 61865984 candidates in 8314ms (7441181/sec)
class  360: tested 61865984 candidates in 8313ms (7442076/sec)
class  361: tested 61865984 candidates in 8315ms (7440286/sec)
class  364: tested 61865984 candidates in 8317ms (7438497/sec)
class  369: tested 61865984 candidates in 8313ms (7442076/sec)
class  376: tested 61865984 candidates in 8315ms (7440286/sec)
class  381: tested 61865984 candidates in 8315ms (7440286/sec)
class  384: tested 61865984 candidates in 8314ms (7441181/sec)
class  385: tested 61865984 candidates in 8316ms (7439392/sec)
class  396: tested 61865984 candidates in 8313ms (7442076/sec)
class  397: tested 61865984 candidates in 8319ms (7436709/sec)
class  405: tested 61865984 candidates in 8317ms (7438497/sec)
class  409: tested 61865984 candidates in 8310ms (7444763/sec)
class  412: tested 61865984 candidates in 8312ms (7442972/sec)
class  417: tested 61865984 candidates in 8312ms (7442972/sec)
no factor for M66362159 from 2^64 to 2^65 bits
tf(): total time spent: 922393msec

real    15m22.494s
user    13m25.326s
sys    0m0.820s
Whenever I try to use my pc the times suddenly ramp up to 68 seconds per class.

I will now have a go at compiling myself and see how i fare.
henryzz is offline   Reply With Quote
Old 2010-01-11, 18:48   #55
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Cambridge (GMT/BST)

2·3·7·137 Posts
Default

I just compiled successfully after changing the cuda directory in the script.
The old version of the script runs at 2/3rds the speed of the one with the hack which is the same as your compilation. I will now try with different sieve bounds.
henryzz is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1657 2020-10-27 01:23
The P-1 factoring CUDA program firejuggler GPU Computing 752 2020-09-08 16:15
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 03:20.

Sat Dec 5 03:20:08 UTC 2020 up 1 day, 23:31, 0 users, load averages: 1.68, 1.74, 1.63

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.