mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Twin Prime Search

Reply
 
Thread Tools
Old 2010-09-04, 21:37   #1
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5·79 Posts
Default TPSieve CUDA Testing Thread

You asked for it, and I've finally made it. Download TPSieve-CUDA here.

I haven't done extensive testing on twin primes, so probably somebody should go over a short range (100G? Perhaps 1T?) with TPSieve-CUDA to make sure it gets the same factors.

I hope it works well for everyone!
Ken_g6 is offline   Reply With Quote
Old 2010-09-04, 22:18   #2
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Could you please specify an example to run? I see it needed cudart static linking - that file comes with cuda sdk. Was it compiled with cuda toolkit 3.1 ?
Karl M Johnson is offline   Reply With Quote
Old 2010-09-04, 22:25   #3
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5×79 Posts
Default

OK. Supposing you downloaded the 480000-484999_30aug2010.txt sieve file, if you run:

./tpsieve-cuda-x86_64-linux -i 480000-484999_30aug2010.txt -p 710005180000000 -P 710005200000000

It should output:
710005185071411 | 5012115*2^481782+1
710005192340203 | 4018161*2^483419-1

very quickly. (I tested this on the emulator, so it runs really slow for me!) Expand the range, and you should get more of Mdettweiler's results.

./tpsieve-cuda-x86_64-linux -i 480000-484999_30aug2010.txt -p 710T -P 715T

would produce all of them, for instance.

Edit: Compiled with the 2.3 toolkit. One place to get the appropriate libcudart.so would be here or here.

Last fiddled with by Ken_g6 on 2010-09-04 at 22:48
Ken_g6 is offline   Reply With Quote
Old 2010-09-05, 06:50   #4
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

1100110112 Posts
Default


Where do I get that fancy file?
I dont need that libcudart.so file, I'm on windows.
Karl M Johnson is offline   Reply With Quote
Old 2010-09-05, 08:47   #5
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

Ah, I got it working.
Found that fancy file from this thread
Here's the output:
Code:
tpsieve-cuda>tpsieve-cuda-x86-windows.exe -i 480000-484999_30aug2010.txt -p 710T -P 715T
tpsieve version cuda-0.1.5b (testing)
Found K's from 3 to 9999999.
Found N's from 480000 to 484999.
nstart=480000, nstep=27, gpu_nstep=27
Read 18013513 terms from NewPGen format input file `480000-484999_30aug2010.txt'
ppsieve initialized: 3 <= k <= 9999999, 480000 <= n <= 484999
Sieve started: 710000000000000 <= p < 715000000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 285
Detected compute capability: 1.3
Detected 30 multiprocessors.
710001064441429 | 1473435*2^480477+1
710001781836203 | 3090555*2^482969+1
710002017069043 | 1947711*2^484889-1
710002639870109 | 7153191*2^483771+1
710003699276149 | 5489211*2^481645-1
710004831474721 | 9156609*2^482469+1
710005185071411 | 5012115*2^481782+1
710005192340203 | 4018161*2^483419-1
710005390472317 | 3240861*2^484861+1
710005916032213 | 5469669*2^482131+1
710006212449883 | 9438471*2^480253+1
710006478541837 | 942801*2^484681-1
p=710007273971713, 121.2M p/sec, 0.34 CPU cores, 0.1% done. ETA 05 Sep 23:11
710007380861971 | 3067731*2^482247-1
710007392845019 | 7483995*2^483443-1
710007480582299 | 1724049*2^480073-1
710008202353481 | 5813421*2^481371-1
710008811001043 | 9322383*2^480292-1
710008912579171 | 6024705*2^482149-1
710009562402587 | 5037609*2^482129-1
710010162887723 | 6614673*2^481762+1
710010987465557 | 6749691*2^483663+1
710011016356171 | 1349535*2^480408-1
710011368918931 | 7281273*2^482722-1
710011521417881 | 8617299*2^483945+1
710013019046899 | 3562503*2^481238-1
710013536554247 | 2683773*2^482840-1
p=710013762297857, 108.1M p/sec, 0.46 CPU cores, 0.3% done. ETA 05 Sep 23:50
710013880633081 | 4357815*2^480333+1
710013961546411 | 6488649*2^484015+1
710014319798129 | 1676877*2^480670-1
710014611723727 | 3195591*2^483289+1
710015165703751 | 1844445*2^483863+1
710016591664817 | 2155857*2^482100+1
710017445315627 | 9930375*2^480732-1
710017473222427 | 8642289*2^480555+1
710018153777579 | 5008965*2^484938-1
710018465445529 | 9185721*2^480167+1
p=710019807338497, 100.7M p/sec, 0.50 CPU cores, 0.4% done. ETA 06 Sep 00:21
710020260919457 | 1584663*2^483746-1
Now, will compiling x64 win binaries cause trouble?
I've had a "out of memory" error, even though I had like 1GB out of 4 free.


P.S.
It's not using GPU completely.
Peak GPU usage is reported 40%.
But I guess you already know that ?

Last fiddled with by Karl M Johnson on 2010-09-05 at 08:52
Karl M Johnson is offline   Reply With Quote
Old 2010-09-05, 09:16   #6
amphoria
 
amphoria's Avatar
 
"Dave"
Sep 2005
UK

23·347 Posts
Default

I tried on a GTX465 on 64-bit linux using a range I had already tested so that I could compare the results. However I didn't get very far before getting an error.

Quote:
./tpsieve-cuda-x86_64-linux -i 480000-484999_19jun2010.txt -p 510T -P 515T
tpsieve version cuda-0.1.5b (testing)
Compiled Sep 4 2010 with GCC 4.3.3
Found K's from 3 to 9999999.
Found N's from 480000 to 484999.
nstart=480000, nstep=26, gpu_nstep=26
Read 18977477 terms from NewPGen format input file `480000-484999_19jun2010.txt'
ppsieve initialized: 3 <= k <= 9999999, 480000 <= n <= 484999
Sieve started: 510000000000000 <= p < 515000000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 465
Detected compute capability: 2.0
Detected 11 multiprocessors.
510000064759291 | 604839*2^481707-1
510000994356869 | 2198475*2^482446+1
510001808585051 | 6049827*2^482948+1
510001965458981 | 9867039*2^480087-1
510002179900517 | 3334131*2^481253+1
510002930897567 | 8814495*2^481041+1
510003018137897 | 7665489*2^480401-1
510003129240001 | 4959981*2^480291+1
510003356427241 | 2391561*2^483615-1
510003644411923 | 7580307*2^484486-1
510003728553343 | 8313309*2^482255-1
510003886955161 | 3607413*2^482256-1
510004210312339 | 5073345*2^483515-1
Cuda error: cudaStreamCreate: out of memory
amphoria is offline   Reply With Quote
Old 2010-09-05, 09:27   #7
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3×137 Posts
Default

Amorphia, that's exactly the same error I had on x86 windows.
Here it pops again:
Code:
tpsieve-cuda-x86-windows.exe -i 480000-484999_30aug2010.txt -p 900T -P 901T
tpsieve version cuda-0.1.5b (testing)
Found K's from 3 to 9999999.
Found N's from 480000 to 484999.
nstart=480000, nstep=27, gpu_nstep=27
Read 18013513 terms from NewPGen format input file `480000-484999_30aug2010.txt'
ppsieve initialized: 3 <= k <= 9999999, 480000 <= n <= 484999
Sieve started: 900000000000000 <= p < 901000000000000
Thread 0 starting
Detected GPU 0: GeForce GTX 285
Detected compute capability: 1.3
Detected 30 multiprocessors.
900000899028509 | 3182751*2^483513-1
900001860603749 | 9998469*2^481563+1
900001934059139 | 1853133*2^482022-1
900002540407273 | 8064075*2^482811+1
900002726446853 | 5749455*2^480565-1
900003355059173 | 3695019*2^484373-1
900003556591063 | 9754467*2^480376-1
900003917464219 | 7522179*2^481393-1
900004723972547 | 5306133*2^484306+1
900005287423111 | 6879159*2^482887-1
900007745466833 | 451935*2^481504+1
900009608245457 | 5425383*2^480786+1
p=900010489954305, 87.42M p/sec, 0.50 CPU cores, 1.0% done. ETA 05 Sep 15:17
900010638873601 | 378417*2^481830-1
900011291258897 | 6507645*2^482813+1
900011626245037 | 1340685*2^481238+1
900012104179271 | 645705*2^484085-1
900016125631741 | 2968161*2^483961+1
900016501038581 | 8124711*2^484951+1
900016817068751 | 75363*2^484216+1
900017662186813 | 525711*2^480789+1
900018252281867 | 6892521*2^484727-1
900020059012663 | 8598285*2^481068+1
900021150322181 | 4615461*2^482939+1
900021336561331 | 8746389*2^484435+1
900021361527311 | 3408945*2^482966-1
p=900021998075905, 95.90M p/sec, 0.54 CPU cores, 2.2% done. ETA 05 Sep 15:08
900022619382521 | 6958245*2^482800-1
900022833913493 | 6580995*2^483721+1
900022917366103 | 4560555*2^482723-1
900023322448907 | 1472211*2^480431-1
900024288211007 | 4808679*2^480371+1
900026935242913 | 3056079*2^482407-1
900028117404131 | 5600343*2^481214-1
900029413721059 | 815793*2^483818-1
900029829750299 | 4354917*2^483802-1
900030812047093 | 5639913*2^483592+1
p=900033017561089, 91.82M p/sec, 0.53 CPU cores, 3.3% done. ETA 05 Sep 15:08
900033789053611 | 7495449*2^481653-1
900034220560883 | 9094419*2^483205-1
900034570657763 | 9890505*2^480175-1
900035606160989 | 5867385*2^481157-1
900037077781057 | 8390829*2^481741+1
900037229605601 | 1863285*2^484553-1
900038990324497 | 3815157*2^482054+1
900040739108881 | 3513243*2^482350+1
900041542191221 | 6049533*2^482774-1
900042730035877 | 9304977*2^481916+1
900043309201403 | 136581*2^482397+1
p=900044056969217, 91.99M p/sec, 0.55 CPU cores, 4.4% done. ETA 05 Sep 15:08
900044321638183 | 8388129*2^484645-1
900044489973593 | 8240649*2^483659+1
900044550938063 | 7226823*2^484696-1
900045358508729 | 7763775*2^483076-1
900047216136989 | 2338305*2^482753-1
900047780897267 | 4008369*2^483695+1
900048470300299 | 2963115*2^481453-1
900048762025013 | 383355*2^480270-1
900049228276043 | 8622855*2^483971-1
900049467999349 | 660627*2^481816-1
900049796295679 | 2937537*2^483980-1
900052042582919 | 385575*2^484714+1
900052572323899 | 7711221*2^484603+1
900053267475361 | 7173609*2^483949+1
900053714040401 | 633879*2^480079+1
900053996550817 | 6894867*2^480856-1
p=900055633248257, 96.46M p/sec, 0.54 CPU cores, 5.6% done. ETA 05 Sep 15:06
900055866972487 | 6849789*2^483481-1
900056741014807 | 2245995*2^482732-1
900056770768759 | 814365*2^482000-1
900057523274303 | 642045*2^480196+1
900057941699027 | 3370071*2^480999+1
900058480102739 | 9883737*2^484374+1
900060511991023 | 8680035*2^484611-1
900060730024969 | 7366341*2^482195+1
900060738679177 | 1099155*2^483395-1
900063136569923 | 6597225*2^483763-1
900063669798383 | 5873829*2^481137-1
900064551341591 | 9219153*2^483872+1
900064734779653 | 7558803*2^483916-1
900065290605601 | 7338225*2^482126-1
900065587257671 | 7356405*2^481242-1
900065728724587 | 8091525*2^484942-1
p=900067529342977, 99.13M p/sec, 0.51 CPU cores, 6.8% done. ETA 05 Sep 15:04
900067553916287 | 8588259*2^483407+1
900068224207921 | 5907333*2^480414-1
900068309721587 | 4858185*2^483053+1
900069742623089 | 7249299*2^483067-1
900071614223911 | 974289*2^484133-1
900072154118867 | 3615069*2^480585-1
900072931824211 | 6749313*2^480668-1
900073013900513 | 1479111*2^482079-1
900073241850151 | 3667035*2^484867-1
900075811775299 | 5091681*2^482559+1
900076383783517 | 6995187*2^481406-1
p=900079152807937, 96.86M p/sec, 0.52 CPU cores, 7.9% done. ETA 05 Sep 15:03
900079180930459 | 8088465*2^482743+1
900080177837117 | 9137745*2^481706+1
900081068828399 | 116547*2^480962+1
900082664855509 | 4331577*2^481696-1
900084606014311 | 7923375*2^480228+1
900084897625079 | 7498953*2^482154+1
900085127281819 | 3059145*2^480229-1
900086877470243 | 2313279*2^483107+1
900087308304337 | 5166585*2^482543-1
p=900090529857537, 94.80M p/sec, 0.52 CPU cores, 9.1% done. ETA 05 Sep 15:03
900090831293629 | 9965295*2^481322-1
900091902233021 | 9990753*2^481446-1
900095462990077 | 9537003*2^480320+1
900096420949717 | 8525847*2^481988-1
900096832377143 | 1048245*2^483598+1
900096929852677 | 2153943*2^481358-1
900098509267721 | 7751367*2^481256+1
900099157340237 | 9244893*2^480360-1
900099669905143 | 9687819*2^484633+1
900101450465951 | 1940013*2^484300+1
p=900101851332609, 94.34M p/sec, 0.52 CPU cores, 10.2% done. ETA 05 Sep 15:03
900102028546621 | 2525739*2^482461+1
900102230642357 | 9699093*2^482344+1
900102319841591 | 8400777*2^481706-1
900102426091157 | 3881955*2^483157-1
900102488675867 | 337989*2^481711+1
900102580103633 | 9216783*2^482100+1
900102741563621 | 2272611*2^480277+1
900103553433571 | 7722345*2^483866-1
900104117029049 | 505821*2^480631-1
900105270926371 | 8850651*2^483739-1
900105302568581 | 7921695*2^482577+1
900106241542903 | 5146383*2^482750-1
900107926468921 | 5710305*2^481576-1
900110050114909 | 8376111*2^480199+1
900110665560263 | 7689909*2^483029-1
p=900112516399105, 88.77M p/sec, 0.52 CPU cores, 11.3% done. ETA 05 Sep 15:04
900113522958017 | 4037511*2^483349+1
900113818670537 | 2881989*2^483381+1
900114293440121 | 9168045*2^484941-1
900114895651987 | 1452225*2^484402+1
900116209588091 | 9696105*2^483814-1
900119156145683 | 1042815*2^481830+1
900120506278387 | 7095909*2^480787+1
900120924004501 | 8100075*2^480374-1
900121363886917 | 7647465*2^481616-1
900122553451477 | 5630019*2^482963+1
900122578082093 | 4954053*2^480218-1
900122941358243 | 6183075*2^482076+1
p=900123264303105, 89.57M p/sec, 0.55 CPU cores, 12.3% done. ETA 05 Sep 15:05
900124366202023 | 5674563*2^484186-1
900126758233421 | 757953*2^481946+1
900127178299367 | 2843865*2^484173-1
900128009292293 | 5166717*2^484434-1
900128886707417 | 4666179*2^484269+1
900129038645509 | 4965093*2^481026-1
900129732119477 | 8266305*2^480961-1
900131300857937 | 9063171*2^481141-1
900131738429219 | 7244415*2^480666+1
900131757667309 | 9087471*2^481013-1
900132376847051 | 8236677*2^480080+1
900133053419231 | 4099683*2^480928-1
p=900134169493505, 90.88M p/sec, 0.57 CPU cores, 13.4% done. ETA 05 Sep 15:05
900136707397699 | 4107933*2^480032+1
900138484885813 | 9160035*2^484905-1
900139177590013 | 8492325*2^481494+1
900141735781361 | 2119167*2^480722+1
900141821615489 | 5879169*2^480907+1
900143923881347 | 8023179*2^481613+1
900144031193809 | 1315365*2^482686+1
p=900144518938625, 86.23M p/sec, 0.60 CPU cores, 14.5% done. ETA 05 Sep 15:06
900146872538153 | 6797847*2^483152+1
900146934657221 | 31053*2^482810-1
900148138109243 | 6732345*2^484116-1
900149419577411 | 7471065*2^483931+1
900150206766011 | 4495635*2^480957-1
900152425932013 | 2111517*2^480778+1
900152581520117 | 8415135*2^481699-1
900153500000561 | 4769493*2^483890-1
900153813027347 | 4079283*2^481164+1
p=900154149060609, 80.25M p/sec, 0.58 CPU cores, 15.4% done. ETA 05 Sep 15:08
900155149794317 | 9979257*2^481892+1
900155755904123 | 2521533*2^483994+1
900158445390817 | 3743577*2^480768-1
900158816255227 | 6612759*2^481673-1
900159262356971 | 7420557*2^482234-1
900159600424079 | 7586655*2^481304+1
900159986271683 | 943605*2^481215+1
Cuda error: cudaStreamCreate: out of memory

tpsieve-cuda>pause
Press any key to continue . . .

Last fiddled with by Karl M Johnson on 2010-09-05 at 09:37 Reason: Yes
Karl M Johnson is offline   Reply With Quote
Old 2010-09-05, 15:45   #8
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

18B16 Posts
Default

I get the feeling I have a severe memory leak on the GPU that I didn't know I had. Someone helped me with the stream synchronization code, and it worked, but I'm starting to suspect that each event and stream that is created also has to be destroyed. I'll fix it in the next release.
Ken_g6 is offline   Reply With Quote
Old 2010-09-06, 17:20   #9
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

1100010112 Posts
Default

v0.1.6, of both PPSieve and TPSieve, is released. Many changes and fixes are included.

- Faster on the GPU than 0.1.5b (though about the same as 0.1.5c)
- Uses less CPU
- A huge memory leak on the GPU should be fixed.
- Input files are more often read correctly.
- Many other bugfixes and tweaks.

Get it at the usual URL, in the first post.

Edit: P.S. I've forgotten to post the source location!

Last fiddled with by Ken_g6 on 2010-09-06 at 17:24
Ken_g6 is offline   Reply With Quote
Old 2010-09-07, 17:24   #10
amphoria
 
amphoria's Avatar
 
"Dave"
Sep 2005
UK

23×347 Posts
Default

Quote:
Originally Posted by Ken_g6 View Post
v0.1.6, of both PPSieve and TPSieve, is released. Many changes and fixes are included.
I have completed sieving 510-515T and the factors match those I previously found. I got 138M p/sec on a GTX465 using 0.41 CPU on a single core of a Core i7@3.6GHz. As the single core was not maxed out I decided to try running 2 instances on a single core (the other 3 cores were running instances of LLR). With 2 instances I got a combined throughput of 210M p/sec with 0.68 CPU used. This would suggest that the GTX465 wasn't maxed out either with a single instance.
amphoria is offline   Reply With Quote
Old 2010-09-07, 18:28   #11
Ken_g6
 
Ken_g6's Avatar
 
Jan 2005
Caught in a sieve

5·79 Posts
Default

Quote:
Originally Posted by amphoria View Post
I have completed sieving 510-515T and the factors match those I previously found.
Good!

Quote:
Originally Posted by amphoria View Post
With 2 instances I got a combined throughput of 210M p/sec with 0.68 CPU used. This would suggest that the GTX465 wasn't maxed out either with a single instance.
Interesting! Try fiddling with the -m option (probably going up from 8 in increments of 1), and see if you can make a single instance do any better.

Last fiddled with by Ken_g6 on 2010-09-07 at 18:30 Reason: Wrong start for -m
Ken_g6 is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Fast Mersenne Testing on the GPU using CUDA Andrew Thall GPU Computing 109 2014-07-28 22:14
Inconsistent factors with TPSieve Caldera Twin Prime Search 7 2013-01-05 18:32
tpsieve-cuda slows down with increasing p amphoria Twin Prime Search 0 2011-07-23 10:52
Is TPSieve-0.2.1 faster than Newpgen? cipher Twin Prime Search 4 2009-05-18 18:36
Thread for non-PrimeNet LL testing ThomRuley Lone Mersenne Hunters 6 2005-10-16 20:11

All times are UTC. The time now is 00:27.


Tue Nov 30 00:27:04 UTC 2021 up 129 days, 18:56, 0 users, load averages: 1.14, 1.23, 1.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.