![]() |
![]() |
#1 |
Jan 2005
Caught in a sieve
5×79 Posts |
![]()
You asked for it, and I've finally made it. Download TPSieve-CUDA here.
![]() I haven't done extensive testing on twin primes, so probably somebody should go over a short range (100G? Perhaps 1T?) with TPSieve-CUDA to make sure it gets the same factors. I hope it works well for everyone! |
![]() |
![]() |
![]() |
#2 |
Mar 2010
1100110112 Posts |
![]()
Could you please specify an example to run? I see it needed cudart static linking - that file comes with cuda sdk. Was it compiled with cuda toolkit 3.1 ?
|
![]() |
![]() |
![]() |
#3 |
Jan 2005
Caught in a sieve
5·79 Posts |
![]()
OK. Supposing you downloaded the 480000-484999_30aug2010.txt sieve file, if you run:
./tpsieve-cuda-x86_64-linux -i 480000-484999_30aug2010.txt -p 710005180000000 -P 710005200000000 It should output: 710005185071411 | 5012115*2^481782+1 710005192340203 | 4018161*2^483419-1 very quickly. (I tested this on the emulator, so it runs really slow for me!) Expand the range, and you should get more of Mdettweiler's results. ./tpsieve-cuda-x86_64-linux -i 480000-484999_30aug2010.txt -p 710T -P 715T would produce all of them, for instance. Edit: Compiled with the 2.3 toolkit. One place to get the appropriate libcudart.so would be here or here. Last fiddled with by Ken_g6 on 2010-09-04 at 22:48 |
![]() |
![]() |
![]() |
#4 |
Mar 2010
3×137 Posts |
![]() ![]() Where do I get that fancy file? I dont need that libcudart.so file, I'm on windows. |
![]() |
![]() |
![]() |
#5 |
Mar 2010
3×137 Posts |
![]()
Ah, I got it working.
Found that fancy file from this thread Here's the output: Code:
tpsieve-cuda>tpsieve-cuda-x86-windows.exe -i 480000-484999_30aug2010.txt -p 710T -P 715T tpsieve version cuda-0.1.5b (testing) Found K's from 3 to 9999999. Found N's from 480000 to 484999. nstart=480000, nstep=27, gpu_nstep=27 Read 18013513 terms from NewPGen format input file `480000-484999_30aug2010.txt' ppsieve initialized: 3 <= k <= 9999999, 480000 <= n <= 484999 Sieve started: 710000000000000 <= p < 715000000000000 Thread 0 starting Detected GPU 0: GeForce GTX 285 Detected compute capability: 1.3 Detected 30 multiprocessors. 710001064441429 | 1473435*2^480477+1 710001781836203 | 3090555*2^482969+1 710002017069043 | 1947711*2^484889-1 710002639870109 | 7153191*2^483771+1 710003699276149 | 5489211*2^481645-1 710004831474721 | 9156609*2^482469+1 710005185071411 | 5012115*2^481782+1 710005192340203 | 4018161*2^483419-1 710005390472317 | 3240861*2^484861+1 710005916032213 | 5469669*2^482131+1 710006212449883 | 9438471*2^480253+1 710006478541837 | 942801*2^484681-1 p=710007273971713, 121.2M p/sec, 0.34 CPU cores, 0.1% done. ETA 05 Sep 23:11 710007380861971 | 3067731*2^482247-1 710007392845019 | 7483995*2^483443-1 710007480582299 | 1724049*2^480073-1 710008202353481 | 5813421*2^481371-1 710008811001043 | 9322383*2^480292-1 710008912579171 | 6024705*2^482149-1 710009562402587 | 5037609*2^482129-1 710010162887723 | 6614673*2^481762+1 710010987465557 | 6749691*2^483663+1 710011016356171 | 1349535*2^480408-1 710011368918931 | 7281273*2^482722-1 710011521417881 | 8617299*2^483945+1 710013019046899 | 3562503*2^481238-1 710013536554247 | 2683773*2^482840-1 p=710013762297857, 108.1M p/sec, 0.46 CPU cores, 0.3% done. ETA 05 Sep 23:50 710013880633081 | 4357815*2^480333+1 710013961546411 | 6488649*2^484015+1 710014319798129 | 1676877*2^480670-1 710014611723727 | 3195591*2^483289+1 710015165703751 | 1844445*2^483863+1 710016591664817 | 2155857*2^482100+1 710017445315627 | 9930375*2^480732-1 710017473222427 | 8642289*2^480555+1 710018153777579 | 5008965*2^484938-1 710018465445529 | 9185721*2^480167+1 p=710019807338497, 100.7M p/sec, 0.50 CPU cores, 0.4% done. ETA 06 Sep 00:21 710020260919457 | 1584663*2^483746-1 I've had a "out of memory" error, even though I had like 1GB out of 4 free. P.S. It's not using GPU completely. Peak GPU usage is reported 40%. But I guess you already know that ? Last fiddled with by Karl M Johnson on 2010-09-05 at 08:52 |
![]() |
![]() |
![]() |
#6 | |
"Dave"
Sep 2005
UK
277610 Posts |
![]()
I tried on a GTX465 on 64-bit linux using a range I had already tested so that I could compare the results. However I didn't get very far before getting an error.
Quote:
|
|
![]() |
![]() |
![]() |
#7 |
Mar 2010
1100110112 Posts |
![]()
Amorphia, that's exactly the same error I had on x86 windows.
Here it pops again: Code:
tpsieve-cuda-x86-windows.exe -i 480000-484999_30aug2010.txt -p 900T -P 901T tpsieve version cuda-0.1.5b (testing) Found K's from 3 to 9999999. Found N's from 480000 to 484999. nstart=480000, nstep=27, gpu_nstep=27 Read 18013513 terms from NewPGen format input file `480000-484999_30aug2010.txt' ppsieve initialized: 3 <= k <= 9999999, 480000 <= n <= 484999 Sieve started: 900000000000000 <= p < 901000000000000 Thread 0 starting Detected GPU 0: GeForce GTX 285 Detected compute capability: 1.3 Detected 30 multiprocessors. 900000899028509 | 3182751*2^483513-1 900001860603749 | 9998469*2^481563+1 900001934059139 | 1853133*2^482022-1 900002540407273 | 8064075*2^482811+1 900002726446853 | 5749455*2^480565-1 900003355059173 | 3695019*2^484373-1 900003556591063 | 9754467*2^480376-1 900003917464219 | 7522179*2^481393-1 900004723972547 | 5306133*2^484306+1 900005287423111 | 6879159*2^482887-1 900007745466833 | 451935*2^481504+1 900009608245457 | 5425383*2^480786+1 p=900010489954305, 87.42M p/sec, 0.50 CPU cores, 1.0% done. ETA 05 Sep 15:17 900010638873601 | 378417*2^481830-1 900011291258897 | 6507645*2^482813+1 900011626245037 | 1340685*2^481238+1 900012104179271 | 645705*2^484085-1 900016125631741 | 2968161*2^483961+1 900016501038581 | 8124711*2^484951+1 900016817068751 | 75363*2^484216+1 900017662186813 | 525711*2^480789+1 900018252281867 | 6892521*2^484727-1 900020059012663 | 8598285*2^481068+1 900021150322181 | 4615461*2^482939+1 900021336561331 | 8746389*2^484435+1 900021361527311 | 3408945*2^482966-1 p=900021998075905, 95.90M p/sec, 0.54 CPU cores, 2.2% done. ETA 05 Sep 15:08 900022619382521 | 6958245*2^482800-1 900022833913493 | 6580995*2^483721+1 900022917366103 | 4560555*2^482723-1 900023322448907 | 1472211*2^480431-1 900024288211007 | 4808679*2^480371+1 900026935242913 | 3056079*2^482407-1 900028117404131 | 5600343*2^481214-1 900029413721059 | 815793*2^483818-1 900029829750299 | 4354917*2^483802-1 900030812047093 | 5639913*2^483592+1 p=900033017561089, 91.82M p/sec, 0.53 CPU cores, 3.3% done. ETA 05 Sep 15:08 900033789053611 | 7495449*2^481653-1 900034220560883 | 9094419*2^483205-1 900034570657763 | 9890505*2^480175-1 900035606160989 | 5867385*2^481157-1 900037077781057 | 8390829*2^481741+1 900037229605601 | 1863285*2^484553-1 900038990324497 | 3815157*2^482054+1 900040739108881 | 3513243*2^482350+1 900041542191221 | 6049533*2^482774-1 900042730035877 | 9304977*2^481916+1 900043309201403 | 136581*2^482397+1 p=900044056969217, 91.99M p/sec, 0.55 CPU cores, 4.4% done. ETA 05 Sep 15:08 900044321638183 | 8388129*2^484645-1 900044489973593 | 8240649*2^483659+1 900044550938063 | 7226823*2^484696-1 900045358508729 | 7763775*2^483076-1 900047216136989 | 2338305*2^482753-1 900047780897267 | 4008369*2^483695+1 900048470300299 | 2963115*2^481453-1 900048762025013 | 383355*2^480270-1 900049228276043 | 8622855*2^483971-1 900049467999349 | 660627*2^481816-1 900049796295679 | 2937537*2^483980-1 900052042582919 | 385575*2^484714+1 900052572323899 | 7711221*2^484603+1 900053267475361 | 7173609*2^483949+1 900053714040401 | 633879*2^480079+1 900053996550817 | 6894867*2^480856-1 p=900055633248257, 96.46M p/sec, 0.54 CPU cores, 5.6% done. ETA 05 Sep 15:06 900055866972487 | 6849789*2^483481-1 900056741014807 | 2245995*2^482732-1 900056770768759 | 814365*2^482000-1 900057523274303 | 642045*2^480196+1 900057941699027 | 3370071*2^480999+1 900058480102739 | 9883737*2^484374+1 900060511991023 | 8680035*2^484611-1 900060730024969 | 7366341*2^482195+1 900060738679177 | 1099155*2^483395-1 900063136569923 | 6597225*2^483763-1 900063669798383 | 5873829*2^481137-1 900064551341591 | 9219153*2^483872+1 900064734779653 | 7558803*2^483916-1 900065290605601 | 7338225*2^482126-1 900065587257671 | 7356405*2^481242-1 900065728724587 | 8091525*2^484942-1 p=900067529342977, 99.13M p/sec, 0.51 CPU cores, 6.8% done. ETA 05 Sep 15:04 900067553916287 | 8588259*2^483407+1 900068224207921 | 5907333*2^480414-1 900068309721587 | 4858185*2^483053+1 900069742623089 | 7249299*2^483067-1 900071614223911 | 974289*2^484133-1 900072154118867 | 3615069*2^480585-1 900072931824211 | 6749313*2^480668-1 900073013900513 | 1479111*2^482079-1 900073241850151 | 3667035*2^484867-1 900075811775299 | 5091681*2^482559+1 900076383783517 | 6995187*2^481406-1 p=900079152807937, 96.86M p/sec, 0.52 CPU cores, 7.9% done. ETA 05 Sep 15:03 900079180930459 | 8088465*2^482743+1 900080177837117 | 9137745*2^481706+1 900081068828399 | 116547*2^480962+1 900082664855509 | 4331577*2^481696-1 900084606014311 | 7923375*2^480228+1 900084897625079 | 7498953*2^482154+1 900085127281819 | 3059145*2^480229-1 900086877470243 | 2313279*2^483107+1 900087308304337 | 5166585*2^482543-1 p=900090529857537, 94.80M p/sec, 0.52 CPU cores, 9.1% done. ETA 05 Sep 15:03 900090831293629 | 9965295*2^481322-1 900091902233021 | 9990753*2^481446-1 900095462990077 | 9537003*2^480320+1 900096420949717 | 8525847*2^481988-1 900096832377143 | 1048245*2^483598+1 900096929852677 | 2153943*2^481358-1 900098509267721 | 7751367*2^481256+1 900099157340237 | 9244893*2^480360-1 900099669905143 | 9687819*2^484633+1 900101450465951 | 1940013*2^484300+1 p=900101851332609, 94.34M p/sec, 0.52 CPU cores, 10.2% done. ETA 05 Sep 15:03 900102028546621 | 2525739*2^482461+1 900102230642357 | 9699093*2^482344+1 900102319841591 | 8400777*2^481706-1 900102426091157 | 3881955*2^483157-1 900102488675867 | 337989*2^481711+1 900102580103633 | 9216783*2^482100+1 900102741563621 | 2272611*2^480277+1 900103553433571 | 7722345*2^483866-1 900104117029049 | 505821*2^480631-1 900105270926371 | 8850651*2^483739-1 900105302568581 | 7921695*2^482577+1 900106241542903 | 5146383*2^482750-1 900107926468921 | 5710305*2^481576-1 900110050114909 | 8376111*2^480199+1 900110665560263 | 7689909*2^483029-1 p=900112516399105, 88.77M p/sec, 0.52 CPU cores, 11.3% done. ETA 05 Sep 15:04 900113522958017 | 4037511*2^483349+1 900113818670537 | 2881989*2^483381+1 900114293440121 | 9168045*2^484941-1 900114895651987 | 1452225*2^484402+1 900116209588091 | 9696105*2^483814-1 900119156145683 | 1042815*2^481830+1 900120506278387 | 7095909*2^480787+1 900120924004501 | 8100075*2^480374-1 900121363886917 | 7647465*2^481616-1 900122553451477 | 5630019*2^482963+1 900122578082093 | 4954053*2^480218-1 900122941358243 | 6183075*2^482076+1 p=900123264303105, 89.57M p/sec, 0.55 CPU cores, 12.3% done. ETA 05 Sep 15:05 900124366202023 | 5674563*2^484186-1 900126758233421 | 757953*2^481946+1 900127178299367 | 2843865*2^484173-1 900128009292293 | 5166717*2^484434-1 900128886707417 | 4666179*2^484269+1 900129038645509 | 4965093*2^481026-1 900129732119477 | 8266305*2^480961-1 900131300857937 | 9063171*2^481141-1 900131738429219 | 7244415*2^480666+1 900131757667309 | 9087471*2^481013-1 900132376847051 | 8236677*2^480080+1 900133053419231 | 4099683*2^480928-1 p=900134169493505, 90.88M p/sec, 0.57 CPU cores, 13.4% done. ETA 05 Sep 15:05 900136707397699 | 4107933*2^480032+1 900138484885813 | 9160035*2^484905-1 900139177590013 | 8492325*2^481494+1 900141735781361 | 2119167*2^480722+1 900141821615489 | 5879169*2^480907+1 900143923881347 | 8023179*2^481613+1 900144031193809 | 1315365*2^482686+1 p=900144518938625, 86.23M p/sec, 0.60 CPU cores, 14.5% done. ETA 05 Sep 15:06 900146872538153 | 6797847*2^483152+1 900146934657221 | 31053*2^482810-1 900148138109243 | 6732345*2^484116-1 900149419577411 | 7471065*2^483931+1 900150206766011 | 4495635*2^480957-1 900152425932013 | 2111517*2^480778+1 900152581520117 | 8415135*2^481699-1 900153500000561 | 4769493*2^483890-1 900153813027347 | 4079283*2^481164+1 p=900154149060609, 80.25M p/sec, 0.58 CPU cores, 15.4% done. ETA 05 Sep 15:08 900155149794317 | 9979257*2^481892+1 900155755904123 | 2521533*2^483994+1 900158445390817 | 3743577*2^480768-1 900158816255227 | 6612759*2^481673-1 900159262356971 | 7420557*2^482234-1 900159600424079 | 7586655*2^481304+1 900159986271683 | 943605*2^481215+1 Cuda error: cudaStreamCreate: out of memory tpsieve-cuda>pause Press any key to continue . . . Last fiddled with by Karl M Johnson on 2010-09-05 at 09:37 Reason: Yes |
![]() |
![]() |
![]() |
#8 |
Jan 2005
Caught in a sieve
5×79 Posts |
![]()
I get the feeling I have a severe memory leak on the GPU that I didn't know I had. Someone helped me with the stream synchronization code, and it worked, but I'm starting to suspect that each event and stream that is created also has to be destroyed. I'll fix it in the next release.
|
![]() |
![]() |
![]() |
#9 |
Jan 2005
Caught in a sieve
6138 Posts |
![]()
v0.1.6, of both PPSieve and TPSieve, is released. Many changes and fixes are included.
- Faster on the GPU than 0.1.5b (though about the same as 0.1.5c) - Uses less CPU - A huge memory leak on the GPU should be fixed. - Input files are more often read correctly. - Many other bugfixes and tweaks. Get it at the usual URL, in the first post. Edit: P.S. I've forgotten to post the source location! Last fiddled with by Ken_g6 on 2010-09-06 at 17:24 |
![]() |
![]() |
![]() |
#10 |
"Dave"
Sep 2005
UK
AD816 Posts |
![]()
I have completed sieving 510-515T and the factors match those I previously found. I got 138M p/sec on a GTX465 using 0.41 CPU on a single core of a Core i7@3.6GHz. As the single core was not maxed out I decided to try running 2 instances on a single core (the other 3 cores were running instances of LLR). With 2 instances I got a combined throughput of 210M p/sec with 0.68 CPU used. This would suggest that the GTX465 wasn't maxed out either with a single instance.
|
![]() |
![]() |
![]() |
#11 | |
Jan 2005
Caught in a sieve
5·79 Posts |
![]() Quote:
![]() Interesting! Try fiddling with the -m option (probably going up from 8 in increments of 1), and see if you can make a single instance do any better. Last fiddled with by Ken_g6 on 2010-09-07 at 18:30 Reason: Wrong start for -m |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Fast Mersenne Testing on the GPU using CUDA | Andrew Thall | GPU Computing | 109 | 2014-07-28 22:14 |
Inconsistent factors with TPSieve | Caldera | Twin Prime Search | 7 | 2013-01-05 18:32 |
tpsieve-cuda slows down with increasing p | amphoria | Twin Prime Search | 0 | 2011-07-23 10:52 |
Is TPSieve-0.2.1 faster than Newpgen? | cipher | Twin Prime Search | 4 | 2009-05-18 18:36 |
Thread for non-PrimeNet LL testing | ThomRuley | Lone Mersenne Hunters | 6 | 2005-10-16 20:11 |