![]() |
![]() |
#1 |
Dec 2009
Peine, Germany
331 Posts |
![]()
Hi,
find here the latest version of the PDF known as "GPU Computing Cheat Sheet". It's the essence of many GPU Computing thread posts on a single piece of paper. Current latest is: 1.04 GIMPS GPU Computing Cheat Sheet latest (pdf) All files Bye, Brain |
![]() |
![]() |
![]() |
#2 |
Dec 2009
Peine, Germany
331 Posts |
![]()
Changes: mfakto 0.11 integrated
Please report errors / suggestions. Last fiddled with by Brain on 2012-08-05 at 09:54 |
![]() |
![]() |
![]() |
#3 |
Dec 2009
Peine, Germany
5138 Posts |
![]()
Quick update:
Changes: mfakto 0.12, CUDALucas 2.03 Last fiddled with by Brain on 2012-08-05 at 09:57 |
![]() |
![]() |
![]() |
#4 |
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
![]()
Some notes on CUDALucas 2.03: In many places, examples and instructions are for 2.01, not 2.03. All command line options are the same (except for -i which prints device info), however the work file requires pseudo-GIMPS format.
Code:
Test=<exp> Test=<AID>,<exp>,<tf>[,<p-1>] As of sometime before 2.00 but after 1.2, save files should be O(n) in size, where n is the fft length. A length of 1474560 (1440K, though 2.03 isn't that smart (2.04 is!)) should have a save file a bit under 1.5 MB. As for max FFT, threads is capped at 1024, and max FFT is capped at 64K*threads, or 64M. That assumes, of course, that there is sufficient memory, that's an excellent point. I would add that if a user gets an "over specifications Grid" error (as you once did) the solution is either to increase threads or decrease FFT length (again assuming sufficient memory). (That help message is added in 2.04 as well.) Also, thanks for the links to the .dlls. Whenever I feel like cleaning up the SourceForge files page, I'll make use of those. (LaurV was able to provide some of them, but not all.) |
![]() |
![]() |
![]() |
#5 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
2×5,179 Posts |
![]()
This is a good sign. If this pdf file appeared, then it means we will have soon the CudaLucas 2.04 made official and all bugs solved (I am already using it since ages, I don't remember using a different version ever
![]() ![]() This usually happens. When we have new models of our products, we do big advertizement to the old models and big discounts to get rid of them... I bought two of my laptops (in an interval of 10-11 years I changed my laptop 4 times) just before new models were launched, which in few months were better and cheaper then the price I paid for my models, already aged, hehe... I still use the sony fw46, core2duo, bought in 2009, just before the big launch of exactly the same laptop, exactly the same price, but with core i7 and more memory.... (they were the only laptops with full hd 1920 screen for coming years, now such things are very common, and they have cuda gtx580m too, even better!, but I did not get one yet, and don't plan to) And by the way, trying to be not totally offtopic, the "B" in the FFT length [edit: in the PDF file] makes no sense and it is technically incorrect. Please correct that. The FFT lenght is not measured in bytes. In fact, each FFT "element" has 8 bytes, and what Dubslow said [edit: about the lenght of the saving files] is therefore wrong: the 1440K FFT size (or 1474560 FFT size, or 1.44M FFT size, but NOT 1440KB FFT, nor 1.44MB FFT, these are WRONG) produces a save file of exactly 11 megabytes, if you do not compress it with gzip or whatever compression algorithm (which compression will be very bad if you do it, because it will not be possible to directly compare residue files using a binary editor/viewer - someone did it in the past for former releases and people, including me, got mad about it). There is also something wrong with the last versions related to "big fft" part, I had in my hands gtx580 with 1.5GB, 3GB, and teslas with 6GB, and I could NOT run any 100Mdigit exponent. But this is more technical and it was/will be discussed in the suitable threads, it has nothing to do with the advertizing (i.e. the pdf file in discussion says what CL advertizes, it is not your fault the program does not work as advertized, hehe). Last fiddled with by LaurV on 2012-08-02 at 04:16 Reason: did a total amalgrammar, it seems I woke up in a bad mood today... |
![]() |
![]() |
![]() |
#6 |
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
![]()
Umm... I never said anything except 1440K. The only place I see a B is 1.5MB. But you're right, I forgot that sizeof(double) != 1, it's 8 as you point out, so 11.8MB plus a few bytes overhead.
[OT] As for 2.04, I've not seen any progress made, and I haven't seen flash on in a while. I haven't kept particularly close track of the issues either, since it appeared to be platform specific. I too have been using 2.04 Beta since then (at least a month) without any file locking issues (or issues of any kind ![]() ![]() |
![]() |
![]() |
![]() |
#7 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
2·5,179 Posts |
![]()
edited to explain (the two "[edit...]" brackets). added text, corrected grammar as much as I could spot. I woke up in the wrong posture today... I better refrain posting until after the forth coffee cup...
Related to installing cuda and msvs, well, I already did, but beside of trying to recompile some of older flashjh's releases, I didn't do too much. I can't really find the time for programming at home (at the office is no way! plenty of little things and Thai minions are pissing me off every minute), and I still have a list of "things to program at home", including that P-1 stuff, I did not write a line of code to it since months, but for that project the story is different, beside of scarce time, there is also scarce inspiration/knowledge. I am still playing with P-1 in pari/gp, trying to optimize it (from the theoretical point of view) as much as possible for mersenne numbers, and trying to get it as parallel as possible, but beside of multiplying primes in pairs on different threads there is not too much to optimize. I have learned a lot of things from this, but the magic spark is still missing. It may be a good reason why other (more clever) people didn't implement P-1 on cuda till now. If any spark, I may return to writing "close to the metal" (i.e. cuda) again, but the chances are low for the time being. |
![]() |
![]() |
![]() |
#8 |
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3·29·83 Posts |
![]()
Re the B thing: Ah yes, that makes much more sense. Please fix it Brain.
![]() Re CUDALucas: flash did some programming, but none I couldn't have done myself. (It would have taken me quite a bit more time than him though. ![]() Re P-1: Since P-1 is also a bunch of multiplication mod Mp, like the LL test, I think our best bet is to modify CUDALucas. First step would be to learn (and I mean learn) about this thingy. That would be something that would require some serious tutoring from the smart people, e.g. msft, ewmayer, Prime95, etc. etc. This might be a good place to start, being the genesis of all modern LL programs, CUDALucas and Prime95 included (though perhaps excluding Mlucas, you'll have to ask ewmayer about that), and being written by Richard Crandall, one of the guys who came up with the IBDWT. (If you look closely, some of the comments and functions in CUDALucas.cu are actually verbatim (or close to it) leftovers from that link.) PS: I have been considering starting such a tutorial thread, but between fixing YAFU's minrels, BOINCifying a modified Msieve (not to mention Prime95) and restarting university in less than a month, I figured it was too much. PPS: Wiki says this: "If we perform carrying on the negacyclic convolution, the result is equivalent to the product of the inputs mod B^n + 1." together with "If we perform carrying on the cyclic convolution, the result is equivalent to the product of the inputs mod B^n − 1." But then it says: "In this algorithm, it will be more useful to compute the negacyclic convolution" but it seems to me that using the cyclic convolution would make more sense since our test should be mpd 2^p-1? Or is the negacyclic thingy still faster, and just be sure to catch values that are 2^p and 2^p-1 (which should reduce to 1 and 0 mod 2^p-1?) PPPS: How do you represent a bignum as an array of doubles? How many bits of the num does each double represent? (Do we assume a double has 64 bits of memory? Do you assume IEEE 754 format? Can you use the exponent bits, e.g. via shifts?) Last fiddled with by Dubslow on 2012-08-02 at 05:16 |
![]() |
![]() |
![]() |
#9 | |
Dec 2009
Peine, Germany
331 Posts |
![]() Quote:
I assume v1.03 will come out soon as of the new upcoming mfaktc/o kernels and CL 2.04. File now here. Last fiddled with by Brain on 2012-08-05 at 09:58 |
|
![]() |
![]() |
![]() |
#10 | |
Dec 2009
Peine, Germany
331 Posts |
![]() Quote:
The web space I use is very limited. I will have to remove older versions for publishing new CUDA dlls. I think they have a nice place to be over there at sourceforge. |
|
![]() |
![]() |
![]() |
#11 |
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
![]()
No, I mean I don't have the dlls you've linked, so in the next few days I'll download those and update the SF page with them.
![]() (And btw, with 2.04 the assignment format will be more flexible, among some other changes. I'll detail them when it's actually released, if the file locking ever gets fixed.) |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
History of Computing | Nick | Computer Science & Computational Number Theory | 0 | 2017-10-10 20:45 |
Error while Computing | Antonio | NFS@Home | 5 | 2016-06-30 17:30 |
Cloud computing | Unregistered | Information & Answers | 10 | 2011-05-10 00:57 |
The ATI GPU Computing thread | Brain | Hardware | 7 | 2009-12-19 18:54 |
The difference between P2P and distributed computing and grid computing | GP2 | Lounge | 2 | 2003-12-03 14:13 |