division/remainder algorithm (trial factoring)
Hello,
can somebody give me a hint how to calculate a division / remainder with software integers in an efficent way? Size of the numbers: up to ~80/160 bits (Trialfactoring of factors up to ~80 bits) Limitations:  (virtually) NO if/then/else!  only 16/32bit integers  only single precission floats (non IEEE rounding)  no assembler (no carry flag, ...), only C if/then/else is OK if (and only if) the code takes in >99.9% of the cases (different input data) the same code path. Background: I'm thinking of trial factoring on graphics cards (Nvidia). There are so many limitations for running calculations on Nvidia cards but TF might be possible ;) Currently I've done a quick&dirty implemantation with the limitations above which runs on my main CPU (C2D @ 3GHz). For 65+ bit factors the performance is about 60 times slower than prime95. I think if I can improve the speed to lets say "only" 1015 times slower than prime95 it might be worth to try it on the GFX. e.g.: Geforce 8800GTX (128 shader units, 768MiB memory) some recommendations for optimal performance:  at least 6 threads per shader unit to hide the latency of the memory (so we will need 768 threads <=> only 1MiB per thread available)  these threads should be strictly SIMD otherwise the threads won't be done in parallel... that's why I've said no if/then/else. Loops are OK if (and only if) the number ot iterations for each thread are the same. TheJudger P.S. I'm pretty sure that the lack of double precission isn't the main reason for not doing LL on the current GFX cards. The limitations mentioned above should be even more problematic... we'll need a threadparallel multiplication algorithm which is SIMD for all threads and which doesn't need much communication between the threads... 
[QUOTE=TheJudger;116554]Hello,
can somebody give me a hint how to calculate a division / remainder with software integers in an efficent way? Size of the numbers: up to ~80/160 bits (Trialfactoring of factors up to ~80 bits) ...[/QUOTE] Read Knuth, The Art of Computer Programming, Vol II 
Why not just use GMP and test a different number in each thread? You could be doing 768 tests in parallel.

[QUOTE=ShiningArcanine;116577]Why not just use GMP and test a different number in each thread? You could be doing 768 tests in parallel.[/QUOTE]
We must have read different posts. I could swear that the O.P. asked "how to calculate" and not "what code do I use". :missingteeth: 
[QUOTE=ShiningArcanine;116577]Why not just use GMP and test a different number in each thread? You could be doing 768 tests in parallel.[/QUOTE]
Yes, that's what I want to do.. test alot of possible factors for a single exponent at the same time. I'm pretty sure the GMP won't fit to the limitations... 
All times are UTC. The time now is 16:46. 
Powered by vBulletin® Version 3.8.11
Copyright ©2000  2022, Jelsoft Enterprises Ltd.