![]() |
![]() |
#1 |
Feb 2016
1510 Posts |
![]()
My latest version of QS is moving in fifth of the speed of light as it only 5 times slower than Tills version now, lol.
40% of my sieving code is spent on the below loop. Do you have an idea how to improve it? Code:
// that loop takes 40% of the performance for (Wheel wheel : wheels) { wheel.update(logs); } // Wheel update method public void update(byte[] logs) { for (; currentPosition < logs.length; currentPosition += prime) { // currentPosition is int logs[currentPosition] += log; } this.currentPosition -= logs.length; } |
![]() |
![]() |
![]() |
#2 |
"Robert Gerbicz"
Oct 2005
Hungary
144510 Posts |
![]() |
![]() |
![]() |
![]() |
#3 | |
"Ben"
Feb 2007
337110 Posts |
![]() Quote:
cp1 = currentPosition[i] cp2 = currentPosition[i+1] cp3 = currentPosition[i+2] cp4 = currentPosition[i+3] p1 = primes[i] p2 = primes[i+1] p3 = primes[i+2] p4 = primes[i+3] maxCP = MAX(cp1, cp2, cp3, cp4) maxP = MAX(p1, p2, p3, p4) for (; maxCP < logs.length; ) { logs[cp1] += log; logs[cp2] += log; logs[cp3] += log; logs[cp4] += log; cp1 += p1; cp2 += p2; cp3 += p3; cp4 += p4; maxCP += maxP; } // now finish the last part of each range with individual loops // finally update all positions for the next block currentPosition[i] = cp1 - logs.length currentPosition[i+1] = cp2 - logs.length currentPosition[i+2] = cp3 - logs.length currentPosition[i+3] = cp4 - logs.length Last fiddled with by bsquared on 2020-08-26 at 20:24 |
|
![]() |
![]() |
![]() |
#4 |
"Ben"
Feb 2007
D2B16 Posts |
![]()
Also, usually people initialize the sieve byte values to a cutoff:
for (i=0; i < sieve_length; i++) { sieve[i] = cutoff; } Then sieve just like normal, but subtract instead of add. Now, you can easily find locations that are ready for trial division by ANDing with a hi-bit mask. if (sieve[i] & 0x80) { trial_divide(i) } But most locations won't need trial division, so you can check and eliminate many at once by overloading the sieve to an 8-byte unsigned long long type: if (((uint64_t)sieve[i] & 0x8080808080808080ULL) == 0) { continue;' } Makes scanning for sieve reports much faster. |
![]() |
![]() |
![]() |
#5 | |||
Feb 2016
178 Posts |
![]() Quote:
My sieving bound M is represented by an array of the size S where Sk = M for a small value of k(I use it as a square of M), I found it more memory efficient. Then my sieving loop looks like this: Code:
for (i=0;i<k;i++){ for (Wheel wheel : wheels) { wheel.update(logs); } } ---------------------------------------------------- Quote:
Quote:
Last fiddled with by Ilya Gazman on 2020-08-26 at 21:37 |
|||
![]() |
![]() |
![]() |
#6 | |
Feb 2016
3×5 Posts |
![]() Quote:
Code:
public void update(byte[] logs) { int length = logs.length; int currentPosition = this.currentPosition; byte log = this.log; int prime = this.prime; while (currentPosition < length) { logs[currentPosition] += log; currentPosition += prime; } this.currentPosition = currentPosition - length; } |
|
![]() |
![]() |
![]() |
#7 | |
"Ben"
Feb 2007
3,371 Posts |
![]() Quote:
All this might be moot, as the bottleneck of the loop is probably memory access and not loop overhead. Unrolling 4x might actually help with memory access since some percentage of the time the writes might hit the same cache line (this is a good thing). Anyway, it might not be a big win but it might help. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Bypass Sieving Stage? | Sam Kennedy | Factoring | 1 | 2016-11-29 02:59 |
Stage 1 with mprime/prime95, stage 2 with GMP-ECM | D. B. Staple | Factoring | 2 | 2007-12-14 00:21 |
Need help to run stage 1 and stage 2 separately | jasong | GMP-ECM | 9 | 2007-10-25 22:32 |
Stage 1 and stage 2 tests missing | Matthias C. Noc | PrimeNet | 5 | 2004-08-25 15:42 |
A distributed-computing project to optimize GIMPS FFT? Genetic algorithms | GP2 | Software | 10 | 2003-12-09 20:41 |