![]() |
![]() |
#122 | |
Mar 2003
New Zealand
13×89 Posts |
![]() Quote:
Did you need to make any change to the timestamp() function, or does the default gettimeofday() have high enough resolution to choose the right method? It would be interesting to see the output from running it on riesel.dat with the -vv switch. |
|
![]() |
![]() |
![]() |
#123 | |
Oct 2006
On a Suzuki Boulevard C90
F616 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#124 |
Oct 2006
On a Suzuki Boulevard C90
2×3×41 Posts |
![]()
Test with original asm-ppc64.h
sr2sieve 1.5.6 -- A sieve for the Sierpinski/Riesel Base 2 projects. L1 data cache 32Kb (detected), L2 cache 512Kb (default). Reading `riesel.dat' ... Read 4293433 terms for 67 sequences from dat format file `riesel.dat'. Split 67 base 2 sequences into 849 base 2^360 subsequences. Opening version 1 cache file `sr2cache.bin'... Loaded Legendre symbol lookup tables for 67 sequences from `sr2cache.bin'. Using 32 Kb for the baby-steps giant-steps hashtable, maximum density 0.42. Using ladder method gen/1. Using 256Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 0.01 terms in this range. sr2sieve started: 464 <= n <= 20000000, 1867875000000000 <= p <= 1867875100000000 p=1867875013500547, 224926 p/sec, 0 factors, 13.50% done, ETA 20 Jun 02:32 p=1867875027001231, 224696 p/sec, 0 factors, 27.00% done, ETA 20 Jun 02:32 p=1867875040501993, 225091 p/sec, 0 factors, 40.50% done, ETA 20 Jun 02:32 p=1867875054002329, 224960 p/sec, 0 factors, 54.00% done, ETA 20 Jun 02:32 p=1867875067502201, 225057 p/sec, 0 factors, 67.50% done, ETA 20 Jun 02:31 p=1867875081003013, 225017 p/sec, 0 factors, 81.00% done, ETA 20 Jun 02:31 p=1867875094503673, 224457 p/sec, 0 factors, 94.50% done, ETA 20 Jun 02:31 sr2sieve stopped: at p=1867875100000000 because range is complete. Found factors for 0 terms in 448.851 cpu sec. (expected about 0.01) Test with original asm-ppc64.h + vec-ppc64.txt sr2sieve 1.5.6 -- A sieve for the Sierpinski/Riesel Base 2 projects. L1 data cache 32Kb (detected), L2 cache 512Kb (default). Reading `riesel.dat' ... Read 4293433 terms for 67 sequences from dat format file `riesel.dat'. Split 67 base 2 sequences into 849 base 2^360 subsequences. Opening version 1 cache file `sr2cache.bin'... Loaded Legendre symbol lookup tables for 67 sequences from `sr2cache.bin'. Using 32 Kb for the baby-steps giant-steps hashtable, maximum density 0.42. Best time for baby step method gen/2: 120. Best time for baby step method gen/4: 107. Best time for baby step method gen/8: 102. Best time for baby step method gen/1: 112. Best time for giant step method gen/2: 78. Best time for giant step method gen/4: 76. Best time for giant step method gen/8: 76. Best time for giant step method gen/1: 83. Best time for ladder method gen/2: 5. Best time for ladder method gen/4: 4. Best time for ladder method gen/8: 4. Best time for ladder method gen/1: 9. Best time for ladder method add/1: 7. Using baby step method gen/8, giant step method gen/4, ladder method gen/4. Using 256Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 0.01 terms in this range. sr2sieve started: 464 <= n <= 20000000, 1867875000000000 <= p <= 1867875100000000 p=1867875014680547, 243753 p/sec, 0 factors, 14.68% done, ETA 20 Jun 02:38 p=1867875029230103, 243560 p/sec, 0 factors, 29.23% done, ETA 20 Jun 02:38 p=1867875043909729, 243884 p/sec, 0 factors, 43.91% done, ETA 20 Jun 02:38 p=1867875058590089, 243455 p/sec, 0 factors, 58.59% done, ETA 20 Jun 02:38 p=1867875073138559, 243790 p/sec, 0 factors, 73.14% done, ETA 20 Jun 02:38 p=1867875087819641, 243305 p/sec, 0 factors, 87.82% done, ETA 20 Jun 02:38 sr2sieve stopped: at p=1867875100000000 because range is complete. Found factors for 0 terms in 414.741 cpu sec. (expected about 0.01) |
![]() |
![]() |
![]() |
#125 |
Oct 2006
On a Suzuki Boulevard C90
2×3×41 Posts |
![]()
Test with original asm-ppc64.h
sr2sieve 1.5.6 -- A sieve for the Sierpinski/Riesel Base 2 projects. L1 data cache 32Kb (default), L2 cache 1024Kb (detected). Reading `riesel.dat' ... Read 4293433 terms for 67 sequences from dat format file `riesel.dat'. Split 67 base 2 sequences into 849 base 2^360 subsequences. Opening version 1 cache file `sr2cache.bin'... Loaded Legendre symbol lookup tables for 67 sequences from `sr2cache.bin'. Using 32 Kb for the baby-steps giant-steps hashtable, maximum density 0.42. Using ladder method gen/1. Using 512Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 0.01 terms in this range. sr2sieve started: 464 <= n <= 20000000, 1867875000000000 <= p <= 1867875100000000 p=1867875015468277, 258874 p/sec, 0 factors, 15.47% done, ETA 19 Jun 20:40 p=1867875031065427, 258372 p/sec, 0 factors, 31.07% done, ETA 19 Jun 20:40 p=1867875046531853, 258631 p/sec, 0 factors, 46.53% done, ETA 19 Jun 20:40 p=1867875062128679, 258576 p/sec, 0 factors, 62.13% done, ETA 19 Jun 20:40 p=1867875077595349, 258553 p/sec, 0 factors, 77.60% done, ETA 19 Jun 20:40 p=1867875093061433, 258133 p/sec, 0 factors, 93.06% done, ETA 19 Jun 20:40 sr2sieve stopped: at p=1867875100000000 because range is complete. Found factors for 0 terms in 390.175 cpu sec. (expected about 0.01) Test with original asm-ppc64.h + vec-ppc64.txt sr2sieve 1.5.6 -- A sieve for the Sierpinski/Riesel Base 2 projects. L1 data cache 32Kb (default), L2 cache 1024Kb (detected). Reading `riesel.dat' ... Read 4293433 terms for 67 sequences from dat format file `riesel.dat'. Split 67 base 2 sequences into 849 base 2^360 subsequences. Opening version 1 cache file `sr2cache.bin'... Loaded Legendre symbol lookup tables for 67 sequences from `sr2cache.bin'. Using 32 Kb for the baby-steps giant-steps hashtable, maximum density 0.42. Best time for baby step method gen/2: 105. Best time for baby step method gen/4: 94. Best time for baby step method gen/8: 89. Best time for baby step method gen/1: 97. Best time for giant step method gen/2: 68. Best time for giant step method gen/4: 66. Best time for giant step method gen/8: 65. Best time for giant step method gen/1: 72. Best time for ladder method gen/2: 4. Best time for ladder method gen/4: 3. Best time for ladder method gen/8: 3. Best time for ladder method gen/1: 7. Best time for ladder method add/1: 6. Using baby step method gen/8, giant step method gen/8, ladder method gen/4. Using 512Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 0.01 terms in this range. sr2sieve started: 464 <= n <= 20000000, 1867875000000000 <= p <= 1867875100000000 p=1867875016777271, 278335 p/sec, 0 factors, 16.78% done, ETA 19 Jun 20:46 p=1867875033425419, 278718 p/sec, 0 factors, 33.43% done, ETA 19 Jun 20:46 p=1867875050202569, 278750 p/sec, 0 factors, 50.20% done, ETA 19 Jun 20:46 p=1867875066848389, 278544 p/sec, 0 factors, 66.85% done, ETA 19 Jun 20:46 p=1867875083626081, 278435 p/sec, 0 factors, 83.63% done, ETA 19 Jun 20:46 sr2sieve stopped: at p=1867875100000000 because range is complete. Found factors for 0 terms in 362.522 cpu sec. (expected about 0.01) |
![]() |
![]() |
![]() |
#126 | |
Mar 2003
New Zealand
48516 Posts |
![]() Quote:
Does anyone know how to use a more precise timer on the ppc64? If not then I could run each method a few times in a loop and take the total time. |
|
![]() |
![]() |
![]() |
#127 | |
"Mark"
Apr 2003
Between here and the
11001111101012 Posts |
![]() Quote:
#include <time.h> #include <stdio.h> #include <sys/sysctl.h> uin64_t get_cycles_per_clock(void) { uint64_t i, t[100]; uint64_t cpufrequency = 0, tbfrequency = 0, tbcycles; size_t cpufrequencylen = sizeof(uint64_t), tbfrequencylen = sizeof(uint64_t); double u; static int cpumib[2] = { CTL_HW, HW_CPU_FREQ } ; static int tbmib[2] = { CTL_HW, HW_TB_FREQ } ; sysctl(cpumib,2,&cpufrequency,&cpufrequencylen,0,0); sysctl(tbmib,2,&tbfrequency,&tbfrequencylen,0,0); u = (double) cpufrequency / (double) tbfrequency; tbcycles = (uint64_t) u; while (tbcycles + 0.5 < u) tbcycles += 1; while (tbcycles - 0.5 > u) tbcycles -= 1; for (i=0; i<100; ++i) t[i] = mach_absolute_time() * tbcycles; for (i=1; i<100; ++i) if (t[i] - t[i-1] > 0) break; return t[i] - t[i-1]; } uint64_t get_clocks(void) { unsigned long retval; asm volatile("mftb %0": "=r"(retval)); return retval; } Call get_cycles_per_clock() once. Call get_clocks() before and after the function call you are timing. Multiple the difference by the returned value from get_cycles_per_clock(). BTW that value is 75 on a 2.5 GHz PPC and 60 on a 2.0 GHz PPC. |
|
![]() |
![]() |
![]() |
#128 | |
Mar 2003
New Zealand
13·89 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#129 |
"Mark"
Apr 2003
Between here and the
3×5×443 Posts |
![]() |
![]() |
![]() |
![]() |
#130 |
Oct 2006
On a Suzuki Boulevard C90
2×3×41 Posts |
![]()
geoff, rogue, Just to let you know that I tested the get_clocks() routine and it works correctly under Linux.
The get_cycles_per_clock() routine does not compile under Linux, as I believe that it is using the OS-X sysctl() API. |
![]() |
![]() |
![]() |
#131 |
Mar 2003
New Zealand
115710 Posts |
![]()
I have added the vector methods code along with rogue's assembler for timing to the version 1.5.7 source.
I think there is still room to improve performance on the ppc64, in at least two ways: 1. The mulmod routine should be able to be re-arranged so that the multiplications are done in a different order and the intermediate product re-used in the next mulmod. At the moment we are effectively multiplying pMagic*b every time, although both pMagic and b are constant over many mulmods. 2. Assuming there are enough spare registers, it should be possible to interleave the instructions from two seperate mulmods, which should allow the CPU to get better throughput. e.g. y1 = x1*b (mod p) and y2 = x2*b (mod p) are independent computations so it should be possible to intermix the instructions from each. At the moment there are unnecessary dependenncies between them because they both use the same scratch registers. |
![]() |
![]() |
![]() |
#132 | ||
"Mark"
Apr 2003
Between here and the
664510 Posts |
![]() Quote:
Quote:
|
||
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
srsieve/sr2sieve enhancements | rogue | Software | 304 | 2021-11-06 13:51 |
32-bit of sr1sieve and sr2sieve for Win | pepi37 | Software | 5 | 2013-08-09 22:31 |
sr2sieve question | SaneMur | Information & Answers | 2 | 2011-08-21 22:04 |
sr2sieve client | mgpower0 | Prime Sierpinski Project | 54 | 2008-07-15 16:50 |
How to use sr2sieve | nuggetprime | Riesel Prime Search | 40 | 2007-12-03 06:01 |