![]() |
![]() |
#45 | |
Oct 2006
On a Suzuki Boulevard C90
2·3·41 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#46 | |||
Mar 2003
New Zealand
13·89 Posts |
![]() Quote:
Quote:
Quote:
|
|||
![]() |
![]() |
![]() |
#47 | |
Oct 2006
On a Suzuki Boulevard C90
2×3×41 Posts |
![]() Quote:
I will experiment some more with the BABY_WORK, GIANT_WORK, EXP_WORK, and SUBSEQ_WORK suggestions you made next. ![]() |
|
![]() |
![]() |
![]() |
#48 | |
Oct 2006
On a Suzuki Boulevard C90
3668 Posts |
![]() Quote:
I promise, last post of the night ![]() [Bet you're all going to be glad when I go away ![]() |
|
![]() |
![]() |
![]() |
#49 | ||
Mar 2003
New Zealand
13×89 Posts |
![]() Quote:
In version 1.4.8 I have made another attempt at the inline mulmod, no guarantees it will work, but if you want to test it just compile with -DUSE_INLINE_MULMOD added to CPPFLAGS. Quote:
|
||
![]() |
![]() |
![]() |
#50 | |
Oct 2006
On a Suzuki Boulevard C90
2·3·41 Posts |
![]() Quote:
![]() Note that these tests are on a different machine with a slower CPU than the results at the top of this page, and the numbers can't be directly compared. ![]() Last fiddled with by BlisteringSheep on 2006-12-07 at 04:56 Reason: speed disclaimer |
|
![]() |
![]() |
![]() |
#51 |
Oct 2006
On a Suzuki Boulevard C90
2×3×41 Posts |
![]()
On the 2.5GHz 970MPs, the speedup is more significant, from about 308000 p/sec to 331000 (over 7%).
There is a similar significant improvement on the 2.2 GHz 970FX, from about 266000 to 286000 (again over 7%). One thing I did think to do in these tests vs. the one last night on the slower CPU was to remove mulmod-ppc64.o from the list of ASM_OBJS when using the USE_INLINE_MULMOD. Last fiddled with by BlisteringSheep on 2006-12-07 at 16:00 Reason: added 2.2 GHz 970FX results |
![]() |
![]() |
![]() |
#52 | |
Mar 2003
New Zealand
13×89 Posts |
![]() Quote:
I don't really know enough about PPC assembler to guess what is likely to work best though, and trial and error will be a long process without a machine at hand to test it on. If you have the patience to do this yourself, the basic idea is to replace a register in the clobber list with an entry in the output list associated with a temporary variable. The current code looks like this: Code:
static inline uint64_t mulmod64(uint64_t a, uint64_t b, uint64_t p) { register uint64_t ret; asm ("li %0, 64" "\n\t" "sub %0, %0, %5" "\n\t" "mulld r7, %1, %2" "\n\t" "mulhdu r8, %1, %2" "\n\t" "mulld r26, r7, %4" "\n\t" "mulhdu r27, r7, %4" "\n\t" "mulld r28, r8, %4" "\n\t" "mulhdu r29, r8, %4" "\n\t" "adde r9, r27, r28" "\n\t" "addze r10, r29" "\n\t" "srd r9, r9, %5" "\n\t" "sld r10, r10, %0" "\n\t" "or r9, r9, r10" "\n\t" "mulld r9, r9, %3" "\n\t" "sub %0, r7, r9" "\n\t" "cmpdi cr6, %0, 0" "\n\t" "bge+ cr6, 0f" "\n\t" "add %0, %0, %3" "\n" "0:" : "=&r" (ret) : "r" (a), "r" (b), "r" (p), "r" (pMagic), "r" (pShift) : "r7","r8","r9","r10","r26","r27","r28","r29","cr6" ); return ret; } Code:
static inline uint64_t mulmod64(uint64_t a, uint64_t b, uint64_t p) { register uint64_t ret, tmp1, tmp2; asm ("li %0, 64" "\n\t" "sub %0, %0, %7" "\n\t" "mulld %1, %3, %4" "\n\t" "mulhdu %2, %3, %4" "\n\t" "mulld r26, %1, %6" "\n\t" "mulhdu r27, %1, %6" "\n\t" "mulld r28, %2, %6" "\n\t" "mulhdu r29, %2, %6" "\n\t" "adde r9, r27, r28" "\n\t" "addze r10, r29" "\n\t" "srd r9, r9, %7" "\n\t" "sld r10, r10, %0" "\n\t" "or r9, r9, r10" "\n\t" "mulld r9, r9, %5" "\n\t" "sub %0, %1, r9" "\n\t" "cmpdi cr6, %0, 0" "\n\t" "bge+ cr6, 0f" "\n\t" "add %0, %0, %5" "\n" "0:" : "=&r" (ret), "=&r" (tmp1), "=&r" (tmp2) : "r" (a), "r" (b), "r" (p), "r" (pMagic), "r" (pShift) : "r9","r10","r26","r27","r28","r29","cr6" ); return ret; } |
|
![]() |
![]() |
![]() |
#53 |
Mar 2003
New Zealand
100100001012 Posts |
![]()
In version 1.4.9 I have made a small change to the inline PRE2_MULMOD64 macro that should allow GCC to recognise that the initial subtraction results in a loop invariant. You can try this change out by replacing the `#if 1' with `#if 0' in asm-ppc64.h.
Last fiddled with by geoff on 2006-12-10 at 23:06 |
![]() |
![]() |
![]() |
#54 | |
Oct 2006
On a Suzuki Boulevard C90
2·3·41 Posts |
![]() Quote:
![]() |
|
![]() |
![]() |
![]() |
#55 |
Oct 2006
On a Suzuki Boulevard C90
2×3×41 Posts |
![]()
On the faster chips, this hurt performance.
![]() |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
srsieve/sr2sieve enhancements | rogue | Software | 304 | 2021-11-06 13:51 |
32-bit of sr1sieve and sr2sieve for Win | pepi37 | Software | 5 | 2013-08-09 22:31 |
sr2sieve question | SaneMur | Information & Answers | 2 | 2011-08-21 22:04 |
sr2sieve client | mgpower0 | Prime Sierpinski Project | 54 | 2008-07-15 16:50 |
How to use sr2sieve | nuggetprime | Riesel Prime Search | 40 | 2007-12-03 06:01 |