mersenneforum.org Sr2sieve on PPC/Linux
 Register FAQ Search Today's Posts Mark Forums Read

 2007-06-29, 23:01 #166 Cruelty     May 2005 31348 Posts Here are my results on C2D@2.4GHz running version 1.5.10 of linux.x86-64 binary Code: sr2sieve 1.5.10 -- A sieve for multiple sequences k*b^n+/-1. L1 data cache 32Kb (detected), L2 cache 2048Kb (detected). Read 233216 terms for 14 sequences from dat format file riesel.dat'. Split 14 base 2 sequences into 425 base 2^60 subsequences. Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.16. Best time for baby step method gen/2: 36891. Best time for baby step method gen/4: 28404. Best time for baby step method gen/8: 29061. Best time for baby step method gen/1: 42660. Best time for giant step method gen/2: 18009. Best time for giant step method gen/4: 16902. Best time for giant step method gen/8: 16911. Best time for giant step method gen/1: 22446. Best time for ladder method gen/2: 1467. Best time for ladder method gen/4: 1143. Best time for ladder method gen/8: 1269. Best time for ladder method gen/1: 2538. Best time for ladder method add/1: 2943. Using baby step method gen/4, giant step method gen/4, ladder method gen/4. Using 1024Kb for the Sieve of Eratosthenes bitmap. Expecting to find factors for about 4552.40 terms in this range. sr2sieve started: 680004 <= n <= 1000000, 11000000000000 <= p <= 20000000000000 p=11000122028941, 2040651 p/sec, 0 factors, 0.00% done, ETA 20 Aug 06:13
2007-07-02, 02:32   #167
geoff

Mar 2003
New Zealand

115710 Posts

I have had a look at the code produced for the ppc64, it seems that GCC is unable to move some constants outside of the critical loops due to aliasing issues, and so it does some redundant reloading of registers. In version 1.5.12 I have made a few changes that might help, but the real solution is probably to code the whole loop in assembly, not just the loop body as we do now.

BlisteringSheep: If you can send me the assembled bsgs.s file for 1.5.12 as before, I will check whether the changes have had the expected effect.
Attached Files
 ppc_asm.txt (1.2 KB, 193 views)

Last fiddled with by geoff on 2007-07-02 at 02:39 Reason: added attachment

2007-07-02, 03:16   #168
BlisteringSheep

Oct 2006
On a Suzuki Boulevard C90

2×3×41 Posts
bsgs-1.5.12 assembled

Here are the gcc-4.1.1 and gcc-4.1.2 versions. Everything did compile cleanly. I haven't done correctness or performance testing yet, but will start right away.

A couple of questions:
1. Is the sr5check.txt test with p from 5e9 to 5.1e9 sufficient for exercising the code with factors > 2^32?
2. Do you have any opinions on the quality of the code generated by gcc-4.1.1 vs. that generated by gcc-4.1.2? I don't see any measurable difference in execution times.
3. Does including the output from multiple compilers as I've been doing help or just add noise? I haven't done timing checks with gcc-3.x compilers in a while because in the beginning gcc-4.x was demonstrably faster.
Attached Files
 bsgs-1.5.12.zip (30.2 KB, 188 views)

Last fiddled with by BlisteringSheep on 2007-07-02 at 03:16 Reason: ^riesel.dat^sr5check.txt

2007-07-02, 04:06   #169
geoff

Mar 2003
New Zealand

13·89 Posts

Quote:
 Originally Posted by BlisteringSheep Here are the gcc-4.1.1 and gcc-4.1.2 versions. Everything did compile cleanly.
The changes worked, I've attached the new loop for comparison: three less loads and one less subtraction. Whether it makes much difference I don't know.

Quote:
 Is the sr5check.txt test with p from 5e9 to 5.1e9 sufficient for exercising the code with factors > 2^32?
Not entirely, each of the combinations of baby and giant step methods could have different bugs :-(. A comprehensive test would require running with switches -Bgen/x -Ggen/y' for each combination of x and y in {1,2,4,8}. That would be quite tedious. A lot of the code for the gen/8 methods is just cut and paste from gen/4, and gen/4 is cut and paste from gen/2, etc. So I expect any bugs would be likely to show up in the gen/8 methods.

Quote:
 Do you have any opinions on the quality of the code generated by gcc-4.1.1 vs. that generated by gcc-4.1.2? I don't see any measurable difference in execution times.
I can't see any significant difference, but I only looked at a few spots in the code. I have found with the x86 builds that GCC 3.4 is faster than GCC 4.1, which is a change since the 1.4.x verisons of sr2sieve.

Quote:
 Does including the output from multiple compilers as I've been doing help or just add noise? I haven't done timing checks with gcc-3.x compilers in a while because in the beginning gcc-4.x was demonstrably faster.
It is probably not necessary to do both versions unless you notice a performance difference between them. Even then it may be hard to figure out why one is faster than the other.
Attached Files
 ppc-asm.txt.txt (786 Bytes, 204 views)

2007-07-02, 05:21   #170
BlisteringSheep

Oct 2006
On a Suzuki Boulevard C90

2×3×41 Posts

sr5check passed with both 100e6-150e6 and 5e9-5.1e9. I will setup a test to run the comprehensive list of methods. Is there any way to tell it to not print the factors to the screen? I'd like to capture the output from -vv, to confirm that I'm passing the command-line flags correctly, but don't really need to see all 10000 factors .

Timing results for riesel.dat and SoB.dat with 1.5.10 EXP 1 vs. 1.5.12. Another measurable speedup.

Summary:
v1.5.10 riesel.dat: 301294 p/sec
v1.5.12 riesel.dat: 314009 p/sec
v1.5.10 SoB.dat: 504277 p/sec
v1.5.12 SoB.dat: 521006 p/sec

Full results with -vv output is attached. I also did some testing with gcc-3.4.6, and gcc-4.1.x is still measurably faster.
Attached Files
 test_js21-1.5.10-1.5.12_riesel_sob.txt (7.6 KB, 202 views)

Last fiddled with by BlisteringSheep on 2007-07-02 at 05:35 Reason: forgot a smiley :)

2007-07-02, 06:31   #171
BlisteringSheep

Oct 2006
On a Suzuki Boulevard C90

2×3×41 Posts

Quote:
 Originally Posted by geoff Not entirely, each of the combinations of baby and giant step methods could have different bugs :-(. A comprehensive test would require running with switches `-Bgen/x -Ggen/y' for each combination of x and y in {1,2,4,8}. That would be quite tedious. A lot of the code for the gen/8 methods is just cut and paste from gen/4, and gen/4 is cut and paste from gen/2, etc. So I expect any bugs would be likely to show up in the gen/8 methods.
I ran all permutations of x & y with both the 100e6-150e6 and 5e9-5.1e9 ranges and every one passed. Good work!

 2007-07-05, 23:43 #172 geoff     Mar 2003 New Zealand 13×89 Posts Version 1.5.13 has a few small changes, it should be a little faster than 1.5.12, but if not then I will undo them. There are no changes to the assembler routines, so just a brief test should suffice. I realise now that testing each of the 16 combinations of -Bgen/x and -Ggen/y options was not necessary, just testing the 4 combinations with -Bgen/x and -Ggen/x would have been enough to exercise all the assembler routines.
 2007-07-06, 21:29 #173 BlisteringSheep     Oct 2006 On a Suzuki Boulevard C90 2×3×41 Posts Passed both of my sr5check tests fine. It is indeed faster (timings from 2.0 GHz 970FX): 1.5.12 SoB.dat: 250971 p/sec 1.5.13 SoB.dat: 258296 p/sec 1.5.12 riesel.dat: 417483 p/sec 1.5.13 riesel.dat: 435247 p/sec sr5check timings: 1.5.12 100e6-150e6: 84.432 cpu sec 1.5.13 100e6-150e6: 80.542 cpu sec 1.5.12 5e9-5.1e9: 141.644 cpu sec 1.5.13 5e9-5.1e9: 135.267 cpu sec Looks like I'll be deploying this version. :) Last fiddled with by BlisteringSheep on 2007-07-06 at 21:32 Reason: added sr5check timings

 Similar Threads Thread Thread Starter Forum Replies Last Post rogue Software 304 2021-11-06 13:51 pepi37 Software 5 2013-08-09 22:31 SaneMur Information & Answers 2 2011-08-21 22:04 mgpower0 Prime Sierpinski Project 54 2008-07-15 16:50 nuggetprime Riesel Prime Search 40 2007-12-03 06:01

All times are UTC. The time now is 16:12.

Fri Jul 1 16:12:41 UTC 2022 up 78 days, 14:14, 0 users, load averages: 2.13, 1.74, 1.57