mersenneforum.org  

Go Back   mersenneforum.org > Prime Search Projects > Sierpinski/Riesel Base 5

Reply
 
Thread Tools
Old 2007-06-29, 23:01   #166
Cruelty
 
Cruelty's Avatar
 
May 2005

31348 Posts
Default

Here are my results on C2D@2.4GHz running version 1.5.10 of linux.x86-64 binary
Code:
sr2sieve 1.5.10 -- A sieve for multiple sequences k*b^n+/-1.
L1 data cache 32Kb (detected), L2 cache 2048Kb (detected).
Read 233216 terms for 14 sequences from dat format file `riesel.dat'.
Split 14 base 2 sequences into 425 base 2^60 subsequences.
Using 16 Kb for the baby-steps giant-steps hashtable, maximum density 0.16.
Best time for baby step method gen/2: 36891.
Best time for baby step method gen/4: 28404.
Best time for baby step method gen/8: 29061.
Best time for baby step method gen/1: 42660.
Best time for giant step method gen/2: 18009.
Best time for giant step method gen/4: 16902.
Best time for giant step method gen/8: 16911.
Best time for giant step method gen/1: 22446.
Best time for ladder method gen/2: 1467.
Best time for ladder method gen/4: 1143.
Best time for ladder method gen/8: 1269.
Best time for ladder method gen/1: 2538.
Best time for ladder method add/1: 2943.
Using baby step method gen/4, giant step method gen/4, ladder method gen/4.
Using 1024Kb for the Sieve of Eratosthenes bitmap.
Expecting to find factors for about 4552.40 terms in this range.
sr2sieve started: 680004 <= n <= 1000000, 11000000000000 <= p <= 20000000000000
p=11000122028941, 2040651 p/sec, 0 factors, 0.00% done, ETA 20 Aug 06:13
Cruelty is offline   Reply With Quote
Old 2007-07-02, 02:32   #167
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

115710 Posts
Default

I have had a look at the code produced for the ppc64, it seems that GCC is unable to move some constants outside of the critical loops due to aliasing issues, and so it does some redundant reloading of registers. In version 1.5.12 I have made a few changes that might help, but the real solution is probably to code the whole loop in assembly, not just the loop body as we do now.

BlisteringSheep: If you can send me the assembled bsgs.s file for 1.5.12 as before, I will check whether the changes have had the expected effect.
Attached Files
File Type: txt ppc_asm.txt (1.2 KB, 193 views)

Last fiddled with by geoff on 2007-07-02 at 02:39 Reason: added attachment
geoff is offline   Reply With Quote
Old 2007-07-02, 03:16   #168
BlisteringSheep
 
BlisteringSheep's Avatar
 
Oct 2006
On a Suzuki Boulevard C90

2×3×41 Posts
Default bsgs-1.5.12 assembled

Here are the gcc-4.1.1 and gcc-4.1.2 versions. Everything did compile cleanly. I haven't done correctness or performance testing yet, but will start right away.

A couple of questions:
  1. Is the sr5check.txt test with p from 5e9 to 5.1e9 sufficient for exercising the code with factors > 2^32?
  2. Do you have any opinions on the quality of the code generated by gcc-4.1.1 vs. that generated by gcc-4.1.2? I don't see any measurable difference in execution times.
  3. Does including the output from multiple compilers as I've been doing help or just add noise? I haven't done timing checks with gcc-3.x compilers in a while because in the beginning gcc-4.x was demonstrably faster.
Attached Files
File Type: zip bsgs-1.5.12.zip (30.2 KB, 188 views)

Last fiddled with by BlisteringSheep on 2007-07-02 at 03:16 Reason: ^riesel.dat^sr5check.txt
BlisteringSheep is offline   Reply With Quote
Old 2007-07-02, 04:06   #169
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default

Quote:
Originally Posted by BlisteringSheep View Post
Here are the gcc-4.1.1 and gcc-4.1.2 versions. Everything did compile cleanly.
The changes worked, I've attached the new loop for comparison: three less loads and one less subtraction. Whether it makes much difference I don't know.

Quote:
Is the sr5check.txt test with p from 5e9 to 5.1e9 sufficient for exercising the code with factors > 2^32?
Not entirely, each of the combinations of baby and giant step methods could have different bugs :-(. A comprehensive test would require running with switches `-Bgen/x -Ggen/y' for each combination of x and y in {1,2,4,8}. That would be quite tedious. A lot of the code for the gen/8 methods is just cut and paste from gen/4, and gen/4 is cut and paste from gen/2, etc. So I expect any bugs would be likely to show up in the gen/8 methods.


Quote:
Do you have any opinions on the quality of the code generated by gcc-4.1.1 vs. that generated by gcc-4.1.2? I don't see any measurable difference in execution times.
I can't see any significant difference, but I only looked at a few spots in the code. I have found with the x86 builds that GCC 3.4 is faster than GCC 4.1, which is a change since the 1.4.x verisons of sr2sieve.

Quote:
Does including the output from multiple compilers as I've been doing help or just add noise? I haven't done timing checks with gcc-3.x compilers in a while because in the beginning gcc-4.x was demonstrably faster.
It is probably not necessary to do both versions unless you notice a performance difference between them. Even then it may be hard to figure out why one is faster than the other.
Attached Files
File Type: txt ppc-asm.txt.txt (786 Bytes, 204 views)
geoff is offline   Reply With Quote
Old 2007-07-02, 05:21   #170
BlisteringSheep
 
BlisteringSheep's Avatar
 
Oct 2006
On a Suzuki Boulevard C90

2×3×41 Posts
Default

sr5check passed with both 100e6-150e6 and 5e9-5.1e9. I will setup a test to run the comprehensive list of methods. Is there any way to tell it to not print the factors to the screen? I'd like to capture the output from -vv, to confirm that I'm passing the command-line flags correctly, but don't really need to see all 10000 factors .

Timing results for riesel.dat and SoB.dat with 1.5.10 EXP 1 vs. 1.5.12. Another measurable speedup.

Summary:
v1.5.10 riesel.dat: 301294 p/sec
v1.5.12 riesel.dat: 314009 p/sec
v1.5.10 SoB.dat: 504277 p/sec
v1.5.12 SoB.dat: 521006 p/sec

Full results with -vv output is attached. I also did some testing with gcc-3.4.6, and gcc-4.1.x is still measurably faster.
Attached Files
File Type: txt test_js21-1.5.10-1.5.12_riesel_sob.txt (7.6 KB, 202 views)

Last fiddled with by BlisteringSheep on 2007-07-02 at 05:35 Reason: forgot a smiley :)
BlisteringSheep is offline   Reply With Quote
Old 2007-07-02, 06:31   #171
BlisteringSheep
 
BlisteringSheep's Avatar
 
Oct 2006
On a Suzuki Boulevard C90

2×3×41 Posts
Default

Quote:
Originally Posted by geoff View Post
Not entirely, each of the combinations of baby and giant step methods could have different bugs :-(. A comprehensive test would require running with switches `-Bgen/x -Ggen/y' for each combination of x and y in {1,2,4,8}. That would be quite tedious. A lot of the code for the gen/8 methods is just cut and paste from gen/4, and gen/4 is cut and paste from gen/2, etc. So I expect any bugs would be likely to show up in the gen/8 methods.
I ran all permutations of x & y with both the 100e6-150e6 and 5e9-5.1e9 ranges and every one passed. Good work!
BlisteringSheep is offline   Reply With Quote
Old 2007-07-05, 23:43   #172
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default

Version 1.5.13 has a few small changes, it should be a little faster than 1.5.12, but if not then I will undo them. There are no changes to the assembler routines, so just a brief test should suffice.

I realise now that testing each of the 16 combinations of -Bgen/x and -Ggen/y options was not necessary, just testing the 4 combinations with -Bgen/x and -Ggen/x would have been enough to exercise all the assembler routines.
geoff is offline   Reply With Quote
Old 2007-07-06, 21:29   #173
BlisteringSheep
 
BlisteringSheep's Avatar
 
Oct 2006
On a Suzuki Boulevard C90

2×3×41 Posts
Default

Passed both of my sr5check tests fine. It is indeed faster (timings from 2.0 GHz 970FX):

1.5.12 SoB.dat: 250971 p/sec
1.5.13 SoB.dat: 258296 p/sec

1.5.12 riesel.dat: 417483 p/sec
1.5.13 riesel.dat: 435247 p/sec

sr5check timings:
1.5.12 100e6-150e6: 84.432 cpu sec
1.5.13 100e6-150e6: 80.542 cpu sec

1.5.12 5e9-5.1e9: 141.644 cpu sec
1.5.13 5e9-5.1e9: 135.267 cpu sec


Looks like I'll be deploying this version. :)

Last fiddled with by BlisteringSheep on 2007-07-06 at 21:32 Reason: added sr5check timings
BlisteringSheep is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
srsieve/sr2sieve enhancements rogue Software 304 2021-11-06 13:51
32-bit of sr1sieve and sr2sieve for Win pepi37 Software 5 2013-08-09 22:31
sr2sieve question SaneMur Information & Answers 2 2011-08-21 22:04
sr2sieve client mgpower0 Prime Sierpinski Project 54 2008-07-15 16:50
How to use sr2sieve nuggetprime Riesel Prime Search 40 2007-12-03 06:01

All times are UTC. The time now is 16:12.


Fri Jul 1 16:12:41 UTC 2022 up 78 days, 14:14, 0 users, load averages: 2.13, 1.74, 1.57

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔