mersenneforum.org  

Go Back   mersenneforum.org > Other Stuff > Archived Projects > Prime Cullen Prime

 
 
Thread Tools
Old 2007-08-05, 21:49   #45
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default gcwsieve 1.0.14

The main loop for SSE2 and x86-64 machines is now 100% assembly instead of a mixture of C and inline assembly, and tries to read memory in a more predictable way.

The 32-bit executable runs about 15% faster on my P4, and the 64-bit executable runs about 60% faster on my C2D. (64-bit is now almost twice as fast as 32-bit on the C2D).
geoff is offline  
Old 2007-08-08, 23:49   #46
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13×89 Posts
Default gcwsieve 1.0.15

The main loop for x86 machines without SSE2 is now 100% assembly. It runs about 30% faster on my P3.
geoff is offline  
Old 2007-08-13, 05:25   #47
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

115710 Posts
Default gcwsieve 1.0.16

Version 1.0.16 has support for software prefetching, using the prefetchnta instruction available for SSE machines, or GCC's __builtin_prefetch() function for non x86/x86-64 builds.

Prefetching should result in a speedup in the case that the sieve is too large to fit in L2 cache (each sieve term takes 8 bytes), but on some machines it results in a slowdown instead, probably because it interferes with the automatic hardware prefetcher.

So before sieving starts some test runs are made with and without prefetch, and the faster method selected. Use the --verbose switch to see whether prefetch was selected. To override the automatic selection, use these new switches:

--prefetch: Force use of prefetch.
--no-prefetch: Prevent use of prefetch.


Here are some times for a 216000 term sieve (Primegrid Cullen 10M) at p=1000e9:
Code:
                      --no-prefetch   --prefetch
P3 450MHz, 512Kb L2:   1167 p/sec      1502 p/sec       +29%
P3 600MHz, 256Kb L2:   1462 p/sec      1993 p/sec       +36%
P4 2.9GHz, 512Kb L2:  12224 p/sec     11711 p/sec        -4%
geoff is offline  
Old 2007-08-13, 07:02   #48
VolMike
 
VolMike's Avatar
 
Jun 2007
Moscow,Russia

8516 Posts
Default

Could you provide executable for windows XP athlon machine?
VolMike is offline  
Old 2007-08-15, 03:49   #49
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

115710 Posts
Default

Quote:
Originally Posted by VolMike View Post
Could you provide executable for windows XP athlon machine?
The executable in gcwsieve-X.Y.Z-windows-x86.zip at http://www.geocities.com/g_w_reynolds/gcwsieve/ should work, or is there a problem on that machine?
geoff is offline  
Old 2007-08-18, 23:19   #50
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

115710 Posts
Default gcwsieve 1.0.17

Version 1.0.17 should properly detect the availability of prefetch instructions on AMD machines with 3DNow! but without SSE. (Some earlier Athlons).

A more compact ABC file format will now be written by default. The old format will still be written if the --multisieve switch is given. Either format can be used for the input file:

Old format:
Code:
ABC $a*$b^$a$c // CW Sieved to: 100000000000 with gcwsieve
2000055 2 +1
2000110 2 +1
2000116 2 +1
2000128 2 +1
New format:
Code:
ABC $a*2^$a+1 // CW Sieved to: 100000000000 with gcwsieve
2000055
2000110
2000116
2000128
geoff is offline  
Old 2007-08-22, 23:05   #51
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

13·89 Posts
Default gcwsieve 1.0.18

This version has two minor bugfixes:

Test for Extended 3DNow instead of just 3DNow to determine whether the prefetchnta instruction is available on AMD CPUs. This affected K6-2 CPUs.

Use the best benchmark time instead of the average benchmark time when deciding whether or not to use software prefetching. The average times could be inaccurate when there were other processes running on the same CPU.

There are also some changes to the status line display: The percentage of CPU usage (cpu_time/elapsed_time) is now reported, the status line alternates between these two sets of stats:
Code:
p=1071802477019, 249775 p/sec, 16 factors, 100.0% cpu, 2953 sec/factor
p=1071817422251, 249836 p/sec, 16 factors, 16.9% done, ETA 24 Aug 14:23
And there are two new switches change the information displayed on the status line:

-R --report-primes

Reports primes/sec (the number of prime factors tested per second) instead of p/sec (the increase in p per second).

-e --elapsed-time

Reports p/sec, primes/sec, and sec/factor using elapsed time instead of CPU time.
geoff is offline  
Old 2007-09-02, 03:09   #52
geoff
 
geoff's Avatar
 
Mar 2003
New Zealand

115710 Posts
Default gcwsieve 1.0.20 (x86-64)

The x86-64 executable now has seperate code paths optimised for Intel (Core 2) and AMD (Athlon 64) CPUs. The Athlon 64 code should be about 15% faster than previous versions. Thanks to jmblazek for testing it.

The appropriate code path should be selected automatically, but can be overridden with the --amd or --intel command-line switches.
geoff is offline  
 

Thread Tools


All times are UTC. The time now is 10:21.

Thu Apr 9 10:21:56 UTC 2020 up 15 days, 7:55, 1 user, load averages: 1.18, 1.30, 1.24

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.