-   Prime Cullen Prime (
-   -   gcwsieve (

geoff 2007-08-05 21:49

gcwsieve 1.0.14
The main loop for SSE2 and x86-64 machines is now 100% assembly instead of a mixture of C and inline assembly, and tries to read memory in a more predictable way.

The 32-bit executable runs about 15% faster on my P4, and the 64-bit executable runs about 60% faster on my C2D. (64-bit is now almost twice as fast as 32-bit on the C2D).

geoff 2007-08-08 23:49

gcwsieve 1.0.15
The main loop for x86 machines without SSE2 is now 100% assembly. It runs about 30% faster on my P3.

geoff 2007-08-13 05:25

gcwsieve 1.0.16
Version 1.0.16 has support for software prefetching, using the prefetchnta instruction available for SSE machines, or GCC's __builtin_prefetch() function for non x86/x86-64 builds.

Prefetching should result in a speedup in the case that the sieve is too large to fit in L2 cache (each sieve term takes 8 bytes), but on some machines it results in a slowdown instead, probably because it interferes with the automatic hardware prefetcher.

So before sieving starts some test runs are made with and without prefetch, and the faster method selected. Use the --verbose switch to see whether prefetch was selected. To override the automatic selection, use these new switches:

--prefetch: Force use of prefetch.
--no-prefetch: Prevent use of prefetch.

Here are some times for a 216000 term sieve (Primegrid Cullen 10M) at p=1000e9:
--no-prefetch --prefetch
P3 450MHz, 512Kb L2: 1167 p/sec 1502 p/sec +29%
P3 600MHz, 256Kb L2: 1462 p/sec 1993 p/sec +36%
P4 2.9GHz, 512Kb L2: 12224 p/sec 11711 p/sec -4%

VolMike 2007-08-13 07:02

Could you provide executable for windows XP athlon machine?

geoff 2007-08-15 03:49

[QUOTE=VolMike;112344]Could you provide executable for windows XP athlon machine?[/QUOTE]

The executable in at [url][/url] should work, or is there a problem on that machine?

geoff 2007-08-18 23:19

gcwsieve 1.0.17
Version 1.0.17 should properly detect the availability of prefetch instructions on AMD machines with 3DNow! but without SSE. (Some earlier Athlons).

A more compact ABC file format will now be written by default. The old format will still be written if the --multisieve switch is given. Either format can be used for the input file:

Old format:
ABC $a*$b^$a$c // CW Sieved to: 100000000000 with gcwsieve
2000055 2 +1
2000110 2 +1
2000116 2 +1
2000128 2 +1

New format:
ABC $a*2^$a+1 // CW Sieved to: 100000000000 with gcwsieve

geoff 2007-08-22 23:05

gcwsieve 1.0.18
This version has two minor bugfixes:

Test for Extended 3DNow instead of just 3DNow to determine whether the prefetchnta instruction is available on AMD CPUs. This affected K6-2 CPUs.

Use the best benchmark time instead of the average benchmark time when deciding whether or not to use software prefetching. The average times could be inaccurate when there were other processes running on the same CPU.

There are also some changes to the status line display: The percentage of CPU usage (cpu_time/elapsed_time) is now reported, the status line alternates between these two sets of stats:
p=1071802477019, 249775 p/sec, 16 factors, 100.0% cpu, 2953 sec/factor
p=1071817422251, 249836 p/sec, 16 factors, 16.9% done, ETA 24 Aug 14:23

And there are two new switches change the information displayed on the status line:

-R --report-primes

Reports primes/sec (the number of prime factors tested per second) instead of p/sec (the increase in p per second).

-e --elapsed-time

Reports p/sec, primes/sec, and sec/factor using elapsed time instead of CPU time.

geoff 2007-09-02 03:09

gcwsieve 1.0.20 (x86-64)
The x86-64 executable now has seperate code paths optimised for Intel (Core 2) and AMD (Athlon 64) CPUs. The Athlon 64 code should be about 15% faster than previous versions. Thanks to jmblazek for testing it.

The appropriate code path should be selected automatically, but can be overridden with the --amd or --intel command-line switches.

All times are UTC. The time now is 12:58.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, Jelsoft Enterprises Ltd.