mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Prime Cullen Prime (https://www.mersenneforum.org/forumdisplay.php?f=79)
-   -   gcwsieve (https://www.mersenneforum.org/showthread.php?t=7788)

em99010pepe 2007-04-08 14:03

gcwsieve
 
[FONT=monospace]Geoff,

How do I feed the client with more ranges without stopping it? I like the srwork.txt thing you have on the other clients.
How do I turn sse2 on?

[/FONT]Best Regards,

Carlos

geoff 2007-04-10 04:26

[QUOTE=em99010pepe;103269][FONT=monospace]How do I feed the client with more ranges without stopping it? I like the srwork.txt thing you have on the other clients.[/QUOTE]
I haven't added that feature yet. Unlike srsieve, It should usually be a little faster to start a new sieve job using the latest sieve file than continuing to use the old file anyway. On Linux you can use the batch command to queue up jobs.

[QUOTE]How do I turn sse2 on?[/QUOTE]
For the linux-x86_64 binary it is always on. For the 32-bit binaries it should be detected automatically: Running with the --verbose option will print "Using SSE2 code path" if it is being used. You can force it on with the --sse2 switch, or force it off with the --no-sse2 switch. If it is not being detected automatically then that is a bug.

I am not sure how much advantage machines other than Pentium 4 will get from the SSE2, it may be worthwhile trying both --sse2 and --no-sse2 to see which is faster.

em99010pepe 2007-04-10 20:43

Thanks Geoff.

Carlos

em99010pepe 2007-04-17 17:20

Geoff,

I want to bring more machines (adding at least 4 cores) to sieve but the client needs to be run in:

1º hidden mode
2º have the ability to queue up jobs
3º save it's progress on an output file

The way the client is started, with that list of flags, it's very difficult to hide when using HideItX. I prefer the srsieve feature....

What's again your opinion? Could you implement the srsieve feature into gcwsieve?
By the way, gcwsieve 1.0.4 is faster than gcwsieve 1.0.6 on my AMD 64 but I still need to make more tests.


Carlos

Citrix 2007-04-18 02:45

[QUOTE=em99010pepe;103893]

The way the client is started, with that list of flags, it's very difficult to hide when using HideItX. I prefer the srsieve feature....


Carlos[/QUOTE]

It is possible. Create a cmd prompt (.bat file) and start the bat file with hideitX.

geoff 2007-04-18 05:21

[QUOTE=em99010pepe;103893]1º hidden mode[/QUOTE]
If you create a file called `gcwsieve-command-line.txt' with one line consisting of the command line you want to use, then run gcwsieve without any command line arguments, it will read that file run as if you had typed the first line at the console. Does this help?

[QUOTE]2º have the ability to queue up jobs[/QUOTE]
I will add a work file facility sometime soon.

[QUOTE]3º save it's progress on an output file[/QUOTE]
You can do this by creating a copy of the input file, call it sieve.txt say, change the number on the first line to <pmin>, then run `gcwsieve -i sieve.txt -o sieve.txt -f factors.txt -s<minutes_between_saves> -P<pmax>...'. If you stop and restart with exactly the same command line each time, it will continue from the last save point.

[QUOTE]By the way, gcwsieve 1.0.4 is faster than gcwsieve 1.0.6 on my AMD 64 but I still need to make more tests.[/QUOTE]

This is interesting :-) There are two changes between 1.0.4 and 1.0.6 that might affect peformance:

The 1.0.6 SSE2 code processes 4 terms at a time instead of 2 at a time in 1.0.4. The idea behind this code was to overcome the long Pentium 4 pipeline and it does run a lot faster on my P4, but it is possible that this code is slower on the AMD64 -- it would be interesting if that is the case. You could test this by running both versions with the --no-sse2 switch and see if 1.0.6 is still slower.

In 1.0.6 the AMD cache size detection has (hopefully) been fixed. You can run using the old settings from version 1.0.4 by adding the command line switches -l16 -L256. If you are running multiple instances of the program on a n-core machine then it might be faster to set the L2 cache size to 1/n of its actual size. The L1 cache size probably doesn't have much effect.

Let me know what the results of your experiments are.

em99010pepe 2007-04-18 07:43

Geoff,

The 1.0.6 SSE2 code increased by 2x the output of my new work machine, a dual core P4 3.0GHz. Went from 107 kp/s to 222 kp/s...thank you.
I noticed that on my AMD the highest performance was achieved by 1.0.4 SSE2 code with caches -l16 -L256. Playing with the flag cache on 1.0.6 code I never achieved the performance of the 1.0.4 SSE2 code.
Later today I will check the ability to hide the client.

Cheers,

Carlos

Citrix 2007-04-18 22:16

Geoff, so is gcwsieve.exe now faster than multisieve?:tu:

geoff 2007-04-19 04:17

[QUOTE=Citrix;103964]Geoff, so is gcwsieve.exe now faster than multisieve?:tu:[/QUOTE]

For sieving this project with a Pentium 4 it looks like it is :-) Actually, I am sieving with two threads on my 2.9GHz hyperthreaded P4 and getting 475 kp/s total, which is a lot faster per clock than my 800MHz P3 which gets 82 kp/s on the current sieve file.

Unfortunately from Carlos's results it looks like the new SSE2 code is only suited to the P4. Since the only Windows machines I have access to are P4 and Celeron D, I can't really compare to Multisieve for other machines.

If you think there is some way the SSE2 code from gcwsieve can be used in MultiSieve, let me know. I may be able to put it into the form of an external function that could be linked by MSC.

Carlos: In a future version I think I'll use the old SSE2 code in the gcwsieve-amd build and only use the new code in the gcwsieve-intel build.

Citrix 2007-04-19 04:28

[QUOTE=geoff;103989]
If you think there is some way the SSE2 code from gcwsieve can be used in MultiSieve, let me know. I may be able to put it into the form of an external function that could be linked by MSC.

[/QUOTE]


I think this will be useful, I can give it a try. Do you have any fast assembly routines for modular 64 bit addition too? (There are none built into multisieve.) I am working on a new algorithm for this project, where I can replace the multiplications with additions.

Btw, for the 3^16 search, your program says it can calculate 6 million primes/sec. But it takes about 100 sec to do 1 billion. So it is looking at approx 600 million primes per billion. But there are not 600 million primes in a billion, as 1/2 the numbers are even. So where the error in the calculations?

geoff 2007-04-19 05:32

[QUOTE=Citrix;103990]I think this will be useful, I can give it a try. Do you have any fast assembly routines for modular 64 bit addition too? (There are none built into multisieve.) I am working on a new algorithm for this project, where I can replace the multiplications with additions.
[/QUOTE]
I don't know if there is anything fundamentally faster than this. Assuming 0 <= a,b < p < 2^63:

x = a+b;
if (x >= p) x -= p;

Two can be done in parallel with SSE2 and the `if' can use a conditional move instead of a branch. I think you will need to be adding two large arrays together, or adding a constant to each element of a large array before hand assembler will be much faster than what the C compiler will generate.

[QUOTE]
Btw, for the 3^16 search, your program says it can calculate 6 million primes/sec. But it takes about 100 sec to do 1 billion. So it is looking at approx 600 million primes per billion. But there are not 600 million primes in a billion, as 1/2 the numbers are even. So where the error in the calculations?[/QUOTE]
The p/sec figure is just the increase in p per second, not the number of primes calculated. I know it is not good notation :-(

The reason testing 3^16*2^(16*n)+1 is so fast is that most of the primes don't even have to be generated, the Sieve of Eratosthenes sieves numbers of the form 32*x+1, it doesn't even look at the others.

geoff 2007-04-20 02:55

[QUOTE=em99010pepe;103893]2º have the ability to queue up jobs
3º save it's progress on an output file
[/QUOTE]

I have added the workfile and checkpoint features in version 1.0.7, and have also added default names for the input and factors file, so now if you start it without specifying the -p -P -i or -f flags you get this behaviour:

Input sieve read from `sieve.txt'
Factors written to `factors.txt'
Checkpoints written to `checkpoint.txt'
Ranges read from `work.txt'

work.txt is the same format as sr5sieve, one range in billions per line, e.g.:
1500,1600
2000,2200

To run multiple gcwsieve processes you will either need to create a seperate directory for each process, or else specify the file names using the command line switches.

Carlos: I have made the SSE2 a little (4%) faster on my Northwood P4, can you check whether it is still slower than version 1.0.4 on your AMD64? If it is still slower then I will add the 1.0.4 code to the AMD build in the next version.

em99010pepe 2007-04-20 06:23

Geoff,

A quick test gave me still slower (5-8 kp/s less)than 1.0.4 version.

Carlos

em99010pepe 2007-04-20 08:28

Geoff,

Just tested the latest version on my P4 3.0Ghz HT and got a decrease of sieve speed from 222 kp/s to 190kp/s.

Carlos

geoff 2007-04-21 03:14

[QUOTE=em99010pepe;104083]Just tested the latest version on my P4 3.0Ghz HT and got a decrease of sieve speed from 222 kp/s to 190kp/s.
[/QUOTE]

Thanks. For comparison at p=1500e9 with the 4352 term sieve file: on my 2.9GHz HT P4 I get 290 kp/s, or 505 kp/s by running two hyperthreads.

I'm not really sure what is going on with the speeds here, some of the code is the same as is used in sr5sieve, but the main loop is different so there are more possible complications. It may be that by trying to finely optimise for my own machines, I am making code that will not run fast on any others.

em99010pepe 2007-04-21 08:24

I always had this problem even when I was running sr5sieve. One of the reasons I stopped helping was because the sieve speed was decreasing each time a new version was released.

The previous sieve speeds were measured at p=1800e9.

For my AMD 64 3000+ seems like the optimal cache are L1=16Kb and L2=256Kb, respectively. I also don't know what's going on...the only thing I can help is to test your clients on different machines.

Carlos

geoff 2007-04-21 22:55

I have made versions 1.0.4a and 1.0.6a using the SSE2 routines from 1.0.4 and 1.0.6, but otherwise they are the same as version 1.0.7.

There is one other setting that you can use to try to speed up the sieve: The -d switch allows you to set the maximum gap between exponents manually. Extra dummy terms will be added to fill any larger gaps. You can see the gap size chosen by running with the -v switch, and try something a bit smaller or larger. The optimal gap size will change with each new sieve file unfortunately.

Carlos: If you happen to remember which sr5sieve version was fastest for your machines I can make it available for download again. But as you can see here with gcwsieve, between three machines -- two of them Pentium 4's -- we have three different versions already :-).

em99010pepe 2007-04-22 09:54

Geoff,

I really can't remember which sr5sieve version was fastest on my machines. Last November I moved all machines to another project...

About gcwsieve, I think the problem here is a memory one.
1.0.4 version with L1=16 Kb and L2=256 Kb is faster (about 6kp/s) than 1.0.4a version with the same cache settings. I noticed the latter detects the cache size memory and 1.0.4 uses as default L1=16 Kb and L2=256 Kb.

Carlos

em99010pepe 2007-04-23 20:44

Geoff,

I don't know if this matters but I tried sr1sieve because of this [url=http://www.mersenneforum.org/showthread.php?t=6514]sieve project[/url] and I got my fastest times with L1 cache size of 32Kb and L2 cache size of 512Kb, the real ones for my machine.

Carlos

geoff 2007-06-10 22:18

gcwsieve 1.0.8
 
Compared to the previous version, this one is about 10% faster on my P4 and about 5% faster on my P3. There is no need to upgrade if it turns out to be slower on your machine.

On Windows the p/sec and sec/factor rates are now measured in CPU-seconds for consistency with the Unix versions. (Same as recent versions of sr5sieve). To get a valid comparison with previous versions, run them on an otherwise idle CPU.

geoff 2007-06-18 22:40

gcwsieve 1.0.9
 
Some changes to the 32-bit SSE2 assembler (taken from sr5sieve 1.5.6) have had a big effect on P4 performance. Here are some times for my 2.9 GHz P4 at p=4200e9 using the 2947 term sieve file for the 2.5-5.0 million n range:
[code]
Version Single Thread 2 Hyperthreads
------- ------------- --------------
1.0.7: 405 kp/s 700 kp/s
1.0.8: 446 kp/s (+10%) 720 kp/s (+3%)
1.0.9: 644 kp/s (+44%) 786 kp/s (+9%)
[/code]
I don't know whether other SSE2 machines will benefit by as much, or at all.

My P4 is still better off LLR testing, but my P4/Celeron would now be more productive sieving, even just on the 2.5-5 million range. (It is the same speed as a full P4 for sieving, but only half speed for LLR testing which it is doing at the moment).

hhh 2007-06-19 09:43

Great news indeed.

Sieving the 5-25M range will be done up to 100G in a couple of days, and I think we can start a new sieve drive then. Is that OK for you, or do you suggest to reopen the late 1.5-5M range?

Yours H.

Citrix 2007-06-20 02:13

[QUOTE=hhh;108541]Great news indeed.

Sieving the 5-25M range will be done up to 100G in a couple of days, and I think we can start a new sieve drive then. Is that OK for you, or do you suggest to reopen the late 1.5-5M range?

Yours H.[/QUOTE]

Since 1.5-5M has been p-1ed, it might be better if we start to sieve 5M to 25M.

Can the 1.5to 5M numbers be P+1ed? What is the command line for Prime95 to do this?:smile:

hhh 2007-06-20 13:05

P-1 has been done only to 3M. See the [URL="http://www.mersenneforum.org/showthread.php?t=7615"]reservation thread[/URL].

I decided to stop the support of sieving mainly because the human overhead was just too big.
But I can look up the P+1 command, and publish some worktodo.ini's, for 3M upwards. Would there be some interest?

H.

geoff 2007-06-21 02:10

[QUOTE=hhh;108541]Sieving the 5-25M range will be done up to 100G in a couple of days, and I think we can start a new sieve drive then. Is that OK for you, or do you suggest to reopen the late 1.5-5M range?
[/QUOTE]

I wouldn't mind sieving 2.5M < n < 5M, 4200T < p < 5000T on my P4/Celeron, it will be a little more producive than LLR testing, but if you don't want to reopen sieving for that range then that is OK, I will just continue with LLR.

hhh 2007-06-21 19:49

Go ahead. If it's a big chunk reservation it should be fine. I'll import it with the P-1 results very much later then.

The Sieving to 100 G finished today, I will do the necessary tomorrow. But I'll create a poll already. H.

geoff 2007-07-15 21:23

gcwsieve 1.0.10 (x86-64)
 
This version has a new main loop in x86-64 assembler. It should be a lot faster than previous versions for those running 64-bit Linux (is there anyone?), but the code has not been tested at all, so please check that the results match those produced by the 32-bit binary.

There is no need to upgrade if you are using the 32-bit binary.

jasong 2007-07-18 04:10

[QUOTE=geoff;110448]This version has a new main loop in x86-64 assembler. It should be a lot faster than previous versions for those running 64-bit Linux (is there anyone?), but the code has not been tested at all, so please check that the results match those produced by the 32-bit binary.

There is no need to upgrade if you are using the 32-bit binary.[/QUOTE]
I'd be willing to try it, but not until tomorrow afternoon. The soonest I can give a result, unless I have insomnia, is about 16 hours from now.

jasong 2007-07-20 03:33

Um, the sieve file seems to be screwed up. The lines have the n-value printed twice, but judging from the equation at the beginning, they should only be printed once per line.

geoff 2007-07-20 03:54

[QUOTE=jasong;110785]Um, the sieve file seems to be screwed up. The lines have the n-value printed twice, but judging from the equation at the beginning, they should only be printed once per line.[/QUOTE]

Yes it is :-(. You can delete the first column with the cut command:

$ cut -d\ -f 2,3,4 INFILE > OUTFILE

Note that there needs to be two spaces after the -d\

jasong 2007-07-20 04:41

64-bit Linux on AMD give a little over 620K a second.

geoff 2007-07-20 23:17

[QUOTE=jasong;110789]64-bit Linux on AMD give a little over 620K a second.[/QUOTE]

If possible could you send me a copy of the factors for a range of about 5G or so for double checking? Email address is in the README file.

Also, are you able to compare that with the 32-bit binary on the same machine? It is possible that the 32-bit SSE2 code is faster than the 64-bit code, and if that was the case then I could probably improve it using 32-bit SSE2 together with the extra SSE registers on the x86-64.

jasong 2007-07-21 20:53

The 64-bit code works perfectly. When I unzipped the 32-bit version to the same directory and tried to run it, the OS claimed the file didn't exist, even though the 'ls' command listed it as being there.

My command was ./gcwsieve I even verified that there were no invisible spaces in the filename.

geoff 2007-07-22 23:21

[QUOTE=jasong;110891]The 64-bit code works perfectly. When I unzipped the 32-bit version to the same directory and tried to run it, the OS claimed the file didn't exist, even though the 'ls' command listed it as being there.
[/QUOTE]

OK, that is probably because you don't have 32-bit system libraries installed.

My main concern was to check that the 64-bit code works correctly, as it hasn't been tested before, so thanks for helping with that.

jasong 2007-07-23 04:50

[QUOTE=geoff;110941]OK, that is probably because you don't have 32-bit system libraries installed.

My main concern was to check that the 64-bit code works correctly, as it hasn't been tested before, so thanks for helping with that.[/QUOTE]

So, I won't be able to run ANY 32-bit apps?

geoff 2007-07-24 00:29

[QUOTE=jasong;110950]So, I won't be able to run ANY 32-bit apps?[/QUOTE]

If the problem is missing 32-bit libraries, then I guess you will only be able to run statically linked 32-bit apps. It should be simple enoughto install the 32-bit libraries though, unless you are using a live CD or something like that.

jasong 2007-07-24 03:03

[QUOTE=geoff;111011]If the problem is missing 32-bit libraries, then I guess you will only be able to run statically linked 32-bit apps. It should be simple enoughto install the 32-bit libraries though, unless you are using a live CD or something like that.[/QUOTE]
I've got more than a week before gcwsieve completes the range, so not a big concern.

geoff 2007-07-30 23:49

In version 1.0.9 the 32-bit code was actually faster than the 64-bit code. But in version 1.0.10 the 64-bit code is faster again.

A quick test run on a primegrid range with a C2D @ 2.67GHz:
[code]
version 1.0.9 64-bit: 61 kp/s
version 1.0.11 32-bit: 83 kp/s
version 1.0.11 64-bit: 100 kp/s
[/code]

edit: from the Cullen 2M sieve, p=1000e9

rogue 2007-08-01 01:16

Version 1.0.11 on PowerPC64 (at 2.5 GHz):

457243 p/sec

:shock:

geoff 2007-08-01 02:01

[QUOTE=rogue;111456]Version 1.0.11 on PowerPC64 (at 2.5 GHz):

457243 p/sec

:shock:[/QUOTE]
Which sieve file was that with? If it is with the current 5.0M < n < 7.5M file for this project then that is a good time, but not too surprising.

For comparison a 2.9GHz P4 does about 500 kp/s on that file at p=100e9.

There is room for improvement in the ppc64 code. Currently the ppc64 uses the same method as the non-SSE2 x86 machines, which process the candidates one at a time, while the SSE2 and x86-64 code does them 4 at a time.

rogue 2007-08-01 02:32

That was with the above file (after fixing the input). It was at p=1000e9.

Are you saying it doesn't have the improvements that were done to sr2sieve and sr5sieve?

geoff 2007-08-01 03:36

[QUOTE=rogue;111463]That was with the above file (after fixing the input). It was at p=1000e9.

Are you saying it doesn't have the improvements that were done to sr2sieve and sr5sieve?[/QUOTE]

No, the main loop in gcwsieve doesn't benefit from those improvements because each new computation a*b (mod p) has new values of a and b.

The x86 and ppc64 main loop looks a bit like this:
[code]
for (i=0; i<n; i++)
X[i] = X[i] * Y[i] (mod p)
if (X[i] == Z[i])
/* Found a factor */
[/code]
With the SSE2 and (from 1.0.10) the x86-64 versions it is vectorised a bit like this:
[code]
m = n/4
for (i = 0; i < m; i++)
X[i+0*m] = X[i+0*m] * Y[i+0*m] (mod p)
...
X[i+3*m] = X[i+3*m] * Y[i+3*m] (mod p)
if (X[i+0*m] == Z[i+0*m] || .. || X[i+3*m] == Z[i+3*m])
/* Found a factor */
[/code]
The vectorisation can't be done automatically by the C compiler because the initial values of X[0], X[m], X[2*m], X[3*m] can't be inferred from the original loop. (They are computed seperately with powmod).

geoff 2007-08-01 04:23

Just to correct the previous post: Yes the improvements to the ppc64 assembler are in gcwsieve 1.0.10. They may speed up some other parts of the code, but they don't help with the main loop, there is still room for improvement there.

geoff 2007-08-03 01:24

gcwsieve 1.0.13
 
This version fixes a memory allocation bug that could cause the program to abort at the end of a sieve range, or a memory leak if there were multiple ranges queued up in the work file.

No work needs to be repeated, as all results for the range would have been written to file before the abort. The affected builds were:

Windows: versions 1.0.0 - 1.0.10.
OS X: versions 1.0.0 - 1.0.12.

The bug didn't affect the Linux builds. Thanks rogue for finding it.

geoff 2007-08-05 21:49

gcwsieve 1.0.14
 
The main loop for SSE2 and x86-64 machines is now 100% assembly instead of a mixture of C and inline assembly, and tries to read memory in a more predictable way.

The 32-bit executable runs about 15% faster on my P4, and the 64-bit executable runs about 60% faster on my C2D. (64-bit is now almost twice as fast as 32-bit on the C2D).

geoff 2007-08-08 23:49

gcwsieve 1.0.15
 
The main loop for x86 machines without SSE2 is now 100% assembly. It runs about 30% faster on my P3.

geoff 2007-08-13 05:25

gcwsieve 1.0.16
 
Version 1.0.16 has support for software prefetching, using the prefetchnta instruction available for SSE machines, or GCC's __builtin_prefetch() function for non x86/x86-64 builds.

Prefetching should result in a speedup in the case that the sieve is too large to fit in L2 cache (each sieve term takes 8 bytes), but on some machines it results in a slowdown instead, probably because it interferes with the automatic hardware prefetcher.

So before sieving starts some test runs are made with and without prefetch, and the faster method selected. Use the --verbose switch to see whether prefetch was selected. To override the automatic selection, use these new switches:

--prefetch: Force use of prefetch.
--no-prefetch: Prevent use of prefetch.


Here are some times for a 216000 term sieve (Primegrid Cullen 10M) at p=1000e9:
[code]
--no-prefetch --prefetch
P3 450MHz, 512Kb L2: 1167 p/sec 1502 p/sec +29%
P3 600MHz, 256Kb L2: 1462 p/sec 1993 p/sec +36%
P4 2.9GHz, 512Kb L2: 12224 p/sec 11711 p/sec -4%
[/code]

VolMike 2007-08-13 07:02

Could you provide executable for windows XP athlon machine?

geoff 2007-08-15 03:49

[QUOTE=VolMike;112344]Could you provide executable for windows XP athlon machine?[/QUOTE]

The executable in gcwsieve-X.Y.Z-windows-x86.zip at [url]http://www.geocities.com/g_w_reynolds/gcwsieve/[/url] should work, or is there a problem on that machine?

geoff 2007-08-18 23:19

gcwsieve 1.0.17
 
Version 1.0.17 should properly detect the availability of prefetch instructions on AMD machines with 3DNow! but without SSE. (Some earlier Athlons).

A more compact ABC file format will now be written by default. The old format will still be written if the --multisieve switch is given. Either format can be used for the input file:

Old format:
[code]
ABC $a*$b^$a$c // CW Sieved to: 100000000000 with gcwsieve
2000055 2 +1
2000110 2 +1
2000116 2 +1
2000128 2 +1
[/code]

New format:
[code]
ABC $a*2^$a+1 // CW Sieved to: 100000000000 with gcwsieve
2000055
2000110
2000116
2000128
[/code]

geoff 2007-08-22 23:05

gcwsieve 1.0.18
 
This version has two minor bugfixes:

Test for Extended 3DNow instead of just 3DNow to determine whether the prefetchnta instruction is available on AMD CPUs. This affected K6-2 CPUs.

Use the best benchmark time instead of the average benchmark time when deciding whether or not to use software prefetching. The average times could be inaccurate when there were other processes running on the same CPU.

There are also some changes to the status line display: The percentage of CPU usage (cpu_time/elapsed_time) is now reported, the status line alternates between these two sets of stats:
[code]
p=1071802477019, 249775 p/sec, 16 factors, 100.0% cpu, 2953 sec/factor
p=1071817422251, 249836 p/sec, 16 factors, 16.9% done, ETA 24 Aug 14:23
[/code]

And there are two new switches change the information displayed on the status line:

-R --report-primes

Reports primes/sec (the number of prime factors tested per second) instead of p/sec (the increase in p per second).

-e --elapsed-time

Reports p/sec, primes/sec, and sec/factor using elapsed time instead of CPU time.

geoff 2007-09-02 03:09

gcwsieve 1.0.20 (x86-64)
 
The x86-64 executable now has seperate code paths optimised for Intel (Core 2) and AMD (Athlon 64) CPUs. The Athlon 64 code should be about 15% faster than previous versions. Thanks to jmblazek for testing it.

The appropriate code path should be selected automatically, but can be overridden with the --amd or --intel command-line switches.


All times are UTC. The time now is 09:02.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.