mersenneforum.org Sr2sieve on PPC/Linux
 Register FAQ Search Today's Posts Mark Forums Read

2007-02-17, 02:26   #100
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by Greenbank And the program is not optimised for Riesel at all. I wanted Sierpinski sieving to be as fast as possible. Sorting it out properly for Riesel is on my big list of stuff to do.
I think part of the reason that sr2sieve does well with riesel.dat (I hear that it is even faster at riesel.dat than JJsieve on x86) is not so much the large number of k, but more because of the narrower range of n. Perhaps because it puts less effort into trying to reduce the work done in BSGS, as the range of n widens and BSGS becomes more expensive, so sr2sieve becomes slower than proth_sieve.

These are some timings for sr2sieve 1.4.x vs proth_sieve 0.42 done on two of my machines, both running Debian Linux.

I tested at p=100e12 (100T) because the proth_sieve speed starts to drop when p becomes too much larger than this, and I think this may be a problem with the code rather than a true indication of performance. (The speed per p should increase as p increases, as there are fewer primes to test).

Times are kp/s (1000's increase in p per CPU second) to 3 s.f. where known. The hyperthreaded times were taken by running two instances of the program and adding the kp/s times for both.

Pentium 3 @ 600MHz (Coppermine EB, 16Kb L1, 256Kb L2), p=100e12
Code:
                             8k SoB.dat    19k SoB.dat    69k riesel.dat
----------    -----------    --------------
proth_sieve_cmov 0.42        151           86             31
sr2sieve-i686 1.4.18         122           75.9           45.6
sr2sieve-i686 1.4.21         138           81.5           47.0
sr2sieve-i686 1.4.23         145           85.4           48.9
Pentium 4 @ 2.9GHz (Northwood C, 8Kb L1, 512Kb L2), p=100e12
Code:
Single thread                8k SoB.dat    19k SoB.dat    69k riesel.dat
-------------                ----------    -----------    --------------
proth_sieve_sse2 0.42        342           201            82
sr2sieve-pentium4 1.4.18     279           177            107
sr2sieve-pentium4 1.4.21     318           189            113
sr2sieve-pentium4 1.4.23     328           197            116

Two hyperthreads             8k SoB.dat    19k SoB.dat    69k riesel.dat
----------------             ----------    -----------    --------------
proth_sieve_sse2 0.42        554           330            130
sr2sieve-pentium4 1.4.18     413           262            157
sr2sieve-pentium4 1.4.21     469           279            162
sr2sieve-pentium4 1.4.23     488           288            167

 2007-02-20, 12:00 #101 Greenbank     Jul 2005 2·193 Posts That looks great Geoff. I hope my message didn't come over as "my sieve is faster than yours", it certainly wasn't meant that way. If we work together and share code/results we can make each others code even faster!
2007-02-21, 03:58   #102
geoff

Mar 2003
New Zealand

22058 Posts

Quote:
 Originally Posted by Greenbank I hope my message didn't come over as "my sieve is faster than yours", it certainly wasn't meant that way.
Not at all :-) I just found it interesting that the SoB.dat times could be so much faster than the riesel.dat times, when for sr2sieve it is the other way around.

I don't know how much of that is due to the effort to make proth sieve run fast for SoB.dat without regard to riesel.dat speed, and how much is because of differences between the proth sieve and sr2sieve algorithms.

I suspect that sr2sieve does a lot less work in trying to eliminate candidates before running BSGS, and that may be a better approach when the range of n is small. The 20 million range of riesel.dat vs the 50 million range of SoB.dat could be the important factor, rather than the number of k in the sieve.

 2007-03-09, 00:07 #103 geoff     Mar 2003 New Zealand 100100001012 Posts Does anyone know how to detect the size of the L1 and L2 data cache on ppc64? Is 32Kb L1, 512Kb L2 a reasonable default if it can't be detected?
 2007-03-09, 03:27 #104 BlisteringSheep     Oct 2006 On a Suzuki Boulevard C90 F616 Posts Geoff, That's what it has been for every ppc64 that I've encountered. I don't know an easy way to do it for Linux; there are external tools but they can't be depended upon. For example, on my home PowerMac, /proc/cpuinfo shows the 512K unified L2 cache, but doesn't mention the L1. lshw says that the same machine has 128 terabytes of L1 and 2 petabytes of L2. However, lshw on the IBM blades shows the correct L1 & L2.
2007-03-09, 13:21   #105
rogue

"Mark"
Apr 2003
Between here and the

26·103 Posts

Quote:
 Originally Posted by geoff Does anyone know how to detect the size of the L1 and L2 data cache on ppc64? Is 32Kb L1, 512Kb L2 a reasonable default if it can't be detected?
For 64-bit PowerPC CPUs, 512Kb is the minimum L2 cache size. Some have 1Mb. I don't know if there is an easy way to determine the L2 cache size.

 2007-03-09, 15:19 #106 Greenbank     Jul 2005 1100000102 Posts MacOS X command line:- sysctl hw.l1icachesize sysctl hw.l1dcachesize sysctl hw.l2cachesize So I'm guessing there'll be somewhere in the sysctl() function call...indeed, in /usr/include/sys/sysctl.h #define HW_L1ICACHESIZE 17 /* int: L1 I Cache Size in Bytes */ #define HW_L1DCACHESIZE 18 /* int: L1 D Cache Size in Bytes */ #define HW_L2SETTINGS 19 /* int: L2 Cache Settings */ #define HW_L2CACHESIZE 20 /* int: L2 Cache Size in Bytes */ #define HW_L3SETTINGS 21 /* int: L3 Cache Settings */ #define HW_L3CACHESIZE 22 /* int: L3 Cache Size in Bytes */ Don't have any time right now to knock up an example program but the stuff on the sysctl() man page (on MacOS X) should help. [EDIT] For my Quad G5 (2.5GHz PPC) I've got 64KB L1 instruction cache, 32KB L2 data cache and 1MB L2 Cache (per cpu). Last fiddled with by Greenbank on 2007-03-09 at 15:20
 2007-03-09, 15:37 #107 Greenbank     Jul 2005 6028 Posts Must be compiled with -m64 Only tested on MacOS X on 64-bit PPC, not Linux (not sure if the sysctl interface is the same). Code: #include #include #include int main(void) { int64_t i; int ret; size_t len=8; ret=sysctlbyname( "hw.l1icachesize", &i, &len, NULL, 0 ); if( ret == -1 ) { perror( "sysctl:" ); } else { printf( "l1icachesize=%d\n", i ); } ret=sysctlbyname( "hw.l1dcachesize", &i, &len, NULL, 0 ); if( ret == -1 ) { perror( "sysctl:" ); } else { printf( "l1dcachesize=%d\n", i ); } ret=sysctlbyname( "hw.l2cachesize", &i, &len, NULL, 0 ); if( ret == -1 ) { perror( "sysctl:" ); } else { printf( "l2cachesize=%d\n", i ); } return(0); } l1icachesize=65536 l1dcachesize=32768 l2cachesize=1048576 which matches the real output. Last fiddled with by Greenbank on 2007-03-09 at 15:49
2007-03-09, 22:05   #108
geoff

Mar 2003
New Zealand

13×89 Posts

Quote:
 Originally Posted by Greenbank Only tested on MacOS X on 64-bit PPC, not Linux (not sure if the sysctl interface is the same).
Thanks, I'll use this in the next version, with a 32Kb/512Kb default if sysctl fails or the detected value doesn't make sense.

2007-03-10, 04:47   #109
BlisteringSheep

Oct 2006
On a Suzuki Boulevard C90

2·3·41 Posts

Quote:
 Originally Posted by Greenbank Only tested on MacOS X on 64-bit PPC, not Linux (not sure if the sysctl interface is the same).
Unfortunately, the interfaces are very different, and the Linux version provides completely different information.

Quote:
 Originally Posted by geoff Thanks, I'll use this in the next version, with a 32Kb/512Kb default if sysctl fails or the detected value doesn't make sense.
As an aside, I did figure out what's wrong with lshw; it reads the sizes into an unsigned long, but they're only int long.

Here's an ugly, probably non-portable hack:
Code:
#include <stdio.h>

int main(void)
{
FILE *fp;
unsigned data = 0;

fp = fopen("/proc/device-tree/cpus/PowerPC,970@0/d-cache-size","r");
fclose(fp);
printf("d-cache-size: %d\n", data);

fp = fopen("/proc/device-tree/cpus/PowerPC,970@0/i-cache-size","r");
fclose(fp);
printf("i-cache-size: %d\n", data);

fp = fopen("/proc/device-tree/cpus/PowerPC,970@0/l2-cache/d-cache-size","r");
fclose(fp);
printf("l2-cache/d-cache-size: %d\n", data);

return(0);
}

2007-03-13, 03:29   #110
geoff

Mar 2003
New Zealand

22058 Posts

Quote:
 Originally Posted by BlisteringSheep fp = fopen("/proc/device-tree/cpus/PowerPC,970@0/d-cache-size","r"); fread(&data, sizeof(data), 1, fp);
Are the sizes in bytes or kilobytes?

Do you know which compiler symbols I should test to decide whether this code should be included? I assume __linux__ and __powerpc64__ and one other for the CPU type.

 Similar Threads Thread Thread Starter Forum Replies Last Post rogue Software 304 2021-11-06 13:51 pepi37 Software 5 2013-08-09 22:31 SaneMur Information & Answers 2 2011-08-21 22:04 mgpower0 Prime Sierpinski Project 54 2008-07-15 16:50 nuggetprime Riesel Prime Search 40 2007-12-03 06:01

All times are UTC. The time now is 15:54.

Wed May 18 15:54:54 UTC 2022 up 34 days, 13:56, 1 user, load averages: 2.44, 2.23, 2.21