![]() |
![]() |
#1 |
P90 years forever!
Aug 2002
Yeehaw, FL
825410 Posts |
![]()
The SSE2 FFTs were coded for the Willamette's 256KB L2 cache. To better utilize the various L2 cache sizes now available, I've added some alternate implementations of each FFT size.
It would be helpful if I could get some benchmarks from some architectures I don't have access to. Namely P4's with other than 512KB L2 cache and AMD64's having other than 1MB of L2 cache. Download and unzip ftp://mersenne.org/gimps/p95tst.zip. Add the lines "AllBench=1" and "FullBench=1" to prime.ini. Do a benchmark and post the results from results.txt here or email the file to me. The benchmark could take a half hour or more and if you have too little RAM will cause thrashing (stop the benchmark if that happens). Oh, and DO NOT use this version to run any LL tests. Install it in a separate directory, run the benchmark with the prime.ini change, and then delete the executable. Thanks for the help. Last fiddled with by Prime95 on 2005-05-26 at 06:35 |
![]() |
![]() |
![]() |
#2 |
Nov 2002
Anchorage, AK
1011001012 Posts |
![]()
IBM ThinkPad T42p (2.1 GHz Dothan Pentium M, 1GB RAM)
[Wed May 25 22:00:58 2005] Compare your results to other computers at http://www.mersenne.org/bench.htm That web page also contains instructions on how your results can be included. Intel(R) Pentium(R) M processor 2.10GHz CPU speed: 2093.16 MHz CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2 L1 cache size: 32 KB L2 cache size: 2048 KB L1 cache line size: 64 bytes L2 cache line size: 64 bytes TLBS: 128 Prime95 32-bit version 24.12, RdtscTiming=1 Time FFTlen=512K, Levels2=11, clm=4: 35.677 ms. Time FFTlen=512K, Levels2=11, clm=4: 37.461 ms. Time FFTlen=640K, Levels2=12, clm=4: 45.008 ms. Time FFTlen=640K, Levels2=11, clm=8: 45.479 ms. Time FFTlen=640K, Levels2=11, clm=4: 45.978 ms. Time FFTlen=640K, Levels2=11, clm=2: 45.766 ms. Time FFTlen=640K, Levels2=8, clm=1: 50.175 ms. Time FFTlen=768K, Levels2=12, clm=4: 55.386 ms. Time FFTlen=768K, Levels2=11, clm=4: 56.552 ms. Time FFTlen=768K, Levels2=11, clm=2: 56.433 ms. Time FFTlen=768K, Levels2=8, clm=1: 62.710 ms. Time FFTlen=896K, Levels2=12, clm=4: 66.374 ms. Time FFTlen=896K, Levels2=11, clm=4: 67.894 ms. Time FFTlen=896K, Levels2=11, clm=2: 68.190 ms. Time FFTlen=896K, Levels2=11, clm=1: 71.276 ms. Time FFTlen=896K, Levels2=8, clm=1: 75.151 ms. Time FFTlen=1024K, Levels2=12, clm=4: 74.469 ms. Time FFTlen=1024K, Levels2=11, clm=4: 76.846 ms. Time FFTlen=1024K, Levels2=11, clm=2: 76.847 ms. Time FFTlen=1024K, Levels2=11, clm=1: 80.142 ms. Time FFTlen=1024K, Levels2=8, clm=1: 84.767 ms. Time FFTlen=1280K, Levels2=13, clm=4: 96.482 ms. Time FFTlen=1280K, Levels2=12, clm=4: 95.168 ms. Time FFTlen=1280K, Levels2=11, clm=4: 99.008 ms. Time FFTlen=1280K, Levels2=11, clm=2: 99.660 ms. Time FFTlen=1280K, Levels2=11, clm=1: 104.348 ms. Time FFTlen=1536K, Levels2=13, clm=4: 118.248 ms. Time FFTlen=1536K, Levels2=12, clm=4: 116.078 ms. Time FFTlen=1536K, Levels2=11, clm=4: 121.289 ms. Time FFTlen=1536K, Levels2=11, clm=4: 121.673 ms. Time FFTlen=1536K, Levels2=11, clm=2: 121.707 ms. Time FFTlen=1536K, Levels2=11, clm=1: 127.805 ms. Time FFTlen=1792K, Levels2=13, clm=4: 141.526 ms. Time FFTlen=1792K, Levels2=12, clm=4: 139.139 ms. Time FFTlen=1792K, Levels2=11, clm=4: 144.485 ms. Time FFTlen=1792K, Levels2=11, clm=2: 145.995 ms. Time FFTlen=1792K, Levels2=11, clm=1: 153.673 ms. Time FFTlen=1792K, Levels2=11, clm=1: 154.764 ms. Time FFTlen=2048K, Levels2=13, clm=4: 158.784 ms. Time FFTlen=2048K, Levels2=12, clm=4: 156.618 ms. Time FFTlen=2048K, Levels2=11, clm=4: 162.955 ms. Time FFTlen=2048K, Levels2=11, clm=2: 163.839 ms. Time FFTlen=2048K, Levels2=11, clm=1: 172.666 ms. Time FFTlen=2048K, Levels2=11, clm=1: 174.661 ms. Time FFTlen=2560K, Levels2=13, clm=4: 202.275 ms. Time FFTlen=2560K, Levels2=12, clm=4: 201.542 ms. Time FFTlen=2560K, Levels2=12, clm=2: 204.774 ms. Time FFTlen=2560K, Levels2=11, clm=2: 209.470 ms. Time FFTlen=2560K, Levels2=11, clm=1: 219.825 ms. Time FFTlen=2560K, Levels2=11, clm=1: 222.471 ms. Time FFTlen=3072K, Levels2=13, clm=4: 246.571 ms. Time FFTlen=3072K, Levels2=12, clm=4: 246.813 ms. Time FFTlen=3072K, Levels2=12, clm=2: 250.427 ms. Time FFTlen=3072K, Levels2=11, clm=2: 258.232 ms. Time FFTlen=3072K, Levels2=11, clm=1: 269.098 ms. [Wed May 25 22:06:01 2005] Time FFTlen=3072K, Levels2=11, clm=1: 273.834 ms. Time FFTlen=3584K, Levels2=13, clm=4: 294.792 ms. Time FFTlen=3584K, Levels2=12, clm=4: 293.923 ms. Time FFTlen=3584K, Levels2=12, clm=2: 299.520 ms. Time FFTlen=3584K, Levels2=11, clm=2: 308.745 ms. Time FFTlen=3584K, Levels2=11, clm=1: 324.570 ms. Time FFTlen=3584K, Levels2=11, clm=1: 329.313 ms. Time FFTlen=4096K, Levels2=13, clm=4: 332.280 ms. Time FFTlen=4096K, Levels2=13, clm=2: 337.007 ms. Time FFTlen=4096K, Levels2=12, clm=4: 330.752 ms. Time FFTlen=4096K, Levels2=12, clm=2: 335.654 ms. Time FFTlen=4096K, Levels2=11, clm=2: 346.815 ms. Time FFTlen=4096K, Levels2=11, clm=1: 366.021 ms. Time FFTlen=4096K, Levels2=11, clm=1: 372.843 ms. Best time for 58 bit trial factors: 6.281 ms. Best time for 59 bit trial factors: 6.302 ms. Best time for 60 bit trial factors: 6.278 ms. Best time for 61 bit trial factors: 6.256 ms. Best time for 62 bit trial factors: 8.935 ms. Best time for 63 bit trial factors: 8.944 ms. Best time for 64 bit trial factors: 15.385 ms. Best time for 65 bit trial factors: 15.329 ms. Best time for 66 bit trial factors: 15.254 ms. Best time for 67 bit trial factors: 15.242 ms. |
![]() |
![]() |
![]() |
#3 |
"6800 descendent"
Feb 2005
Colorado
5×149 Posts |
![]()
[Wed May 25 23:57:00 2005]
Compare your results to other computers at http://www.mersenne.org/bench.htm That web page also contains instructions on how your results can be included. Intel(R) Celeron(R) CPU 2.80GHz CPU speed: 2799.45 MHz CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2 L1 cache size: 8 KB L2 cache size: 128 KB L1 cache line size: 64 bytes L2 cache line size: 128 bytes TLBS: 64 Prime95 32-bit version 24.12, RdtscTiming=1 Time FFTlen=512K, Levels2=11, clm=4: 50.355 ms. Time FFTlen=512K, Levels2=11, clm=4: 46.070 ms. Time FFTlen=640K, Levels2=12, clm=4: 99.994 ms. Time FFTlen=640K, Levels2=11, clm=8: 86.964 ms. Time FFTlen=640K, Levels2=11, clm=4: 69.750 ms. Time FFTlen=640K, Levels2=11, clm=2: 60.770 ms. Time FFTlen=640K, Levels2=8, clm=1: 134.716 ms. Time FFTlen=768K, Levels2=12, clm=4: 120.820 ms. Time FFTlen=768K, Levels2=11, clm=4: 90.124 ms. Time FFTlen=768K, Levels2=11, clm=2: 81.473 ms. Time FFTlen=768K, Levels2=8, clm=1: 178.886 ms. Time FFTlen=896K, Levels2=12, clm=4: 152.102 ms. Time FFTlen=896K, Levels2=11, clm=4: 115.429 ms. Time FFTlen=896K, Levels2=11, clm=2: 104.153 ms. Time FFTlen=896K, Levels2=11, clm=1: 70.295 ms. Time FFTlen=896K, Levels2=8, clm=1: 228.997 ms. Time FFTlen=1024K, Levels2=12, clm=4: 180.197 ms. Time FFTlen=1024K, Levels2=11, clm=4: 139.551 ms. Time FFTlen=1024K, Levels2=11, clm=2: 124.928 ms. Time FFTlen=1024K, Levels2=11, clm=1: 83.740 ms. Time FFTlen=1024K, Levels2=8, clm=1: 278.085 ms. Time FFTlen=1280K, Levels2=13, clm=4: 303.984 ms. Time FFTlen=1280K, Levels2=12, clm=4: 233.396 ms. Time FFTlen=1280K, Levels2=11, clm=4: 211.137 ms. Time FFTlen=1280K, Levels2=11, clm=2: 175.252 ms. Time FFTlen=1280K, Levels2=11, clm=1: 130.983 ms. Time FFTlen=1536K, Levels2=13, clm=4: 371.218 ms. Time FFTlen=1536K, Levels2=12, clm=4: 296.462 ms. Time FFTlen=1536K, Levels2=11, clm=4: 267.498 ms. Time FFTlen=1536K, Levels2=11, clm=4: 267.168 ms. Time FFTlen=1536K, Levels2=11, clm=2: 218.874 ms. Time FFTlen=1536K, Levels2=11, clm=1: 173.738 ms. Time FFTlen=1792K, Levels2=13, clm=4: 448.385 ms. Time FFTlen=1792K, Levels2=12, clm=4: 361.122 ms. Time FFTlen=1792K, Levels2=11, clm=4: 356.173 ms. [Thu May 26 00:02:05 2005] Time FFTlen=1792K, Levels2=11, clm=2: 294.110 ms. Time FFTlen=1792K, Levels2=11, clm=1: 250.506 ms. Time FFTlen=1792K, Levels2=11, clm=1: 203.129 ms. Time FFTlen=2048K, Levels2=13, clm=4: 521.893 ms. Time FFTlen=2048K, Levels2=12, clm=4: 430.165 ms. Time FFTlen=2048K, Levels2=11, clm=4: 421.714 ms. Time FFTlen=2048K, Levels2=11, clm=2: 333.901 ms. Time FFTlen=2048K, Levels2=11, clm=1: 296.267 ms. Time FFTlen=2048K, Levels2=11, clm=1: 240.527 ms. Time FFTlen=2560K, Levels2=13, clm=4: 699.221 ms. Time FFTlen=2560K, Levels2=12, clm=4: 600.006 ms. Time FFTlen=2560K, Levels2=12, clm=2: 535.369 ms. Time FFTlen=2560K, Levels2=11, clm=2: 482.100 ms. Time FFTlen=2560K, Levels2=11, clm=1: 407.768 ms. Time FFTlen=2560K, Levels2=11, clm=1: 324.119 ms. Time FFTlen=3072K, Levels2=13, clm=4: 882.695 ms. Time FFTlen=3072K, Levels2=12, clm=4: 739.638 ms. Time FFTlen=3072K, Levels2=12, clm=2: 683.837 ms. Time FFTlen=3072K, Levels2=11, clm=2: 630.538 ms. Time FFTlen=3072K, Levels2=11, clm=1: 528.111 ms. Time FFTlen=3072K, Levels2=11, clm=1: 429.357 ms. Time FFTlen=3584K, Levels2=13, clm=4: 1066.447 ms. Time FFTlen=3584K, Levels2=12, clm=4: 977.193 ms. [Thu May 26 00:07:19 2005] Time FFTlen=3584K, Levels2=12, clm=2: 900.288 ms. Time FFTlen=3584K, Levels2=11, clm=2: 847.074 ms. Time FFTlen=3584K, Levels2=11, clm=1: 739.278 ms. Time FFTlen=3584K, Levels2=11, clm=1: 568.176 ms. Time FFTlen=4096K, Levels2=13, clm=4: 1245.311 ms. Time FFTlen=4096K, Levels2=13, clm=2: 1047.303 ms. Time FFTlen=4096K, Levels2=12, clm=4: 1136.646 ms. Time FFTlen=4096K, Levels2=12, clm=2: 1044.060 ms. Time FFTlen=4096K, Levels2=11, clm=2: 1002.227 ms. Time FFTlen=4096K, Levels2=11, clm=1: 844.239 ms. Time FFTlen=4096K, Levels2=11, clm=1: 659.219 ms. Best time for 58 bit trial factors: 10.510 ms. Best time for 59 bit trial factors: 10.514 ms. Best time for 60 bit trial factors: 10.425 ms. Best time for 61 bit trial factors: 10.463 ms. Best time for 62 bit trial factors: 11.447 ms. Best time for 63 bit trial factors: 11.514 ms. Best time for 64 bit trial factors: 13.937 ms. Best time for 65 bit trial factors: 13.848 ms. Best time for 66 bit trial factors: 13.858 ms. Best time for 67 bit trial factors: 13.823 ms. |
![]() |
![]() |
![]() |
#4 |
"6800 descendent"
Feb 2005
Colorado
2E916 Posts |
![]()
Compare your results to other computers at http://www.mersenne.org/bench.htm
That web page also contains instructions on how your results can be included. Genuine Intel(R) CPU 3.40GHz CPU speed: 3415.28 MHz CPU features: RDTSC, CMOV, Prefetch, MMX, SSE, SSE2 L1 cache size: 16 KB L2 cache size: 1024 KB L1 cache line size: 64 bytes L2 cache line size: 128 bytes TLBS: 64 Prime95 32-bit version 24.12, RdtscTiming=1 Time FFTlen=512K, Levels2=11, clm=4: 15.177 ms. Time FFTlen=512K, Levels2=11, clm=4: 16.171 ms. Time FFTlen=640K, Levels2=12, clm=4: 18.936 ms. Time FFTlen=640K, Levels2=11, clm=8: 19.371 ms. Time FFTlen=640K, Levels2=11, clm=4: 19.146 ms. Time FFTlen=640K, Levels2=11, clm=2: 19.624 ms. Time FFTlen=640K, Levels2=8, clm=1: 21.059 ms. Time FFTlen=768K, Levels2=12, clm=4: 23.107 ms. Time FFTlen=768K, Levels2=11, clm=4: 23.163 ms. Time FFTlen=768K, Levels2=11, clm=2: 23.711 ms. Time FFTlen=768K, Levels2=8, clm=1: 25.725 ms. Time FFTlen=896K, Levels2=12, clm=4: 27.780 ms. Time FFTlen=896K, Levels2=11, clm=4: 27.892 ms. Time FFTlen=896K, Levels2=11, clm=2: 28.479 ms. Time FFTlen=896K, Levels2=11, clm=1: 29.659 ms. Time FFTlen=896K, Levels2=8, clm=1: 30.929 ms. Time FFTlen=1024K, Levels2=12, clm=4: 31.156 ms. Time FFTlen=1024K, Levels2=11, clm=4: 31.423 ms. Time FFTlen=1024K, Levels2=11, clm=2: 31.813 ms. Time FFTlen=1024K, Levels2=11, clm=1: 32.816 ms. Time FFTlen=1024K, Levels2=8, clm=1: 35.094 ms. Time FFTlen=1280K, Levels2=13, clm=4: 40.932 ms. Time FFTlen=1280K, Levels2=12, clm=4: 39.088 ms. Time FFTlen=1280K, Levels2=11, clm=4: 40.970 ms. Time FFTlen=1280K, Levels2=11, clm=2: 42.155 ms. Time FFTlen=1280K, Levels2=11, clm=1: 43.482 ms. Time FFTlen=1536K, Levels2=13, clm=4: 49.742 ms. Time FFTlen=1536K, Levels2=12, clm=4: 47.431 ms. Time FFTlen=1536K, Levels2=11, clm=4: 49.761 ms. Time FFTlen=1536K, Levels2=11, clm=4: 49.636 ms. Time FFTlen=1536K, Levels2=11, clm=2: 50.868 ms. Time FFTlen=1536K, Levels2=11, clm=1: 52.562 ms. Time FFTlen=1792K, Levels2=13, clm=4: 60.016 ms. Time FFTlen=1792K, Levels2=12, clm=4: 57.282 ms. Time FFTlen=1792K, Levels2=11, clm=4: 59.985 ms. Time FFTlen=1792K, Levels2=11, clm=2: 61.189 ms. Time FFTlen=1792K, Levels2=11, clm=1: 63.484 ms. Time FFTlen=1792K, Levels2=11, clm=1: 74.613 ms. Time FFTlen=2048K, Levels2=13, clm=4: 67.241 ms. Time FFTlen=2048K, Levels2=12, clm=4: 64.142 ms. Time FFTlen=2048K, Levels2=11, clm=4: 66.992 ms. Time FFTlen=2048K, Levels2=11, clm=2: 67.893 ms. Time FFTlen=2048K, Levels2=11, clm=1: 70.398 ms. Time FFTlen=2048K, Levels2=11, clm=1: 83.118 ms. Time FFTlen=2560K, Levels2=13, clm=4: 84.546 ms. Time FFTlen=2560K, Levels2=12, clm=4: 83.883 ms. Time FFTlen=2560K, Levels2=12, clm=2: 85.977 ms. Time FFTlen=2560K, Levels2=11, clm=2: 85.382 ms. Time FFTlen=2560K, Levels2=11, clm=1: 88.231 ms. Time FFTlen=2560K, Levels2=11, clm=1: 105.568 ms. Time FFTlen=3072K, Levels2=13, clm=4: 102.246 ms. Time FFTlen=3072K, Levels2=12, clm=4: 101.433 ms. [Thu May 26 00:32:13 2005] Time FFTlen=3072K, Levels2=12, clm=2: 103.427 ms. Time FFTlen=3072K, Levels2=11, clm=2: 106.629 ms. Time FFTlen=3072K, Levels2=11, clm=1: 109.385 ms. Time FFTlen=3072K, Levels2=11, clm=1: 130.107 ms. Time FFTlen=3584K, Levels2=13, clm=4: 122.885 ms. Time FFTlen=3584K, Levels2=12, clm=4: 122.558 ms. Time FFTlen=3584K, Levels2=12, clm=2: 124.972 ms. Time FFTlen=3584K, Levels2=11, clm=2: 128.695 ms. Time FFTlen=3584K, Levels2=11, clm=1: 132.744 ms. Time FFTlen=3584K, Levels2=11, clm=1: 156.190 ms. Time FFTlen=4096K, Levels2=13, clm=4: 137.744 ms. Time FFTlen=4096K, Levels2=13, clm=2: 140.119 ms. Time FFTlen=4096K, Levels2=12, clm=4: 135.801 ms. Time FFTlen=4096K, Levels2=12, clm=2: 138.502 ms. Time FFTlen=4096K, Levels2=11, clm=2: 142.473 ms. Time FFTlen=4096K, Levels2=11, clm=1: 147.108 ms. Time FFTlen=4096K, Levels2=11, clm=1: 173.449 ms. Best time for 58 bit trial factors: 8.034 ms. Best time for 59 bit trial factors: 8.072 ms. Best time for 60 bit trial factors: 8.047 ms. Best time for 61 bit trial factors: 8.070 ms. Best time for 62 bit trial factors: 11.317 ms. Best time for 63 bit trial factors: 11.295 ms. Best time for 64 bit trial factors: 12.835 ms. Best time for 65 bit trial factors: 12.904 ms. Best time for 66 bit trial factors: 13.038 ms. Best time for 67 bit trial factors: 12.960 ms. |
![]() |
![]() |
![]() |
#5 |
P90 years forever!
Aug 2002
Yeehaw, FL
2·4,127 Posts |
![]()
Sorry,
![]() I only wrote new code for larger L2 caches. Is the 128KB Celeron a big enough seller that I should write code for it? I thought most Celerons had a 256KB L2 cache. I think this might get a 5% improvement for smaller FFT sizes (1024K and under) Last fiddled with by Prime95 on 2005-05-26 at 07:12 |
![]() |
![]() |
![]() |
#6 |
P90 years forever!
Aug 2002
Yeehaw, FL
2·4,127 Posts |
![]()
PhilF, please download a new p95tst and run it on the 128KB Celeron. I've added some new options for FFTs between 512K and 1536K.
On a 1024K FFT, your times are slower than my 1.4GHz Willamette. Something screwy is going on, I might need you to run some more tests after this one. |
![]() |
![]() |
![]() |
#7 |
P90 years forever!
Aug 2002
Yeehaw, FL
2×4,127 Posts |
![]()
delta_t, can you download ftp://mersenne.org/gimps/p95tst2.zip and try your benchmark again. This version issues twice as many prefetch instructions to make up for your Pentium M's smaller L2 cache line size. I'm curious as to how much this will help.
Last fiddled with by Prime95 on 2005-05-26 at 08:29 |
![]() |
![]() |
![]() |
#8 |
Aug 2002
Termonfeckin, IE
53208 Posts |
![]()
George,
A small question. Are you planning to move the default cutoff for LL and DC for 24.12? doublechecking is slowly but surely falling behing and a 1.2GHz Athlon just doesn't cut in on a 29.8M test any more. Thanks |
![]() |
![]() |
![]() |
#9 |
May 2005
22·11·37 Posts |
![]()
First setup: 3400+ (default speed, Newcastle CG core)
|
![]() |
![]() |
![]() |
#10 |
May 2005
22·11·37 Posts |
![]()
Second setup: 3500+ (OC 10x252, Venice E3 core)
|
![]() |
![]() |
![]() |
#11 | |
Aug 2002
North San Diego Coun
24×3×17 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Benchmarks | MurrayInfoSys | Information & Answers | 3 | 2011-04-14 17:10 |
LLR benchmarks | Oddball | No Prime Left Behind | 11 | 2010-08-06 21:39 |
benchmarks | Unregistered | Information & Answers | 15 | 2009-08-18 16:44 |
Benchmarks for i7 965 | lavalamp | Hardware | 21 | 2009-01-06 04:32 |
Benchmarks | Vandy | Hardware | 6 | 2002-10-28 13:45 |