![]() |
![]() |
#155 |
"Sander"
Oct 2002
52.345322,5.52471
4A516 Posts |
![]()
I've tried the newest 64bit core2 version from Jeff's site and tested it against the above c85
Code:
GMP-ECM 6.2.2 [powered by GMP 4.2.1_MPIR_1.0.0] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=3473972786 Step 1 took 8673ms Step 2 took 7332ms Code:
GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=1798745233 Step 1 took 7872ms Step 2 took 4660ms |
![]() |
![]() |
![]() |
#156 | |
Jun 2003
Ottawa, Canada
7×167 Posts |
![]() Quote:
The Windows MSVC code uses a different set of assembler than the Linux code so it doesn't surprise me that the timing is different. If you choose the same sigma for both your Windows and Linux tests, and choose a larger B2 value so the test runs a little longer do you still see the huge difference? Try running each test twice just to make sure the numbers are similar in case your system decided to do something during the test and artificially slowed down the benchmark for one. Jeff. Last fiddled with by Jeff Gilchrist on 2009-04-07 at 14:59 |
|
![]() |
![]() |
![]() |
#157 |
"Sander"
Oct 2002
52.345322,5.52471
29×41 Posts |
![]()
I see that you used a B1=300M, i used 3M.
I wasn't comparing directly with your run. I did two runs on my laptop (Core2duo T7800 @2,6GHz) on both the host (64-bit Vista) and a VM (64-bit Ubuntu 8.10). |
![]() |
![]() |
![]() |
#158 |
Sep 2005
Berlin
1028 Posts |
![]()
@smh:
Could you please post this binary and/or compare it with my 64-bit binary? I found binaries optimised for Athlon64 are even faster on Core2, in comparison to Core2-optimised ones. |
![]() |
![]() |
![]() |
#159 | |
Jun 2003
Ottawa, Canada
7·167 Posts |
![]()
Ah, that would explain the difference.
![]() Quote:
As I said before, Brian Gladman had to translate the assembler from the syntax used by GCC to the one that YASM (used in the MSVC) build understands. I think he said that some of the code in the linux source is still newer than what he has translated. Since I'm not familiar with the code, I'm not sure why there is such a big difference. Jeff. |
|
![]() |
![]() |
![]() |
#160 | |
"Sander"
Oct 2002
52.345322,5.52471
29·41 Posts |
![]() Quote:
I did limited testing, but with larger composites yours might also be faster in step 1. Notice i used GMP-ECM 6.2.2 and GMP 4.2.4 (with the core2 patch), so it might be apples and oranges. With B1=3M Code:
GMP-ECM 6.2.1 [powered by GMP 4.2.3] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=959787799 Step 1 took 8008ms Step 2 took 4496ms Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=1211299266 Step 1 took 7865ms Step 2 took 4328ms Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=573230298 Step 1 took 7989ms Step 2 took 4340ms GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=3000000, B2=5706890290, polynomial Dickson(6), sigma=937001321 Step 1 took 7808ms Step 2 took 4500ms Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=1410435444 Step 1 took 7773ms Step 2 took 4500ms Using B1=3000000, B2=3000000-5706890290, polynomial Dickson(6), sigma=3426145601 Step 1 took 7921ms Step 2 took 4500ms Code:
GMP-ECM 6.2.1 [powered by GMP 4.2.3] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1064336844 Step 1 took 29329ms Step 2 took 14061ms Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=3355605506 Step 1 took 28858ms Step 2 took 14157ms Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=191990272 Step 1 took 29342ms Step 2 took 14181ms GMP-ECM 6.2.2 [powered by GMP 4.2.4] [ECM] Input number is 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 (85 digits) Using B1=11000000, B2=35133391030, polynomial Dickson(12), sigma=1387859769 Step 1 took 28389ms Step 2 took 14777ms Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=4281716356 Step 1 took 27850ms Step 2 took 14685ms Using B1=11000000, B2=11000000-35133391030, polynomial Dickson(12), sigma=3779197836 Step 1 took 27638ms Step 2 took 14681ms |
|
![]() |
![]() |
![]() |
#161 |
Jun 2003
Ottawa, Canada
7×167 Posts |
![]()
I took ECM 6.2.2 and compiled it with MPIR 1.0 in cygwin to compare the LINUX code to what Windows MSVC code is doing. I saw a similar pattern to all of you as well. This is all 32bit code run on an Intel Core2 Q9550 @ 3.4GHz.
ECM Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 Sigma: 980060817 MSVC 6.2.2 with new SSE2: Step 1 took 82837ms | Step 1 took 82790ms Step 2 took 41137ms | Step 2 took 41402ms MSVC 6.2.2 without SSE2: Step 1 took 82867ms | Step 1 took 83071ms Step 2 took 42557ms | Step 2 took 43337ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 78359ms | Step 1 took 78531ms Step 2 took 34695ms | Step 2 took 34086ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 78375ms | Step 1 took 78718ms Step 2 took 24445ms | Step 2 took 24367ms P-1 Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 x0: 524328229 MSVC 6.2.2 with new SSE2: Step 1 took 9469ms | Step 1 took 9563ms Step 2 took 7098ms | Step 2 took 7051ms MSVC 6.2.2 without SSE2: Step 1 took 9360ms | Step 1 took 9235ms Step 2 took 11731ms | Step 2 took 11404ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 8751ms | Step 1 took 8487ms Step 2 took 5788ms | Step 2 took 5740ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 8455ms | Step 1 took 8658ms Step 2 took 5788ms | Step 2 took 5710ms P+1 Factoring: 1877138824359859508015524119652506869600959721781289179190693027302028679377371001561 B1=20000000 x0: 524328229 MSVC 6.2.2 with new SSE2: Step 1 took 17082ms | Step 1 took 17145ms Step 2 took 8596ms | Step 2 took 8408ms MSVC 6.2.2 without SSE2: Step 1 took 17675ms | Step 1 took 17566ms Step 2 took 15585ms | Step 2 took 15553ms GCC cygwin (--enable-sse2 -enable-asm-redc) builds as pentium3 Step 1 took 14570ms | Step 1 took 14617ms Step 2 took 7566ms | Step 2 took 7816ms GCC cygwin (--enable-sse2 -enable-asm-redc --build=pentium4-pc-cygwin) Step 1 took 14929ms | Step 1 took 14602ms Step 2 took 7706ms | Step 2 took 7862ms You can see that the new MSVC build that uses SSE2 is much faster in Stage 2 than the old build, but the Linux code built with gcc (in cygwin on Windows or whatever) is faster in both Stage1 and Stage2. So if you want the fastest possible ECM/P-1/P+1 you could install cygwin/mingw or run Linux/Linux in VM Jeff. Last fiddled with by Jeff Gilchrist on 2009-04-10 at 14:49 |
![]() |
![]() |
![]() |
#162 | |
"Nancy"
Aug 2002
Alexandria
1001101000112 Posts |
![]() Quote:
Then, with build type pentium4, the mulredc asm code from pentium4/ should be used instead of the code from athlon/, so on an actual Pentium 4 at least, the stage 1 time should differ. On what CPU type did you run these tests? Alex |
|
![]() |
![]() |
![]() |
#163 | |
Jun 2003
Ottawa, Canada
22218 Posts |
![]() Quote:
Both config.h files contain #define HAVE_SSE2 1 Both linked the mulredc files from pentium4/ Jeff. Last fiddled with by Jeff Gilchrist on 2009-04-10 at 15:15 |
|
![]() |
![]() |
![]() |
#164 |
"Mark"
Apr 2003
Between here and the
11000011001012 Posts |
![]()
Are you referring to GMP or GMP-ECM thinking it is a P3. My understanding (from the GMP folks) is that the Core 2 is built on a P3 architecture, not the P4 architecture, thus the P3 optimizations work better than the P4 optimizations. That doesn't explain the difference of your ECM run.
|
![]() |
![]() |
![]() |
#165 | |
Jun 2003
Ottawa, Canada
7·167 Posts |
![]() Quote:
Jeff. |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Project Links | masser | Sierpinski/Riesel Base 5 | 25 | 2011-11-26 09:21 |
Links to Precompiled Msieve versions | wblipp | Msieve | 0 | 2011-07-17 20:59 |
Links | davieddy | Information & Answers | 9 | 2010-10-08 14:27 |
Links question | ET_ | PrimeNet | 0 | 2008-01-26 09:35 |
Links. | Xyzzy | Forum Feedback | 2 | 2007-03-18 02:17 |